The Data Stack Show - 232: Building a Business Solo: Streaming Data, Synthetic Testing, and Startup Lessons with Michael Drogalis of ShadowTraffic.io
Episode Date: March 12, 2025Highlights from this week’s conversation include:Michael's Background and Journey in Data (0:24)Synthetic Data Challenges (1:49)Open Source Project Development (4:20)Founding Distributed Masonry (5:...56)Acquisition by Confluent (7:27)Introduction to Shadow Traffic (10:57)Observations on Streaming Data (12:33)Importance of Timestamps in Testing (16:22)Customer Workflows with Shadow Traffic (19:09)Artificial Intelligence in Data Generation (22:13)Advantages of Domain-Specific Language (DSL) (25:14)Solopreneurship Insights (26:53)Exit Criteria for Startup Focus (30:12)The Feedback Loop (33:51)Balancing Customer Needs and Vision (35:02)Navigating Administrative Tasks (38:15)Expected Value Mindset (41:00)Solopreneur Efficiency (43:01)Maximizing Velocity (46:06)Final Thoughts and Takeaways (47:34)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
Transcript
Discussion (0)
Hi, I'm Eric Dotz.
And I'm Jon Wessel.
Welcome to the Data Stack Show.
The Data Stack Show is a podcast where we talk about the technical, business, and human
challenges involved in data work.
Join our casual conversations with innovators and data professionals to learn about new
data technologies and how data teams are run at top companies. Welcome back to the show.
We are here with Michael Drogalas of Shadow Traffic.
Michael, welcome to the Data Stack Show.
Hey, thanks for having me.
All right.
Well, we have a ton to get into.
Of course, I'm passionate about streaming data. And so we're going to go deep
on that. And we're going to talk about solarpreneurship and a number of other things. But first, just
give our guests a brief background. How'd you get into data and end up at Shadow Traffic?
Yeah, by trade, I'm a software engineer. I think the last thing that kind of inspired
me as I was coming out of college was distributed systems and streaming data. They were all
kind of really getting started around like 2010 or 2011. And I went
out and I built an open source project, ended up building a company on top of that. I sold
it to Confluent and then recently I left to go start Shadow Traffic, which we'll talk
about that. It's sort of the inspiration of all the problems that I've seen occurring
in the last 10 years or so. And yeah.
Awesome.
So Michael, we were talking before the show, doing a little bit of show prep.
So many cool topics here.
Eric already mentioned one solopreneur thing.
I've been reading a lot about that and people are like,
who's going to be the first $100 million solopreneur?
So that's a fun topic.
And then the streaming topic is just a fun one.
It's been going on a long time and I think a lot's happening there.
What are some topics you're interested in covering?
Yeah, it's always fun kind of going into the details of the problems around synthetic data.
I think people look at it and they think, well, I can just use chat GPT to create some data
or I can just write a little script to do it.
And in some simple cases you can, but as you start to go down this path
and you need to build more and more cases that reflect production scenarios
It's actually a lot harder than you think and reaching for a tool or it sort of has that that defined as a set of abstractions
That help you it's fun to go into the motivation behind those things in the use cases and such
Well, let's dig in. All right, let's do it
Michael I I'm so interested in in your background because you got interested in distributed
systems and streaming really early.
Of course, in many ways, those technologies have become ubiquitous for certain use cases
in the data stack.
But tell us what you were doing.
You were a software engineer, and what piqued your interest?
What actually got you into that?
Was it a use case at work or just personal research?
Yeah, it's funny.
I had a college professor and there was a class on distributed systems and it was a
pretty small class and so I had a lot of individual attention.
He was just a really inspiring professor.
He was telling me about Erlang and message passing and all these things.
And there just, there wasn't a lot of people who were kind of working on it.
And that's actually great when you're just kind of coming out of school, you want to
find kind of a small community where you can participate in and feel like you can directly
interact with the people who are working on these problems and that's kind of what led to me like
being a little bit involved with Kafka's that project it's huge today at the time like 15 years
ago it was just getting off the ground everyone was very easy to talk to it was very easy to follow
all the trends that were going on and so it was actually a perfect thing to just jump into coming out of school.
Mm-hmm.
Very cool.
And so you came out of school.
Did you get a job as a software engineer or were you working on...
I mean, you were obviously part of the open source community and interacting with that
community.
Yeah.
I kind of got to work as a backend engineer working on analytics systems.
I did some contracting.
And during that time, the other thing I sort of fell
in love with was functional programming.
Closure is my tool of choice.
And that's another community that was just, it was and still is rather small in niche,
but I learned a lot there.
And that maybe the first three years out of school, I kind of understood what it meant
to be a professional software engineer and how to work with other people and that kind of thing.
Yeah, yeah.
It's funny, you're taking me back because I founded a company with a technical co-founder
and he loved Erlang Enclosure so much and he was very involved in those communities
and so you're taking me back a little bit.
That's fun.
Okay, so you're working as a professional software engineer.
You're involved in the open source community and then you start your own open source project.
Yeah, that's right.
So there were sort of like multiple pieces to the streaming problem.
There's like the back channel or the backbone, I should say, of how you actually move data
from A to B.
And then there's the problem of like, what do you do with it?
And that's kind of the whole area of stream processing.
And so in 2011 or 12, Apache Storm came out,
which was basically the first mainstream attempt
at processing data at real time.
And I just felt like there were some problems
in that project, it was a great first attempt.
I thought I could do a little bit better,
specifically if I sort of zeroed in on the problem,
solving it like the closure way and functional programming.
And because I was part of like a niche community, I got to build a relatively niche solution
that people really liked.
And so I got started on that partly just because I wanted to have something of my own.
I sort of felt like as I left school, I didn't really have like an identity of my work.
And it seemed like everyone who was doing really well had started some kind of an open
source project. So that seemed like everyone who was doing really well had started some kind of an open source project.
So that seemed like the fashionable thing to do.
And I pursued that for a couple of years.
I built a community around it,
ended up meeting the co-founder of my next company through it.
It was just a really great experience.
Again, learning how to get along with people
that you don't immediately work with,
new community members and that kind of thing.
Yeah.
Is it still around?
I mean, that's a long time ago.
No, no longer.
So I, when we eventually ended up selling the company that I co-founded, we kind
of had to part ways with it.
You can only juggle so many things at once.
Yeah, yeah, yeah.
Totally.
My heart goes out to open source maintainers.
It's a really hard thing to do.
Yeah.
And what was the, so you met your co-founder and then what company,
what company did you found?
Yeah, we called the company Distributed Masonry and it was a little bit of a play that the
project's name was Onyx.
It was sort of like a stone-based thing.
And again, our whole premise was that we could build something cool in distributed systems.
And so we tried basically to build a platform on top of Onyx that didn't really work because
there wasn't really a big enough user base to do a SaaS and do it as a service, as a
consumption- based model.
We tried sort of like a function as a service sort of thing in 2015.
Lambda was sort of still getting started.
We just whipped on a whole bunch of product ideas.
And the thing that ended up working really well was actually kind of going back to something
a little bit earlier in my career, which was seeing how we could support Kafka to do tiered
storage. So the problem with Kafka like 10 years ago was that
it was pretty limited in the amount of data
that it could transport.
You basically kind of had to size it per box.
It was so very finicky about the way
that you resource these things.
And we basically had this idea of like,
well, what if you could hook up S3 with Kafka
and have unlimited streaming data?
And it turned out that the technology that we built on it
was actually a pretty good way to do an initial prototype of that. And it turned out that the technology that we built on it was actually a pretty good way
to do an initial prototype of that.
And so we built a product on that
and that was sort of what we had a little bit of success with
before we eventually sold the company to Confluent.
Yeah.
And tell us just a little bit
about the journey of selling to Confluent.
I mean, that's pretty cool to start a company.
It sounds like he went through sort of rapid development to find
something that would have early product market fit where it's like, okay, this is a pain
point that's big enough to where there's some traction.
And then you sell to a company like Confluent.
They're obviously much larger now than they were back then, but that's a pretty neat journey
as a first at that as an entrepreneur.
Yeah, it was a lot of fun.
There's a lot of good stories in there.
Actually the company was just four folks
and we had been working together for maybe two,
two and a half years.
And the very first time we all met in person
at the same time was at like the acquisition discussion,
discussion to go through.
No way.
Yeah, so we're like,
we're actively figuring out like the rapport
and the timing of how we talk together in the room.
But it was fun.
I mean, people like to ask about it
and it was like, surprisingly straightforward.
I asked our lawyer, I was like, well, how do I say this?
And he was like, you just say it.
Like, do we want to be acquired?
Just say it.
And it was like a very, it was a very direct discussion.
And it made me sort of appreciate that like, as you move up in the stakes in business,
it's just worth being very direct.
Nobody wants to waste time.
Tell people what you want.
They'll tell you what they want.
And I learned a lot of things that like really accelerated my career by maybe like five or
10 or 15 years.
Wow.
Yeah.
It's so funny because hearing that story about the first time I met my co-founders in person
was at the acquisition meeting feels so much like a post-COVID, a post-COVID story or post-COVID dynamic because that is so much more common now to
to have these really like intimate business relationships without actually having physically
met the person. Yeah, we pioneered the fully remote model for better or worse. It worked out for us,
but yeah, it's just sort of a fun anecdote. Yeah, very cool. Okay, well, I know John is
burning with a number of questions, but I do want
to hear about Confluent. So you went to work at Confluent, you got into product, and then
were there for a good while, probably at a pretty formative time for Confluent as a company,
because the last five or six years have been pretty crazy just in terms of Kafka generally,
a lot of the stuff that Confluent has shipped.
I mean, on the product side,
so many really cool things they've done.
So tell us a little bit about Confluent
and maybe some of the big lessons that you took away.
Yeah, Confluent was an interesting experience.
I was really lucky in that I got to work across
just a bunch of different things.
Primarily, I headed up product for stream processing.
So the way you can kind of think about Confluent at the time I joined 2018 was that like Kafka
was a relative success.
They were trying to get Confluent Cloud off the ground and pivot from like an on-prem company
to a cloud company.
And then in addition to that, it was like, well, we got to get more products off the ground.
Besides Kafka, how do we get into the compute game?
And that's where all this ties back to the beginning of my career.
I led kind of the early efforts for stream processing. So, right.
Confluent has a number of offerings on that.
They have Kafka streams, which is a Java library.
And then the thing I primarily worked on was KSQL,
which was a streaming SQL variant. And that, I mean,
it was so interesting trying to do this at a time when the company was like
trying to solidify its core offering, move to the cloud,
continue through hyper growth. A lot of lessons learned in there. Yeah, wow. Actually, Brooks may be able to look
it up for us, but we had a guest on the show who is also very involved in KSQL. I think he worked
at Meta for a while, and then he went to Confluent, and I think he worked on KSQL. I'm not sure, but
we'll try to remember his name because I'm sure you interacted with him. Yeah, very cool
Okay, John, I've been monopolizing the mic. So I'm handing it over to you
So a lot of different ways we can go with this
But let me share something I share with you guys before before we started so you're new so you're no gig
So you're at confluent you've started a new thing now called shadow traffic in the moment
I like kind of understood what you were doing. It took me back
Almost 15 years of working on a b2b SAS app. We were going from like a
major version change for various growth reasons
We just had a ton of things ton of features we packed into this particular release
reasons, we just had a ton of features we packed into this particular release.
So we get all the codes done. I'm more doing like database backend stuff. Got a sysadmin doing his thing and then a bunch of developers. So the things like finally
in QA, the developers celebrate and say it's done, right? It's not done.
And then we have this problem of like, how do we test all of these like use cases?
And it's like, should we grab a bunch these like use cases like and it's like should we grab a bunch
of production data and like run it through here and then you run into like at the time we probably
should have been more concerned about like privacy and security type things so there's that part of
it but there's also just all these other like edge cases of like well we need to make sure we turn
this feature off so we don't randomly email thousands of customers or we need to make sure we turn this feature off so we don't randomly email thousands of customers or we
need to make sure we turn this feature off so we don't accidentally pump data into the
accounting system and accounting thinks it's real.
So there's all these like really interesting things that came up and we look for a solution
is like maybe there's a solution out there.
Nothing that we found at least.
So that's my setup for like kind of my personal experience here.
You're obviously really deep on this this problem. So at Shadow Traffic, walk us through some of the,
first maybe how you even came up with the idea
and got into the space.
Yeah, I basically observed,
actually over the last 15 years
from when I started my career,
I was just in this streaming space.
And time again, it was hard to test things.
I mean, we would talk about these features
that are really cool.
Like, oh, we could capture real time data.
We could do real time joins.
Look at how quickly we could update these aggregates.
Who's showing it?
Where do you see this stuff?
I mean, you could see it in unit tests, but like nobody was showing it for real.
You have to actually go behind the scenes and look at people's production metrics
or the systems that are just basically scared from view
because they're production systems.
And I noticed just a litany of use cases for this.
So engineering teams kind of need to do
testing, stress testing, integration testing, edge case testing. Sales engineers need to be able to
have test data to exercise their systems to prove that they're what people are buying.
Developer advocates need to be able to put on cool demos. And there's a lot of places where it's
applicable. It's this nice niche problem that I felt like, okay, one person could just go and solve this really well. And there's all kinds of
continuations once you solve the initial problem, you can do more and more. But that's really what
started it all. What is what in terms of, I'm just thinking about your experience at Confluent,
especially thinking about stream processing, right? Because I think that's an area in particular where you really do need to run a lot of data.
I mean, the ideal testing is with your production data,
because you have all of these different messages
coming through and there actually can be many times
a very high amount of cardinality with that.
And that can vary over time periods, right?
Even just through like the cycle of a day,
it'll say you have international traffic, right?
Throughout the cycle of a day,
you'll have very different types of traffic come through.
And so if you're making an update
to a streaming transformation that you're doing,
like it's really hard to test.
Can you give us an example from Confluent,
maybe from like an actual customer or some situation where that was really problematic and how did you face it
and why was it painful?
Yeah, I'll give you two of them that are really easy to understand. So like imagine, let's
do like a retail example. You want to push retail data through an adventure of a system
and then you have an application that's sort of processing stuff as it comes in. And you
may have two streams like customers and orders.
If you want to actually test this,
you'll have all bunch of customer IDs coming through and saying,
all right, customer John and Eric and Brooks come through.
And then a bunch of orders come through.
Orders almost certainly has a identifier that refers to customers.
How do you do that?
Do you just pick John and Eric and Brooks and randomize those?
What if the messages for each of those three people
don't show up before the orders?
How do you do that over a big enough key space?
That's just like a clear problem you immediately hit
when you start using these systems.
Yeah.
Another one is like, imagine you're doing
like a checkout process where you're taking in web events
where like, okay, in my shopping cart,
John puts the item in it, views an item,
puts it in his cart, takes it out, puts another item in,
and then checks out.
If you start to change the order of those events and you say, well, the checkout comes before the an item, puts it in its cart, takes it out, puts another item in, and then checks out. If you start to change the order of those events and you say,
well, the checkout comes before the view item, your application will break.
And again, very basic stuff.
You can write a unit test for these things, but if you want to test
production volume with all of your systems together, which you should,
you immediately hit this problem and it's harder to solve than it looks.
Can we dig in a little bit to timestamps specifically? Because from a very practical
standpoint, that's super challenging because if you, like let's say you generate a set of data,
right? Because I mean, a very common way to do this and something that we've done in the past is
you just write a script, right? Like, okay, it doesn't seem that hard, right?
It actually can take a lot of work if you're trying to do it, I would say,
if you're trying to do it properly, as it were, right?
Where you're trying to represent the cardinality appropriately,
where you're trying to do sequencing and all that sort of stuff, right?
A lot of that work actually has to do with timestamps, right?
And so the way that you generate data and the way you have to sequence timestamps,
especially when strong ordering is actually very important
for a downstream application or analytics use case.
So let's say you go through all the work to do that, right?
And then it's like,
again, I need to do this again and again, right?
And like, it just is really annoying.
It's so annoying to like go back through
and like recalculate all the time
stamps and make changes because you realize, oh, all the time and same stuff that I did,
like it's even if you try to randomize stuff, it's really hard to make it seem real. That's
so dumb, but like that's very hard.
Yeah. And the thing that you kind of want at the end of the day is something to sort
of sit at the front of your architecture and at the front door, just blast data through as if
it were your real customer data and have a set of knobs to be able to say, I don't have
to go down and feel like I'm programming like C or assembly, but I have these very high
level parameters that let me say, what does this data look like?
What are the non-functional characteristics?
And then have it act as shadow traffic.
I mean, that's the name, to act as a shadow
of your actual customer data.
I think that, with simulation testing,
is kind of the right answer.
I'm curious a little bit, like,
want to dig in a little bit on architecture,
because if you told me, like,
here's a problem, how would you solve it?
My immediate would be to go to, like,
okay, let's go to production data,
and then, like, scrub it, right? Like, let's go to production data and then like scrub it, right?
Like let's hash stuff, there's PII,
let's essentially scrub from production data.
We haven't talked about it, but I don't sense that's the way that you went about solving this.
Maybe it is.
I think there's two ways you can build a product in this space.
You could either do what you're saying, which is to take existing data, use machine learning
or some kind of procedure to basically reverse a safe copy of that data.
There are many companies that actually do this really well,
particularly in the relational database space
where you have like thousands of tables.
They're very static.
All you need to do is like find all the addresses
and rip them out.
Not a trivial problem, but like that's kind of its own thing.
And then there's the approach that I took,
which is to say, okay, what if instead you had basically
a very high level language that let you describe
what the data is
and you can do it directly, you could sort of bootstrap off a schema, you can use an LLM to help you write it.
And that has the advantage of being able to say, well, okay, we don't have to modify anything because this is fully fresh.
But also it has the advantage that many times the data doesn't exist yet.
If you've never been to production, there's no production data.
Oh, wow. Yeah, it's blank tables or blank fields or whatever.
Yeah.
Exactly.
And then you could sort of speculate about the future.
You could say, well, it's not just take a copy of production data.
Let's basically use similar characteristics and then say like, well, let's triple the
volume over time or let's make the traffic much spikier over time or stuff like that.
And so it's probably a smaller market of the two, but I think in some ways it's the one
that's a little bit harder to solve.
And it's maybe why I've had a little bit of traction.
Yeah. This is a super practical question. How do you see your customers...
So I'm guessing that the general workflow is that I set up in dev or QA and then I
point shadow traffic at it and generate this highly realistic stream that allows me to
get as close to testing and production as I can as far as the data goes, right?
Because that's a lot of data.
Sometimes you'll try to run tests with a small data set or a small sample batch or whatever,
right?
But if you're testing a bunch of data in scale, it also creates this really interesting challenge
of – well, a couple of challenges that come to mind.
So one is cost, and then two is the systems
that you're sending it to downstream, right?
Because you're either provisioning something
as part of your dev,
you probably don't ultimately want the data in there,
so are you just dropping it?
Can you just explain kind of the,
what's a typical workflow for a shadow traffic customer
in terms of the environment,
what they do with the data, all that stuff?
Yeah, it kind of depends on intent, as you say.
So if you're like an engineering team,
your goal may be to make sure
that a bunch of systems integrate.
And so you may have a smoke test
where you basically kind of run a minimal set of traffic
through your system,
but you want all components online
to make sure that the scheme has worked,
that your web sockets are connecting well together,
utilization works really well.
That very same team may take that set
of shadow traffic files and basically use these knobs
that I described, it's like a very high level DSL,
and say, no, no, let's crank up the volume,
let's do a stress test.
A good example, a customer of mine, Raft,
published yesterday that they did a hundred terabyte test
on their systems, they have to generate very low latency queries using historical and streaming data
together which is a tricky problem and internally they use shadow traffic for like more minimal
testing for this particular case they turned it way up generated a hundred terabytes of
data 50 gigabytes of data a minute and they were able to do sort of a short-lived somewhat
more expensive by an edge and test tear the whole thing down be confident checkpoint move
on and so great there's just a set of use cases where it applies and the fact that it's lived somewhat more expensive by an agent test, tear the whole thing down, be confident, checkpoint and move on.
And so there's just a set of use cases where it applies and the fact that it's parameterizable
kind of helps people move from problem to problem.
Yeah, super interesting.
Yeah, I mean, expensive to test, but probably not relative to the cost of making a breaking
chain reduction at 100 terabyte scale.
It's worth it every time to do the testing before the customer finds the problem.
Yeah, for sure. So interesting,
moving further down this journey, how...
So this makes a lot of sense if I can control it, but what about testing with
some kind of external dependency, like API type thing,
and I don't know the space that well.
So I don't know if you're solving this problem or others are like, this stuff. We just want to have like kind of a dummy hold-in
that will behave about like a stripe API,
have the same like rate limits, et cetera.
Is that part of this scope of what?
That's a bit of an orthogonal problem.
It reminds me, the company name escapes me,
but there's a company that basically kind of mimics
AWS services where they give you a set of containers
or maybe they host services and then they kind of behave
in a similar way for testing purposes.
And yeah, so that shadow trapping, it's sort of meant to find itself in a place where things
are as realistic as possible.
Whether you kind of fake out the rest of your downstream systems is up to you.
If you want to use test containers, that's totally fine.
But it's meant to give you those like degrees of freedom to make those choices.
Yeah, cool.
One thing you mentioned and because I want to make sure we have plenty of time to talk
about you being a solarpreneur and what that journey has been like, but this is just such
a fascinating problem.
So one of the really interesting challenges, and this is, it sounds so funny, but it's
actually just very difficult to be creative enough to generate data that mimics reality.
I think part of that is because human behavior is a very complex thing generally, and I think
you see that in streaming data specifically, or even sometimes system behavior.
But to write a script that generates a bunch of data, you have to think in a pretty structured way, right?
Like enforce it like a lot of concepts
around taxonomy and stuff.
And so you have these two competing things.
And so it makes it very difficult for a human
to generate something that's like highly realistic.
So how do you do that at Shadow Traffic?
And you even mentioned that there are some tools like LLMs
that can help you express what you're
trying to do, but how do you get close to the bullseye in terms of this feels like real
production data?
Yeah, reality helps because people usually come to me not when they're just bored or
just trying to do something new.
Like they have a problem to solve where they're like, okay, our customer needs to do this.
We have this schema.
We may not have their data, but I know what their data looks like.
And then the 80-20 rule applies where it's like, we need to get it good enough
along these particular characteristics.
And usually they could dial it in where it's like, okay, problem
solved and they move on.
So they're not imagining kind of all possible dimensions.
Right.
But the other thing you mentioned is if you really are starting from scratch,
like many developer advocates would be, if you're building demos to try to
like promote your software, I have a custom trained GPT, which is awesome.
You could just say, Hey, I'm thinking about these domains.
What kind of examples could you give me a data streams?
They'll give you some lists and say,
write the shout traffic file for me.
And then like, it's not perfect, but like 90% of the time,
it gives you a great baseline that you can go
and pick up and just start moving.
And it's like, that's a perfect marriage of AI
and high level programming languages
where you could use AI to be creative
and then take that thing that it generates, check it in the Git,
share it with your team, modularize it, and go from there.
Yeah, totally. Yeah, it's just like fast-tracking it.
I mean, you can even run tests and then say,
okay, let's go in and tweak these things to get it the last 20%.
Exactly. Super interesting.
Okay, one last question before we dive into Solrpreneur stuff.
Why a DSL? That's always like an interesting choice, especially with a startup, right? Because
there's tons of different thoughts on this, right? But like a classic one is,
I mean, a classic one in data and analytics is people trying to write language that like write
a DSL that will eventually replace SQL, right? And so there's a lot of people who are like, that's never going to happen, right?
But and there's all sorts of interesting tensions there and different philosophies, but would
love to know why a DSL is a choice for shadow traffic.
Yeah, when I say a DSL, what you actually program in is JSON.
And the reason it's advantageous is imagine you have this like super deep nested gnarly
record, which is actually like, probably deep nested gnarly record,
which is actually like probably more common
for your listeners than not.
You need some way to basically kind of work with that
without like juggling all these different inner attributes
and seeing whether things line up.
The 30 second explanation of ShadowTraffic's API
is you basically take a specimen of your data,
you look at all the concrete values,
all the strings, all the Booleans, all the integers,
even all the inner collections that you wanna change,
you rip them out and then you put in
these little function markers to say,
what do I put here instead of this specific value?
Now, if you were to build that another way,
you were to build that with a programming language,
you would have to do all that juggling,
you would have to figure out,
what is the infrastructure do I need?
Do I need Maven?
Do I need a JVM?
What do I need to do this?
I package all of that into a Docker container.
So all you need is an editor to write JSON and then a Docker container.
And it takes care of all the complexity of compiling, running,
garbage collecting efficiently, all that.
Oh, cool. Yeah, so it's more like a tool set for JSON interface, I guess.
Yeah, that's probably more accurate.
That's right.
Yeah, yeah, very cool.
Okay, man, that's super interesting.
I'm sorry, I can't wait.
I didn't get a chance to use it,
but I totally wanna go play with this now.
Okay, John, you wanted to dig into solopreneur stuff.
I have a million questions
about dominating the competition.
Yeah, I think, yeah, and the solopreneur stuff,
super interesting.
You can go a couple directions here,
but the thing that comes to mind first is say I'm a software engineer, I've worked on streaming solutions like you have,
or maybe another sector, it doesn't really matter. What kind of framework, like mental
framework, do you have when you're thinking about ideas? Because for a lot of us, it's like,
I can have 10 ideas a day. But like, do you have like a mental checklist or framework to decide,
like, oh, like, I should pursue that a little bit. Like do you have a mental checklist or framework to decide,
oh, I should pursue that a little bit?
Just walk us through the thought process.
It's a little bit more of what kind of lifestyle do you want to live?
You can think of really big problems and you can go raise money and live
that sort of life where you have to hire people and scale really fast.
Or in my case, you can try to find problems that can be solved by
one person or just a few people and try to run maybe like a more slower growth business. And so I mean, my thinking is I was at Confluent and my entrepreneurial drive,
I tried to relax it. Like it just wouldn't stop. This is where I learned about myself. Like I am
built to make and sell things and I will be for the rest of my life. I can't turn it off. And I
tried to, I worked on this like new presentation tool idea on the side. It was like a VC fundable idea.
It never really went anywhere
just because I wasn't really comfortable
with doing another investor-backed company.
It just felt like this is the wrong time in life for me.
I wanna do something that's a bit more lifestyle-driven.
And so when I left Confluent,
I put up a blog post that said,
I'm launching four startups in four quarters.
And I basically outlined my thesis that like,
hey, I have a list of 10 ideas. I'm gonna burn through them one a quarter. I'm obviously not startups in four quarters. And I basically outlined my thesis that like, Hey, I have a list of 10 ideas.
I'm going to burn through them one a quarter.
I'm obviously not going to run four startups.
I'm going to find one that works.
And I went through a process for 12 weeks until I launched shadow traffic.
And by week six, I was pretty confident that I had a winner.
So I kind of cut the whole thing off, but I just took the approach that like,
I'm just going to burn through ideas until it works.
And I'm not going to work on them for years.
I'm going to work on them for at max 12 weeks and that should be enough to tell me.
Right.
So, so tell us about like the process to get to the 10 ideas and let's say let's stay in
the like, we're not going VC back route.
We're going to go like solopreneur route or at least bootstrapped, right?
So like any process behind the 10 ideas or for you it's just like, well, I just kind
of always have ideas in the back of my mind. I don't usually have ideas
I sort of forced it I like stood in my backyard and it was like a summer day and I was just like, okay
Well, I'm gonna do this. What the heck am I gonna do?
I just started writing stuff that I observed over time
I had first one that came to mind was like well everybody needs test data
like maybe I could do something with that and then I had some other ideas that are maybe lesser quality around like
like maybe I could do something with that. And then I had some other ideas that are maybe lesser quality around like
child care is really annoying in my particular area.
And I can't remember what other ideas that I had, but I just sort of forced it.
I was like, I have 10 ideas now.
And that was helpful to just like stepping in the creative mindset.
Yeah, that makes sense.
That's interesting.
I met a really successful entrepreneur who very similar.
People would say like, well, you just seem to like have these ideas that are great.
And I was like, I was, I met up with him for lunch and I was like, I am interested.
How do, how have you come up with multiple, very successful ideas?
It's like, it's so funny to think about.
He's like, I just do the ABC thing.
And I was like, what do you mean?
And he's like, you just write down all the letters of the alphabet and then you try to come up with like, like a
company idea or concept that starts with a and then B and
then C and like he's done that.
I haven't heard that one before.
That's great.
I would it's the same thing.
It's like a forcing function, right?
To just like sort of get your wheels turning and think about
problems.
That's really cool.
I like that.
So one thing that I'm really fascinated about is 12 weeks
is an extremely short amount of time to sort of build and validate. What was your exit
criteria for, okay, I'm going to focus on this, right? Across all the ideas, like, what
needed to happen in 12 weeks for you to say,
okay, I found the one that I'm going to focus on, right?
I mean, hopefully the winner, but at least the one that I'm confident enough to, to give
them my full focus.
Yeah.
It's only tight if your problem I think is too big.
So what I did was I said, okay, number one on the list that I feel decent about test
data for Kafka.
People often have trouble doing demos
in the Kafka community.
Small problem.
And so I opened my laptop,
I wrote a social media post that said,
I call it the $10,000 demo problem.
That was like the title of the post.
And it was like, hey, you ever had to do like test data?
I bet you it actually costs you $10,000 in your time.
For these reasons, you had,
maybe you have to do like related data
or that sequencing that I mentioned or
any of these other things.
Yep.
Didn't hint at all about the solution.
I just wrote a post that I thought was interesting.
Got a bunch of reactions.
I don't say traction.
I got reactions to, I got comments, I got likes on LinkedIn, on Twitter.
I went and I reached out to every single person.
I was like, Hey, thanks for interacting with my thing.
Can you tell me a little bit more about your experience with this problem?
Any hard details?
Just tell me about it.
I started to hear some real use cases,
which is like indicator number one.
Are people saying that's cool or are they saying,
hey, that's useful.
And then here is my background with this specific problem
and there's these very hard details about what happened.
I started to hear that and I was like, okay, good.
Step number one.
Next thing I did was I created a minimal landing page, came up with the name
Shout Traffic, did it like a hero that basically sketched just like the
beginnings of the solution, had a CTA that was like join the wait list, put you
on like an email thing with me, had maybe a hundred people signed up, reached out
to every single one of those, ran the same process, said, hey, tell me about
your experience with this problem. The details got even a little bit more.
That was really good. It felt like, okay, something's happening here. During that process, I had
two companies reach out that not only did they have the problem, very critically, they had urgency
about it. They had decision makers and they had budget. And I was like, okay, six weeks in.
This is very good. Got it. Yeah. That was enough for me to be like, that's enough of a checkbox for
me to keep going. And had you started to build any product in that six week period or were you still spending
most of your time just doing validation by talking to people?
I did a little bit because my sketch of the idea was like pretty loose.
And so I had to fill in some gaps for like, well, how will this work or what would this
do?
But it was mostly, I mean, I did, I can't remember, I'll have to go look it up.
It was like 60 customer calls in a couple,
two months or something like that.
I did it pretty hard.
And all of those conversations,
I mean, they just shaped what I eventually built.
But once I had those two customers that were like,
yeah, we wanna pay for this if you complete it,
that was go time.
I just went, heads down to the keyboard
and just banged out exactly what they needed.
And that was the beginnings of a real product.
Yeah.
Wow. What was it, that of a real product. Yeah. Wow.
That's a pretty hard swing, right?
So you're talking with one or more people
every day for two months,
and then you're processing all of that.
You're trying to collate all of the different patterns
that you're seeing across all these conversations.
And then you just go heads down and build product product is that I mean that's a pretty crazy swing
did you enjoy that I mean what was that experience like I love it I'm just super
driven to solve problems for people for money like that loop is just so
satisfying to me and it is a little bit of like a swing as you say where you're
like you're on the call then you're doing some marketing content then you're
coding but when you find someone in a real team who has a real problem, it's just so direct.
I mean, if you have a question about the requirements, you just ask them, like, do you want A or
B?
And they'll say, and then you do it.
And they give you the feedback loop that like, yeah, that looks good.
And you get to use their satisfaction for more marketing.
It's just like really beautiful flywheel.
Yeah, yeah, yeah.
I love that.
One question I have, and this is just gonna be
a totally selfish question from one product person
to another product person.
Well, I guess like CEO, CTO, chief product officer,
chief marketing officer, all Michael is all of those things.
Yeah, yeah, a lot of titles.
So in the early stages, like getting that feedback,
building those direct solutions,
have you come across a situation yet
where a customer asks for something
and you have sort of a vision for the product
where you say, I'm actually not gonna do that,
or you push back because their specific need
doesn't necessarily reflect the larger picture
of what you wanna build?
Yeah, that totally comes up once in a while
and to connect to earlier in our conversation,
I had a few people who were like,
hey, it would be great if you were to take my production data
and automatically do this for me.
No LLM, you just have this black box
that snapshots all my data and outputs it.
I could eventually build that, but I feel like,
okay, that's something I want to kind of come into over time,
maybe take some VC funding to go after.
Another thing people constantly tempt me with is like,
hey, this would be great if you did like unstructured text for AI.
I always resist that because nine out of 10 times,
they don't have a real use case behind it.
It's just like, I could sense those checkboxes aren't there to really get,
really build a product that people are going to use sustainably.
So yeah, sometimes you just have to decline it.
It's tough, but it's true.
Yeah, yeah, yeah.
Now that makes total sense.
Talk about the ingredients to be a, we'll just, we'll narrow the scope to solopreneur,
but you could probably extend it to entrepreneur as well, right?
But we're a software developer, you are a product manager.
Not everyone can make the transition from being an IC definitely or even a manager to actually
starting a company and building a company. Can you just talk a little bit to that? What
do you think some of the ingredients are that you've noticed in your own experience where
and we're just thinking about those listeners who not everyone's designed to be an entrepreneur
and that's totally okay. But I'm thinking about those people who are listening to this,
maybe on their commute to work or on their way home,
and they've had that itch inside of them just wondering,
could I do that?
Is it possible for me to do that?
So speak to that experience and speak to that person around,
what do they need to hear to push them over the edge, I guess.
I bet that's like nine out of 10 year listeners since I started this.
I've had so many people reach out to me that were like,
I would love to quit my job and just pursue an idea that I feel like is important.
But I think the first thing is to just like look at it objectively and say,
OK, if I want to do this,
there is a much larger range of skills that I need to develop to be good at this.
And I was not born with them.
And I say that like me, like I I learned to code that took me many years.
It's the hardest skill I've ever developed.
But then as soon as I did that, like if I want to build a company, I need to learn how
to do marketing and how to have a sales conversation, how to build pipeline and how to treat customers
during customer service.
You really have to work at it.
And what's so hard mindset wise is that when you do this, you start a company, everything is so scary because there's so much uncertainty. And the thing you're always going
to want to do is to go back to what's comfortable. I'm going to code because I'm good at coding.
But the trap is nine out of 10 times in the beginning, what you need to be doing is not
coding, but probably sales and marketing and working with customers. And it's just hard to
be that uncomfortable all the time. It's very frustrating. If you can push through it and you can actively get mentors to help you and study, you really
can do this.
It's not impossible.
Yeah, I love that.
I love in many ways how simple that is, is encouraging just to work at it.
You can actually do those things.
I want to extend the question a little bit and get super, super practical.
There are a lot of administrative components to this, right?
So you have to set up a business, right?
And there are a lot of things that go into that, right?
I mean, even you have to set up bank accounts, right?
None of this is rocket science, but again, for someone who has never done that, how do
I structure my organization, all of that, was there anything you learned in that process?
Did you use any tools like Stripes Patliss or anything to sort of accelerate that process
for yourself?
Anything you can share with people?
Yeah, I mean, so first of all, this is my second time around.
So I made a lot of mistakes around paying taxes and bank accounts.
Like when we raised our first money, I didn't know that if you raise a bunch of money, you should probably put it in a place that bears interest
and not just let it sit in a checking account.
Things like that.
Who's going to tell you that?
But I mean, yeah, this time around, there's all the drudgery to get through in the beginning
around legal registration.
But so many people have done all this before you and so you can look up the answers on
the internet.
Just try to figure out what the right questions to ask are.
And then I use a whole bunch of different tools.
I don't use Atlas, but I use Stripe.
I use Calendly to do efficient scheduling.
I use Obsidian to do a lot of my tracking.
You just kind of have to find a system that lets you settle into a routine for
how do I manage my sales pipeline?
How do I know when to do outreach?
How do I know how and when to build marketing content? How can I check its performance?
You just kind of build up these little tools and they don't all cost money. They mostly are free
You just have to figure out what works for you over time
Yeah, totally John any questions from your end. I know I've been dominating this whole conversation
Yeah, I mean it's such an there's so many ways to go with this
I think I'm gonna like stick on the solopreneur topic
because I think that's such an interesting one.
So we're like, Eric, which is asking about the basics.
And I've actually have done this over the last two years
and have found, like you're saying,
most of it is Googleable, right?
And in fact, all of it is as far as doing the basics.
And then I think from there, I very much identify with that like everyone has their comfort space. And especially if you're coming from a and sales, right? Like you're not comfortable doing marketing and sales.
Like, like do it, you know, like you got to do it.
And like you said, you've got you've got you like having mentors,
having people come in that are like professionals in marketing sales, I think is helpful.
But I guess like another spin on this is like, well, what's a really practical thing?
So say like it's Thursday today and you're like,
I want to build, I need to do marketing and sales.
Like, why don't we throw just like a Thursday,
like what does that look like?
Like, I mean like internally,
talk to yourself essentially of like,
all right, I want to spend, I want to build,
I want to add that cool new feature,
but I know I need to work on marketing and sales.
I think it comes down to this mindset of expected value,
which is like, if you looked at it objectively, you take yourself out of it, you could look at the situation and say, what
is the probabilistically like highest chance of something that I do?
Like, what is the thing that's going to move the ball forward and get customers?
Like if your goal is to just have fun and like look cool, yeah, you can code.
But if your goal is to actually get customers and build a company that can sustain your
lifestyle, then the right expected value thing, if you have no customers or you want
more customers is to go make more people aware of you.
And for me, like believing that it's enough to be like, okay, I should go work on that.
And then I think once you see it start to work a little bit, you just believe in your
like, okay, there's for me Thursday's marketing day after I hang up this call, I'm going to
go work on my marketing content for the
week.
And I like doing that because I know that results in people who become
customers and pay me money and enjoy my software and say nice things.
And I can't wait to get there.
And so I'm motivated to do it.
Well, and you actually just slipped in something that I think is super
important.
I think you just said that like, Oh, I have a predefined type where I do that.
And like, that's what I should be doing is marketing.
I think that's actually like, can be a major thing too.
Yeah, time boxing.
Cause you have so many different roles and like,
A, if you're just gonna like mix them in,
like an alternate every 15 minutes, that's a nightmare.
Right?
So like, yeah.
So even having a time box of like, cool.
Like I'm marketing or I'm like talking to customers
or I'm whatever and try to like
Switch like time box it and then switch hats like I imagine that you know, that's helpful, too
Yeah, it's a great point
I mean you you can't sort of leave the week to its own devices and say like oh
I hope I do all the right things you get some way of being accountable for saying like what did I work on sales enough?
Did I work on marketing enough?
Did I work on engineering enough and then you sort of balance that with like the macro things that are going on. Like if all my customers
are coming in and they have things that like they need immediately, yeah, I'm going to
put marketing on pause. I'm going to do like rerun posts or whatever, just something minimal
to keep it afloat. And so you need to just like play this game of balance. But as you
say, having some system to keep you honest is really important.
Yeah, that's so interesting. One other quick thing I'll ask. So you did the like original startup co-founders, I think you said
there was like four of you, then obviously Confluent like near the end of
like fairly large company. What type of like additional efficiency do you think
you have as a solopreneur versus working even with a small team? Because
the communication problem right is this exponential problem as you add people to a company.
And you essentially have that zero amount of that problem
because it's just you communicating with yourself.
So what's the advantage?
I think there's an advantage there.
How would you think about that advantage?
How would you quantify that advantage
that maybe you have since you can do it all?
I think it lets you be more objective
about what you're doing.
I think even if you're on a small team, and especially if you're a big company, good things
can be happening across the company that basically it's a rising tide for everyone.
Like we closed the big deal.
Okay, that makes me feel about good about whatever I'm doing over here.
When it's just you, I mean, like your ego just can't be in the way.
If your goal is to make money, do things that make you money and don't do things that don't
make you money.
And all of your rewards are your own and all of your failures are your own as well.
And so I think it puts you in this extremely fast learning loop.
And it's true.
Like the trade-off is I can't solve problems as big as a 10 person or a 1,000 person company can.
But I get to learn a whole lot faster.
So if I want to stay with what I'm doing, I'm getting better at.
And if I eventually want to pivot back to like a bigger company,
I get to take all these learnings and I probably accelerated my career in the last year by like
fivefold because I'm solving so many more problems faster.
It's just like accelerator in a certain way.
Sure.
I was just thinking about solopreneur board meetings.
Those are called sleepless nights wondering if I'm doing the right thing.
But I mean, it is funny though, because if you think through, I mean, Eric, you think through your day or even myself with a very small team, like there's every time you add
somebody, like there's an extra layer of communication. And then if you raise money,
and then you have that layer of communication, and then, you know, if you have a board, like it adds
so many different stacked layers that for what you're doing they're just not there.
Yeah and it lets you stay extremely customer focused in the beginning like I work with
a bunch of other companies advising or just mentoring and they're all very focused on
raising money and kind of doing all the things to get the company going.
I think there's a huge advantage to starting incredibly simply and saying like,
have I nailed the customer problem
and signed customer for one or two or maybe even three
and then goes raise money.
Cause you just feel like you just have such a straighter path
where if the investors aren't aligned with you,
leave them behind.
You know what you're doing.
You've found the right way to go.
Yeah.
I agree with that a hundred percent.
And I think that the maximizing for velocity as much as possible
in the early part of a company where, and I love the idea of the solopreneur, I didn't
raise money for this, you don't have a choice other than to get to the pain point as quickly
as you possibly can and then solve it as quickly as you possibly can, right?
If you want people to give you their money. Yeah, exactly. And then if you manage to do it,
you're in an awesome position. Like I'm 18 months in, I'm at six figures ARR. I could
go raise money now on great terms. I could take whichever direction I want because I
was really patient and endured a whole bunch of pain. I may continue with that pain, but
it gives you more options. If you figured out more of the space on your own, you can
decide what you want to do.
Yeah, totally.
Well, I know we're really close to the end here, but my last question will be around
what you just said.
So do you have like a dream for shadow traffic in terms of I want to sell it to another company
or raise money for it or is it you just want to keep solving pain points and doing that in
a way that people pay you money and you'll see what happens?
I love not deciding.
It's really fun.
But the only thing that's true for me is I'm just going to do it until it's not fun anymore.
And then I think I've gotten enough customers to a place where I feel like it's sellable
both for the product and for the ideas that I've pioneered or I could open source it or
whatever.
But it's really fun just not deciding, just really doing this for myself.
And my goal at this point is to just build it into a long-term business
and keep going until it's not fun anymore.
But today it's still fun.
Man, that's so great.
I hope that's so encouraging to any of our listeners who are just worried
about the potential of jumping out on their own.
But man, what an encouraging note to end on that.
It is painful and difficult, but it's also really fun.
And that maybe-
We trade this year for anything at all, like ever.
Love it, I also love it.
Michael, thank you so much for joining us on the show.
I learned so many lessons.
You reminded me of so many good things,
just about the value of diving in and facing your fears,
putting hard work in and really enjoying what you do.
Ian, we got to nerd out on streaming data,
which is always a bonus.
Thank you for having me.
It was a lot of fun.
The Data Stack Show is brought to you by Rutter Stack,
the warehouse native customer data platform.
Rutter Stack is purpose-built to help data teams turn customer data into competitive advantage.
Learn more at ruddersack.com.