The Data Stack Show - 232: Building a Business Solo: Streaming Data, Synthetic Testing, and Startup Lessons with Michael Drogalis of ShadowTraffic.io

Episode Date: March 12, 2025

Highlights from this week’s conversation include:Michael's Background and Journey in Data (0:24)Synthetic Data Challenges (1:49)Open Source Project Development (4:20)Founding Distributed Masonry (5:...56)Acquisition by Confluent (7:27)Introduction to Shadow Traffic (10:57)Observations on Streaming Data (12:33)Importance of Timestamps in Testing (16:22)Customer Workflows with Shadow Traffic (19:09)Artificial Intelligence in Data Generation (22:13)Advantages of Domain-Specific Language (DSL) (25:14)Solopreneurship Insights (26:53)Exit Criteria for Startup Focus (30:12)The Feedback Loop (33:51)Balancing Customer Needs and Vision (35:02)Navigating Administrative Tasks (38:15)Expected Value Mindset (41:00)Solopreneur Efficiency (43:01)Maximizing Velocity (46:06)Final Thoughts and Takeaways (47:34)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

Transcript
Discussion (0)
Starting point is 00:00:00 Hi, I'm Eric Dotz. And I'm Jon Wessel. Welcome to the Data Stack Show. The Data Stack Show is a podcast where we talk about the technical, business, and human challenges involved in data work. Join our casual conversations with innovators and data professionals to learn about new data technologies and how data teams are run at top companies. Welcome back to the show. We are here with Michael Drogalas of Shadow Traffic.
Starting point is 00:00:34 Michael, welcome to the Data Stack Show. Hey, thanks for having me. All right. Well, we have a ton to get into. Of course, I'm passionate about streaming data. And so we're going to go deep on that. And we're going to talk about solarpreneurship and a number of other things. But first, just give our guests a brief background. How'd you get into data and end up at Shadow Traffic? Yeah, by trade, I'm a software engineer. I think the last thing that kind of inspired
Starting point is 00:01:00 me as I was coming out of college was distributed systems and streaming data. They were all kind of really getting started around like 2010 or 2011. And I went out and I built an open source project, ended up building a company on top of that. I sold it to Confluent and then recently I left to go start Shadow Traffic, which we'll talk about that. It's sort of the inspiration of all the problems that I've seen occurring in the last 10 years or so. And yeah. Awesome. So Michael, we were talking before the show, doing a little bit of show prep.
Starting point is 00:01:29 So many cool topics here. Eric already mentioned one solopreneur thing. I've been reading a lot about that and people are like, who's going to be the first $100 million solopreneur? So that's a fun topic. And then the streaming topic is just a fun one. It's been going on a long time and I think a lot's happening there. What are some topics you're interested in covering?
Starting point is 00:01:49 Yeah, it's always fun kind of going into the details of the problems around synthetic data. I think people look at it and they think, well, I can just use chat GPT to create some data or I can just write a little script to do it. And in some simple cases you can, but as you start to go down this path and you need to build more and more cases that reflect production scenarios It's actually a lot harder than you think and reaching for a tool or it sort of has that that defined as a set of abstractions That help you it's fun to go into the motivation behind those things in the use cases and such Well, let's dig in. All right, let's do it
Starting point is 00:02:20 Michael I I'm so interested in in your background because you got interested in distributed systems and streaming really early. Of course, in many ways, those technologies have become ubiquitous for certain use cases in the data stack. But tell us what you were doing. You were a software engineer, and what piqued your interest? What actually got you into that? Was it a use case at work or just personal research?
Starting point is 00:02:44 Yeah, it's funny. I had a college professor and there was a class on distributed systems and it was a pretty small class and so I had a lot of individual attention. He was just a really inspiring professor. He was telling me about Erlang and message passing and all these things. And there just, there wasn't a lot of people who were kind of working on it. And that's actually great when you're just kind of coming out of school, you want to find kind of a small community where you can participate in and feel like you can directly
Starting point is 00:03:08 interact with the people who are working on these problems and that's kind of what led to me like being a little bit involved with Kafka's that project it's huge today at the time like 15 years ago it was just getting off the ground everyone was very easy to talk to it was very easy to follow all the trends that were going on and so it was actually a perfect thing to just jump into coming out of school. Mm-hmm. Very cool. And so you came out of school. Did you get a job as a software engineer or were you working on...
Starting point is 00:03:34 I mean, you were obviously part of the open source community and interacting with that community. Yeah. I kind of got to work as a backend engineer working on analytics systems. I did some contracting. And during that time, the other thing I sort of fell in love with was functional programming. Closure is my tool of choice.
Starting point is 00:03:50 And that's another community that was just, it was and still is rather small in niche, but I learned a lot there. And that maybe the first three years out of school, I kind of understood what it meant to be a professional software engineer and how to work with other people and that kind of thing. Yeah, yeah. It's funny, you're taking me back because I founded a company with a technical co-founder and he loved Erlang Enclosure so much and he was very involved in those communities and so you're taking me back a little bit.
Starting point is 00:04:21 That's fun. Okay, so you're working as a professional software engineer. You're involved in the open source community and then you start your own open source project. Yeah, that's right. So there were sort of like multiple pieces to the streaming problem. There's like the back channel or the backbone, I should say, of how you actually move data from A to B. And then there's the problem of like, what do you do with it?
Starting point is 00:04:45 And that's kind of the whole area of stream processing. And so in 2011 or 12, Apache Storm came out, which was basically the first mainstream attempt at processing data at real time. And I just felt like there were some problems in that project, it was a great first attempt. I thought I could do a little bit better, specifically if I sort of zeroed in on the problem,
Starting point is 00:05:04 solving it like the closure way and functional programming. And because I was part of like a niche community, I got to build a relatively niche solution that people really liked. And so I got started on that partly just because I wanted to have something of my own. I sort of felt like as I left school, I didn't really have like an identity of my work. And it seemed like everyone who was doing really well had started some kind of an open source project. So that seemed like everyone who was doing really well had started some kind of an open source project. So that seemed like the fashionable thing to do.
Starting point is 00:05:27 And I pursued that for a couple of years. I built a community around it, ended up meeting the co-founder of my next company through it. It was just a really great experience. Again, learning how to get along with people that you don't immediately work with, new community members and that kind of thing. Yeah.
Starting point is 00:05:43 Is it still around? I mean, that's a long time ago. No, no longer. So I, when we eventually ended up selling the company that I co-founded, we kind of had to part ways with it. You can only juggle so many things at once. Yeah, yeah, yeah. Totally.
Starting point is 00:05:54 My heart goes out to open source maintainers. It's a really hard thing to do. Yeah. And what was the, so you met your co-founder and then what company, what company did you found? Yeah, we called the company Distributed Masonry and it was a little bit of a play that the project's name was Onyx. It was sort of like a stone-based thing.
Starting point is 00:06:10 And again, our whole premise was that we could build something cool in distributed systems. And so we tried basically to build a platform on top of Onyx that didn't really work because there wasn't really a big enough user base to do a SaaS and do it as a service, as a consumption- based model. We tried sort of like a function as a service sort of thing in 2015. Lambda was sort of still getting started. We just whipped on a whole bunch of product ideas. And the thing that ended up working really well was actually kind of going back to something
Starting point is 00:06:38 a little bit earlier in my career, which was seeing how we could support Kafka to do tiered storage. So the problem with Kafka like 10 years ago was that it was pretty limited in the amount of data that it could transport. You basically kind of had to size it per box. It was so very finicky about the way that you resource these things. And we basically had this idea of like,
Starting point is 00:06:57 well, what if you could hook up S3 with Kafka and have unlimited streaming data? And it turned out that the technology that we built on it was actually a pretty good way to do an initial prototype of that. And it turned out that the technology that we built on it was actually a pretty good way to do an initial prototype of that. And so we built a product on that and that was sort of what we had a little bit of success with before we eventually sold the company to Confluent.
Starting point is 00:07:14 Yeah. And tell us just a little bit about the journey of selling to Confluent. I mean, that's pretty cool to start a company. It sounds like he went through sort of rapid development to find something that would have early product market fit where it's like, okay, this is a pain point that's big enough to where there's some traction. And then you sell to a company like Confluent.
Starting point is 00:07:33 They're obviously much larger now than they were back then, but that's a pretty neat journey as a first at that as an entrepreneur. Yeah, it was a lot of fun. There's a lot of good stories in there. Actually the company was just four folks and we had been working together for maybe two, two and a half years. And the very first time we all met in person
Starting point is 00:07:50 at the same time was at like the acquisition discussion, discussion to go through. No way. Yeah, so we're like, we're actively figuring out like the rapport and the timing of how we talk together in the room. But it was fun. I mean, people like to ask about it
Starting point is 00:08:04 and it was like, surprisingly straightforward. I asked our lawyer, I was like, well, how do I say this? And he was like, you just say it. Like, do we want to be acquired? Just say it. And it was like a very, it was a very direct discussion. And it made me sort of appreciate that like, as you move up in the stakes in business, it's just worth being very direct.
Starting point is 00:08:20 Nobody wants to waste time. Tell people what you want. They'll tell you what they want. And I learned a lot of things that like really accelerated my career by maybe like five or 10 or 15 years. Wow. Yeah. It's so funny because hearing that story about the first time I met my co-founders in person
Starting point is 00:08:37 was at the acquisition meeting feels so much like a post-COVID, a post-COVID story or post-COVID dynamic because that is so much more common now to to have these really like intimate business relationships without actually having physically met the person. Yeah, we pioneered the fully remote model for better or worse. It worked out for us, but yeah, it's just sort of a fun anecdote. Yeah, very cool. Okay, well, I know John is burning with a number of questions, but I do want to hear about Confluent. So you went to work at Confluent, you got into product, and then were there for a good while, probably at a pretty formative time for Confluent as a company, because the last five or six years have been pretty crazy just in terms of Kafka generally,
Starting point is 00:09:24 a lot of the stuff that Confluent has shipped. I mean, on the product side, so many really cool things they've done. So tell us a little bit about Confluent and maybe some of the big lessons that you took away. Yeah, Confluent was an interesting experience. I was really lucky in that I got to work across just a bunch of different things.
Starting point is 00:09:41 Primarily, I headed up product for stream processing. So the way you can kind of think about Confluent at the time I joined 2018 was that like Kafka was a relative success. They were trying to get Confluent Cloud off the ground and pivot from like an on-prem company to a cloud company. And then in addition to that, it was like, well, we got to get more products off the ground. Besides Kafka, how do we get into the compute game? And that's where all this ties back to the beginning of my career.
Starting point is 00:10:03 I led kind of the early efforts for stream processing. So, right. Confluent has a number of offerings on that. They have Kafka streams, which is a Java library. And then the thing I primarily worked on was KSQL, which was a streaming SQL variant. And that, I mean, it was so interesting trying to do this at a time when the company was like trying to solidify its core offering, move to the cloud, continue through hyper growth. A lot of lessons learned in there. Yeah, wow. Actually, Brooks may be able to look
Starting point is 00:10:29 it up for us, but we had a guest on the show who is also very involved in KSQL. I think he worked at Meta for a while, and then he went to Confluent, and I think he worked on KSQL. I'm not sure, but we'll try to remember his name because I'm sure you interacted with him. Yeah, very cool Okay, John, I've been monopolizing the mic. So I'm handing it over to you So a lot of different ways we can go with this But let me share something I share with you guys before before we started so you're new so you're no gig So you're at confluent you've started a new thing now called shadow traffic in the moment I like kind of understood what you were doing. It took me back
Starting point is 00:11:09 Almost 15 years of working on a b2b SAS app. We were going from like a major version change for various growth reasons We just had a ton of things ton of features we packed into this particular release reasons, we just had a ton of features we packed into this particular release. So we get all the codes done. I'm more doing like database backend stuff. Got a sysadmin doing his thing and then a bunch of developers. So the things like finally in QA, the developers celebrate and say it's done, right? It's not done. And then we have this problem of like, how do we test all of these like use cases? And it's like, should we grab a bunch these like use cases like and it's like should we grab a bunch
Starting point is 00:11:45 of production data and like run it through here and then you run into like at the time we probably should have been more concerned about like privacy and security type things so there's that part of it but there's also just all these other like edge cases of like well we need to make sure we turn this feature off so we don't randomly email thousands of customers or we need to make sure we turn this feature off so we don't randomly email thousands of customers or we need to make sure we turn this feature off so we don't accidentally pump data into the accounting system and accounting thinks it's real. So there's all these like really interesting things that came up and we look for a solution is like maybe there's a solution out there.
Starting point is 00:12:16 Nothing that we found at least. So that's my setup for like kind of my personal experience here. You're obviously really deep on this this problem. So at Shadow Traffic, walk us through some of the, first maybe how you even came up with the idea and got into the space. Yeah, I basically observed, actually over the last 15 years from when I started my career,
Starting point is 00:12:37 I was just in this streaming space. And time again, it was hard to test things. I mean, we would talk about these features that are really cool. Like, oh, we could capture real time data. We could do real time joins. Look at how quickly we could update these aggregates. Who's showing it?
Starting point is 00:12:49 Where do you see this stuff? I mean, you could see it in unit tests, but like nobody was showing it for real. You have to actually go behind the scenes and look at people's production metrics or the systems that are just basically scared from view because they're production systems. And I noticed just a litany of use cases for this. So engineering teams kind of need to do testing, stress testing, integration testing, edge case testing. Sales engineers need to be able to
Starting point is 00:13:11 have test data to exercise their systems to prove that they're what people are buying. Developer advocates need to be able to put on cool demos. And there's a lot of places where it's applicable. It's this nice niche problem that I felt like, okay, one person could just go and solve this really well. And there's all kinds of continuations once you solve the initial problem, you can do more and more. But that's really what started it all. What is what in terms of, I'm just thinking about your experience at Confluent, especially thinking about stream processing, right? Because I think that's an area in particular where you really do need to run a lot of data. I mean, the ideal testing is with your production data, because you have all of these different messages
Starting point is 00:13:54 coming through and there actually can be many times a very high amount of cardinality with that. And that can vary over time periods, right? Even just through like the cycle of a day, it'll say you have international traffic, right? Throughout the cycle of a day, you'll have very different types of traffic come through. And so if you're making an update
Starting point is 00:14:16 to a streaming transformation that you're doing, like it's really hard to test. Can you give us an example from Confluent, maybe from like an actual customer or some situation where that was really problematic and how did you face it and why was it painful? Yeah, I'll give you two of them that are really easy to understand. So like imagine, let's do like a retail example. You want to push retail data through an adventure of a system and then you have an application that's sort of processing stuff as it comes in. And you
Starting point is 00:14:43 may have two streams like customers and orders. If you want to actually test this, you'll have all bunch of customer IDs coming through and saying, all right, customer John and Eric and Brooks come through. And then a bunch of orders come through. Orders almost certainly has a identifier that refers to customers. How do you do that? Do you just pick John and Eric and Brooks and randomize those?
Starting point is 00:15:01 What if the messages for each of those three people don't show up before the orders? How do you do that over a big enough key space? That's just like a clear problem you immediately hit when you start using these systems. Yeah. Another one is like, imagine you're doing like a checkout process where you're taking in web events
Starting point is 00:15:16 where like, okay, in my shopping cart, John puts the item in it, views an item, puts it in his cart, takes it out, puts another item in, and then checks out. If you start to change the order of those events and you say, well, the checkout comes before the an item, puts it in its cart, takes it out, puts another item in, and then checks out. If you start to change the order of those events and you say, well, the checkout comes before the view item, your application will break. And again, very basic stuff. You can write a unit test for these things, but if you want to test
Starting point is 00:15:35 production volume with all of your systems together, which you should, you immediately hit this problem and it's harder to solve than it looks. Can we dig in a little bit to timestamps specifically? Because from a very practical standpoint, that's super challenging because if you, like let's say you generate a set of data, right? Because I mean, a very common way to do this and something that we've done in the past is you just write a script, right? Like, okay, it doesn't seem that hard, right? It actually can take a lot of work if you're trying to do it, I would say, if you're trying to do it properly, as it were, right?
Starting point is 00:16:12 Where you're trying to represent the cardinality appropriately, where you're trying to do sequencing and all that sort of stuff, right? A lot of that work actually has to do with timestamps, right? And so the way that you generate data and the way you have to sequence timestamps, especially when strong ordering is actually very important for a downstream application or analytics use case. So let's say you go through all the work to do that, right? And then it's like,
Starting point is 00:16:36 again, I need to do this again and again, right? And like, it just is really annoying. It's so annoying to like go back through and like recalculate all the time stamps and make changes because you realize, oh, all the time and same stuff that I did, like it's even if you try to randomize stuff, it's really hard to make it seem real. That's so dumb, but like that's very hard. Yeah. And the thing that you kind of want at the end of the day is something to sort
Starting point is 00:17:04 of sit at the front of your architecture and at the front door, just blast data through as if it were your real customer data and have a set of knobs to be able to say, I don't have to go down and feel like I'm programming like C or assembly, but I have these very high level parameters that let me say, what does this data look like? What are the non-functional characteristics? And then have it act as shadow traffic. I mean, that's the name, to act as a shadow of your actual customer data.
Starting point is 00:17:29 I think that, with simulation testing, is kind of the right answer. I'm curious a little bit, like, want to dig in a little bit on architecture, because if you told me, like, here's a problem, how would you solve it? My immediate would be to go to, like, okay, let's go to production data,
Starting point is 00:17:43 and then, like, scrub it, right? Like, let's go to production data and then like scrub it, right? Like let's hash stuff, there's PII, let's essentially scrub from production data. We haven't talked about it, but I don't sense that's the way that you went about solving this. Maybe it is. I think there's two ways you can build a product in this space. You could either do what you're saying, which is to take existing data, use machine learning or some kind of procedure to basically reverse a safe copy of that data.
Starting point is 00:18:04 There are many companies that actually do this really well, particularly in the relational database space where you have like thousands of tables. They're very static. All you need to do is like find all the addresses and rip them out. Not a trivial problem, but like that's kind of its own thing. And then there's the approach that I took,
Starting point is 00:18:18 which is to say, okay, what if instead you had basically a very high level language that let you describe what the data is and you can do it directly, you could sort of bootstrap off a schema, you can use an LLM to help you write it. And that has the advantage of being able to say, well, okay, we don't have to modify anything because this is fully fresh. But also it has the advantage that many times the data doesn't exist yet. If you've never been to production, there's no production data. Oh, wow. Yeah, it's blank tables or blank fields or whatever.
Starting point is 00:18:45 Yeah. Exactly. And then you could sort of speculate about the future. You could say, well, it's not just take a copy of production data. Let's basically use similar characteristics and then say like, well, let's triple the volume over time or let's make the traffic much spikier over time or stuff like that. And so it's probably a smaller market of the two, but I think in some ways it's the one that's a little bit harder to solve.
Starting point is 00:19:04 And it's maybe why I've had a little bit of traction. Yeah. This is a super practical question. How do you see your customers... So I'm guessing that the general workflow is that I set up in dev or QA and then I point shadow traffic at it and generate this highly realistic stream that allows me to get as close to testing and production as I can as far as the data goes, right? Because that's a lot of data. Sometimes you'll try to run tests with a small data set or a small sample batch or whatever, right?
Starting point is 00:19:40 But if you're testing a bunch of data in scale, it also creates this really interesting challenge of – well, a couple of challenges that come to mind. So one is cost, and then two is the systems that you're sending it to downstream, right? Because you're either provisioning something as part of your dev, you probably don't ultimately want the data in there, so are you just dropping it?
Starting point is 00:20:00 Can you just explain kind of the, what's a typical workflow for a shadow traffic customer in terms of the environment, what they do with the data, all that stuff? Yeah, it kind of depends on intent, as you say. So if you're like an engineering team, your goal may be to make sure that a bunch of systems integrate.
Starting point is 00:20:16 And so you may have a smoke test where you basically kind of run a minimal set of traffic through your system, but you want all components online to make sure that the scheme has worked, that your web sockets are connecting well together, utilization works really well. That very same team may take that set
Starting point is 00:20:30 of shadow traffic files and basically use these knobs that I described, it's like a very high level DSL, and say, no, no, let's crank up the volume, let's do a stress test. A good example, a customer of mine, Raft, published yesterday that they did a hundred terabyte test on their systems, they have to generate very low latency queries using historical and streaming data together which is a tricky problem and internally they use shadow traffic for like more minimal
Starting point is 00:20:53 testing for this particular case they turned it way up generated a hundred terabytes of data 50 gigabytes of data a minute and they were able to do sort of a short-lived somewhat more expensive by an edge and test tear the whole thing down be confident checkpoint move on and so great there's just a set of use cases where it applies and the fact that it's lived somewhat more expensive by an agent test, tear the whole thing down, be confident, checkpoint and move on. And so there's just a set of use cases where it applies and the fact that it's parameterizable kind of helps people move from problem to problem. Yeah, super interesting. Yeah, I mean, expensive to test, but probably not relative to the cost of making a breaking
Starting point is 00:21:21 chain reduction at 100 terabyte scale. It's worth it every time to do the testing before the customer finds the problem. Yeah, for sure. So interesting, moving further down this journey, how... So this makes a lot of sense if I can control it, but what about testing with some kind of external dependency, like API type thing, and I don't know the space that well. So I don't know if you're solving this problem or others are like, this stuff. We just want to have like kind of a dummy hold-in
Starting point is 00:22:06 that will behave about like a stripe API, have the same like rate limits, et cetera. Is that part of this scope of what? That's a bit of an orthogonal problem. It reminds me, the company name escapes me, but there's a company that basically kind of mimics AWS services where they give you a set of containers or maybe they host services and then they kind of behave
Starting point is 00:22:24 in a similar way for testing purposes. And yeah, so that shadow trapping, it's sort of meant to find itself in a place where things are as realistic as possible. Whether you kind of fake out the rest of your downstream systems is up to you. If you want to use test containers, that's totally fine. But it's meant to give you those like degrees of freedom to make those choices. Yeah, cool. One thing you mentioned and because I want to make sure we have plenty of time to talk
Starting point is 00:22:47 about you being a solarpreneur and what that journey has been like, but this is just such a fascinating problem. So one of the really interesting challenges, and this is, it sounds so funny, but it's actually just very difficult to be creative enough to generate data that mimics reality. I think part of that is because human behavior is a very complex thing generally, and I think you see that in streaming data specifically, or even sometimes system behavior. But to write a script that generates a bunch of data, you have to think in a pretty structured way, right? Like enforce it like a lot of concepts
Starting point is 00:23:29 around taxonomy and stuff. And so you have these two competing things. And so it makes it very difficult for a human to generate something that's like highly realistic. So how do you do that at Shadow Traffic? And you even mentioned that there are some tools like LLMs that can help you express what you're trying to do, but how do you get close to the bullseye in terms of this feels like real
Starting point is 00:23:50 production data? Yeah, reality helps because people usually come to me not when they're just bored or just trying to do something new. Like they have a problem to solve where they're like, okay, our customer needs to do this. We have this schema. We may not have their data, but I know what their data looks like. And then the 80-20 rule applies where it's like, we need to get it good enough along these particular characteristics.
Starting point is 00:24:07 And usually they could dial it in where it's like, okay, problem solved and they move on. So they're not imagining kind of all possible dimensions. Right. But the other thing you mentioned is if you really are starting from scratch, like many developer advocates would be, if you're building demos to try to like promote your software, I have a custom trained GPT, which is awesome. You could just say, Hey, I'm thinking about these domains.
Starting point is 00:24:26 What kind of examples could you give me a data streams? They'll give you some lists and say, write the shout traffic file for me. And then like, it's not perfect, but like 90% of the time, it gives you a great baseline that you can go and pick up and just start moving. And it's like, that's a perfect marriage of AI and high level programming languages
Starting point is 00:24:42 where you could use AI to be creative and then take that thing that it generates, check it in the Git, share it with your team, modularize it, and go from there. Yeah, totally. Yeah, it's just like fast-tracking it. I mean, you can even run tests and then say, okay, let's go in and tweak these things to get it the last 20%. Exactly. Super interesting. Okay, one last question before we dive into Solrpreneur stuff.
Starting point is 00:25:05 Why a DSL? That's always like an interesting choice, especially with a startup, right? Because there's tons of different thoughts on this, right? But like a classic one is, I mean, a classic one in data and analytics is people trying to write language that like write a DSL that will eventually replace SQL, right? And so there's a lot of people who are like, that's never going to happen, right? But and there's all sorts of interesting tensions there and different philosophies, but would love to know why a DSL is a choice for shadow traffic. Yeah, when I say a DSL, what you actually program in is JSON. And the reason it's advantageous is imagine you have this like super deep nested gnarly
Starting point is 00:25:44 record, which is actually like, probably deep nested gnarly record, which is actually like probably more common for your listeners than not. You need some way to basically kind of work with that without like juggling all these different inner attributes and seeing whether things line up. The 30 second explanation of ShadowTraffic's API is you basically take a specimen of your data,
Starting point is 00:26:01 you look at all the concrete values, all the strings, all the Booleans, all the integers, even all the inner collections that you wanna change, you rip them out and then you put in these little function markers to say, what do I put here instead of this specific value? Now, if you were to build that another way, you were to build that with a programming language,
Starting point is 00:26:17 you would have to do all that juggling, you would have to figure out, what is the infrastructure do I need? Do I need Maven? Do I need a JVM? What do I need to do this? I package all of that into a Docker container. So all you need is an editor to write JSON and then a Docker container.
Starting point is 00:26:31 And it takes care of all the complexity of compiling, running, garbage collecting efficiently, all that. Oh, cool. Yeah, so it's more like a tool set for JSON interface, I guess. Yeah, that's probably more accurate. That's right. Yeah, yeah, very cool. Okay, man, that's super interesting. I'm sorry, I can't wait.
Starting point is 00:26:49 I didn't get a chance to use it, but I totally wanna go play with this now. Okay, John, you wanted to dig into solopreneur stuff. I have a million questions about dominating the competition. Yeah, I think, yeah, and the solopreneur stuff, super interesting. You can go a couple directions here,
Starting point is 00:27:03 but the thing that comes to mind first is say I'm a software engineer, I've worked on streaming solutions like you have, or maybe another sector, it doesn't really matter. What kind of framework, like mental framework, do you have when you're thinking about ideas? Because for a lot of us, it's like, I can have 10 ideas a day. But like, do you have like a mental checklist or framework to decide, like, oh, like, I should pursue that a little bit. Like do you have a mental checklist or framework to decide, oh, I should pursue that a little bit? Just walk us through the thought process. It's a little bit more of what kind of lifestyle do you want to live?
Starting point is 00:27:33 You can think of really big problems and you can go raise money and live that sort of life where you have to hire people and scale really fast. Or in my case, you can try to find problems that can be solved by one person or just a few people and try to run maybe like a more slower growth business. And so I mean, my thinking is I was at Confluent and my entrepreneurial drive, I tried to relax it. Like it just wouldn't stop. This is where I learned about myself. Like I am built to make and sell things and I will be for the rest of my life. I can't turn it off. And I tried to, I worked on this like new presentation tool idea on the side. It was like a VC fundable idea. It never really went anywhere
Starting point is 00:28:07 just because I wasn't really comfortable with doing another investor-backed company. It just felt like this is the wrong time in life for me. I wanna do something that's a bit more lifestyle-driven. And so when I left Confluent, I put up a blog post that said, I'm launching four startups in four quarters. And I basically outlined my thesis that like,
Starting point is 00:28:24 hey, I have a list of 10 ideas. I'm gonna burn through them one a quarter. I'm obviously not startups in four quarters. And I basically outlined my thesis that like, Hey, I have a list of 10 ideas. I'm going to burn through them one a quarter. I'm obviously not going to run four startups. I'm going to find one that works. And I went through a process for 12 weeks until I launched shadow traffic. And by week six, I was pretty confident that I had a winner. So I kind of cut the whole thing off, but I just took the approach that like, I'm just going to burn through ideas until it works.
Starting point is 00:28:42 And I'm not going to work on them for years. I'm going to work on them for at max 12 weeks and that should be enough to tell me. Right. So, so tell us about like the process to get to the 10 ideas and let's say let's stay in the like, we're not going VC back route. We're going to go like solopreneur route or at least bootstrapped, right? So like any process behind the 10 ideas or for you it's just like, well, I just kind of always have ideas in the back of my mind. I don't usually have ideas
Starting point is 00:29:07 I sort of forced it I like stood in my backyard and it was like a summer day and I was just like, okay Well, I'm gonna do this. What the heck am I gonna do? I just started writing stuff that I observed over time I had first one that came to mind was like well everybody needs test data like maybe I could do something with that and then I had some other ideas that are maybe lesser quality around like like maybe I could do something with that. And then I had some other ideas that are maybe lesser quality around like child care is really annoying in my particular area. And I can't remember what other ideas that I had, but I just sort of forced it.
Starting point is 00:29:31 I was like, I have 10 ideas now. And that was helpful to just like stepping in the creative mindset. Yeah, that makes sense. That's interesting. I met a really successful entrepreneur who very similar. People would say like, well, you just seem to like have these ideas that are great. And I was like, I was, I met up with him for lunch and I was like, I am interested. How do, how have you come up with multiple, very successful ideas?
Starting point is 00:29:57 It's like, it's so funny to think about. He's like, I just do the ABC thing. And I was like, what do you mean? And he's like, you just write down all the letters of the alphabet and then you try to come up with like, like a company idea or concept that starts with a and then B and then C and like he's done that. I haven't heard that one before. That's great.
Starting point is 00:30:15 I would it's the same thing. It's like a forcing function, right? To just like sort of get your wheels turning and think about problems. That's really cool. I like that. So one thing that I'm really fascinated about is 12 weeks is an extremely short amount of time to sort of build and validate. What was your exit
Starting point is 00:30:37 criteria for, okay, I'm going to focus on this, right? Across all the ideas, like, what needed to happen in 12 weeks for you to say, okay, I found the one that I'm going to focus on, right? I mean, hopefully the winner, but at least the one that I'm confident enough to, to give them my full focus. Yeah. It's only tight if your problem I think is too big. So what I did was I said, okay, number one on the list that I feel decent about test
Starting point is 00:31:03 data for Kafka. People often have trouble doing demos in the Kafka community. Small problem. And so I opened my laptop, I wrote a social media post that said, I call it the $10,000 demo problem. That was like the title of the post.
Starting point is 00:31:16 And it was like, hey, you ever had to do like test data? I bet you it actually costs you $10,000 in your time. For these reasons, you had, maybe you have to do like related data or that sequencing that I mentioned or any of these other things. Yep. Didn't hint at all about the solution.
Starting point is 00:31:28 I just wrote a post that I thought was interesting. Got a bunch of reactions. I don't say traction. I got reactions to, I got comments, I got likes on LinkedIn, on Twitter. I went and I reached out to every single person. I was like, Hey, thanks for interacting with my thing. Can you tell me a little bit more about your experience with this problem? Any hard details?
Starting point is 00:31:46 Just tell me about it. I started to hear some real use cases, which is like indicator number one. Are people saying that's cool or are they saying, hey, that's useful. And then here is my background with this specific problem and there's these very hard details about what happened. I started to hear that and I was like, okay, good.
Starting point is 00:32:02 Step number one. Next thing I did was I created a minimal landing page, came up with the name Shout Traffic, did it like a hero that basically sketched just like the beginnings of the solution, had a CTA that was like join the wait list, put you on like an email thing with me, had maybe a hundred people signed up, reached out to every single one of those, ran the same process, said, hey, tell me about your experience with this problem. The details got even a little bit more. That was really good. It felt like, okay, something's happening here. During that process, I had
Starting point is 00:32:28 two companies reach out that not only did they have the problem, very critically, they had urgency about it. They had decision makers and they had budget. And I was like, okay, six weeks in. This is very good. Got it. Yeah. That was enough for me to be like, that's enough of a checkbox for me to keep going. And had you started to build any product in that six week period or were you still spending most of your time just doing validation by talking to people? I did a little bit because my sketch of the idea was like pretty loose. And so I had to fill in some gaps for like, well, how will this work or what would this do?
Starting point is 00:33:00 But it was mostly, I mean, I did, I can't remember, I'll have to go look it up. It was like 60 customer calls in a couple, two months or something like that. I did it pretty hard. And all of those conversations, I mean, they just shaped what I eventually built. But once I had those two customers that were like, yeah, we wanna pay for this if you complete it,
Starting point is 00:33:17 that was go time. I just went, heads down to the keyboard and just banged out exactly what they needed. And that was the beginnings of a real product. Yeah. Wow. What was it, that of a real product. Yeah. Wow. That's a pretty hard swing, right? So you're talking with one or more people
Starting point is 00:33:34 every day for two months, and then you're processing all of that. You're trying to collate all of the different patterns that you're seeing across all these conversations. And then you just go heads down and build product product is that I mean that's a pretty crazy swing did you enjoy that I mean what was that experience like I love it I'm just super driven to solve problems for people for money like that loop is just so satisfying to me and it is a little bit of like a swing as you say where you're
Starting point is 00:34:02 like you're on the call then you're doing some marketing content then you're coding but when you find someone in a real team who has a real problem, it's just so direct. I mean, if you have a question about the requirements, you just ask them, like, do you want A or B? And they'll say, and then you do it. And they give you the feedback loop that like, yeah, that looks good. And you get to use their satisfaction for more marketing. It's just like really beautiful flywheel.
Starting point is 00:34:21 Yeah, yeah, yeah. I love that. One question I have, and this is just gonna be a totally selfish question from one product person to another product person. Well, I guess like CEO, CTO, chief product officer, chief marketing officer, all Michael is all of those things. Yeah, yeah, a lot of titles.
Starting point is 00:34:41 So in the early stages, like getting that feedback, building those direct solutions, have you come across a situation yet where a customer asks for something and you have sort of a vision for the product where you say, I'm actually not gonna do that, or you push back because their specific need doesn't necessarily reflect the larger picture
Starting point is 00:35:01 of what you wanna build? Yeah, that totally comes up once in a while and to connect to earlier in our conversation, I had a few people who were like, hey, it would be great if you were to take my production data and automatically do this for me. No LLM, you just have this black box that snapshots all my data and outputs it.
Starting point is 00:35:15 I could eventually build that, but I feel like, okay, that's something I want to kind of come into over time, maybe take some VC funding to go after. Another thing people constantly tempt me with is like, hey, this would be great if you did like unstructured text for AI. I always resist that because nine out of 10 times, they don't have a real use case behind it. It's just like, I could sense those checkboxes aren't there to really get,
Starting point is 00:35:34 really build a product that people are going to use sustainably. So yeah, sometimes you just have to decline it. It's tough, but it's true. Yeah, yeah, yeah. Now that makes total sense. Talk about the ingredients to be a, we'll just, we'll narrow the scope to solopreneur, but you could probably extend it to entrepreneur as well, right? But we're a software developer, you are a product manager.
Starting point is 00:35:59 Not everyone can make the transition from being an IC definitely or even a manager to actually starting a company and building a company. Can you just talk a little bit to that? What do you think some of the ingredients are that you've noticed in your own experience where and we're just thinking about those listeners who not everyone's designed to be an entrepreneur and that's totally okay. But I'm thinking about those people who are listening to this, maybe on their commute to work or on their way home, and they've had that itch inside of them just wondering, could I do that?
Starting point is 00:36:34 Is it possible for me to do that? So speak to that experience and speak to that person around, what do they need to hear to push them over the edge, I guess. I bet that's like nine out of 10 year listeners since I started this. I've had so many people reach out to me that were like, I would love to quit my job and just pursue an idea that I feel like is important. But I think the first thing is to just like look at it objectively and say, OK, if I want to do this,
Starting point is 00:36:57 there is a much larger range of skills that I need to develop to be good at this. And I was not born with them. And I say that like me, like I I learned to code that took me many years. It's the hardest skill I've ever developed. But then as soon as I did that, like if I want to build a company, I need to learn how to do marketing and how to have a sales conversation, how to build pipeline and how to treat customers during customer service. You really have to work at it.
Starting point is 00:37:20 And what's so hard mindset wise is that when you do this, you start a company, everything is so scary because there's so much uncertainty. And the thing you're always going to want to do is to go back to what's comfortable. I'm going to code because I'm good at coding. But the trap is nine out of 10 times in the beginning, what you need to be doing is not coding, but probably sales and marketing and working with customers. And it's just hard to be that uncomfortable all the time. It's very frustrating. If you can push through it and you can actively get mentors to help you and study, you really can do this. It's not impossible. Yeah, I love that.
Starting point is 00:37:53 I love in many ways how simple that is, is encouraging just to work at it. You can actually do those things. I want to extend the question a little bit and get super, super practical. There are a lot of administrative components to this, right? So you have to set up a business, right? And there are a lot of things that go into that, right? I mean, even you have to set up bank accounts, right? None of this is rocket science, but again, for someone who has never done that, how do
Starting point is 00:38:23 I structure my organization, all of that, was there anything you learned in that process? Did you use any tools like Stripes Patliss or anything to sort of accelerate that process for yourself? Anything you can share with people? Yeah, I mean, so first of all, this is my second time around. So I made a lot of mistakes around paying taxes and bank accounts. Like when we raised our first money, I didn't know that if you raise a bunch of money, you should probably put it in a place that bears interest and not just let it sit in a checking account.
Starting point is 00:38:51 Things like that. Who's going to tell you that? But I mean, yeah, this time around, there's all the drudgery to get through in the beginning around legal registration. But so many people have done all this before you and so you can look up the answers on the internet. Just try to figure out what the right questions to ask are. And then I use a whole bunch of different tools.
Starting point is 00:39:08 I don't use Atlas, but I use Stripe. I use Calendly to do efficient scheduling. I use Obsidian to do a lot of my tracking. You just kind of have to find a system that lets you settle into a routine for how do I manage my sales pipeline? How do I know when to do outreach? How do I know how and when to build marketing content? How can I check its performance? You just kind of build up these little tools and they don't all cost money. They mostly are free
Starting point is 00:39:30 You just have to figure out what works for you over time Yeah, totally John any questions from your end. I know I've been dominating this whole conversation Yeah, I mean it's such an there's so many ways to go with this I think I'm gonna like stick on the solopreneur topic because I think that's such an interesting one. So we're like, Eric, which is asking about the basics. And I've actually have done this over the last two years and have found, like you're saying,
Starting point is 00:39:57 most of it is Googleable, right? And in fact, all of it is as far as doing the basics. And then I think from there, I very much identify with that like everyone has their comfort space. And especially if you're coming from a and sales, right? Like you're not comfortable doing marketing and sales. Like, like do it, you know, like you got to do it. And like you said, you've got you've got you like having mentors, having people come in that are like professionals in marketing sales, I think is helpful. But I guess like another spin on this is like, well, what's a really practical thing? So say like it's Thursday today and you're like,
Starting point is 00:40:46 I want to build, I need to do marketing and sales. Like, why don't we throw just like a Thursday, like what does that look like? Like, I mean like internally, talk to yourself essentially of like, all right, I want to spend, I want to build, I want to add that cool new feature, but I know I need to work on marketing and sales.
Starting point is 00:41:01 I think it comes down to this mindset of expected value, which is like, if you looked at it objectively, you take yourself out of it, you could look at the situation and say, what is the probabilistically like highest chance of something that I do? Like, what is the thing that's going to move the ball forward and get customers? Like if your goal is to just have fun and like look cool, yeah, you can code. But if your goal is to actually get customers and build a company that can sustain your lifestyle, then the right expected value thing, if you have no customers or you want more customers is to go make more people aware of you.
Starting point is 00:41:32 And for me, like believing that it's enough to be like, okay, I should go work on that. And then I think once you see it start to work a little bit, you just believe in your like, okay, there's for me Thursday's marketing day after I hang up this call, I'm going to go work on my marketing content for the week. And I like doing that because I know that results in people who become customers and pay me money and enjoy my software and say nice things. And I can't wait to get there.
Starting point is 00:41:54 And so I'm motivated to do it. Well, and you actually just slipped in something that I think is super important. I think you just said that like, Oh, I have a predefined type where I do that. And like, that's what I should be doing is marketing. I think that's actually like, can be a major thing too. Yeah, time boxing. Cause you have so many different roles and like,
Starting point is 00:42:13 A, if you're just gonna like mix them in, like an alternate every 15 minutes, that's a nightmare. Right? So like, yeah. So even having a time box of like, cool. Like I'm marketing or I'm like talking to customers or I'm whatever and try to like Switch like time box it and then switch hats like I imagine that you know, that's helpful, too
Starting point is 00:42:31 Yeah, it's a great point I mean you you can't sort of leave the week to its own devices and say like oh I hope I do all the right things you get some way of being accountable for saying like what did I work on sales enough? Did I work on marketing enough? Did I work on engineering enough and then you sort of balance that with like the macro things that are going on. Like if all my customers are coming in and they have things that like they need immediately, yeah, I'm going to put marketing on pause. I'm going to do like rerun posts or whatever, just something minimal to keep it afloat. And so you need to just like play this game of balance. But as you
Starting point is 00:42:58 say, having some system to keep you honest is really important. Yeah, that's so interesting. One other quick thing I'll ask. So you did the like original startup co-founders, I think you said there was like four of you, then obviously Confluent like near the end of like fairly large company. What type of like additional efficiency do you think you have as a solopreneur versus working even with a small team? Because the communication problem right is this exponential problem as you add people to a company. And you essentially have that zero amount of that problem because it's just you communicating with yourself.
Starting point is 00:43:32 So what's the advantage? I think there's an advantage there. How would you think about that advantage? How would you quantify that advantage that maybe you have since you can do it all? I think it lets you be more objective about what you're doing. I think even if you're on a small team, and especially if you're a big company, good things
Starting point is 00:43:47 can be happening across the company that basically it's a rising tide for everyone. Like we closed the big deal. Okay, that makes me feel about good about whatever I'm doing over here. When it's just you, I mean, like your ego just can't be in the way. If your goal is to make money, do things that make you money and don't do things that don't make you money. And all of your rewards are your own and all of your failures are your own as well. And so I think it puts you in this extremely fast learning loop.
Starting point is 00:44:11 And it's true. Like the trade-off is I can't solve problems as big as a 10 person or a 1,000 person company can. But I get to learn a whole lot faster. So if I want to stay with what I'm doing, I'm getting better at. And if I eventually want to pivot back to like a bigger company, I get to take all these learnings and I probably accelerated my career in the last year by like fivefold because I'm solving so many more problems faster. It's just like accelerator in a certain way.
Starting point is 00:44:34 Sure. I was just thinking about solopreneur board meetings. Those are called sleepless nights wondering if I'm doing the right thing. But I mean, it is funny though, because if you think through, I mean, Eric, you think through your day or even myself with a very small team, like there's every time you add somebody, like there's an extra layer of communication. And then if you raise money, and then you have that layer of communication, and then, you know, if you have a board, like it adds so many different stacked layers that for what you're doing they're just not there. Yeah and it lets you stay extremely customer focused in the beginning like I work with
Starting point is 00:45:14 a bunch of other companies advising or just mentoring and they're all very focused on raising money and kind of doing all the things to get the company going. I think there's a huge advantage to starting incredibly simply and saying like, have I nailed the customer problem and signed customer for one or two or maybe even three and then goes raise money. Cause you just feel like you just have such a straighter path where if the investors aren't aligned with you,
Starting point is 00:45:36 leave them behind. You know what you're doing. You've found the right way to go. Yeah. I agree with that a hundred percent. And I think that the maximizing for velocity as much as possible in the early part of a company where, and I love the idea of the solopreneur, I didn't raise money for this, you don't have a choice other than to get to the pain point as quickly
Starting point is 00:45:59 as you possibly can and then solve it as quickly as you possibly can, right? If you want people to give you their money. Yeah, exactly. And then if you manage to do it, you're in an awesome position. Like I'm 18 months in, I'm at six figures ARR. I could go raise money now on great terms. I could take whichever direction I want because I was really patient and endured a whole bunch of pain. I may continue with that pain, but it gives you more options. If you figured out more of the space on your own, you can decide what you want to do. Yeah, totally.
Starting point is 00:46:26 Well, I know we're really close to the end here, but my last question will be around what you just said. So do you have like a dream for shadow traffic in terms of I want to sell it to another company or raise money for it or is it you just want to keep solving pain points and doing that in a way that people pay you money and you'll see what happens? I love not deciding. It's really fun. But the only thing that's true for me is I'm just going to do it until it's not fun anymore.
Starting point is 00:46:54 And then I think I've gotten enough customers to a place where I feel like it's sellable both for the product and for the ideas that I've pioneered or I could open source it or whatever. But it's really fun just not deciding, just really doing this for myself. And my goal at this point is to just build it into a long-term business and keep going until it's not fun anymore. But today it's still fun. Man, that's so great.
Starting point is 00:47:16 I hope that's so encouraging to any of our listeners who are just worried about the potential of jumping out on their own. But man, what an encouraging note to end on that. It is painful and difficult, but it's also really fun. And that maybe- We trade this year for anything at all, like ever. Love it, I also love it. Michael, thank you so much for joining us on the show.
Starting point is 00:47:42 I learned so many lessons. You reminded me of so many good things, just about the value of diving in and facing your fears, putting hard work in and really enjoying what you do. Ian, we got to nerd out on streaming data, which is always a bonus. Thank you for having me. It was a lot of fun.
Starting point is 00:47:59 The Data Stack Show is brought to you by Rutter Stack, the warehouse native customer data platform. Rutter Stack is purpose-built to help data teams turn customer data into competitive advantage. Learn more at ruddersack.com.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.