The Data Stack Show - 224: Bridging Gaps: DevRel, Marketing Synergies, and the Future of Data with Pedram Navid of Dagster Labs

Episode Date: January 15, 2025

Highlights from this week’s conversation include:Pedram’s Background and Journey in Data (0:47)Joining Dagster Labs (1:41)Synergies Between Teams (2:56)Developer Marketing Preferences (6:06)Bridgi...ng Technical Gaps (9:54)Understanding Data Orchestration (11:05)Dagster's Unique Features (16:07)The Future of Orchestration (18:09)Freeing Up Team Resources (20:30)Market Readiness of the Modern Data Stack (22:20)Career Journey into DevRel and Marketing (26:09)Understanding Technical Audiences (29:33)Building Trust Through Open Source (31:36)Understanding Vendor Lock-In (34:40)AI and Data Orchestration (36:11)Modern Data Stack Evolution (39:09)The Cost of AI Services (41:58)Differentiation Through Integration (44:13)Language and Frameworks in Orchestration (49:45)Future of Orchestration and Closing Thoughts (51:54)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

Transcript
Discussion (0)
Starting point is 00:00:00 Hi, I'm Eric Dotz. And I'm John Wessel. Welcome to the Data Stack Show. The Data Stack Show is a podcast where we talk about the technical, business, and human challenges involved in data work. Join our casual conversations with innovators and data professionals to learn about new data technologies and how data teams are run at top companies. All right, welcome back to the Data Stack Show. We're here with Pedram Navid from Dagster, the Chief Dashboard Officer. Pedram, welcome to the show. Great to be here. Thank you. Yeah, so I think this is your second time on the show. It's been a little over a year.
Starting point is 00:00:41 We'd love a quick kind of update and then tell us a little bit about your current role. Yeah, I think last time I was here, I was enjoying consulting life, which meant lots of birdwatching, lots of looking outside, being outside. Since then, I've joined Daxter Labs about a year and a half ago, initially to run data and DevRel and now also marketing. So far less time to do birdwatching. That's too bad. Yeah, it's too bad.
Starting point is 00:01:07 Back to the grind, as it were. Okay. So we're going to spend a few minutes chatting. We've been spending a few minutes chatting, preparing for the show. I'm excited to kind of get into how you've gotten to this point and orchestrators in general. What are you looking forward to chatting about? Yeah, I mean, I can always talk about orchestration.
Starting point is 00:01:24 We'll talk about data platforms, how we, I can always talk about orchestration. We'll talk about data platforms. How we got to where we are could be kind of a fun story. We can always talk about AI. We can talk about data engineering and how you somehow accidentally end up running marketing. Could all be fun. All right, I'm excited. Let's do it.
Starting point is 00:01:41 Hey, Jerem, excited to have you. Let's talk a little bit about how you ended up at Dagster. So you were doing consulting, had some time to kind of work as you pleased, and now you're back at a startup. So tell us about that process. Yeah, I think what happened is I was actually consulting for Dagster initially. We had a great relationship and Pete and Nick, CEO and founders, asked me if I wanted to join. Initially, I said no, because I was enjoying my freedom too much. But one thing I found with consulting is your scope of work is often limited, and you don't get to see things, you know, fully end to end. And I also kind of missed the camar I said, Hey, you know, if that offer is still on the table, I'd love to chat about joining. And so we talked about a role, which was initially just a, had a DevRel role with, I believe, maybe data on the side as well. A small team of two or three people. And that was almost June of last year. And so I've been a year and a half now since I've been here. A couple months ago, we took on marketing as well as part of DevRel, which initially I wasn't so sure about. But now
Starting point is 00:02:51 that I've seen it operate, it makes a ton of sense for DevRel marketing to be close together and working together. Yeah, that's really interesting. So I've had a previous experience too, where I ended up having a data team and marketing as well. Tell me about maybe some of the unexpected synergies there. You've got DevRel, you've got marketing, data kind of on the side. What's come about that you're like, wow, this is cool that this is unified? Yeah, if you had told me initially that I would be on a DevRel team reporting to marketing, I probably wouldn't have taken the job because I've always felt like marketing didn't quite get DevRel. But this way it's kind of flipped. It's like marketing and DevRel are
Starting point is 00:03:27 pointing to me and I'm okay with that. So what I found is like the DevRel side of the house is like the content arm. Daxter is a technical product. We target technical people. And so we just need technical people who have experience in the field to create the content. For me, content's a broad term. It's not just blog posts. It's tutorials, workshops, webinars, how-tos, actual integrations. Our DevRel team has built integrations that have one deal. So DevRel is the producers of the marketing arm of Dengster. And then the rest of the marketing org is really in support of the distribution of that. Where the DevRel team probably doesn't have the expertise
Starting point is 00:04:06 is how to get their content out into the world, whether that's through paid ads or events or campaigns, that type of thing. And so having the two teams together, it's really actually a lot of synergies. I hate to use that word, but it is exactly that. Where they sit together, we're on the same meetings, every week we talk about what we're working on. And the advanced person
Starting point is 00:04:27 picks up on something the DevRel team's working on, and so does the campaign manager. And then three of them together, they're like, all right, let's go build something more holistic around that, rather than just this one-off content that you created. Yeah, that makes a lot of sense. So we've got Matt here co-hosting today in place of Eric. Matt, you've been that technical data audience before, you know, looking to purchase products or things like that. I'm curious, and I'll have to ask you the same thing, Pedram, like what really clicks with you if you think back about like content or maybe even just interacting with people around these technical products? Like what do you think, what mediums or what like what can you think of that's really like clicked with you in the past
Starting point is 00:05:08 yeah i think anything that lets you kind of see how the product actually works in a real kind of way and not just the super trivial kind of look one plus one equals two type of manner right i think that helps especially because previous to a lot more of this, it was very marketing-y. So it was everyone feeling like they were trying to bait you into giving them your information or trying something or whatever.
Starting point is 00:05:36 So things that kind of give you that ability to see it. And I think they have that credibility of professionals who've used it and who can show you, this is what it's actually going to help you with. So not like the 10 tips for personalization in your marketing using data.
Starting point is 00:05:52 So same question to you, Pedro, what have you found that works? Because it's the data, at least in my opinion, that data technical audience is a tricky one. It's a tricky one to find, a tricky one to resonate with. It know resonate with it is and it is like this meme almost of like developers hate being marketed to and i don't think it's true
Starting point is 00:06:12 i think developers need a certain type of marketing that works for them along their journey and their journey often might look different than you know someone like a leadership role for example and you just have to because a developer is going to sit down and what they want to do is almost every single time is they want to try the product. They want to figure out, is this thing what it says it is? Is it useful for me? Will it work in the way that I need it to?
Starting point is 00:06:38 And so a lot of the DevRel focus and the focus of Dynastar's marketing arm is to enable developers to be successful in their entire journey from becoming aware of the product, trying it out, learning about it. And so things like docs matter a lot more to a developer than they might to a technical leader, even. As a director of data, for example, you probably aren't going to sit down and try DynXer. You might care more about what are the features, benefits, how is it solving the five things my CEO keeps yelling at me about.
Starting point is 00:07:06 But your data engineer is going to want to actually try the product and make sure it hits the things that they actually care about. Well, and that can also be a little tricky just because the technical ability of data people, there's a pretty wide spectrum you can fall on there. There's some that are very, like they came from software engineering. And then there's others that are very self-trained and might be coming more from the, I'm doing data engineering or whatever because I have to and no one else here to do it. And so I'm scraping together YouTube tutorials and stuff like that.
Starting point is 00:07:39 How do you kind of, do you guys have a specific part of that you're targeting or do you try to kind of have content more of a wider swath of that spectrum? We definitely do the whole, we try as much as possible of the whole range. You have to. There were like, what I've learned is that not everyone is me. And like, I like a certain way of learning that other people don't.
Starting point is 00:08:00 And people like ways of learning that I refuse to use. A great example is like dexter university it's something we spun up last year it's like an online course it's like structured you go through lessons and that's the last thing in the world i would ever want and when they suggested it i'm like i don't know about this guys but all right we'll try it and people love it they love it they we get five out of five like if you look at our ratings, it's like 4.8 out of five. And we get weekly emails saying how much they enjoy it. And it was completely forward to me because that's not how I learned.
Starting point is 00:08:33 What I've learned is you have to provide scope for everyone. There's people who want structured training. There's people who want to just read the docs. Some people just want to install it and look at the source code. That's the range that you have to deal with. And all of that has to be good. Your source code, your documentation in your app, your code has to look good in a way that people can understand
Starting point is 00:08:53 and interact with it. All the way up to your tutorials and video. People want to sometimes sit down and watch a 30-minute training video on the product as well. And so we do all of it. And we hire for that too, right? We have people on the Dufferil team who are much more focused on the earlier as well. And so we do all of it. And we hire for that too, right? We have people on the Dufferil team who are much more focused on the earlier stage persona,
Starting point is 00:09:09 people who are getting first started. And we have people focused on much more deeper understanding of the product as well. Yeah, I'm definitely one of those that I do not want to watch a video. For whatever reason, I cannot sit through a 30-minute technical video. I'm one of those that wants to pull up the code,
Starting point is 00:09:27 we'll reference the docs when needed, we'll struggle through it, and then we'll think to myself, I should have just watched that video. So yeah, that's a really wide persona. And Daxter is a super flexible tool. You can use it a lot of different ways. And that's got to be a challenge as well, where I come to it with like, oh, I have this specific problem. And you've got a tool like,
Starting point is 00:09:50 well, we can solve lots of problems. Like, how do you bridge that gap? That is also a great question. We are looking to bridge it through product improvement. So we have something coming up called Daxter Components. I don't know if I'm allowed to leak it yet, but it is coming. It will be more focused on providing almost like building blocks to develop a data platform. And so it'll be a command line based tool initially, but you'll have like your YAML schema,
Starting point is 00:10:16 you'll have very easy ways to plug and play different integrations. That's like our approach to sort of addressing that while always being able to expose the underlying Daxter framework, which as you said, is extremely flexible, which has both its pros and cons. The pros is like you'll never really be constrained. If you can do it in Python, you can do it in Daxter. It's essentially your limitations.
Starting point is 00:10:36 Right. The cons can be like for a very simple setup. It can often feel like a lot to go through if you just want to orchestrate one simple task. Yeah, that makes sense. So let's zoom out a little bit for people that have no idea what Daxter is, maybe have never even heard of orchestration, like that kind of analyst persona.
Starting point is 00:10:57 How would you describe just the general field that you all are in, the data orchestration field, to someone that was like, I have no idea what this is? Yeah, it's a great question. Everyone orchestrates. They just might not do it intentionally or they might not know that they're doing it right. Orchestration could be as simple as you log into your computer once a week and you click on a button and you kick off a process. It's a very manual orchestration, but it's totally fine and often it's the right decision for you. It can become a little bit more complicated when you start to use something like a cron scheduler
Starting point is 00:11:28 that runs every single day or every single week at a certain time. And that's often enough for many tasks. When things start to get a little bit complicated is when you need to add dependencies or you need to be resistant to failures, essentially. Once those two things become into play, like you want to make sure that A runs before B every single time, you can't rely on cron.
Starting point is 00:11:49 You sometimes can like fudge it. You'll say, you know, you start at 12 and we'll start this one at three. And I'll hope it never takes more than three hours. And it will always succeed. And if that's true, you probably don't need an orchestrator. But often what happens is I think people realize they need an orchestrator a little too late what they thought was true no longer
Starting point is 00:12:09 becomes true you can't really observe cron that well your tasks take too long something fails or even worse your vendors like oh by the way that thing we sent you two months ago it was wrong here's an update now go and fix that and it's like well i can't rewind time and my cron schedule doesn't know how to rewind. And so once you start to get into these types of things, that's where orchestrators come into play and they start to manage some of these more complexities for you.
Starting point is 00:12:34 I feel like you said there that, you know, where you bring it in probably later than you should. I feel like that's a recurring theme for a lot of successful data things are, you know, if you would have brought this in two months ago, this is a five minute fix. Now we're very limited in what we can do and that type of a thing. So, but I also don't know if there's a way around that. Yeah. I mean, you don't know what you don't know. Right. And if you're, especially if you're doing something for the
Starting point is 00:12:58 first time, that's like, Oh, like this works. And then, or in that, like my favorite, because I think most people, if you're in a data role, at least get to that time gap thing where I'm going to have this run at midnight, this run at 2am, this run at 5am, everything's fine. And then usually, if you get in that world, you have some bad mornings where the first one failed. And then it's like kind of a house of cards and then because some of these take maybe you know hours to run like it takes like you're kind of sunk like you basically lost a full day for having the data correct um i think that's experience right like you get burned hopefully only once and you learn your lesson or you work with people who have been burned and they've learned their lessons and they'll impart that on you. Or you'll listen to the Data Stack show
Starting point is 00:13:45 and you'll learn about things not to do. It's also human nature. I think it's so much easier. This is why ice cream tastes good. You don't really think about the consequences. Running pipelines on Cron feels good because you don't have to think about the consequences until it's too late.
Starting point is 00:14:02 We try to educate people about how it's probably easier. It's not that hard to set up a pipeline in Daxter. It's just Cron. You can do it just in Cron. You don't have to use any of our advanced features.
Starting point is 00:14:13 We have a Cron scheduler. We have it in Daxter. And you'll get a pretty UI, which is more than you normally get out of Cron. Yeah, sure. And that's worth its weight in gold. And then from there,
Starting point is 00:14:23 you can evolve as you need to. You don't have to go and build these complex dependencies if you don't want to but get started with something when it's simple when it's just a few tasks, a simple dbt pipeline very easy to do in Dynaster we've got a great dbt integration or do it in a different orchestrator too, it doesn't have to be Dynaster
Starting point is 00:14:38 there's others out there but get it in something that you can observe because I think every engineer knows observability and logging are critical to any system. Yeah, that makes a lot of sense. I've used XR for a couple of projects. And this was kind of interesting. It was last weekend.
Starting point is 00:14:57 So it was two weekends ago. It was around that New Year's, Christmas holiday. And I got an error. I had set up alerting. I got an error, which was handy. And I thought, like, you know, what is this? Like, it's like, I better check on it. And sure enough, you know, it's an API,
Starting point is 00:15:13 like access denied type error because I was pulling data from an API. So like, what happened? You know, figure out, like, do I need to, did the credentials expire? What happened? It was funny. So essentially what happened is I was pulling data from,
Starting point is 00:15:26 it was like 28 different locations on this project. And essentially one of the locations had closed at the end of the year. But since I had everything like separated out, it was like, okay, cool. Like I can just like turn that location off and like everything keeps going and it's not a big deal. I think those are the types of things that like you know had it been the other way where essentially everything like cascades through and you're like oh like i'm gonna have to like rewrite a bunch of stuff etc those are the fun moments so
Starting point is 00:15:55 i guess i'm curious from your perspective obviously there's lots of different orchestrators out there what's special about daxter and maybe even what's special about Daxter for analytics orchestration specifically? Yeah, orchestration has been around for a long time. I think like Cron is like the classic, right? From there, I think Airflow is probably the next biggest orchestrator most people have heard of.
Starting point is 00:16:17 And that's a task-based orchestrator, right? So you've got a thing you want to do, you tell it and it runs and it's like black box and you sort of hope every box continues the way you want it to, but you have no ability to peer into the box. What Daxter sort of said is like, what if we split that or reverse that and instead of telling us about the task, tell us about the things you actually care about or let us discover those for you.
Starting point is 00:16:40 So a great example is I think a DBbt project, everyone sort of kind of gets what that is. It's a collection of like tables that you want to materialize at some, you know, regular cadence. The traditional airflow way would be to have a dbt task that just runs your dbt project, and then you sort of assume all those models in there are completed. In Daxter, what we do is we flip that around, and we actually expose every single model as an asset. And so Daxter is what we call an asset based orchestrator, because everything you care about is now represented in this big graph of things that you can sort of follow all the way through their logical conclusion. And so you can see all your dbt models within the Daxter view. And you can actually be kind of clever about it, you could run the whole thing at once every
Starting point is 00:17:24 single day, if that's what you want. Or you can actually be kind of clever about it. You could run the whole thing at once every single day if that's what you want. Or you can say, you know what? My stakeholders care about these five models. Run everything that depends on those on a five-minute schedule because they really want those things to be updated. And then these other models over here,
Starting point is 00:17:37 those, put them in a group that runs once a day whenever you feel like it doesn't really matter to me as long as they're refreshed daily. That's something you can start to do with Daxter. And then because you have this like asset view, you can start to connect things outside of DBT as well in a really intuitive way. Maybe you have a BI dashboard in Sigma.
Starting point is 00:17:54 Maybe you have, you know, some stuff happening in Red Hat Stack that you want to connect it to. Some files dropping into S3 bucket, FTP. All these things start to connect and you build lineage on them. And so you can be really clever about the full end-to-end orchestration of this thing rather than just focusing on a specific task. And so DAX has really been, I think, the next level of where we are going with orchestration.
Starting point is 00:18:15 And in fact, Airflow is even starting to move in this direction, which I find really validating that this is really the future of where orchestration is going yeah i think one two two benefits that i've seen from this like asset style orchestration has been essentially what you said one time compression because if i have separate like extract jobs that then load into a warehouse and then i have to transform and it's all like linear the time compression to get that one essentially one report that i need to be fast like fast as in like very up to date is there's just a limit right like if i'm having to do all of it here all the here all the here the there's a time compression but since everything is is compute based now there's also a cost
Starting point is 00:18:57 implication right because if i can compress some of these like times for the the ones that i want to be really fast i can also do the opposite for things that I only need that once a day. Before I was running this whole thing and everything was every five minutes, I can delay this 80%, which I don't care that it's a day old. And that's compute savings in your warehouse, potentially savings in your ETL tool. I think that's a big deal. You could take it even further.
Starting point is 00:19:27 Because you've exposed this data lineage, you get all these side effects almost for free. And that's something we've actually learned ourselves. It's like, now you have this data catalog, essentially. You understand all your data assets, and you have the source of truth of where your data is defined. Well, now you can search that, and now you have a data catalog for free.
Starting point is 00:19:43 You don't have to go and maintain a separate one. Data quality becomes something you bolt on top of your actual execution. It's not an afterthought. It's like as part of your pipelines, you can start to emit what we call asset checks or data quality things. And like you said, time compression becomes a much more interesting problem because we can actually be very declarative in Dijkstra. Instead of saying we want to run these things every day at 5 o'clock, you can say, this asset needs to be updated by this time. Do whatever it takes to make sure that's done. Make sure you run all its parents whenever you need to.
Starting point is 00:20:14 And now you're limited by only the chain of things that matter to that asset and not everything that comes before it. So we get a lot of really, I think, nice side benefits of this asset view that I don't think we really knew we were going to get when we first started going down this path, but it's become really interesting. Well, and that I think speaks to one of those things that you see is that a lot of teams find themselves kind of, they're drowned in whatever their process is. And so they can't really see what the next thing they could be doing is. And it's only once they kind of free up that
Starting point is 00:20:46 space or that mental thing, because, okay, now I've got Dagster that's running this and I don't have to think about it. Oh, now look at these other three things that have popped up that we can do that were never part of our initial plan of, you know, we were just trying to like not have to spend three, four hours every day you know troubleshooting or fixing or running whatever and it's like now that's gone now we can actually see more opportunities that we could have never thought of before 100 there's that old cartoon of like a two cavemen and one has like a square wheel and he's trying to push it and his friend with the circle wheel is like oh you should try a circle wheel and he's like oh i don't have time for that i'm spending all my time pushing the square wheel up the hill right and i feel like that's the circle wheel is like, oh, you should try a circle wheel. And he's like, oh, I don't have time for that.
Starting point is 00:21:25 I'm spending all my time pushing the square wheel up the hill. And I feel like that's the same way with orchestration. Often it feels like, oh, just an extra step that I have to go through. But that extra step is going to compound your productivity down the line. Yeah. So I'm curious a little bit about the software space, software stack. So we're in 2025 now. I think the modern data stack was declared dead last year. I don't know, last year or two.
Starting point is 00:21:53 And which I think practically means like people are seeing like consolidation essentially. I'm curious, like some of your thoughts on where do you think that shakes out? Because we've got so many different layers we've added into a data stack of extraction, observability, orchestration, transformation. The list really goes storage. The list goes on. How do you see that playing out in the next few years? Yeah. I feel that any time, you're not enterprise ready until you've been declared dead.
Starting point is 00:22:25 That's sort of... Yeah, exactly. Love that. So the modern data stack, I think is now enterprise ready. I think it's ready for, you know, the mass market to adopt. And what we might call dead,
Starting point is 00:22:38 I see being implemented still. There's so many companies going through like cloud modernization efforts. For sure. They're moving towards Snowflake. they're moving towards Databricks, they're moving towards DBT and cloud. That's not dead. So if we define modern data stack as cloud data warehouses
Starting point is 00:22:54 and a few really good tools, that's fine. I think modern data stack, if you want to talk about the 2020s version of it, where every function you had to do was its own company. That's probably dead. I don't think people want 27 vendors to do three things at the end of the day. And so consolidation is going to happen. We're seeing it at Dijkstra. Our customers are asking for us to combine catalog and quality into one thing. Our catalog will never be as good as a full-featured catalog that you go out and buy
Starting point is 00:23:25 and pay like a grand for. Like that's not where we're competing. But there's probably some elements of those things that you can combine within the products you're already using. That's going to continue. I mean, I think ByteTran is doing this with like their transforms.
Starting point is 00:23:38 I know you guys at RudderSack are doing this as well. Dykes are doing it. I think it's just natural. And what's going to happen is what happens all the time. We see a bunch of consolidation. People get annoyed at the consolidators. Some new tool comes out and
Starting point is 00:23:52 it's like, I'm really good at this one particular thing. Interests go down again. We get a hundred of those things. It's going to be a cycle. And I think right now we're just in the plateau of productivity area where I think things slowing down has actually been really good for
Starting point is 00:24:07 data teams in general. You don't have to pay attention to 500 different things, you can kind of just put your head down and get your job done and the tools you're using to do that just keep getting better on their own, which is a good feeling. Yeah, I think also during especially that peak like 2020-ish, 2021-ish
Starting point is 00:24:24 time period, a lot of teams got very hooked on all the different tools. And kind of, you know, I mean, I saw where the teams could kind of lose track of like, well, what is this ultimately supposed to be serving? You know, well, look, but we've got all these different things. And we've got all this data in a warehouse. And it's like, okay, but what's happening to it? How is it actually turning into revenue
Starting point is 00:24:47 or savings or profit or whatever? Yeah, I mean, and it wasn't just data. What I realize now, I mean, I'm in marketing land a little bit and the exact same thing was happening there. What was going on in marketing is everyone wanted a tool to solve their
Starting point is 00:25:04 particular issues case and almost like nobody wanted to do the work they just wanted to buy tools to do the work for them and you end up with like these massive marketing sites with like 40 50 different tools to do like three things so it wasn't just us but it was everywhere it felt like at that time but i think we're now in a better place where um i think interest rates solve a lot of problems to be honest like yeah sure yeah money not being free yeah it solved a lot of efficiency problems anyway i'll put it that way right so um we're seeing that consolidation it might not feel good to everybody but i think at the end of the day businesses are operating more leanly and they probably aren't you know losing a
Starting point is 00:25:41 lot at that expense either yeah i think that right. So talked a little bit about orchestration, what that is, Daxter's unique twist on that. I'm curious about your kind of career trajectory. You mentioned when we were talking earlier, data science, data engineering, now you're in DevRel marketing data. Tell us about that journey. I think it's a little bit of a unique journey and be interested how that all played out for you yeah when i did like
Starting point is 00:26:11 i think it was in high school they ask you to fill out the survey and it'll tell you what kind of job you had i don't even remember what it was but it was like a job i'd never heard of and i never knew what i wanted to be when i like grew up. I just sort of fell into different jobs based on what I was interested in at the time. Data science was, you know, a thing that was everyone's mind back in 2018, I think it was. I was listening to all the data science podcasts. Many of them are now defunct RIP.
Starting point is 00:26:39 But they were, it was the next hot thing, right? And so I was like, all right, I'm going to figure out how to become a data scientist. And I did that for a few years. And what I realized was the new batch of data scientists that were coming in, they weren't as technical as I had been. I spent more of my time programming than they had. And so they were great at building models, much better than I was,
Starting point is 00:26:59 because they were trained in it. But they couldn't deploy them at all. And so I started building infrastructure just to make it easier for them to deploy them at all and so yeah i started building like infrastructure just to make it easier for them to deploy because their code was better than mine so i ended up becoming a data engineer by accident and i found that really rewarding it was great to like build something and then the reward is like someone using it whereas the data scientist the reward is like maybe in a year you'll find out if your experiment was correct yeah right so for me like that instant validation of like knowing i built something that clearly like
Starting point is 00:27:30 works or doesn't and the person next to me is benefiting uh was super empowering and so that's how i started in data engineering did that for however many years eventually became a head of data at a company called high touch which back back then was really focused on the data persona. And as part of that, I was also doing what we call DevRel, essentially, talking about the product to data people. Ended up starting a team there, moving on to consulting, where I thought initially I was going to do data consulting and help people with their data problems.
Starting point is 00:28:06 But almost every company that talked to me wanted me to help them with their marketing problems. And even though I didn't think of myself as a marketer, I think they saw the diverse activities I was doing and the success we were having at Hightouch. They wanted me to replicate that for them. A lot of that was just educating them
Starting point is 00:28:21 that copying the thing that I did that won't work for you. It's not the blog post that's successful. I think a lot of people look at DBT, for example, and they saw their massive community. And they thought, oh, I should open a Slack community. And it's like, well, why? How? Where's the value to the actual user? Do you think people want 25 different Slack communities?
Starting point is 00:28:44 Or do you think they want one or two places to hang out? That might already be a place that's covered for them. So it was more about talking through what were really marketing principles, but to me it was just a common sense about how to get to data people in a way that made sense. And that, I guess, put the mark of marketer on my head and eventually I joined Daxter initially as DevRel, and more recently DevRel and marketing and also data. Yeah, that makes sense. There's a trajectory makes sense. And I would imagine, so the alternative here is like, let's just, you know,
Starting point is 00:29:17 for Daxter, like, well, sorry, a marketer, right? And there's got to be, we've already talked to some about the synergies there, but there's also got to be this like scratch your own itch. You kind of get to market to yourself or to your previous self, which like that has to be an advantage. I think for a company like Daxter and for like any technical company that markets to technical people, having a technical person who really gets the audience and the go-to-market motion and like really gets it is critical and i think we've even made mistakes with this as well in the past where like we're an open source core product like by our nature we are and so we shouldn't hide that fact and i think if you talk to a traditional marketer they might be like scared that people might use open source because we're not capturing an email.
Starting point is 00:30:06 So direct them to the email form instead. Get rid of all the open source things from our website. And bury it deeply. Right. It doesn't exist anymore. Kill it. And that's the mentality
Starting point is 00:30:17 of someone who doesn't understand how developers might operate. A developer is not going to want to sign up for a course or fill out a form. They're going to want to try the product. They do that through open source. Open source, to me, is not a competitor to Daxter Plus or Enterprise Offerings.
Starting point is 00:30:34 Open source is like a channel. It's a channel where people get to try it. If people go out and they're successful with open source and they never want to talk to us, that's totally fine by me. That's another Daxter user out there in the wild talking about how great Daxter is. That's free marketing. And so for me, open source is part of it.
Starting point is 00:30:51 And you really have to understand developers to be able to market to them. And that's really kind of why this marketing journey between DevRel and marketing made sense to me. At first, I was suspicious. I think if you asked me as a DevRel person to report into marketing, I probably would have said no. But if you have DevRel and marketing working together and they're all reporting to me,
Starting point is 00:31:11 it kind of felt fine. And I'm seeing it today. It actually works out really well. Yeah, and I think that's also, when you get to the open source stuff, especially when you're trying to do something at scale, it can be, most open source projects are really hard to continue
Starting point is 00:31:26 at scale. So it gives you a way of people like it, they trust it, and then they can go to, okay, how do I make this easier for myself to use over time? Yeah, we see that all the time. Like people don't want to run and maintain infrastructure generally. It can't be the only thing because often the companies that are good enough at using Dijkstra, they can figure out how to deploy Dijkstra themselves eventually. It's not that hard. So you do need to have things that are value-driven in the enterprise offering, hopefully, that will drive people to that. But also, it's easier to get open source into an enterprise than it is a vendor. So if I work at a big company and I really like Dijkstra, will I go and try the open source product and prove its value? Or will I get into this long, lengthy, lawyer-driven
Starting point is 00:32:11 vendor negotiation thing before I've even shown it to my peers that it's a good idea? I'll often start open source. I'll build some momentum. And then once we've proven out its value, we've hit either scaling limits or I just don't want to maintain it or we want additional features. I've proven it's useful. I can go and have that conversation and I'll go contact a sales team and have them start. But knowing that's a journey that people go through is, I think, critical in building out technical orgs that market to technical people. Yeah, I couldn't agree more with that. And there's this other component to where you've got
Starting point is 00:32:47 a team that's vetting a product, proving it works. Imagine that you're going through a traditional enterprise sales process. And I've done multiple of these where you don't get to see, touch, do anything with a product until basically the money has changed hands. It's been a while since I've done one of those type deals, but I've done those before. And those are scary as a technical person. A lot of times, and a lot of times is maybe driven by marketing or sales, for example. They've got to have this product,
Starting point is 00:33:15 and then you as a technical person stuck with you've got to integrate to implement it. So number one, for people who have been around a little bit, they have that in the back of their mind as far as the alternative and hate it. And then number two, you have this other practical competitor in a sense where the open source product keeps you, I think, honest as a company. Where if you ever were to 10x your prices overnight, people
Starting point is 00:33:40 could switch to open source, for example. But if you're a traditional enterprise-type thing and you connect your product and people are kind of stuck because it's hard to replace, then people are stuck and they have a lot of pain to switch. So I think that's another component that I've always appreciated about open source. Well, and I think the other one with that is, when I first came on to WriterStack on the marketing side,
Starting point is 00:34:04 one of the things that I told them was I was talking to someone on the marketing team and they were like well we really want RudderStack to be the reason you get your next promotion and my reply to that was I don't know anyone on the data side who buys software to get promoted
Starting point is 00:34:19 I know people who don't buy it because they don't want to get fired and the open source kind of helps you bridge that gap where we're not saying like I know people who don't buy it because they don't want to get fired. That's funny. And the open source kind of helps you bridge that gap where we're not saying like, hey, I need to make a really big commitment that's going to take time to implement. And I really hope it goes well or I'm not going to be here in a year. Yeah. Yeah.
Starting point is 00:35:10 I mean, the other thing we've seen is like if you really want to get promoted promoted, is you build Dynex for first principles, and it takes three years, and then you quit. Right? You get that staff level engineer, and then you just like, all right, I'm out of here, off to the next one. And then what you've built is like an in-house shitty version of a product you could have bought. Right? So there's two sides of that. I think the open source just makes it easier for everyone. There's this idea, you might be able to avoid vendor lock-in as well which i think really is appealing to people but i mean there's also great software that doesn't have open source and people buy it and love it there's technical things you can do with it but i think we all as engineers have seen those like monster implementations that promise like often the best ones are the ones that promise you have no need to talk to your engineers at all when you implement it. You just plug and play and click a few buttons and you're in. And then as soon as the deal is signed, oh, by the way, where's your engineers? We need them to come implement this thing we've never heard of before. That's the thing I think everyone wants to avoid. The other version of that is, oh, we're going to handle everything for you.
Starting point is 00:35:44 We're going to help you along the way. And then you sign the deal and they and you say okay how do we migrate this data and they go oh well it has to follow these this standard we don't do anything before that that's all on you it's like well that would have been nice to have known a month ago yeah yeah okay so we played this game on the show where we see how far into the show we can get without mentioning AI. I don't know where we're clocking in today. I think we did okay.
Starting point is 00:36:12 But I want to talk a little bit about AI and we got to talk about orchestration. You know, I think Daxter is a tool you can also use to orchestrate when you're, you know, pulling data together for AI or doing other things. I'm curious, like, what are people actually doing?
Starting point is 00:36:27 Maybe people using Daxter that are more on the cutting edge of using LLMs and maybe even AI agents. What are people actually practically doing with AI and orchestrators? Yeah, we see a lot of data prep for AI within Dynastar itself. We even see some companies building foundational models and doing experimentation, but that is like I would say cutting edge.
Starting point is 00:36:55 But bread and butter use cases, at the end of the day, I think AI engineering is data engineering and we even believe data engineering is software engineering. So if you follow this logical conclusion, it's all really the same thing. You're moving around data, you're transforming it, you're storing it, you're converting it, you're embedding it, you're calling APIs. Is that data engineering or is that working with open AI and LLMs? That's one and the same. Often
Starting point is 00:37:19 what we find is actually AI engineering is a little bit easier than ML engineering because you're relying a lot on these third-party providers, for example, for embeddings. You're not training models. It's not for you. You're really just experimenting and putting things out. And so we've seen a lot of companies do things like,
Starting point is 00:37:41 I mean, RAG is the big one, right? Everyone's trying to, like, AI is great, but it needs context. Without context, it's often garbage. If you go to OpenAI or Cloud today and you ask it to write a Daxter pipeline, it's often going to write really terrible code because it was trained on, like, Daxter code
Starting point is 00:37:59 from three years ago, which probably isn't valid anymore. But what we've done is we've built internally a RAG model that uses our documentation, our GitHub issues, our GitHub discussions to power what we call Ask AI. It's a Slack bot in our Slack community. And it does really good. Is it perfect? No, but it's a lot better than nothing.
Starting point is 00:38:19 Yeah, I've used it. It's pretty great. It's pretty good, right? Yeah. Not bad for a POC, and we can always make it better. Sometimes it gets confused, but it's better than it's pretty good right yeah uh not bad for a poc and you know we can always make it better sometimes it gets confused but it's better than not getting an answer which is always what i tell people so context is everything i think in ai and so what is context context is data right so ingesting our data transforming it picking the right ones adding metadata running experimentation on those different context windows, on different models. That's really where the extra thing shines. It's just like running these pipelines. So help me out with this. There is
Starting point is 00:38:52 basically a clone. Think about a data stack or the modern data stack from 2021. There's a clone of almost every single component that's like AI focused, right? Like there's orchestration tool, ETL tool, database specific. And I'm personally not super knowledgeable about each of those components when it comes to AI. Do you think that stays, or do you think it all gets consolidated back? Because it's not that different. Yeah, that's a good question. Maybe the vector databases stay.
Starting point is 00:39:23 If they're lucky, it's my best guess. Or do they, but I don't know technically how hard that would be to implement, you know, for Snowflake and Databricks to implement that. Most databases implement some type of embedding already. Yeah, right. Snowflake already has a vector version of their database.
Starting point is 00:39:43 Postgres has vector embeddings now. I think even MotherDuck, DuckDB have it. Is it that hard to store a vector of numbers? Probably not. There might be added benefits to using a dedicated vector database for I don't know. Those are going to become specialized
Starting point is 00:39:59 cases that you run into. That's my guess. And outside of that, the ETL stuff, I think we love reinventing things. My guess is most people who are getting into AI today, they're not coming into it from a background in data engineering.
Starting point is 00:40:15 Yes. And so they just don't know the tools. So if you don't know the tools, you think you have to invent things, right? Or maybe you just want to build new things
Starting point is 00:40:21 because old things are boring. Some of those will probably stick around because they'll be good enough that everyone uses them and they evolve. I think a lot of them will fall by the wayside when we realize AI problems are actually data problems and we have data tools to solve that already. I think a lot of people still, there was this confusion I feel like I still hear around there, which is this idea of we should be replacing all of our deterministic processes with AI. But I don't need it to give me seven different answers to it. I just want the one answer that's right every time. Yeah, I mean, it's people using AI as a calculator.
Starting point is 00:41:00 And it's like, well, it's a very expensive way to warm up the world. So I don't know. Maybe we don't need to do that. I don't know. Sometimes all you need are if statements and a regex. And maybe AI can replace that. But at the end of the day, whatever is faster is what's going to work for people. Right.
Starting point is 00:41:16 I think on that one, AI is just going to replace me having to look up how to write the regex. Yeah, that is a decent application. So, yeah, along the AI kind of questioning, I mean, you just kind of alluded to this. I mean, it's still very expensive. And the billions of dollars being poured into these companies mask the expense for now. Like, just this week, it came out that the $200 a month plan
Starting point is 00:41:42 still loses money for OpenAI. And I think they weren't even necessarily expecting that. And of course the thought here is like, okay, we're going to keep investing money in this and we'll have better hardware that's going to drive costs down. We'll have better models that don't have to be you know trained as you know in the same way to reduce cost well i mean this is just speculation at this point but it'll be interesting and i'm curious your take what does that curve look like because of it because eventually like the money i think could run out before we get to that spot but i mean i don't know what do you think just speculation on what what might happen there.
Starting point is 00:42:27 I mean, there's already some evidence of plateauing. Do you remember the great VC-funded days of Uber and DoorDash where it didn't cost anything to use these tools? And if you were smart, you would just abuse them as much as you could. You would get the referrals and the $100 here and the credits there there and there's like five cents to cross the city you can get free food pretty much every single day and that was wonderful and then the company's in public and it would cost like fifty dollars to go five miles right i know yeah exactly anywhere near an airport it's like at least fifty dollars even if you're just going across the street yeah it was supposed to be better it's supposed to be this utopia and it ended up just being a company that makes money off people.
Starting point is 00:43:07 And they did so at the expense of killing their competitors. So will AI be the same way? I don't know. Probably. People need to make margins at some point. Cash is not infinite right now. It's really driven off massive amounts of funding.
Starting point is 00:43:24 At some point, that'll change. We'll come down for sure, but when the margins go down, like the research also slows down. And so they will probably plateau and we'll probably find them useful in some limited capacity that's probably not going to fundamentally solve AGI,
Starting point is 00:43:44 for example. And I think we're seeing also that's probably not going to fundamentally solve AGI, for example. And I think we're seeing also that having the best model is not really much of a moat at this point. So it's not like you can say, well, yeah, we're going to spend billions, but once we get it there, we're going to capture everything. It does sound a bit like that Uber time of it's like, profits don't matter, we just need to capture market. And then eventually, once we capture the whole market,
Starting point is 00:44:10 we'll make money off of it. Yeah, it's tough to capture the market when really it's a commodity too. So I think where AI differentiates, it's through products, actually. So anyone can build a model these days. A lot of them are good. There's great open source models out there.
Starting point is 00:44:29 Integrating that model in a workflow is where differentiation I think really happens. Great companies who really understand that can make it a lot better. I think Anthropic and Cloud, for example, do a really good job with their projects and the way they've sort of structured Cloud
Starting point is 00:44:46 to make it very useful in particular contexts for solving these problems and discussions. I use it all the time. OpenAI, maybe not as good, I would say, product-wise as Anthropic these days. They have more features that I don't end up using, but purely from a chat agent with documentation store i think claude does a better job yeah i imagine in a few years we're going to find companies that like really
Starting point is 00:45:09 get the product perspective right and they built really cohesive products which are really powered by ai rather than just like an ai chatbot that is really good at generating responses which i think we've sort of hit a peak on, regardless of how much better they get. Yeah, the other one it makes me think of a little bit is like satellite telephone stuff, where it costs a whole lot of money to get the satellites up and to get the infrastructure there.
Starting point is 00:45:37 And once you had done all of that, it was really hard to make money off of it. But then when the next people came around and were just using the infrastructure that was already out there, you could make a profitable model, like business model off of it. But then when the next people came around and were just using the infrastructure that was already out there, you could make a profitable business model off of it. Like with a GPS, for example. Even satellite phone.
Starting point is 00:45:52 It's still around, and the companies are more profitable with it because they didn't have to pay to put all the satellites in there. Yeah, that's interesting. Yeah, so we have a few minutes left here. I'll throw this to Matt. So Matt, you've spent a little bit of time
Starting point is 00:46:07 with Daxter recently. And I'm curious, and you've got a data background. Matt worked for a publicly traded company in data. I'm curious, yeah, has Daxter and the orchestration landscape strike you with what you used in some of your previous roles? Like, how is it different? What's the evolution like? Well, so most of the places i worked we didn't really have an orchestrator so we had some more like pipeline related things but we didn't have like a dedicated orchestrator
Starting point is 00:46:36 and a lot of it so it's been an interesting little journey having to get to know it a little bit more and you know try to sometimes wrap my brain around the concepts because i think that's usually it because a lot of i mean there's a lot of stuff that you get into like okay i'm planning things i'm putting them in sequence or in parallel those types of ideas a lot of it then comes down to what's the framework that they're using to talk about these things what's the language they're using? What do they label this stuff? So, yeah. So, I mean, overall it's been, I have the added twist
Starting point is 00:47:09 that I'm also including Rudder Stack into this with some new stuff. So that's thrown some interesting frustrations the time just learning the two things at the exact same time. But I mean, overall it's been, it's one of those things that I can look at and i can see like oh here's how i could have used it yeah yeah oh yeah when i had a team of 15 this is how we could have used this right the one thing though i always i had to
Starting point is 00:47:38 think about back then was kind of to go back to a point that you made much earlier in that there's this newer generation of people who are data scientists or whatever, and they got taught a very applied way of doing things, which typically was very software-centric and how do I call the function to train a model or whatever. And so when you get into that more broader, kind of closer to software engineering world, they sometimes get a little scared.
Starting point is 00:48:10 And so you really had to pick stuff that you knew you could quickly get them in and get them learning with. So remember, we had a software engineer as a contractor once, and he was going to show us how to modernize our stuff and he did this whole thing of just basically tearing things apart building it from scratch and trying to show it how great it was and i was like okay that's great but no one but you can run
Starting point is 00:48:37 this right like i got a team of people that when you're not here i need you to run it whereas something like daxter is definitely one that you could see okay i can get a team of people that when you're not here, I need you to run it. Whereas something like Daxter is definitely one that you could see, okay, I can get a team of people to be up and running with this. I think that's a really big deal. Two things I thought of from my previous experiences, because I'd use, it's actually funny, I'd use the product called Rundeck. Adrian, I don't know if you're familiar with that one.
Starting point is 00:49:03 It's like a little bit more than a Windows task scheduler, but before we had that DAG type concept. It's interesting when you go through what you would do every day, and now you have words and language for it.
Starting point is 00:49:19 I think that's the most interesting thing about finding a good framework for oh, I didn't know I was doing orchestration. Like I just, you know, schedule this around this and this. I think that's one of the things. And then the second one,
Starting point is 00:49:31 which Matt just touched on, which I talked a lot about. And I think orchestration is a big deal here, here. When you move, when your data team moves from like one, maybe two people to be more of a team, it was three, four three four five however many people
Starting point is 00:49:46 that conversion from what i call single player mode to multiplayer mode it's a really big deal the tooling becomes a bigger deal the version control the you know and i think like dbt for example is one thing that i think is a big deal if you're moving into multiplayer mode for your data team like dbt and people in that transformation layer having a solution think is a big deal. If you're moving into multiplayer mode for your data team, like DBT and people in that transformation layer, having a solution there is a big deal. With orchestration, same thing. Where you're now using the same framework,
Starting point is 00:50:19 there's less esoteric-ness when how do we schedule a job is defined. We use this, it has specs and documentation. And I think knowing that, because I've been a part of at least one company where orchestration had the name and it was an employee named Gary. And so he ran everything. And when he left, nothing could run. Versus if you, and then we were scrambling, whole group of us to try to get things back together right but we
Starting point is 00:50:46 also didn't have any like we didn't have the language because this was almost 10 years ago now to be able to be like okay now what we need to do is get this into an orchestrator so that we're not dealing with this anymore yeah and even just i think the language of how do i talk about these things okay these are assets and stuff like that. Giving language to that can be very helpful in just helping, I think, a lot of times people get out of the kind of limited mind frame they're in, if that makes sense. Especially when you're talking about things like, what does data as a product mean? Well, to a lot of data scientists who are very new, it means the model I built and explaining to them, well, no, you have to have this. It's the end-to-end
Starting point is 00:51:29 collection to delivery is the product, not just this little part that you build. One last take. Pedro, maybe specifically for Dagster or generally for orchestration, where do you think this goes in the next couple of years?
Starting point is 00:51:47 What are the core problems in the space to solve for orchestrators such as Daxter? Yeah, it's a good question. I think one of it is something we just touched on, is that not everyone knows what an orchestrator is and when they need it. And so I think at Dxter, we have like two sort of big priorities. One is just helping generate awareness of what orchestrators are,
Starting point is 00:52:11 what a data platform is, the fact that you probably already have one and like how to think about observing and having a single place to look at these things, right? You can't just go to Gary every single time. And so having one place where you can understand where everything is supposed to run, that's, I think, a big piece of it. And the other is also just like lowering
Starting point is 00:52:29 that adoption curve for people. So finding ways to make it easier, more plug and play to use Zangster with existing playbooks that you already have and are pretty common across the industry. Building those out without losing sight of sort of the power of Python and Dengster itself is kind of what we're focused on.
Starting point is 00:52:49 Yeah, makes a ton of sense. Well, thanks for being on the show. It's been really fun. Matt, thanks for being here. And we'll catch everybody in the next episode. Thank you. All right, thank you. The Data Stack Show is brought to you by Rudderstack,
Starting point is 00:53:04 the warehouse-native customer data platform. Rudderstack is purpose-built to help data teams turn customer data into competitive advantage. Learn more at ruddersack.com.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.