The Data Stack Show - 224: Bridging Gaps: DevRel, Marketing Synergies, and the Future of Data with Pedram Navid of Dagster Labs
Episode Date: January 15, 2025Highlights from this week’s conversation include:Pedram’s Background and Journey in Data (0:47)Joining Dagster Labs (1:41)Synergies Between Teams (2:56)Developer Marketing Preferences (6:06)Bridgi...ng Technical Gaps (9:54)Understanding Data Orchestration (11:05)Dagster's Unique Features (16:07)The Future of Orchestration (18:09)Freeing Up Team Resources (20:30)Market Readiness of the Modern Data Stack (22:20)Career Journey into DevRel and Marketing (26:09)Understanding Technical Audiences (29:33)Building Trust Through Open Source (31:36)Understanding Vendor Lock-In (34:40)AI and Data Orchestration (36:11)Modern Data Stack Evolution (39:09)The Cost of AI Services (41:58)Differentiation Through Integration (44:13)Language and Frameworks in Orchestration (49:45)Future of Orchestration and Closing Thoughts (51:54)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
Transcript
Discussion (0)
Hi, I'm Eric Dotz.
And I'm John Wessel.
Welcome to the Data Stack Show.
The Data Stack Show is a podcast where we talk about the technical, business, and human challenges involved in data work.
Join our casual conversations with innovators and data professionals to learn about new data technologies and how data teams are run at top companies.
All right, welcome back to the Data Stack Show. We're here with Pedram Navid from Dagster, the Chief Dashboard Officer. Pedram, welcome to the show.
Great to be here. Thank you.
Yeah, so I think this is your second time on the show. It's been a little over a year.
We'd love a quick kind of update and then tell us a little bit about your current role.
Yeah, I think last time I was here, I was enjoying consulting life,
which meant lots of birdwatching, lots of looking outside, being outside.
Since then, I've joined Daxter Labs about a year and a half ago,
initially to run data and DevRel and now also marketing.
So far less time to do birdwatching.
That's too bad.
Yeah, it's too bad.
Back to the grind, as it were.
Okay.
So we're going to spend a few minutes chatting.
We've been spending a few minutes chatting, preparing for the show.
I'm excited to kind of get into how you've gotten to this point
and orchestrators in general.
What are you looking forward to chatting about?
Yeah, I mean, I can always talk about orchestration.
We'll talk about data platforms, how we, I can always talk about orchestration.
We'll talk about data platforms.
How we got to where we are could be kind of a fun story.
We can always talk about AI.
We can talk about data engineering and how you somehow accidentally end up running marketing.
Could all be fun.
All right, I'm excited.
Let's do it.
Hey, Jerem, excited to have you.
Let's talk a little bit about how you ended up at Dagster.
So you were doing consulting, had some time to kind of work as you pleased, and now you're
back at a startup.
So tell us about that process.
Yeah, I think what happened is I was actually consulting for Dagster initially.
We had a great relationship and Pete and Nick, CEO and founders, asked me if I wanted to join. Initially, I said no, because I was enjoying my freedom too much. But one thing I found with consulting is your scope of work is often limited, and you don't get to see things, you know, fully end to end. And I also kind of missed the camar I said, Hey, you know, if that offer is still on the table, I'd love to chat about joining. And so we talked about a role, which was initially just a, had a DevRel role with, I believe, maybe data on the side as well. A small team of two or three people. And that was almost June of last year. And so I've been a year and a half now since I've been here. A couple months ago,
we took on marketing as well as part of DevRel, which initially I wasn't so sure about. But now
that I've seen it operate, it makes a ton of sense for DevRel marketing to be close together
and working together. Yeah, that's really interesting. So I've had a previous experience
too, where I ended up having a data team and marketing as well. Tell me about maybe some of the
unexpected synergies there. You've got DevRel, you've got marketing, data kind of on the side.
What's come about that you're like, wow, this is cool that this is unified?
Yeah, if you had told me initially that I would be on a DevRel team reporting
to marketing, I probably wouldn't have taken the job because I've always felt like marketing didn't quite
get DevRel. But this way it's kind of flipped. It's like marketing and DevRel are
pointing to me and I'm okay with that. So what I found is like the DevRel side of the house is like
the content arm. Daxter is a technical product. We target technical people. And so we just need
technical people who have experience in the field to create the content. For me, content's a broad term. It's not just blog posts. It's tutorials,
workshops, webinars, how-tos, actual integrations.
Our DevRel team has built integrations that have one deal. So DevRel is
the producers of the marketing arm of Dengster.
And then the rest of the marketing org is really in support of the distribution of that.
Where the DevRel team probably doesn't have the expertise
is how to get their content out into the world,
whether that's through paid ads or events or campaigns,
that type of thing.
And so having the two teams together,
it's really actually a lot of synergies.
I hate to use that word, but it is exactly that.
Where they sit together, we're on the same meetings,
every week we talk about what we're working on. And the advanced person
picks up on something the DevRel team's working on, and so does the campaign manager.
And then three of them together, they're like, all right, let's go build something more holistic
around that, rather than just this one-off content that you created.
Yeah, that makes a lot of sense. So we've got Matt here co-hosting today in place of Eric.
Matt,
you've been that technical data audience before, you know, looking to purchase products or things like that. I'm curious, and I'll have to ask you the same thing, Pedram, like what really clicks
with you if you think back about like content or maybe even just interacting with people around
these technical products? Like what do you think, what mediums or what like what can you think of that's really like clicked with you in the past
yeah i think anything that lets you kind of see how the product actually works in a real kind of
way and not just the super trivial kind of look one plus one equals two type of manner right i
think that helps especially because previous to a lot more of this,
it was very marketing-y.
So it was everyone feeling like
they were trying to bait you into
giving them your information
or trying something or whatever.
So things that kind of give you that ability to see it.
And I think they have that credibility
of professionals who've used it
and who can show you,
this is what it's actually going to help you with.
So not like the
10 tips for personalization
in your marketing using data.
So same question to you,
Pedro, what have you found that works?
Because it's the data, at least in my
opinion, that data technical audience
is a tricky one. It's a tricky one to find,
a tricky one to
resonate with. It know resonate with it is
and it is like this meme almost of like developers hate being marketed to and i don't think it's true
i think developers need a certain type of marketing that works for them along their journey
and their journey often might look different than you know someone like a leadership role for
example and you just have to because a developer is going to sit down
and what they want to do is almost every single time
is they want to try the product.
They want to figure out, is this thing what it says it is?
Is it useful for me?
Will it work in the way that I need it to?
And so a lot of the DevRel focus
and the focus of Dynastar's marketing arm
is to enable developers to be successful
in their entire journey from becoming aware of the product, trying it out, learning about it.
And so things like docs matter a lot more to a developer than they might to a technical leader,
even. As a director of data, for example, you probably aren't going to sit down and try DynXer.
You might care more about what are the features, benefits, how is it solving the five things my
CEO keeps yelling at me about.
But your data engineer is going to want to actually try the product
and make sure it hits the things that they actually care about.
Well, and that can also be a little tricky
just because the technical ability of data people,
there's a pretty wide spectrum you can fall on there.
There's some that are very, like they came from software engineering.
And then there's others that are very self-trained and might be coming more from the, I'm doing data engineering or whatever because I have to and no one else here to do it.
And so I'm scraping together YouTube tutorials and stuff like that.
How do you kind of, do you guys have a specific part of that you're targeting or do you try to kind of have content more of a wider swath of that spectrum?
We definitely do the whole,
we try as much as possible of the whole range.
You have to.
There were like, what I've learned
is that not everyone is me.
And like, I like a certain way of learning
that other people don't.
And people like ways of learning that I refuse to use.
A great example is like dexter
university it's something we spun up last year it's like an online course it's like structured
you go through lessons and that's the last thing in the world i would ever want and when they
suggested it i'm like i don't know about this guys but all right we'll try it and people love it they
love it they we get five out of five like if you look at our ratings, it's like 4.8 out of five.
And we get weekly emails saying how much they enjoy it.
And it was completely forward to me because that's not how I learned.
What I've learned is you have to provide scope for everyone.
There's people who want structured training.
There's people who want to just read the docs.
Some people just want to install it and look at the source code.
That's the range that you have to deal with.
And all of that has to be good.
Your source code, your documentation in your app,
your code has to look good in a way that people can understand
and interact with it.
All the way up to your tutorials and video.
People want to sometimes sit down and watch a 30-minute training video
on the product as well.
And so we do all of it.
And we hire for that too, right?
We have people on the Dufferil team who are much more focused on the earlier as well. And so we do all of it. And we hire for that too, right? We have people on the Dufferil team
who are much more focused on the earlier stage persona,
people who are getting first started.
And we have people focused on much more
deeper understanding of the product as well.
Yeah, I'm definitely one of those
that I do not want to watch a video.
For whatever reason,
I cannot sit through a 30-minute technical video.
I'm one of those that wants to pull up the code,
we'll reference the docs when needed,
we'll struggle through it, and then we'll think to myself,
I should have just watched that video.
So yeah, that's a really wide persona.
And Daxter is a super flexible tool.
You can use it a lot of different ways.
And that's got to be a challenge as well,
where I come to it with like, oh, I have this specific problem. And you've got a tool like,
well, we can solve lots of problems. Like, how do you bridge that gap?
That is also a great question. We are looking to bridge it through product improvement. So we have
something coming up called Daxter Components. I don't know if I'm allowed to leak it yet,
but it is coming. It will be more focused on providing
almost like building blocks
to develop a data platform.
And so it'll be a command line based tool initially,
but you'll have like your YAML schema,
you'll have very easy ways to plug and play
different integrations.
That's like our approach to sort of addressing that
while always being able to expose
the underlying Daxter framework, which as you said, is extremely flexible, which has both its pros and cons.
The pros is like you'll never really be constrained.
If you can do it in Python, you can do it in Daxter.
It's essentially your limitations.
Right.
The cons can be like for a very simple setup.
It can often feel like a lot to go through if you just want to orchestrate one simple task.
Yeah, that makes sense.
So let's zoom out a little bit for people
that have no idea what Daxter is,
maybe have never even heard of orchestration,
like that kind of analyst persona.
How would you describe just the general field
that you all are in, the data orchestration field,
to someone that was like, I have no idea what this is? Yeah, it's a great question. Everyone orchestrates. They just might not do it
intentionally or they might not know that they're doing it right. Orchestration could be as simple
as you log into your computer once a week and you click on a button and you kick off a process.
It's a very manual orchestration, but it's totally fine and often it's the right decision for you.
It can become a little bit more complicated
when you start to use something like a cron scheduler
that runs every single day or every single week at a certain time.
And that's often enough for many tasks.
When things start to get a little bit complicated
is when you need to add dependencies
or you need to be resistant to failures, essentially.
Once those two things become into play,
like you want to make sure that A runs before B
every single time, you can't rely on cron.
You sometimes can like fudge it.
You'll say, you know, you start at 12
and we'll start this one at three.
And I'll hope it never takes more than three hours.
And it will always succeed.
And if that's true,
you probably don't need an orchestrator.
But often what happens is I think people realize they need an orchestrator a little too late what they thought was true no longer
becomes true you can't really observe cron that well your tasks take too long something fails
or even worse your vendors like oh by the way that thing we sent you two months ago it was wrong
here's an update now go and fix that and it's like well i can't rewind time and my cron schedule
doesn't know how to rewind.
And so once you start to get into these types of things,
that's where orchestrators come into play
and they start to manage some of these
more complexities for you.
I feel like you said there that, you know,
where you bring it in probably later than you should.
I feel like that's a recurring theme
for a lot of successful data things are,
you know, if you would have brought this in two
months ago, this is a five minute fix. Now we're very limited in what we can do and that type of
a thing. So, but I also don't know if there's a way around that. Yeah. I mean, you don't know
what you don't know. Right. And if you're, especially if you're doing something for the
first time, that's like, Oh, like this works. And then, or in that, like my favorite, because I
think most people, if you're in a data role,
at least get to that time gap thing where I'm going to have this run at midnight, this run at
2am, this run at 5am, everything's fine. And then usually, if you get in that world, you have some
bad mornings where the first one failed. And then it's like kind of a house of cards and then because some of these take maybe you know hours to run like it takes like you're kind of sunk like you basically lost
a full day for having the data correct um i think that's experience right like you get burned
hopefully only once and you learn your lesson or you work with people who have been burned and
they've learned their lessons and they'll impart that on you. Or you'll listen to the Data Stack show
and you'll learn about
things not to do.
It's also human nature.
I think it's so much easier.
This is why ice cream tastes good. You don't
really think about the consequences.
Running pipelines on Cron feels good because you don't have to
think about the consequences until it's too late.
We try to educate
people about how it's probably easier.
It's not that hard
to set up a pipeline in Daxter.
It's just Cron.
You can do it just in Cron.
You don't have to use
any of our advanced features.
We have a Cron scheduler.
We have it in Daxter.
And you'll get a pretty UI,
which is more than you normally get
out of Cron.
Yeah, sure.
And that's worth its weight in gold.
And then from there,
you can evolve as you need to.
You don't have to go and build these complex dependencies
if you don't want to
but get started with something when it's simple
when it's just a few tasks, a simple dbt pipeline
very easy to do in Dynaster
we've got a great dbt integration
or do it in a different orchestrator too, it doesn't have to be Dynaster
there's others out there
but get it in something that you can observe
because I think every engineer
knows observability and logging are critical to any system.
Yeah, that makes a lot of sense.
I've used XR for a couple of projects.
And this was kind of interesting.
It was last weekend.
So it was two weekends ago.
It was around that New Year's, Christmas holiday.
And I got an error.
I had set up alerting.
I got an error, which was handy.
And I thought, like, you know, what is this?
Like, it's like, I better check on it.
And sure enough, you know, it's an API,
like access denied type error
because I was pulling data from an API.
So like, what happened?
You know, figure out, like, do I need to,
did the credentials expire?
What happened?
It was funny.
So essentially what happened is I was pulling data from,
it was like 28 different locations on this project.
And essentially one of the locations had closed at the end of the year.
But since I had everything like separated out, it was like, okay, cool.
Like I can just like turn that location off and like everything keeps going
and it's not a big deal.
I think those are the types of things
that like you know had it been the other way where essentially everything like cascades through and
you're like oh like i'm gonna have to like rewrite a bunch of stuff etc those are the fun moments so
i guess i'm curious from your perspective obviously there's lots of different orchestrators out there
what's special about daxter and maybe even what's special about Daxter for analytics orchestration
specifically? Yeah, orchestration
has been around for a long time.
I think like Cron is like the classic, right?
From there, I think
Airflow is probably the next biggest orchestrator
most people have heard of.
And that's a task-based orchestrator, right?
So you've got a thing you want to do, you tell it
and it runs and it's like black box and you sort of
hope every box continues the way you want it to,
but you have no ability to peer into the box.
What Daxter sort of said is like, what if we split that or reverse that
and instead of telling us about the task, tell us about the things you actually care about
or let us discover those for you.
So a great example is I think a DBbt project, everyone sort of kind of gets what
that is. It's a collection of like tables that you want to materialize at some, you know, regular
cadence. The traditional airflow way would be to have a dbt task that just runs your dbt project,
and then you sort of assume all those models in there are completed. In Daxter, what we do is we
flip that around, and we actually expose every single model as an asset. And so Daxter is what we call an asset based orchestrator, because everything
you care about is now represented in this big graph of things that you can sort of follow all
the way through their logical conclusion. And so you can see all your dbt models within the Daxter
view. And you can actually be kind of clever about it, you could run the whole thing at once every
single day, if that's what you want. Or you can actually be kind of clever about it. You could run the whole thing at once every single day
if that's what you want.
Or you can say, you know what?
My stakeholders care about these five models.
Run everything that depends on those
on a five-minute schedule
because they really want those things to be updated.
And then these other models over here,
those, put them in a group that runs once a day
whenever you feel like it doesn't really matter to me
as long as they're refreshed daily.
That's something you can start to do with Daxter.
And then because you have this like asset view,
you can start to connect things outside of DBT as well
in a really intuitive way.
Maybe you have a BI dashboard in Sigma.
Maybe you have, you know,
some stuff happening in Red Hat Stack
that you want to connect it to.
Some files dropping into S3 bucket, FTP.
All these things start to connect
and you build lineage on them.
And so you can be really clever about the full end-to-end orchestration of this thing rather than just focusing on a specific
task. And so DAX has really been, I think, the next level of where we are going with orchestration.
And in fact, Airflow is even starting to move in this direction, which I find really validating
that this is really the future of where orchestration is going yeah i think one two two benefits that i've seen
from this like asset style orchestration has been essentially what you said one time compression
because if i have separate like extract jobs that then load into a warehouse and then i have to
transform and it's all like linear the time compression to get that one essentially one
report that i need to be fast like fast as in like very up to date
is there's just a limit right like if i'm having to do all of it here all the here all the here
the there's a time compression but since everything is is compute based now there's also a cost
implication right because if i can compress some of these like times for the the ones that i want
to be really fast i can also do the opposite for things that I only need
that once a day. Before I was running this whole thing and everything was
every five minutes, I can delay this 80%, which I don't
care that it's a day old. And that's compute savings in your warehouse,
potentially savings in your ETL tool.
I think that's a big deal.
You could take it even further.
Because you've exposed this data lineage,
you get all these side effects almost for free.
And that's something we've actually learned ourselves.
It's like, now you have this data catalog, essentially.
You understand all your data assets,
and you have the source of truth of where your data is defined.
Well, now you can search that,
and now you have a data catalog for free.
You don't have to go and maintain a separate one.
Data quality becomes something you bolt on top of your actual execution. It's not an afterthought. It's like as part of your pipelines, you can start to emit
what we call asset checks or data quality things. And like you said, time compression becomes a
much more interesting problem because we can actually be very declarative in Dijkstra.
Instead of saying we want to run these things every day at 5 o'clock,
you can say, this asset needs to be updated by this time.
Do whatever it takes to make sure that's done.
Make sure you run all its parents whenever you need to.
And now you're limited by only the chain of things that matter to that asset
and not everything that comes before it.
So we get a lot of really, I think, nice side benefits of this asset view
that I don't
think we really knew we were going to get when we first started going down this path, but it's
become really interesting. Well, and that I think speaks to one of those things that you see is that
a lot of teams find themselves kind of, they're drowned in whatever their process is. And so they
can't really see what the next thing they could be doing is. And it's only once they kind of free up that
space or that mental thing, because, okay, now I've got Dagster that's running this and I don't
have to think about it. Oh, now look at these other three things that have popped up that we
can do that were never part of our initial plan of, you know, we were just trying to like not
have to spend three, four hours every day you know troubleshooting or
fixing or running whatever and it's like now that's gone now we can actually see more opportunities
that we could have never thought of before 100 there's that old cartoon of like a two cavemen
and one has like a square wheel and he's trying to push it and his friend with the circle wheel
is like oh you should try a circle wheel and he's like oh i don't have time for that i'm spending all my time pushing the square wheel up the hill right and i feel like that's the circle wheel is like, oh, you should try a circle wheel. And he's like, oh, I don't have time for that.
I'm spending all my time pushing the square wheel up the hill.
And I feel like that's the same way with orchestration.
Often it feels like, oh, just an extra step that I have to go through.
But that extra step is going to compound your productivity down the line.
Yeah.
So I'm curious a little bit about the software space, software stack.
So we're in 2025 now. I think the modern data stack was declared dead last year.
I don't know, last year or two.
And which I think practically means like people are seeing like consolidation essentially.
I'm curious, like some of your thoughts on where do you think that shakes out? Because we've got so many different layers we've added into a data stack of extraction,
observability, orchestration, transformation.
The list really goes storage.
The list goes on.
How do you see that playing out in the next few years?
Yeah.
I feel that any time, you're not enterprise ready until you've been declared dead.
That's sort of...
Yeah, exactly.
Love that.
So the modern data stack,
I think is now enterprise ready.
I think it's ready for, you know,
the mass market to adopt.
And what we might call dead,
I see being implemented still.
There's so many companies going through
like cloud modernization efforts.
For sure.
They're moving towards Snowflake. they're moving towards Databricks,
they're moving towards DBT and cloud.
That's not dead.
So if we define modern data stack as cloud data warehouses
and a few really good tools, that's fine.
I think modern data stack, if you want to talk about the 2020s version of it,
where every function you had to do was its own company.
That's probably dead. I don't think people want 27 vendors to do
three things at the end of the day. And so consolidation is going to happen.
We're seeing it at Dijkstra. Our customers are asking for us to combine catalog and quality
into one thing. Our catalog will never be as good as a full-featured catalog
that you go out and buy
and pay like a grand for.
Like that's not where we're competing.
But there's probably some elements
of those things that you can combine
within the products you're already using.
That's going to continue.
I mean, I think ByteTran is doing this
with like their transforms.
I know you guys at RudderSack
are doing this as well.
Dykes are doing it.
I think it's just natural.
And what's going to happen is what happens all the time.
We see a bunch of consolidation.
People get annoyed at the consolidators.
Some new tool comes out and
it's like, I'm really good at this one particular
thing.
Interests go down again. We get a hundred of those things.
It's going to be a cycle.
And I think right now we're just in the
plateau of productivity
area where I think things
slowing down has actually been really good for
data teams in general. You don't have to
pay attention to 500 different things, you can
kind of just put your head down and get your job done
and the tools you're using to do that
just keep getting better on their own, which is
a good feeling. Yeah, I think also
during especially that peak like
2020-ish, 2021-ish
time period,
a lot of teams got very hooked on all the different tools.
And kind of, you know, I mean, I saw where the teams could kind of lose track of like,
well, what is this ultimately supposed to be serving?
You know, well, look, but we've got all these different things.
And we've got all this data in a warehouse.
And it's like, okay, but what's happening to it? How is it actually turning
into revenue
or savings or profit or whatever?
Yeah, I mean,
and it wasn't just data. What I realize
now, I mean, I'm in marketing land a little
bit and the exact same thing was happening
there. What was
going on in marketing is
everyone wanted a tool to solve their
particular issues case
and almost like nobody wanted to do the work they just wanted to buy tools to do the work for them
and you end up with like these massive marketing sites with like 40 50 different tools to do like
three things so it wasn't just us but it was everywhere it felt like at that time but i think
we're now in a better place where um i think interest rates solve a lot of problems to be honest like yeah sure yeah
money not being free yeah it solved a lot of efficiency problems anyway i'll put it that way
right so um we're seeing that consolidation it might not feel good to everybody but i think at
the end of the day businesses are operating more leanly and they probably aren't you know losing a
lot at that expense either yeah i think that right. So talked a little bit about orchestration,
what that is, Daxter's unique twist on that.
I'm curious about your kind of career trajectory.
You mentioned when we were talking earlier,
data science, data engineering,
now you're in DevRel marketing data.
Tell us about that journey.
I think it's a little bit of a unique journey and be interested how that all played out for you yeah when i did like
i think it was in high school they ask you to fill out the survey and it'll tell you what kind of job
you had i don't even remember what it was but it was like a job i'd never heard of and i never knew
what i wanted to be when i like grew up. I just sort of fell into different jobs
based on what I was interested in at the time.
Data science was, you know, a thing
that was everyone's mind back in 2018, I think it was.
I was listening to all the data science podcasts.
Many of them are now defunct RIP.
But they were, it was the next hot thing, right?
And so I was like, all right,
I'm going to figure out how to become a data scientist.
And I did that for a few years.
And what I realized was the new batch of data scientists that were coming in,
they weren't as technical as I had been.
I spent more of my time programming than they had.
And so they were great at building models, much better than I was,
because they were trained in it.
But they couldn't deploy them at all.
And so I started building infrastructure just to make it easier for them to deploy them at all and so yeah i started building like
infrastructure just to make it easier for them to deploy because their code was better than mine so
i ended up becoming a data engineer by accident and i found that really rewarding it was great
to like build something and then the reward is like someone using it whereas the data scientist
the reward is like maybe in a year you'll find out if your experiment was correct yeah right so
for me like that instant validation of like knowing i built something that clearly like
works or doesn't and the person next to me is benefiting uh was super empowering and so
that's how i started in data engineering did that for however many years eventually became a head of
data at a company called high touch which back back then was really focused on the data persona.
And as part of that, I was also doing what we
call DevRel, essentially, talking about the product to data people.
Ended up starting a team there, moving
on to consulting, where I thought initially I was going to do data consulting
and help people with their data problems.
But almost every company that talked to me
wanted me to help them with their marketing problems.
And even though I didn't think of myself
as a marketer, I think they saw
the diverse activities I was doing and the success
we were having at Hightouch.
They wanted me to replicate that for them.
A lot of that was just educating them
that copying
the thing that I did that won't work for you.
It's not the blog post that's successful.
I think a lot of people look at DBT, for example, and they saw their massive community.
And they thought, oh, I should open a Slack community.
And it's like, well, why? How?
Where's the value to the actual user?
Do you think people want 25 different Slack communities?
Or do you think they want one or two places to hang out?
That might already be a place that's covered for them.
So it was more about talking through what were really marketing principles,
but to me it was just a common sense about how to get to data people in a way that made sense.
And that, I guess, put the mark of marketer on my head
and eventually I joined Daxter initially as DevRel,
and more recently DevRel and marketing and also data. Yeah, that makes sense. There's a trajectory
makes sense. And I would imagine, so the alternative here is like, let's just, you know,
for Daxter, like, well, sorry, a marketer, right? And there's got to be, we've already talked to
some about the synergies there, but there's also got to be this like scratch your own itch.
You kind of get to market to yourself or to your previous self, which like that has to be an advantage.
I think for a company like Daxter and for like any technical company that markets to technical people, having a technical person who really gets the audience and the go-to-market motion and like really gets it
is critical and i think we've even made mistakes with this as well in the past where like we're
an open source core product like by our nature we are and so we shouldn't hide that fact and i think
if you talk to a traditional marketer they might be like scared that people might use open source
because we're not capturing an email.
So direct them to the email form instead.
Get rid of all the open source things
from our website.
And bury it deeply.
Right.
It doesn't exist anymore.
Kill it.
And that's the mentality
of someone who doesn't understand
how developers might operate.
A developer is not going to want to sign up
for a course or fill out a form.
They're going to want to try the product.
They do that through open source.
Open source, to me, is not a competitor
to Daxter Plus or Enterprise Offerings.
Open source is like a channel.
It's a channel where people get to try it.
If people go out
and they're successful with open source and they never want
to talk to us, that's totally fine by me.
That's another Daxter user out there in the wild talking about how great Daxter is.
That's free marketing.
And so for me, open source is part of it.
And you really have to understand developers to be able to market to them.
And that's really kind of why this marketing journey
between DevRel and marketing made sense to me.
At first, I was suspicious.
I think if you asked me as a DevRel person to report into marketing,
I probably would have said no.
But if you have DevRel and marketing working together
and they're all reporting to me,
it kind of felt fine.
And I'm seeing it today.
It actually works out really well.
Yeah, and I think that's also,
when you get to the open source stuff,
especially when you're trying to do something at scale,
it can be,
most open source projects are really hard to continue
at scale. So it gives you a way of people like it, they trust it, and then they can go to, okay,
how do I make this easier for myself to use over time? Yeah, we see that all the time. Like people
don't want to run and maintain infrastructure generally. It can't be the only thing because
often the companies that are good enough at using Dijkstra, they can figure out how to deploy Dijkstra themselves eventually. It's not that hard.
So you do need to have things that are value-driven in the enterprise offering, hopefully, that
will drive people to that. But also, it's easier to get open source into an enterprise than it is
a vendor. So if I work at a big company and I really like Dijkstra, will I go and try the open source product
and prove its value? Or will I get into this long, lengthy, lawyer-driven
vendor negotiation thing before I've even shown it to my peers
that it's a good idea? I'll often start open source. I'll build some momentum.
And then once we've proven out its value, we've hit either scaling limits
or I just don't want to maintain it or we want additional features.
I've proven it's useful.
I can go and have that conversation and I'll go contact a sales team and have them start.
But knowing that's a journey that people go through is, I think, critical in building out technical orgs that market to technical people.
Yeah, I couldn't agree more with that. And there's this other component to where you've got
a team that's vetting a product, proving it works. Imagine that you're going through a
traditional enterprise sales process. And I've done multiple of these where you don't get to
see, touch, do anything with a product until basically the money has changed hands. It's
been a while since I've done one of those type deals, but I've done those before.
And those are scary as a technical person.
A lot of times, and a lot of times
is maybe driven by marketing or sales, for example.
They've got to have this product,
and then you as a technical person
stuck with you've got to integrate to implement it.
So number one, for people who have been around a little bit,
they have that in the back of their mind
as far as the alternative and hate it. And then number two, you have this other
practical competitor in a sense where the open source
product keeps you, I think, honest as a company.
Where if you ever were to 10x your prices overnight, people
could switch to open source, for example. But if you're
a traditional enterprise-type thing
and you connect your product and people are kind of stuck
because it's hard to replace, then people are stuck
and they have a lot of pain to switch.
So I think that's another component that I've always appreciated about open source.
Well, and I think the other one with that is,
when I first came on to WriterStack on the marketing side,
one of the things that I told them was
I was talking to someone on the marketing team
and they were like well we really want
RudderStack to be the reason you get your next promotion
and my reply to that was
I don't know anyone
on the data side who buys
software to get promoted
I know people who don't buy it because they don't want to get
fired
and the open source kind of helps you bridge that gap where we're not saying like I know people who don't buy it because they don't want to get fired.
That's funny.
And the open source kind of helps you bridge that gap where we're not saying like, hey, I need to make a really big commitment that's going to take time to implement.
And I really hope it goes well or I'm not going to be here in a year.
Yeah.
Yeah.
I mean, the other thing we've seen is like if you really want to get promoted promoted, is you build Dynex for first principles, and it takes three years, and then you quit. Right? You get that staff level engineer, and then you just like, all right, I'm out of here, off to the next one. And then what you've built is like an in-house shitty version of a product you could have bought. Right? So there's two sides of that. I think the open source just makes it easier for everyone. There's this idea, you might be able to avoid vendor lock-in as well which i think really is appealing to people but i mean there's also great software that doesn't have open source and people buy it and love it there's technical things you can do with it but i think we all as
engineers have seen those like monster implementations that promise like often the
best ones are the ones that promise you have no need to talk to your engineers at all when you implement it.
You just plug and play and click a few buttons and you're in.
And then as soon as the deal is signed, oh, by the way, where's your engineers?
We need them to come implement this thing we've never heard of before.
That's the thing I think everyone wants to avoid.
The other version of that is, oh, we're going to handle everything for you.
We're going to help you along the way. And then you sign the deal and they and you say okay how do we
migrate this data and they go oh well it has to follow these this standard we don't do anything
before that that's all on you it's like well that would have been nice to have known a month ago
yeah yeah okay so we played this game on the show
where we see how far into the show
we can get without mentioning AI.
I don't know where we're clocking in today.
I think we did okay.
But I want to talk a little bit about AI
and we got to talk about orchestration.
You know, I think Daxter is a tool
you can also use to orchestrate
when you're, you know,
pulling data together for AI
or doing other things.
I'm curious, like, what are people actually doing?
Maybe people using Daxter that are more on the cutting edge of using LLMs and maybe even
AI agents.
What are people actually practically doing with AI and orchestrators?
Yeah, we see a lot of data prep for AI
within Dynastar itself.
We even see some companies building foundational models
and doing experimentation,
but that is like I would say cutting edge.
But bread and butter use cases,
at the end of the day,
I think AI engineering is data engineering
and we even believe data engineering is software engineering.
So if you follow this logical conclusion, it's all really the same thing.
You're moving around data, you're transforming it, you're storing it, you're converting it, you're embedding it, you're
calling APIs. Is that data engineering or is that working with
open AI and LLMs? That's one and the same. Often
what we find is actually AI engineering is a little bit easier
than ML engineering
because you're relying a lot on these third-party providers,
for example, for embeddings.
You're not training models.
It's not for you.
You're really just experimenting and putting things out.
And so we've seen a lot of companies do things like,
I mean, RAG is the big one, right?
Everyone's trying to, like, AI is great,
but it needs context.
Without context, it's often garbage.
If you go to OpenAI or Cloud today
and you ask it to write a Daxter pipeline,
it's often going to write really terrible code
because it was trained on, like, Daxter code
from three years ago, which probably isn't valid anymore.
But what we've done is we've built internally
a RAG model
that uses our documentation, our GitHub issues,
our GitHub discussions to power what we call Ask AI.
It's a Slack bot in our Slack community.
And it does really good.
Is it perfect? No, but it's a lot better than nothing.
Yeah, I've used it. It's pretty great.
It's pretty good, right? Yeah.
Not bad for a POC, and we can always make it better. Sometimes it gets confused, but it's better than it's pretty good right yeah uh not bad for a poc and you know we can always make
it better sometimes it gets confused but it's better than not getting an answer which is always
what i tell people so context is everything i think in ai and so what is context context is data
right so ingesting our data transforming it picking the right ones adding metadata
running experimentation on those different context windows, on different models. That's really where the extra thing shines. It's just like running
these pipelines. So help me out with this. There is
basically a clone.
Think about a data stack or the modern data stack from
2021. There's a clone of almost every single component that's like AI
focused, right? Like there's orchestration tool, ETL tool, database
specific. And I'm personally not super knowledgeable
about each of those components when it comes to AI. Do you think that stays, or do you think
it all gets consolidated back? Because it's not that different.
Yeah, that's a good question. Maybe the vector databases stay.
If they're lucky, it's my best guess.
Or do they, but I don't know technically
how hard that would be to implement,
you know, for Snowflake and Databricks
to implement that.
Most databases implement some type of embedding already.
Yeah, right.
Snowflake already has a vector version of their database.
Postgres has vector embeddings now.
I think even MotherDuck, DuckDB have it.
Is it that hard to store
a vector of numbers? Probably not.
There might be added benefits to using
a dedicated vector database for
I don't know.
Those are going to become specialized
cases that you run into.
That's my guess. And outside of that, the ETL
stuff,
I think we love reinventing things.
My guess is most people who are getting into AI today,
they're not coming into it
from a background
in data engineering.
Yes.
And so they just don't know
the tools.
So if you don't know the tools,
you think you have to invent things,
right?
Or maybe you just want to
build new things
because old things are boring.
Some of those will probably
stick around because they'll be good enough that everyone uses them and they evolve.
I think a lot of them will fall by the wayside when we realize AI problems are actually data problems and we have data tools to solve that already.
I think a lot of people still, there was this confusion I feel like I still hear around there, which is this idea of we should be replacing all of our deterministic processes with AI.
But I don't need it to give me seven different answers to it.
I just want the one answer that's right every time.
Yeah, I mean, it's people using AI as a calculator.
And it's like, well, it's a very expensive way to warm up the world.
So I don't know.
Maybe we don't need to do that.
I don't know.
Sometimes all you need are if statements and a regex.
And maybe AI can replace that.
But at the end of the day, whatever is faster is what's going to work for people.
Right.
I think on that one, AI is just going to replace me having to look up how to write the regex.
Yeah, that is a decent application.
So, yeah, along the AI kind of questioning,
I mean, you just kind of alluded to this.
I mean, it's still very expensive.
And the billions of dollars being poured into these companies
mask the expense for now.
Like, just this week, it came out that the $200 a month plan
still loses money for OpenAI.
And I think they weren't even necessarily expecting that.
And of course the thought here is like, okay, we're going to keep investing money in this
and we'll have better hardware that's going to drive costs down.
We'll have better models that don't have to be you know trained as you know in the same way to reduce cost well i mean this is just speculation at this point but it'll be interesting and i'm curious
your take what does that curve look like because of it because eventually like the money i think
could run out before we get to that spot but i mean i don't know what do you think just speculation
on what what might happen there.
I mean, there's already some evidence of plateauing.
Do you remember the great VC-funded days of Uber and DoorDash where it didn't cost anything to use these tools? And if you were smart, you would just abuse them as much as you could. You would get
the referrals and the $100 here and the credits there there and there's like five cents to cross the city you can get free
food pretty much every single day and that was wonderful and then the company's in public and
it would cost like fifty dollars to go five miles right i know yeah exactly anywhere near an airport
it's like at least fifty dollars even if you're just going across the street yeah it was supposed
to be better it's supposed to be this utopia and it ended up just being a company that
makes money off people.
And they did so at the expense of killing their competitors.
So will AI
be the same way? I don't know. Probably.
People need to make
margins at some point. Cash
is not infinite right now.
It's really driven off
massive amounts of funding.
At some point, that'll change.
We'll come down for sure,
but when the margins go down,
like the research also slows down.
And so they will probably plateau
and we'll probably find them useful
in some limited capacity
that's probably not going to fundamentally solve AGI,
for example.
And I think we're seeing also that's probably not going to fundamentally solve AGI, for example.
And I think we're seeing also that having the best model is not really much of a moat at this point.
So it's not like you can say, well, yeah, we're going to spend billions,
but once we get it there, we're going to capture everything.
It does sound a bit like that Uber time of it's like,
profits don't matter, we just need to capture market.
And then eventually, once we capture the whole market,
we'll make money off of it.
Yeah, it's tough to capture the market
when really it's a commodity too.
So I think where AI differentiates,
it's through products, actually.
So anyone can build a model these days.
A lot of them are good. There's great
open source models out there.
Integrating that model in a
workflow is where differentiation
I think really happens.
Great companies who really understand that
can make it a lot better.
I think Anthropic and Cloud, for example,
do a really good job with their projects
and the way they've sort of structured Cloud
to make it very useful in particular contexts
for solving these problems and discussions.
I use it all the time.
OpenAI, maybe not as good, I would say,
product-wise as Anthropic these days.
They have more features that I don't end up using,
but purely from a chat agent with documentation store i think claude
does a better job yeah i imagine in a few years we're going to find companies that like really
get the product perspective right and they built really cohesive products which are really powered
by ai rather than just like an ai chatbot that is really good at generating responses which i think
we've sort of hit a peak on, regardless of how much better they get.
Yeah, the other one it makes me think of a little bit
is like satellite telephone stuff,
where it costs a whole lot of money
to get the satellites up
and to get the infrastructure there.
And once you had done all of that,
it was really hard to make money off of it.
But then when the next people came around
and were just using the infrastructure
that was already out there, you could make a profitable model, like business model off of it. But then when the next people came around and were just using the infrastructure that was already out there,
you could make a profitable business model off of it.
Like with a GPS, for example.
Even satellite phone.
It's still around,
and the companies are more profitable with it
because they didn't have to pay
to put all the satellites in there.
Yeah, that's interesting.
Yeah, so we have a few minutes left here.
I'll throw this to Matt.
So Matt, you've spent a little bit of time
with Daxter recently.
And I'm curious, and you've got a data background.
Matt worked for a publicly traded company in data.
I'm curious, yeah, has Daxter
and the orchestration landscape strike you
with what you used in some of your previous roles?
Like, how is it different?
What's the evolution like? Well, so most of the places i worked we didn't really have an orchestrator so we had some more like pipeline related things but we didn't have like a dedicated orchestrator
and a lot of it so it's been an interesting little journey having to get to know it a little bit more
and you know try to sometimes wrap my brain around the concepts
because i think that's usually it because a lot of i mean there's a lot of stuff that you get into
like okay i'm planning things i'm putting them in sequence or in parallel those types of ideas
a lot of it then comes down to what's the framework that they're using to talk about
these things what's the language they're using? What do they label this stuff? So, yeah.
So, I mean, overall it's been,
I have the added twist
that I'm also including Rudder Stack
into this with some new stuff.
So that's thrown some interesting frustrations
the time just learning the two things
at the exact same time.
But I mean, overall it's been,
it's one of those things that I can look at and i can see like oh here's how i could have used it yeah yeah oh yeah when i had a
team of 15 this is how we could have used this right the one thing though i always i had to
think about back then was kind of to go back to a point that you made much earlier in that there's this newer generation of people
who are data scientists or whatever,
and they got taught a very applied way of doing things,
which typically was very software-centric
and how do I call the function to train a model or whatever.
And so when you get into that more broader,
kind of closer to software engineering world,
they sometimes get a little scared.
And so you really had to pick stuff
that you knew you could quickly get them in
and get them learning with.
So remember, we had a software engineer
as a contractor once,
and he was going to show us how to modernize our stuff
and he did this whole thing of just basically tearing things apart building it from scratch
and trying to show it how great it was and i was like okay that's great but no one but you can run
this right like i got a team of people that when you're not here i need you to run it whereas
something like daxter is definitely one that you could see okay i can get a team of people that when you're not here, I need you to run it. Whereas something like Daxter is definitely one that you could see, okay, I can get a team of people
to be up and running with this.
I think that's a really big deal.
Two things I thought of from my previous experiences,
because I'd use, it's actually funny,
I'd use the product called Rundeck.
Adrian, I don't know if you're familiar with that one.
It's like a little bit more than a
Windows task scheduler, but before
we had that DAG type
concept.
It's interesting when you go through
what you would do
every day, and now you have
words and language for it.
I think that's the most interesting thing about finding
a good framework for
oh, I didn't know I was doing orchestration.
Like I just,
you know,
schedule this around this and this.
I think that's one of the things.
And then the second one,
which Matt just touched on,
which I talked a lot about.
And I think orchestration is a big deal here,
here.
When you move,
when your data team moves from like one,
maybe two people to be more of a team,
it was three, four three four five however many people
that conversion from what i call single player mode to multiplayer mode it's a really big deal
the tooling becomes a bigger deal the version control the you know and i think like dbt for
example is one thing that i think is a big deal if you're moving into multiplayer mode for your
data team like dbt and people in that transformation layer having a solution think is a big deal. If you're moving into multiplayer mode for your data team,
like DBT and people in that transformation layer,
having a solution there is a big deal.
With orchestration, same thing.
Where you're now using the same framework,
there's less esoteric-ness when how do we schedule a job is defined.
We use this, it has specs and documentation.
And I think knowing that, because I've been a part of at least one company where orchestration had the name
and it was an employee named Gary.
And so he ran everything.
And when he left, nothing could run.
Versus if you, and then we were scrambling,
whole group of us to try to get things back together right but we
also didn't have any like we didn't have the language because this was almost 10 years ago now
to be able to be like okay now what we need to do is get this into an orchestrator so that
we're not dealing with this anymore yeah and even just i think the language of how do i talk about
these things okay these are assets and stuff like that.
Giving language to that can be very helpful in just helping, I think, a lot of times people get out of the kind of limited mind frame they're in, if that makes sense.
Especially when you're talking about things like, what does data as a product mean? Well, to a lot of data scientists who are very new, it means the model I built and explaining to them, well, no,
you have to have this.
It's the end-to-end
collection to delivery is
the product, not just this little
part that you build.
One last
take. Pedro,
maybe specifically
for Dagster or generally for orchestration,
where do you think this goes in the next couple of years?
What are the core problems in the space to solve
for orchestrators such as Daxter?
Yeah, it's a good question.
I think one of it is something we just touched on,
is that not everyone knows what an orchestrator is
and when they need it.
And so I think at Dxter, we have like two sort of big priorities.
One is just helping generate awareness of what orchestrators are,
what a data platform is, the fact that you probably already have one
and like how to think about observing and having a single place
to look at these things, right?
You can't just go to Gary every single time.
And so having one place where you can understand
where everything is supposed to run,
that's, I think, a big piece of it.
And the other is also just like lowering
that adoption curve for people.
So finding ways to make it easier,
more plug and play to use Zangster
with existing playbooks that you already have
and are pretty common across the industry.
Building those out without losing sight
of sort of the power of Python and Dengster itself
is kind of what we're focused on.
Yeah, makes a ton of sense.
Well, thanks for being on the show.
It's been really fun.
Matt, thanks for being here.
And we'll catch everybody in the next episode.
Thank you.
All right, thank you.
The Data Stack Show is brought to you by Rudderstack,
the warehouse-native customer data platform.
Rudderstack is purpose-built to help data teams turn customer data into competitive advantage.
Learn more at ruddersack.com.