The Data Stack Show - 235: Pete Soderling on the Evolution of Data Engineering
Episode Date: April 2, 2025Highlights from this week’s conversation include:Pete's Career Overview (1:00)AI and Data Engineering Discussion (2:05)Themes of Data Council (4:19)High-Frequency Trading Insights (8:04)San Francisc...o's Unique Advantages (10:27)Data Council Conference Preview (13:23)The Magic of In-Person Events (15:45)Collapsing Batch and Streaming Systems (19:47)Leveraging Local Hardware for Data Processing (22:07)Future of Blockchain in Computing (23:57)Intersection of AI and Data Management (26:47)Advice for AI Startup Founders (28:44)Blurring Lines Between Data Roles (32:46)The Evolving Role of Engineers (36:56)Discount Code for Data Council and Parting Thoughts (38:23)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
Transcript
Discussion (0)
Here on the Data Stack Show,
we have been to many of
the major data conferences across the industry.
But year after year,
one of our favorite ones is data council,
and it's because of how much value we get when we go.
I think this year is going to be the best one yet.
It's three days long in-person,
April 22nd to 24th in Oakland,
so back in the Bay Area.
The theme this year is meeting your AI and data heroes, IRL,
and I am personally extremely excited to meet some people
that I have admired for a long time
and a bunch of people that we've had on the show.
I'm really excited to learn what is happening
at the cutting edge of AI and data, and also hear
from people building new tools in the standard data space.
Hi, I'm Eric Dotz.
And I'm John Wessel.
Welcome to the Data Stack Show.
The Data Stack Show is a podcast where we talk about the technical, business, and human
challenges involved in data work.
Join our casual conversations with innovators and data professionals to learn about new
data technologies and how data teams are run at top companies.
Before we dig into today's episode, we want to give a huge thanks to our presenting sponsor,
RutterSack.
They give us the equipment and time to do this show
week in, week out, and provide you the valuable content.
Rudder Sack provides customer data infrastructure
and is used by the world's most innovative companies
to collect, transform, and deliver their event data
wherever it's needed, all in real time.
You can learn more at ruddersack.com.
Welcome back to the Data Sack Show. in real time. know you. A lot of our listeners have met you at the Data Council conference. But for those who haven't met you, give us the brief flyover of your career and how you got into running the largest
conference for data engineering and investing in data companies. Sure. Thanks, Eric. I'm Pete
Soderling. I'm the founder of Data Council and I'm the founder and general partner at Zero Prime
Ventures. I'm a software engineer from
the first internet bubble, as I like to say, back in the 90s. And I was a self-taught hacker,
programmer in high school, sort of made my way to the East Coast to New York City and
had my first jobs in tech in New York City in the first internet bubble. And so
ended up sort of becoming a founder in New York City, started a couple of companies there.
Then in 2010, I moved to the Bay Area, started a couple of companies there. But one of the
companies was a data infrastructure company. And that sort of got me into the early cloud data
in for world and got me really excited about data and just sort of my geek mind went long on data.
And ultimately that sort of culminated in me starting Data Council, the world's first data
engineering conference in 2012, which was a long time ago now, it's hard to
believe. And over the years been sort of building out the data community across
multiple dimensions and ultimately culminating in starting a venture fund
to invest in day zero engineer founders inside the data community and beyond.
That's awesome. So we were talking before the show about AI and the impact on data engineering roles, products.
We're talking about people starting these AI companies
that have no idea about data engineering.
So that's a bunch of topics.
I'm excited about talking about all of that.
What are some things you're excited about?
Yeah, I think that's really astute.
I think there's a whole new generation of hipster hackers,
quote, AI engineers, which is amazing to
see, not to be pejorative because we need new fresh blood in the community. But I think there's
sometimes a gap in what newer folks in the community might understand about older school,
old stodgy data management techniques and architectures and data infra. And so this
year's data council is actually an effort to put those two pieces together and to
explain why all the new sexy AI stuff at scale ultimately becomes and sort of demands a data
engineering solution or data infrastructure solution, a data management solution. So we're
excited about sort of pushing that vision forward and putting these two pieces of the
community together and really explaining why AI needs AI or sorry, why AI needs data management.
I should invert inverted and put it that way around.
I love it.
Well, tons to talk about.
Let's dig in.
Yeah, let's do it.
Pete, the first time we had you on the show was actually just, just under three
years ago, I think we can call it three years ago, towards the beginning of the show.
And you have a really cool superlative for the show.
This is your fourth time on as a guest.
And so you have been on the show more than any other guest,
which is pretty cool.
And I think that is the longest time span as well.
So just really neat for us to look back
at the history of the show and see that you've been a part of it in an important way every single year. So welcome back.
Well, thank you guys. I'm definitely out of good things to say by now then. So you'll forgive me for being a little bit flat today. But it's also been great to have you guys at Data Council and physically recording shows and talking to guests and talking to speakers and your
support and participation over the years and being part of the Data Council community,
both as the Data Stack show, as Router Stack.
It's been really appreciated, so thank you.
Thank you for that and the community appreciates that.
Absolutely.
Well, Pete, we're super excited about Data Council this year and I want to talk about
the themes because we are
watching the industry change before our eyes.
But before we get there, you have this really unique perspective on being an engineer in the first dot com bubble.
And in New York City, that was an exciting time.
There can be an electric feel when you see an ecosystem emerging and you realize, okay, this is potentially going
to be something that people look back on and I can't help but feel the same way about where
we are now with AI and you are on the bleeding edge of that, both the data council and the
companies that you're investing in.
So just give us a little bit of a sense of what was it like back then?
What is it like now?
How does it feel the same and what's different?
Yeah, it's a really interesting question and I have an interesting perspective because I was not in the Bay Area
Which is yeah, typically seen as the pinnacle of engineering dumb right like that's sort of the Bay Area I mean, maybe you're in Boston
You're at MIT, but ultimately is migrate in the same way. I see migrated from. You sort of end up in the Bay Area, but that's where sort of the best
engineers in the country congregate. Well, that was not me. I was stuck in New York City. And
the interesting counterpoint, however, is that New York City had a pretty strong data culture,
and the data culture came out of the quant teams at the banks. So the banks had really pretty hardcore quanti data
science teams. Well, they were also backing into early flavors of data engineering to
satisfy the quants. So there was an interesting subculture of data culture inside New York
City engineering that was actually pretty meaningful. And then it's no wonder that a
lot of those engineers and scientists went to
double-click and double-click turned into this high-frequency, low-latency ad trading platform
because that was like the high-frequency, low-latency system. So the first tech companies
in New York that hit scale were also like sort of pulled out of the quant bank culture and even some architectures and even some business
models. And so there was this interesting bright spot of relatively hardcore data people and some
high-scale engineering as well in New York City in the mid-2000s. And I think some of my interest
in data probably sprung from contacts and touch points with that community. And when we started the data engineering meetup,
originally in 2012, we started it inside Spotify's office in New York City. And our first talk was
from Eric Bernhardson, who spoke on Luigi, which was this data orchestration that he had built at
Spotify. It was open source, free airflow. So all this was happening in New York City. And you don't
think of New York City as being like a hot spot for data engineering, but in my particular case, my educational experiences
were mostly there before I moved to the Bay Area. But yeah, there was an interesting significance
of data, not just quantity stuff, but also some of the early engineering stuff that was centered
in New York City, believe it or not. So that's just an interesting point that a lot of people don't sort of understand I think
from necessarily an engineering culture perspective.
I love it.
There are interesting stories of we've had a couple people on the show who were also
immersed in that quant world and advanced trading and everything.
And one really fascinating thing is that back in those days, physical
proximity to connections for sharing data and the speed was a huge deal, right? And
so like office location and bandwidth and networking was like a serious, it was a data
was a factor of real estate to some extent, which is kind of why.
Yeah, there was a data centers popping up in Jersey City across the Hudson from from Manhattan and there were sort of big ISPs and data centers located over there and
the big banks were starting to build stuff there because it was a short hop from Manhattan
and those latency milliseconds when you're a trader in sort of a high-frequency environment,
especially when you're trying to automate the trading and do more computerized computer-based
approaches was critical. Yeah.
So I love that sort of inside story from being in New York and the data.
What are you sensing now?
What are you feeling now?
What are the founders, the engineers that you work with, that are companies that you
invest in?
What's the feeling like now?
Well, I mean, I guess if my career and life is any example,
I mean, I migrated to San Francisco.
So I don't want to be too pedantic about it
and act like it's a foregone conclusion.
But I do think that a lot of the best engineering
sort of ultimately ends up finding its way to SF.
And I do think that in this current AI world,
there's not many places like San Francisco.
I think there's interesting research, obviously,
in Paris and London has a bunch of deep mind people.
And so I'm not really here to pick a winner
in terms of which cities are best,
but for sure there's a lot of super intense,
concentrated interest experience, funding.
And a lot of that has sort of shifted
maybe back to San Francisco maybe after COVID
if we're thinking about this short timeframe. Were people really thinking that Austin was
going to be a tech capital of the world or Miami? I don't know if people were seriously thinking
that and that's not why we ran data council in Austin. We just wanted to be in a warm, cool,
fun place where people could enjoy themselves. So people thought that I was maybe long on Austin.
And again, not to say anything bad about Austin
because it's an amazing city,
and there's smart engineers there.
But San Francisco is still in my mind kind of unrivaled
at the top of the heap when it comes to this intersection
of AI research, product hacking, funding, startups.
And these are all the things that I care about right now, which it's not just AI research.
Maybe there's strong AI research locations around the world, but when you put the full
stacks together of what it takes to actually weaponize the startup product into a real
growing company in the AI world, I think that there's really no better place right now in
terms of community support and ecosystem than the Bay Area.
So that's sort of my current take. Yeah. As far as that stack, if you had to, if
you pulled one piece, if we think about it like you've got these pillars for it,
if you pulled one piece away that you think makes the biggest difference
about the geography being an SF, what's the one piece that you think really
makes the biggest difference? I mean it's hard to say. I don say. And I don't want to get too high in my own supply. But I mean,
of course, the funding matters. The concentration of funding is like the icing on the cake. I mean,
you have all the other stuff comes before and is arguably more important. The engineering culture,
the deep experience that folks have, the universities of Stanford and Berkeley, just the DNA of like product building and engineering and going to market. And this is the
stuff that matters most. Like I'm an engineer and a founder, and I did a lot of things with no money
because I figured out how to be scrappy. But then you sort of lay over all the investor interest and
the depth of the funds and the size of the funds and things and I think you just get get this really like incredible trucker or not that
is the Bay Area.
Yeah, we've talked a lot about that in the southeast right like what are the ingredients
that make this and I think one of the realities is just almost the the compounding economy
of scale,
of different variables, right? Startup companies, universities, research, like all that sort of stuff. If that machine, if that flywheel turns for decades, it's just, I mean, it's
a juggernaut, I think is a great word for it.
And I don't, and I don't want to say that everyone has to like be two feet, like geographically
committed live in San Francisco. Like I respect remote work and the equalization of talent across geographies and cheap living
costs. I mean, God knows, when I was a founder, I was living in different places around the world,
partly to be cost efficient and phoning home and talking to the team other places.
So I don't think that everyone has to physically be located in SF, but I think if you're an engineer
founder, you ignore SF at your own peril. And so that means that you need to physically be located in SF. But I think if you're an engineer founder, you ignore SF at your own peril.
And so that means that you need to somehow
be connected there.
You need to be spending time there regularly.
You need to sort of honor what the ecosystem is,
even if you choose not to live there.
Like we invest in companies in Europe and in New York City
and all over the US.
And we don't demand that every founder lives in SF,
but I do think that
you ignore it at your peril and you have to sort of come to terms with how you are going
to embrace and leverage that ecosystem and sort of be a part of that ecosystem to the
extent that you can, even if you don't live there.
And I think that's a smart engineer founders find themselves doing in some way.
Such a fascinating topic.
Okay.
Well, speaking of the Bay Area, give us, whet our appetite for data council this
year.
I know there are a couple of specific subjects that we want to get your expertise on, especially
around data and AI, and I think that's going to be a big emphasis of the conference this
year, but we're super pumped.
Tell us what we're going to talk about at data council.
Yeah, well, we are covering lots of sort of good amazing stuff at data council as we
always do across a dimension of different tracks. I think we have 10 tracks this year.
We have a new foundation models track, we have an AI engineering track, which is going
to be awesome. We have a generative AI apps track. That's kind of all on the AI side.
And then the classic data side, we have tried and true data and analytics, data science
and algos, databases track. Andy Pavel is coming from CMU to speak, which we're really excited
about. This will be there from Mother Duck, the author of Mother Duck. Or I'm sorry, DuckDB. The
Mother Duck will also be there, which is the sort of entity around DuckDB. So yeah, Ryan from
Tabular of Iceberg fame will be speaking, Lloyd from
Looker, who's now the author and the creator of Malloy, which is this drop in SQL replacement,
which is quite cool. So we have like lots of old stuff, lots of known names in the data,
classic data, info world. But then we have this new edge of, oh my God, like we're embracing,
we're living in this AI world. And what does this mean for all of us data people?
And I mean, I believe that the mother of AI is data,
but it's sort of explaining to the world
like exactly what that means
and why we believe that to be true
and how these two sides go together.
I'm just part of the theme of data council this year
and we're particularly excited about that.
I love that.
I'm super excited personally,
because it feels like the pace of what's coming out, even just in terms of, I love it. this. I mean, it's creating entirely new categories of problems to solve, especially around data.
And so just to be in one place to have that concentration of those caliber of people,
I feel like it's going to be, it's going to be make it possible to drink from the fire
hose a little bit more than just following Hacker News.
And yeah, our tagline is literally like come and meet
your data and AI heroes IRL and Data Council is such a special event because
we sort of insist on it being in person every year and yep it is tough to get
geeks to like come to the same spot and sometimes I feel like we're dragging
them by their hair I've probably said this before and we cajole and we plead
and we tease them with like amazing speakers and then we put barriers in front
of them because the conference has to like make money to survive.
And so it's like we try and make it as open source friendly as we can and then there's
some commercial things that have to happen and people have to book flights to come and
they have to like take time off work and figure out their schedules.
But then you get everyone in the same room and it's just magic. And all of these genius people, tool builders,
founders, engineers, long-term champion bearers
in the world of data.
And it's just really such a special time.
And it's this IRL component that we think is really special
and we just look forward to every year.
It is totally special.
I can speak from firsthand experience
as a multi-year attendee. Well, Pete, can we dig into a couple of these topics It is totally special.
we sort of say traditional data engine. That's actually relative to the world of technology,
is still very young, the AI is happening so rapidly.
I want to rewind for five minutes on the data engine thing.
Because I think we start there and you're like, Data engine, like what that used to be and what that became.
Because I'm coming from telling Pete in the intro, DBA background years ago, database administrator, separate role, system administrator, separate role.
And then data engine is like, okay, let's pull in some of that old DBA stuff, let's do some analytic stuff.
So I think that'd be fun to start there and then let's move into what we think is coming. Yeah, yeah, I love it. I mean, we've seen the tool stack change over the last decade
as the roles have blended into each other. And obviously the shift left perspective,
which means software engineers end up ruling the world, means that software engineers also end up
ruling more job titles inside a team. And so probably most modern
startups are hard pressed to like identify who the DBA is. The DBA is kind of like all the
engineering team. And when people are responsible to manage the bits that they put in production,
and that might include everything all the way down to the data storage layer. I mean, obviously,
there's still DevOps teams, but even some engineers sort of cross software engineering is eating into the system ops in the DevOps world, right? And then same way. So no, it's been
fascinating to watch that whole amalgamation and evolution through the lens of data and data console.
And then it starts to be more specific like this year, this whole collapsing of batch and streaming
systems like into each other, right? I think Estuary is speaking at
Data Console this year. That's an interesting thing to think about. And then you go down one
more layer. Well, what supports that? Oh, it's the Lakehouse architecture and iceberg tables
and the hoodie tables and these file formats that allow near real-time data streaming use cases on top of them.
And so all of a sudden, that starts to throw into question the orchestration layer.
Because all of a sudden, if you're not orchestrating data into these different formats,
into this long pipeline, and you can just approach the data query where it sits,
get access to where it sits and where it lives, does that obviate some of the ETLing
that we've been doing across these systems
over the last 10 years? So there's all kinds of interesting implications I think that are
buried in this and obviously we see a lot of this evolution in the tooling in the community.
Yeah. That was actually something I'm trying to remember how long ago this was that Andrew
Lamb from InfluxDB had talked about this
before the, I mean Iceberg had certainly been around,
but sort of like right before the big Lake House sort of wave
when you had one house and a bunch of the companies
that sort of got developed around it.
It was interesting to hear him,
he was talking about time series data
which has a bunch of his own unique challenges.
And it was interesting to see him kind of like dream, I mean this is a couple years ago where he was just like, He was talking about time series data,
conceptual thing. control costs really. We had a startup on the show that was like part of their core architecture.
It's like well it's an S3. I think they like segmented buckets by S3 like real-time like
read it in, process data with some customer data, and then like it was ephemeral and like after that
was done like oh they spun it down. Yeah yeah yeah yeah yeah yeah just some interesting things around
around that type stuff. Yeah the S3ification of everything is definitely a theme that we see
in data council
and some of our investments at Zero Prime.
It's not just that internal companies are trying to go for cheap storage whenever possible
and re-architect internal systems to do that.
It's that also the database vendors and there's a new class of databases coming up where everyone
is trying to run on the cheapest storage possible because hey, people are tired
of their snowflake bills and they want sort of more scale and better cost, better economics.
And so we're starting to see S3 become a credible sort of base bedrock for a lot of data storage and
a lot of applications and future applications that are starting to pop up. So that's a common theme
that we're seeing across the industry for sure.
I had another thing kind of around that,
like the S3 thing I've seen.
What are your thoughts on this?
Like, I remember the first time I thought of this,
it was like, wow, this makes a ton of sense.
Just better leveraging this really powerful local hardware
that everybody has.
I think that's another interesting theme.
Like, have you seen that play out the last couple of years?
Yeah, I think there's another interesting theme. Like, have you seen that play out the last couple of years? Yeah, I think there's something there.
We have these Mac processors now on,
I mean, most of the developers that I know
are sort of still Mac junkies.
We have the M2, M3, M4 chips, I guess now.
So there's a lot of pent up like power,
and we're seeing this obviously in some of the AI features
in our local machines.
I guess it's
an interesting counterpoint to moving everything to the cloud because modal wants you to stop using
your desktop period and just run your Python scripts as if they were locally, but they're
really like on some remote cloud instance. So we're seeing kind of things go both ways. I'm
not sure exactly if I can make a bet as an investor yet on which
one's going to win the day from a development environment standpoint. I do think that for sure,
the models, like small models running locally is going to become an increasing, the powerful thing.
And if you see this through Apple intelligence and they're trying to push down all this stuff
onto the actual client hardware. So definitely
think that's a thing. How this actually will impact data, like classic data engineering,
or even engineering workflows and development environments. Is there going to be a battle
between convenience and cloud-based sort of scenarios, sandbox scenarios and integrations
and things versus just the power and the cost effectiveness of developing locally. I think that there's a couple of interesting like credible
factors pushing in both directions so it's hard for me to know exactly what
that's going to go specifically. In the security and compliance angle is fascinating for me too
because Apple would argue who's obviously very invested in the hardware
side of things oh like locals better and all the reasons why that's better for
privacy and stuff.
And the cloud vendors would argue like, oh, well the local device could be compromised.
You want it all controlled in the cloud, that's better. So I think that tension is there,
which also drives people both, I think, two different directions.
Maybe everything's just going to run at the end of the day and we're okay.
Yeah, that's right.
It's like that would be the ultimate irony.
I don't know.
I don't know if I could live with that.
I don't.
Yeah, that would cause a pretty big like, you know, internal crisis for a lot of people.
Right.
Now, it will be interesting.
My screen serve is turning on.
I'm mining some ETH right now.
Sorry about that.
It will be interesting, actually.
I'm going to make it a point to talk to the DuckDB and MotherDuck teams about their local UI that they rolled out.
We talked about that on a recent show, right?
But super interesting.
We talked about what's traditional data engineering, how is that changing, what are the trends? they're going to be focused on that.
more of the advanced tools within a standard data engineering tool set. There's orchestration, there's lake houses, there's pipelines and jobs running.
Now there are a series of tools that are essentially developed in the world of AI,
but they're data engineering tools. So speak to that a little bit. What are you seeing at
Data Council and especially with the companies that you talk to from an investment standpoint? Yeah, so I guess there's different ways to slice this, right?
When you talk about AI engineering and how it's colliding with sort of the traditional
data in the world, obviously there's a whole new workflow of tool set that and process
and development processes that any developer, any engineer anywhere is getting dragged into
through chat, GPT and cursor and
oh my God, vibe coding and all these things. So of course, like at Zero Prime on the investment
side, we've seen data engineering co-pilots pop up and things like this. So that's all sort of the
data engineers workflow is changing in the same way that many other engineering workflows are changing just commonly speaking. I think that in addition to that, there's what I kind of want to bridge to and talk about is
one of the things that we're really passionate about with this year's data council is
really acknowledging the intersection between what AI engineers are likely to face as they have
successful applications and like tried and true work
that the data infrastructure or data management world has done over the last decades. And what
do I mean by that? Well, I think there's this whole new class of AI engineers that maybe think
they can just concatenate strings and throw them against LLMs and have a successful AI app. Well, that might be true,
but everyone knows that the success of your AI app is based on volume and scale. And so as you
like collect more data from your users, that becomes the actual differentiating piece around
your AI wrapper, if you will. So I think there's a whole class of engineers that might become that
successful in their
AI companies and applications.
But then all of a sudden, they have to manage all this data and it becomes a classic data
management problem.
And there's a whole generation, I think, of engineers that might not know and understand
sort of what we've been mucking around in for data council for the last 10 years, which
is tried and true best practices of architectures around data storage
and data processing and cleaning and scale and governance and privacy and all these things.
And so I think there's a really interesting complement between these two worlds. And I
think that if folks want to really understand what mature AI engineering looks like from a data
management perspective, they need to put those two things together. And we think the data engineering, the data council community is uniquely positioned to
really bridge that gap and help these founders, these new AI founders kind of get dragged into
the world of proper data management. And we think it'll be incredibly useful and powerful
tools for them. So that's one thing that I think puts these two pieces together in a really
interesting way that we're quite passionate about this year at Data Console. Can you speak to, and I really hope that there are multiple listeners out there who are starting
their own AI startup.
And if so, and if you're listening, please reach out to us.
We'd love to have you on the show.
Pete would love to talk with you if you're growing quickly.
But can you speak to that person who is thinking about starting a company or maybe has started
an AI company and they are realizing now like, oh yeah, that's actually going to be a major
problem.
What do they need to be thinking about?
They may not need to take immediate action now, but or maybe they do, but what do they
need to be thinking about if they 10x, they're going to face problems that they probably
don't see right now.
Yeah I mean I well I think this is just very general advice but I think it's all about the
quality of the people on your team. This is more startup founder advice than it is technical advice
but I think finding an advisor or somebody who's actually been through data management at scale
who's worked at one of the larger internet companies
or at least a scaling startup and has gone through a lot of the orchestration pieces and the data
storage pieces and has had to choose between different kinds of databases. Some of these
are not obvious things to someone who hasn't gotten fully in the weeds on them. Even questions like,
oh, do I need a standalone vector DB right now or should I
just be using vector storage that's getting bolted on and integrated with all the other
major data tooling? These are real things that I think modern engineers have to figure out.
And no better way to do that than to put someone on your cap table either as an advisor or an
angel investor or someone who's actually gone through these challenges before. And they're probably gonna look a little older than you because you're
the young whippersnapper, super smart AI hacker founder, and they've been around data management
for a while. And it's gonna look like feel like legacy skills and legacy insights. And
I think that's the point is that we need to put these two worlds together. And there's gonna be
a time gap in a skills gap that smart founders will want to complement and sort of plug holes in their team
and their cap table and get good advice from a technical standpoint. So that's just general
advice that I would give any AI founder who's starting a new AI company today to think about
that and to try to add that experience in their team maybe before they need it so that they're not making sort of bad architectural
decisions all the way along the way until they realize they have to be undone.
Yep. So I've got a question then you just came to me. So with data
council I think one of your goal is just to have those two people co-located
right?
kind of traditional. Like how do you commingle those to really grab both people?
I mean partly it's true, like people asked me this year, like are you going to rename
data council?
Like does this require a top-down like rebuild of rebrands?
Is this a completely new thing?
I'm like well I don't know maybe we should and maybe that'll be a different discussion
for next year and beyond.
But this year we sort of like just segmented them into tracks.
So we have about half of the tracks are AI related tracks, about half the tracks are classic data tracks. But then the
cool thing about data councils, we have office hours after the end of every single talk.
So if you want to sort of dig, if you're an AI engineer and you're listening to a database's
talk and you want to dig in with that speaker after you go to the office hours and you can
sort of have a conversation with the speaker. And I think that's where some of the interplay and the cross-functional
skills transfer will come. So that's a very exciting layer of data council that we've
baked in over the years is we're very committed to these office hours that happen at the end
of every talk. And so every speaker is totally approachable, which is why we're in the sort
of meet your data and AI heroes IRL because
you get to talk to them and you can spend quality time with you answering your questions.
And that's part of the magic of how we sort of mesh these communities together.
And we can't even other than that, how could we structure it?
So we just try and get the right people in the room and give them a basic opportunity
to have time in the schedule to let the mind's mingle and then they do the rest of the magic
and the community has always been amazing at that.
So we just try to facilitate.
Very cool.
So great.
We're gonna take a quick break from the episode
to talk about our sponsor, Rutter Stack.
Now I could say a bunch of nice things
as if I found a fancy new tool,
but John has been implementing Rutter Stack
for over half a decade.
John, you work with customer event data every day and you know how hard it can be to make sure RutterStack for over half a decade.
one of my team's secret weapons. picked Rutter Stack was that it does not store the data and we can live stream data to our
downstream tools.
One of the things about the implementation that has been so common over all the years
and with so many Rutter Stack customers is that it wasn't a wholesale replacement of
your stack.
It fit right into your existing tool set.
Yeah, and even with technical tools, Eric, things like Kafka or PubSub, but you don't
have to have all that complicated customer data infrastructure. technical tools, Eric. Things like Kafka or PubSub,
some sort of data product
to a data consumer.
One of the really interesting trends there is that
the line between data consumer and let's say engineer acting on data is blurring.
and let's say engineer acting on data is blurring.
And even before the show, we were kind of talking about the term data engineer,
maybe a little one that's maybe a little bit easier,
is the term analyst is getting kind of interesting, right?
Because it's like, well, I mean, even on a personal level,
I've never had a formal job as an analyst,
but with a clean set of tables, Even on a personal level,
and I wouldn't necessarily consider myself an analytics engineer, but the line is blurring just in terms of jurisdiction because of the tool set. Can you speak to that a little bit?
I mean, it's just I think the power of the AI tools, it's so generalizable because it's such
a great sidekick. And in any collaboration environment, you can find the AI just incredibly
useful. Now there's all kinds of workflow hangups and
improvements and sort of what does the full value chain look like and what are the tactical aspects
of how we communicate with the AI and what shape does that take? But there's no question that it's
just changing every aspect of creativity from image creation to content writing to authoring content to music
generation to engineering. Engineering is just another form of creativity. Like creators,
engineers are creators. And there's a very artistic thing about being creative. And it's
no wonder that AI, which originally caught fire genie AI with all the creative types,
well, that very quickly the engineers got sucked up into that updraft because engineers are creatorial. And the more we realize
that and sort of adapt and are willing to be flexible in using this as a tool to help us.
And the cool thing is, well, I don't know, maybe this is not right. Maybe there are like old
fraughty engineers who are really like hell bent on keep the AI away from me.. Maybe there are like old, fraughty engineers who are really like hell-bent on keep
the AI away from me, just like there are some screenwriter unions that are really fucking scared
about this whole thing. But so far, I think to the engineering credit, I haven't seen a lot of
manifestations of that. And so it seems like the engineering community overall is sort of down with
being very utilitarian and using the AI to help them
create things better, faster, cheaper, and using these coding sidekicks as real collaboration
partners. So that's just a very generalized thought about it, but I do think it is really
cool when you understand that engineers are creatorial and that's how we fit in this immediate
value chain of Gen. AI, which has swept the rest of the creative world. Yeah, yeah, I love that. I was reading an article about the first railroad that they built that was the precursor to the Panama Canal and
I promise this there's a tie-in here.
But it just the amount of it took them like eight years to build this thing and it was less than 50 miles, right?
Because of all the difficulties and all that sort of stuff. Right. But when it was complete, the mind boggling thing to everyone was that they dramatically underestimated the power of the like rapid exchange of goods from east to west and west to east. Right. And it was just dramatically more economically productive than anyone projected. And they had like immense projections, right?
types of additional things could have happened within that eight-year period with all the economic activity, all the different entrepreneurial ideas.
I really feel the same way about AI where it's like, okay, we're just building the railroad
way faster and removing a lot of the manual labor so that there will be a higher flourishing
of human creativity.
Absolutely.
Now, of course, we're going to suck some new creators, quote creators, into the bottom
end of this vacuum that we might not have called engineers before.
So what constitutes an engineer going forward might be an interesting discussion or debate
because the AI enables people who are, let's just, again, not to be mean, but otherwise unqualified
to write code to all of a sudden be more than dangerous at talking to a database or doing
basic data analytics, as you mentioned, Eric.
And I think that's good.
I think overall, we want to increase the surface area of the number of people that can use
these tools and there's real power there.
But of course, that could feel threatening to
engineers who have spent entire careers and degrees and lots of time and effort and blood, sweat, and tears debugging code for a long time. And they have their identity locked up and
I'm an engineer and this other person is not. So there's going to be a whole shift in how we,
I mean, just think of junior developers leaving boot camps and how difficult it's been for
them to get hired in the last 10 years.
We haven't seen the tip of the iceberg once people start really coding with ChatGPT.
And so how does that fit into the ecosystem going forward?
It's maybe a little unclear, but it's fascinating to think that AI can touch and enable and
empower so many people to do cool technical things that otherwise might have felt like it was beyond their reach. Yeah I love it. All right well we
are at the buzzer but of course Pete tell our listeners where they can sign
up to attend data council and get all the information they need because if you
haven't signed up yet definitely check it out and look at getting a ticket
early. Yeah come and see us in Oakland.
Data Council is back in the Bay Area this year.
It's April 22nd to 24th.
DataCouncil.ai is the site.
We'll include it.
We'll create a discount code for Data Stack show listeners.
So we'll just call it Data Stack 20 and pop that in.
You can get a nice discount on your tickets.
Come and see us in person.
Data Council is not online.
It's not live streamed.
You have to sort of commit to be there. We make it worth your while, we promise.
But yeah, come and visit us in Oakland next month and I look forward to seeing you guys there as well.
Awesome. Always a pleasure to have you on the show, Pete.
Thank you guys. It's really fun. Appreciate the work that you're doing.
We have an exciting offer for you for Data Council this year.
It's going back to the Bay Area on April 22nd through 24th.
And if you're a listener of the Data Stack show,
you can get a discount.
Go to datacouncil.ai and use the discount code DATASTACK20.
I've been to a ton of conferences in this industry
and Data Council is absolutely at the top of my list.
I love going every year because of how much value I get,
and I think this is gonna be the best year ever.
The theme is Meet Your Data and AI Heroes IRL.
We're gonna hear what's happening
on the cutting edge of data and AI,
and I'm personally excited to meet some leaders
in the data industry that I've admired
for a really long time.
You'll also get to shake hands with a lot of people who have been on this show before. Join us in Oakland April 22nd through 24th
at data council and don't forget as a listener you get a discount. Data Stack 20. Use that when you
purchase your ticket and we'll see you in a couple weeks. The Data Stack show is brought to you by
Rutter Stack, the warehouse native customer data platform.
Rutter Stack is purpose-built to help data teams
turn customer data into competitive advantage.
Learn more at ruddersack.com.