The Data Stack Show - 235: Pete Soderling on the Evolution of Data Engineering

Starting point is 00:00:00 Here on the Data Stack Show, we have been to many of the major data conferences across the industry. But year after year, one of our favorite ones is data council, and it's because of how much value we get when we go. I think this year is going to be the best one yet. It's three days long in-person,

Starting point is 00:00:20 April 22nd to 24th in Oakland, so back in the Bay Area. The theme this year is meeting your AI and data heroes, IRL, and I am personally extremely excited to meet some people that I have admired for a long time and a bunch of people that we've had on the show. I'm really excited to learn what is happening at the cutting edge of AI and data, and also hear

Starting point is 00:00:46 from people building new tools in the standard data space. Hi, I'm Eric Dotz. And I'm John Wessel. Welcome to the Data Stack Show. The Data Stack Show is a podcast where we talk about the technical, business, and human challenges involved in data work. Join our casual conversations with innovators and data professionals to learn about new data technologies and how data teams are run at top companies.

Starting point is 00:01:19 Before we dig into today's episode, we want to give a huge thanks to our presenting sponsor, RutterSack. They give us the equipment and time to do this show week in, week out, and provide you the valuable content. Rudder Sack provides customer data infrastructure and is used by the world's most innovative companies to collect, transform, and deliver their event data wherever it's needed, all in real time.

Starting point is 00:01:41 You can learn more at ruddersack.com. Welcome back to the Data Sack Show. in real time. know you. A lot of our listeners have met you at the Data Council conference. But for those who haven't met you, give us the brief flyover of your career and how you got into running the largest conference for data engineering and investing in data companies. Sure. Thanks, Eric. I'm Pete Soderling. I'm the founder of Data Council and I'm the founder and general partner at Zero Prime Ventures. I'm a software engineer from the first internet bubble, as I like to say, back in the 90s. And I was a self-taught hacker, programmer in high school, sort of made my way to the East Coast to New York City and had my first jobs in tech in New York City in the first internet bubble. And so

Starting point is 00:02:40 ended up sort of becoming a founder in New York City, started a couple of companies there. Then in 2010, I moved to the Bay Area, started a couple of companies there. But one of the companies was a data infrastructure company. And that sort of got me into the early cloud data in for world and got me really excited about data and just sort of my geek mind went long on data. And ultimately that sort of culminated in me starting Data Council, the world's first data engineering conference in 2012, which was a long time ago now, it's hard to believe. And over the years been sort of building out the data community across multiple dimensions and ultimately culminating in starting a venture fund

Starting point is 00:03:15 to invest in day zero engineer founders inside the data community and beyond. That's awesome. So we were talking before the show about AI and the impact on data engineering roles, products. We're talking about people starting these AI companies that have no idea about data engineering. So that's a bunch of topics. I'm excited about talking about all of that. What are some things you're excited about? Yeah, I think that's really astute.

Starting point is 00:03:40 I think there's a whole new generation of hipster hackers, quote, AI engineers, which is amazing to see, not to be pejorative because we need new fresh blood in the community. But I think there's sometimes a gap in what newer folks in the community might understand about older school, old stodgy data management techniques and architectures and data infra. And so this year's data council is actually an effort to put those two pieces together and to explain why all the new sexy AI stuff at scale ultimately becomes and sort of demands a data engineering solution or data infrastructure solution, a data management solution. So we're

Starting point is 00:04:17 excited about sort of pushing that vision forward and putting these two pieces of the community together and really explaining why AI needs AI or sorry, why AI needs data management. I should invert inverted and put it that way around. I love it. Well, tons to talk about. Let's dig in. Yeah, let's do it. Pete, the first time we had you on the show was actually just, just under three

Starting point is 00:04:39 years ago, I think we can call it three years ago, towards the beginning of the show. And you have a really cool superlative for the show. This is your fourth time on as a guest. And so you have been on the show more than any other guest, which is pretty cool. And I think that is the longest time span as well. So just really neat for us to look back at the history of the show and see that you've been a part of it in an important way every single year. So welcome back.

Starting point is 00:05:10 Well, thank you guys. I'm definitely out of good things to say by now then. So you'll forgive me for being a little bit flat today. But it's also been great to have you guys at Data Council and physically recording shows and talking to guests and talking to speakers and your support and participation over the years and being part of the Data Council community, both as the Data Stack show, as Router Stack. It's been really appreciated, so thank you. Thank you for that and the community appreciates that. Absolutely. Well, Pete, we're super excited about Data Council this year and I want to talk about the themes because we are

Starting point is 00:05:45 watching the industry change before our eyes. But before we get there, you have this really unique perspective on being an engineer in the first dot com bubble. And in New York City, that was an exciting time. There can be an electric feel when you see an ecosystem emerging and you realize, okay, this is potentially going to be something that people look back on and I can't help but feel the same way about where we are now with AI and you are on the bleeding edge of that, both the data council and the companies that you're investing in. So just give us a little bit of a sense of what was it like back then?

Starting point is 00:06:24 What is it like now? How does it feel the same and what's different? Yeah, it's a really interesting question and I have an interesting perspective because I was not in the Bay Area Which is yeah, typically seen as the pinnacle of engineering dumb right like that's sort of the Bay Area I mean, maybe you're in Boston You're at MIT, but ultimately is migrate in the same way. I see migrated from. You sort of end up in the Bay Area, but that's where sort of the best engineers in the country congregate. Well, that was not me. I was stuck in New York City. And the interesting counterpoint, however, is that New York City had a pretty strong data culture, and the data culture came out of the quant teams at the banks. So the banks had really pretty hardcore quanti data

Starting point is 00:07:06 science teams. Well, they were also backing into early flavors of data engineering to satisfy the quants. So there was an interesting subculture of data culture inside New York City engineering that was actually pretty meaningful. And then it's no wonder that a lot of those engineers and scientists went to double-click and double-click turned into this high-frequency, low-latency ad trading platform because that was like the high-frequency, low-latency system. So the first tech companies in New York that hit scale were also like sort of pulled out of the quant bank culture and even some architectures and even some business models. And so there was this interesting bright spot of relatively hardcore data people and some

Starting point is 00:07:53 high-scale engineering as well in New York City in the mid-2000s. And I think some of my interest in data probably sprung from contacts and touch points with that community. And when we started the data engineering meetup, originally in 2012, we started it inside Spotify's office in New York City. And our first talk was from Eric Bernhardson, who spoke on Luigi, which was this data orchestration that he had built at Spotify. It was open source, free airflow. So all this was happening in New York City. And you don't think of New York City as being like a hot spot for data engineering, but in my particular case, my educational experiences were mostly there before I moved to the Bay Area. But yeah, there was an interesting significance of data, not just quantity stuff, but also some of the early engineering stuff that was centered

Starting point is 00:08:42 in New York City, believe it or not. So that's just an interesting point that a lot of people don't sort of understand I think from necessarily an engineering culture perspective. I love it. There are interesting stories of we've had a couple people on the show who were also immersed in that quant world and advanced trading and everything. And one really fascinating thing is that back in those days, physical proximity to connections for sharing data and the speed was a huge deal, right? And so like office location and bandwidth and networking was like a serious, it was a data

Starting point is 00:09:17 was a factor of real estate to some extent, which is kind of why. Yeah, there was a data centers popping up in Jersey City across the Hudson from from Manhattan and there were sort of big ISPs and data centers located over there and the big banks were starting to build stuff there because it was a short hop from Manhattan and those latency milliseconds when you're a trader in sort of a high-frequency environment, especially when you're trying to automate the trading and do more computerized computer-based approaches was critical. Yeah. So I love that sort of inside story from being in New York and the data. What are you sensing now?

Starting point is 00:09:53 What are you feeling now? What are the founders, the engineers that you work with, that are companies that you invest in? What's the feeling like now? Well, I mean, I guess if my career and life is any example, I mean, I migrated to San Francisco. So I don't want to be too pedantic about it and act like it's a foregone conclusion.

Starting point is 00:10:13 But I do think that a lot of the best engineering sort of ultimately ends up finding its way to SF. And I do think that in this current AI world, there's not many places like San Francisco. I think there's interesting research, obviously, in Paris and London has a bunch of deep mind people. And so I'm not really here to pick a winner in terms of which cities are best,

Starting point is 00:10:34 but for sure there's a lot of super intense, concentrated interest experience, funding. And a lot of that has sort of shifted maybe back to San Francisco maybe after COVID if we're thinking about this short timeframe. Were people really thinking that Austin was going to be a tech capital of the world or Miami? I don't know if people were seriously thinking that and that's not why we ran data council in Austin. We just wanted to be in a warm, cool, fun place where people could enjoy themselves. So people thought that I was maybe long on Austin.

Starting point is 00:11:05 And again, not to say anything bad about Austin because it's an amazing city, and there's smart engineers there. But San Francisco is still in my mind kind of unrivaled at the top of the heap when it comes to this intersection of AI research, product hacking, funding, startups. And these are all the things that I care about right now, which it's not just AI research. Maybe there's strong AI research locations around the world, but when you put the full

Starting point is 00:11:32 stacks together of what it takes to actually weaponize the startup product into a real growing company in the AI world, I think that there's really no better place right now in terms of community support and ecosystem than the Bay Area. So that's sort of my current take. Yeah. As far as that stack, if you had to, if you pulled one piece, if we think about it like you've got these pillars for it, if you pulled one piece away that you think makes the biggest difference about the geography being an SF, what's the one piece that you think really makes the biggest difference? I mean it's hard to say. I don say. And I don't want to get too high in my own supply. But I mean,

Starting point is 00:12:08 of course, the funding matters. The concentration of funding is like the icing on the cake. I mean, you have all the other stuff comes before and is arguably more important. The engineering culture, the deep experience that folks have, the universities of Stanford and Berkeley, just the DNA of like product building and engineering and going to market. And this is the stuff that matters most. Like I'm an engineer and a founder, and I did a lot of things with no money because I figured out how to be scrappy. But then you sort of lay over all the investor interest and the depth of the funds and the size of the funds and things and I think you just get get this really like incredible trucker or not that is the Bay Area. Yeah, we've talked a lot about that in the southeast right like what are the ingredients

Starting point is 00:12:53 that make this and I think one of the realities is just almost the the compounding economy of scale, of different variables, right? Startup companies, universities, research, like all that sort of stuff. If that machine, if that flywheel turns for decades, it's just, I mean, it's a juggernaut, I think is a great word for it. And I don't, and I don't want to say that everyone has to like be two feet, like geographically committed live in San Francisco. Like I respect remote work and the equalization of talent across geographies and cheap living costs. I mean, God knows, when I was a founder, I was living in different places around the world, partly to be cost efficient and phoning home and talking to the team other places.

Starting point is 00:13:59 So I don't think that everyone has to physically be located in SF, but I think if you're an engineer founder, you ignore SF at your own peril. And so that means that you need to physically be located in SF. But I think if you're an engineer founder, you ignore SF at your own peril. And so that means that you need to somehow be connected there. You need to be spending time there regularly. You need to sort of honor what the ecosystem is, even if you choose not to live there. Like we invest in companies in Europe and in New York City

Starting point is 00:14:19 and all over the US. And we don't demand that every founder lives in SF, but I do think that you ignore it at your peril and you have to sort of come to terms with how you are going to embrace and leverage that ecosystem and sort of be a part of that ecosystem to the extent that you can, even if you don't live there. And I think that's a smart engineer founders find themselves doing in some way. Such a fascinating topic.

Starting point is 00:14:42 Okay. Well, speaking of the Bay Area, give us, whet our appetite for data council this year. I know there are a couple of specific subjects that we want to get your expertise on, especially around data and AI, and I think that's going to be a big emphasis of the conference this year, but we're super pumped. Tell us what we're going to talk about at data council. Yeah, well, we are covering lots of sort of good amazing stuff at data council as we

Starting point is 00:15:06 always do across a dimension of different tracks. I think we have 10 tracks this year. We have a new foundation models track, we have an AI engineering track, which is going to be awesome. We have a generative AI apps track. That's kind of all on the AI side. And then the classic data side, we have tried and true data and analytics, data science and algos, databases track. Andy Pavel is coming from CMU to speak, which we're really excited about. This will be there from Mother Duck, the author of Mother Duck. Or I'm sorry, DuckDB. The Mother Duck will also be there, which is the sort of entity around DuckDB. So yeah, Ryan from Tabular of Iceberg fame will be speaking, Lloyd from

Starting point is 00:15:47 Looker, who's now the author and the creator of Malloy, which is this drop in SQL replacement, which is quite cool. So we have like lots of old stuff, lots of known names in the data, classic data, info world. But then we have this new edge of, oh my God, like we're embracing, we're living in this AI world. And what does this mean for all of us data people? And I mean, I believe that the mother of AI is data, but it's sort of explaining to the world like exactly what that means and why we believe that to be true

Starting point is 00:16:14 and how these two sides go together. I'm just part of the theme of data council this year and we're particularly excited about that. I love that. I'm super excited personally, because it feels like the pace of what's coming out, even just in terms of, I love it. this. I mean, it's creating entirely new categories of problems to solve, especially around data. And so just to be in one place to have that concentration of those caliber of people, I feel like it's going to be, it's going to be make it possible to drink from the fire

Starting point is 00:16:58 hose a little bit more than just following Hacker News. And yeah, our tagline is literally like come and meet your data and AI heroes IRL and Data Council is such a special event because we sort of insist on it being in person every year and yep it is tough to get geeks to like come to the same spot and sometimes I feel like we're dragging them by their hair I've probably said this before and we cajole and we plead and we tease them with like amazing speakers and then we put barriers in front of them because the conference has to like make money to survive.

Starting point is 00:17:30 And so it's like we try and make it as open source friendly as we can and then there's some commercial things that have to happen and people have to book flights to come and they have to like take time off work and figure out their schedules. But then you get everyone in the same room and it's just magic. And all of these genius people, tool builders, founders, engineers, long-term champion bearers in the world of data. And it's just really such a special time. And it's this IRL component that we think is really special

Starting point is 00:17:57 and we just look forward to every year. It is totally special. I can speak from firsthand experience as a multi-year attendee. Well, Pete, can we dig into a couple of these topics It is totally special. we sort of say traditional data engine. That's actually relative to the world of technology, is still very young, the AI is happening so rapidly. I want to rewind for five minutes on the data engine thing. Because I think we start there and you're like, Data engine, like what that used to be and what that became.

Starting point is 00:18:45 Because I'm coming from telling Pete in the intro, DBA background years ago, database administrator, separate role, system administrator, separate role. And then data engine is like, okay, let's pull in some of that old DBA stuff, let's do some analytic stuff. So I think that'd be fun to start there and then let's move into what we think is coming. Yeah, yeah, I love it. I mean, we've seen the tool stack change over the last decade as the roles have blended into each other. And obviously the shift left perspective, which means software engineers end up ruling the world, means that software engineers also end up ruling more job titles inside a team. And so probably most modern startups are hard pressed to like identify who the DBA is. The DBA is kind of like all the engineering team. And when people are responsible to manage the bits that they put in production,

Starting point is 00:19:36 and that might include everything all the way down to the data storage layer. I mean, obviously, there's still DevOps teams, but even some engineers sort of cross software engineering is eating into the system ops in the DevOps world, right? And then same way. So no, it's been fascinating to watch that whole amalgamation and evolution through the lens of data and data console. And then it starts to be more specific like this year, this whole collapsing of batch and streaming systems like into each other, right? I think Estuary is speaking at Data Console this year. That's an interesting thing to think about. And then you go down one more layer. Well, what supports that? Oh, it's the Lakehouse architecture and iceberg tables and the hoodie tables and these file formats that allow near real-time data streaming use cases on top of them.

Starting point is 00:20:25 And so all of a sudden, that starts to throw into question the orchestration layer. Because all of a sudden, if you're not orchestrating data into these different formats, into this long pipeline, and you can just approach the data query where it sits, get access to where it sits and where it lives, does that obviate some of the ETLing that we've been doing across these systems over the last 10 years? So there's all kinds of interesting implications I think that are buried in this and obviously we see a lot of this evolution in the tooling in the community. Yeah. That was actually something I'm trying to remember how long ago this was that Andrew

Starting point is 00:21:00 Lamb from InfluxDB had talked about this before the, I mean Iceberg had certainly been around, but sort of like right before the big Lake House sort of wave when you had one house and a bunch of the companies that sort of got developed around it. It was interesting to hear him, he was talking about time series data which has a bunch of his own unique challenges.

Starting point is 00:21:23 And it was interesting to see him kind of like dream, I mean this is a couple years ago where he was just like, He was talking about time series data, conceptual thing. control costs really. We had a startup on the show that was like part of their core architecture. It's like well it's an S3. I think they like segmented buckets by S3 like real-time like read it in, process data with some customer data, and then like it was ephemeral and like after that was done like oh they spun it down. Yeah yeah yeah yeah yeah yeah just some interesting things around around that type stuff. Yeah the S3ification of everything is definitely a theme that we see in data council and some of our investments at Zero Prime.

Starting point is 00:22:28 It's not just that internal companies are trying to go for cheap storage whenever possible and re-architect internal systems to do that. It's that also the database vendors and there's a new class of databases coming up where everyone is trying to run on the cheapest storage possible because hey, people are tired of their snowflake bills and they want sort of more scale and better cost, better economics. And so we're starting to see S3 become a credible sort of base bedrock for a lot of data storage and a lot of applications and future applications that are starting to pop up. So that's a common theme that we're seeing across the industry for sure.

Starting point is 00:23:06 I had another thing kind of around that, like the S3 thing I've seen. What are your thoughts on this? Like, I remember the first time I thought of this, it was like, wow, this makes a ton of sense. Just better leveraging this really powerful local hardware that everybody has. I think that's another interesting theme.

Starting point is 00:23:21 Like, have you seen that play out the last couple of years? Yeah, I think there's another interesting theme. Like, have you seen that play out the last couple of years? Yeah, I think there's something there. We have these Mac processors now on, I mean, most of the developers that I know are sort of still Mac junkies. We have the M2, M3, M4 chips, I guess now. So there's a lot of pent up like power, and we're seeing this obviously in some of the AI features

Starting point is 00:23:43 in our local machines. I guess it's an interesting counterpoint to moving everything to the cloud because modal wants you to stop using your desktop period and just run your Python scripts as if they were locally, but they're really like on some remote cloud instance. So we're seeing kind of things go both ways. I'm not sure exactly if I can make a bet as an investor yet on which one's going to win the day from a development environment standpoint. I do think that for sure, the models, like small models running locally is going to become an increasing, the powerful thing.

Starting point is 00:24:18 And if you see this through Apple intelligence and they're trying to push down all this stuff onto the actual client hardware. So definitely think that's a thing. How this actually will impact data, like classic data engineering, or even engineering workflows and development environments. Is there going to be a battle between convenience and cloud-based sort of scenarios, sandbox scenarios and integrations and things versus just the power and the cost effectiveness of developing locally. I think that there's a couple of interesting like credible factors pushing in both directions so it's hard for me to know exactly what that's going to go specifically. In the security and compliance angle is fascinating for me too

Starting point is 00:24:55 because Apple would argue who's obviously very invested in the hardware side of things oh like locals better and all the reasons why that's better for privacy and stuff. And the cloud vendors would argue like, oh, well the local device could be compromised. You want it all controlled in the cloud, that's better. So I think that tension is there, which also drives people both, I think, two different directions. Maybe everything's just going to run at the end of the day and we're okay. Yeah, that's right.

Starting point is 00:25:26 It's like that would be the ultimate irony. I don't know. I don't know if I could live with that. I don't. Yeah, that would cause a pretty big like, you know, internal crisis for a lot of people. Right. Now, it will be interesting. My screen serve is turning on.

Starting point is 00:25:38 I'm mining some ETH right now. Sorry about that. It will be interesting, actually. I'm going to make it a point to talk to the DuckDB and MotherDuck teams about their local UI that they rolled out. We talked about that on a recent show, right? But super interesting. We talked about what's traditional data engineering, how is that changing, what are the trends? they're going to be focused on that. more of the advanced tools within a standard data engineering tool set. There's orchestration, there's lake houses, there's pipelines and jobs running.

Starting point is 00:26:31 Now there are a series of tools that are essentially developed in the world of AI, but they're data engineering tools. So speak to that a little bit. What are you seeing at Data Council and especially with the companies that you talk to from an investment standpoint? Yeah, so I guess there's different ways to slice this, right? When you talk about AI engineering and how it's colliding with sort of the traditional data in the world, obviously there's a whole new workflow of tool set that and process and development processes that any developer, any engineer anywhere is getting dragged into through chat, GPT and cursor and oh my God, vibe coding and all these things. So of course, like at Zero Prime on the investment

Starting point is 00:27:12 side, we've seen data engineering co-pilots pop up and things like this. So that's all sort of the data engineers workflow is changing in the same way that many other engineering workflows are changing just commonly speaking. I think that in addition to that, there's what I kind of want to bridge to and talk about is one of the things that we're really passionate about with this year's data council is really acknowledging the intersection between what AI engineers are likely to face as they have successful applications and like tried and true work that the data infrastructure or data management world has done over the last decades. And what do I mean by that? Well, I think there's this whole new class of AI engineers that maybe think they can just concatenate strings and throw them against LLMs and have a successful AI app. Well, that might be true,

Starting point is 00:28:06 but everyone knows that the success of your AI app is based on volume and scale. And so as you like collect more data from your users, that becomes the actual differentiating piece around your AI wrapper, if you will. So I think there's a whole class of engineers that might become that successful in their AI companies and applications. But then all of a sudden, they have to manage all this data and it becomes a classic data management problem. And there's a whole generation, I think, of engineers that might not know and understand

Starting point is 00:28:37 sort of what we've been mucking around in for data council for the last 10 years, which is tried and true best practices of architectures around data storage and data processing and cleaning and scale and governance and privacy and all these things. And so I think there's a really interesting complement between these two worlds. And I think that if folks want to really understand what mature AI engineering looks like from a data management perspective, they need to put those two things together. And we think the data engineering, the data council community is uniquely positioned to really bridge that gap and help these founders, these new AI founders kind of get dragged into the world of proper data management. And we think it'll be incredibly useful and powerful

Starting point is 00:29:16 tools for them. So that's one thing that I think puts these two pieces together in a really interesting way that we're quite passionate about this year at Data Console. Can you speak to, and I really hope that there are multiple listeners out there who are starting their own AI startup. And if so, and if you're listening, please reach out to us. We'd love to have you on the show. Pete would love to talk with you if you're growing quickly. But can you speak to that person who is thinking about starting a company or maybe has started an AI company and they are realizing now like, oh yeah, that's actually going to be a major

Starting point is 00:29:52 problem. What do they need to be thinking about? They may not need to take immediate action now, but or maybe they do, but what do they need to be thinking about if they 10x, they're going to face problems that they probably don't see right now. Yeah I mean I well I think this is just very general advice but I think it's all about the quality of the people on your team. This is more startup founder advice than it is technical advice but I think finding an advisor or somebody who's actually been through data management at scale

Starting point is 00:30:23 who's worked at one of the larger internet companies or at least a scaling startup and has gone through a lot of the orchestration pieces and the data storage pieces and has had to choose between different kinds of databases. Some of these are not obvious things to someone who hasn't gotten fully in the weeds on them. Even questions like, oh, do I need a standalone vector DB right now or should I just be using vector storage that's getting bolted on and integrated with all the other major data tooling? These are real things that I think modern engineers have to figure out. And no better way to do that than to put someone on your cap table either as an advisor or an

Starting point is 00:31:01 angel investor or someone who's actually gone through these challenges before. And they're probably gonna look a little older than you because you're the young whippersnapper, super smart AI hacker founder, and they've been around data management for a while. And it's gonna look like feel like legacy skills and legacy insights. And I think that's the point is that we need to put these two worlds together. And there's gonna be a time gap in a skills gap that smart founders will want to complement and sort of plug holes in their team and their cap table and get good advice from a technical standpoint. So that's just general advice that I would give any AI founder who's starting a new AI company today to think about that and to try to add that experience in their team maybe before they need it so that they're not making sort of bad architectural

Starting point is 00:31:47 decisions all the way along the way until they realize they have to be undone. Yep. So I've got a question then you just came to me. So with data council I think one of your goal is just to have those two people co-located right? kind of traditional. Like how do you commingle those to really grab both people? I mean partly it's true, like people asked me this year, like are you going to rename data council? Like does this require a top-down like rebuild of rebrands?

Starting point is 00:32:32 Is this a completely new thing? I'm like well I don't know maybe we should and maybe that'll be a different discussion for next year and beyond. But this year we sort of like just segmented them into tracks. So we have about half of the tracks are AI related tracks, about half the tracks are classic data tracks. But then the cool thing about data councils, we have office hours after the end of every single talk. So if you want to sort of dig, if you're an AI engineer and you're listening to a database's talk and you want to dig in with that speaker after you go to the office hours and you can

Starting point is 00:33:03 sort of have a conversation with the speaker. And I think that's where some of the interplay and the cross-functional skills transfer will come. So that's a very exciting layer of data council that we've baked in over the years is we're very committed to these office hours that happen at the end of every talk. And so every speaker is totally approachable, which is why we're in the sort of meet your data and AI heroes IRL because you get to talk to them and you can spend quality time with you answering your questions. And that's part of the magic of how we sort of mesh these communities together. And we can't even other than that, how could we structure it?

Starting point is 00:33:35 So we just try and get the right people in the room and give them a basic opportunity to have time in the schedule to let the mind's mingle and then they do the rest of the magic and the community has always been amazing at that. So we just try to facilitate. Very cool. So great. We're gonna take a quick break from the episode to talk about our sponsor, Rutter Stack.

Starting point is 00:33:54 Now I could say a bunch of nice things as if I found a fancy new tool, but John has been implementing Rutter Stack for over half a decade. John, you work with customer event data every day and you know how hard it can be to make sure RutterStack for over half a decade. one of my team's secret weapons. picked Rutter Stack was that it does not store the data and we can live stream data to our downstream tools. One of the things about the implementation that has been so common over all the years

Starting point is 00:34:51 and with so many Rutter Stack customers is that it wasn't a wholesale replacement of your stack. It fit right into your existing tool set. Yeah, and even with technical tools, Eric, things like Kafka or PubSub, but you don't have to have all that complicated customer data infrastructure. technical tools, Eric. Things like Kafka or PubSub, some sort of data product to a data consumer. One of the really interesting trends there is that

Starting point is 00:35:37 the line between data consumer and let's say engineer acting on data is blurring. and let's say engineer acting on data is blurring. And even before the show, we were kind of talking about the term data engineer, maybe a little one that's maybe a little bit easier, is the term analyst is getting kind of interesting, right? Because it's like, well, I mean, even on a personal level, I've never had a formal job as an analyst, but with a clean set of tables, Even on a personal level,

Starting point is 00:36:25 and I wouldn't necessarily consider myself an analytics engineer, but the line is blurring just in terms of jurisdiction because of the tool set. Can you speak to that a little bit? I mean, it's just I think the power of the AI tools, it's so generalizable because it's such a great sidekick. And in any collaboration environment, you can find the AI just incredibly useful. Now there's all kinds of workflow hangups and improvements and sort of what does the full value chain look like and what are the tactical aspects of how we communicate with the AI and what shape does that take? But there's no question that it's just changing every aspect of creativity from image creation to content writing to authoring content to music generation to engineering. Engineering is just another form of creativity. Like creators,

Starting point is 00:37:12 engineers are creators. And there's a very artistic thing about being creative. And it's no wonder that AI, which originally caught fire genie AI with all the creative types, well, that very quickly the engineers got sucked up into that updraft because engineers are creatorial. And the more we realize that and sort of adapt and are willing to be flexible in using this as a tool to help us. And the cool thing is, well, I don't know, maybe this is not right. Maybe there are like old fraughty engineers who are really like hell bent on keep the AI away from me.. Maybe there are like old, fraughty engineers who are really like hell-bent on keep the AI away from me, just like there are some screenwriter unions that are really fucking scared about this whole thing. But so far, I think to the engineering credit, I haven't seen a lot of

Starting point is 00:37:57 manifestations of that. And so it seems like the engineering community overall is sort of down with being very utilitarian and using the AI to help them create things better, faster, cheaper, and using these coding sidekicks as real collaboration partners. So that's just a very generalized thought about it, but I do think it is really cool when you understand that engineers are creatorial and that's how we fit in this immediate value chain of Gen. AI, which has swept the rest of the creative world. Yeah, yeah, I love that. I was reading an article about the first railroad that they built that was the precursor to the Panama Canal and I promise this there's a tie-in here. But it just the amount of it took them like eight years to build this thing and it was less than 50 miles, right?

Starting point is 00:38:43 Because of all the difficulties and all that sort of stuff. Right. But when it was complete, the mind boggling thing to everyone was that they dramatically underestimated the power of the like rapid exchange of goods from east to west and west to east. Right. And it was just dramatically more economically productive than anyone projected. And they had like immense projections, right? types of additional things could have happened within that eight-year period with all the economic activity, all the different entrepreneurial ideas. I really feel the same way about AI where it's like, okay, we're just building the railroad way faster and removing a lot of the manual labor so that there will be a higher flourishing of human creativity. Absolutely. Now, of course, we're going to suck some new creators, quote creators, into the bottom end of this vacuum that we might not have called engineers before.

Starting point is 00:39:51 So what constitutes an engineer going forward might be an interesting discussion or debate because the AI enables people who are, let's just, again, not to be mean, but otherwise unqualified to write code to all of a sudden be more than dangerous at talking to a database or doing basic data analytics, as you mentioned, Eric. And I think that's good. I think overall, we want to increase the surface area of the number of people that can use these tools and there's real power there. But of course, that could feel threatening to

Starting point is 00:40:30 engineers who have spent entire careers and degrees and lots of time and effort and blood, sweat, and tears debugging code for a long time. And they have their identity locked up and I'm an engineer and this other person is not. So there's going to be a whole shift in how we, I mean, just think of junior developers leaving boot camps and how difficult it's been for them to get hired in the last 10 years. We haven't seen the tip of the iceberg once people start really coding with ChatGPT. And so how does that fit into the ecosystem going forward? It's maybe a little unclear, but it's fascinating to think that AI can touch and enable and empower so many people to do cool technical things that otherwise might have felt like it was beyond their reach. Yeah I love it. All right well we

Starting point is 00:41:08 are at the buzzer but of course Pete tell our listeners where they can sign up to attend data council and get all the information they need because if you haven't signed up yet definitely check it out and look at getting a ticket early. Yeah come and see us in Oakland. Data Council is back in the Bay Area this year. It's April 22nd to 24th. DataCouncil.ai is the site. We'll include it.

Starting point is 00:41:32 We'll create a discount code for Data Stack show listeners. So we'll just call it Data Stack 20 and pop that in. You can get a nice discount on your tickets. Come and see us in person. Data Council is not online. It's not live streamed. You have to sort of commit to be there. We make it worth your while, we promise. But yeah, come and visit us in Oakland next month and I look forward to seeing you guys there as well.

Starting point is 00:41:52 Awesome. Always a pleasure to have you on the show, Pete. Thank you guys. It's really fun. Appreciate the work that you're doing. We have an exciting offer for you for Data Council this year. It's going back to the Bay Area on April 22nd through 24th. And if you're a listener of the Data Stack show, you can get a discount. Go to datacouncil.ai and use the discount code DATASTACK20. I've been to a ton of conferences in this industry

Starting point is 00:42:19 and Data Council is absolutely at the top of my list. I love going every year because of how much value I get, and I think this is gonna be the best year ever. The theme is Meet Your Data and AI Heroes IRL. We're gonna hear what's happening on the cutting edge of data and AI, and I'm personally excited to meet some leaders in the data industry that I've admired

Starting point is 00:42:41 for a really long time. You'll also get to shake hands with a lot of people who have been on this show before. Join us in Oakland April 22nd through 24th at data council and don't forget as a listener you get a discount. Data Stack 20. Use that when you purchase your ticket and we'll see you in a couple weeks. The Data Stack show is brought to you by Rutter Stack, the warehouse native customer data platform. Rutter Stack is purpose-built to help data teams turn customer data into competitive advantage. Learn more at ruddersack.com.

Pet Camera - EBO Air 2

The Data Stack Show - 235: Pete Soderling on the Evolution of Data Engineering

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.

Your Ad Here

Pet Camera - EBO Air 2

The Data Stack Show - 235: Pete Soderling on the Evolution of Data Engineering

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.