The Data Stack Show - Data Council Week (Ep 4): The Data Council Origin Story With Pete Soderling

Episode Date: April 28, 2022

Highlights from this week’s conversation include:Pete’s start in data and Data Council (2:01)Learning more from failure (6:42)Shaping terminology and definitions (9:30)What investors look for in d...ata technology (12:43)Working as a data engineer (16:32)Data Council takeaways (18:16)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome to the Data Stack Show. Each week we explore the world of data by talking to the people shaping its future. You'll learn about new data technology and trends and how data teams and processes are run at top companies. The Data Stack Show is brought to you by Rudderstack, the CDP for developers. You can learn more at rudderstack.com. Welcome back to the Data Stack Show. I am on site here at Data Council Austin recording some shows. And you'll notice that I said I in the singular, that's because Costas is out doing some really cool stuff with the Starburst team at the conference.
Starting point is 00:00:38 And so I am flying solo, which is maybe going to give Brooks some heartburn, but I have a great guest. So I'm going to talk with Pete, who started Data Council, and he was actually an engineer in a former life and has built this amazing conference. And so I'm just going to ask him about his background and actually what led him to Data Council. And if I'm feeling intrepid, I might ask him about his fund as well because he's an investor, which is uncharted territory. But since Costas and Brooks are gone, I can do whatever I want. So let's dive in and talk with Pete.
Starting point is 00:01:13 Pete, welcome to the Data Stack Show. It's so great to have you here. Thank you. It's really exciting to be here. And we are actually live at Data Council Austin, which is a conference you put on. And seeing the faces in the crowd in the opening was amazing because I think in many ways, people are just so excited to get together and talk about this stuff in person, even though we've been doing it, you know, over for a couple of years. So for sure. Congratulations. And thank you for putting on this amazing event.
Starting point is 00:01:44 Yeah, it's my pleasure. I think everyone feels like they've been let out of prison or something. Yeah, that's true. Okay, so give us the background. So you've been working in and around data for a long time. You know, I want to hear about the founding of Data Council. And you have a fund, which is super interesting. But how did you get your start in data? Yeah, so it starts back a little bit
Starting point is 00:02:05 earlier than Data Council. And I was an engineer, founder, turned founder in 2003. I started two companies in New York before 2010. Then I started two companies in SF after 2010, one of which became Data Council. But one of my New York City companies in 2008, I started an API based cloud security platform. And it was designed for businesses to sell streams of data through our proxy software to other companies. So it was a data oriented play. And I ended up talking to lots of premium data providers, think of like Bloomberg, or Comscore, or Garmin, or these kinds of companies that essentially sell high value data. And we had built this middleware, which was security, proxy, kiosk, metering, billing, sort of all this stuff. And they would plug their API, their data feed into the back of our proxy, and we would advertise it out to the end user and help them turn their data stream into a business. Oh, interesting. Right. So that was really the first time.
Starting point is 00:03:07 They're producing data, but the infrastructure to monetize, like the data is valuable, but like it's hard to build infrastructure to monetize that. Because like if you're Garmin, you're like building maps. Yeah, you build products. You're responsible to sort of push your data and give a context in your own product. And these companies do that well with the Garmin Nav device or the Bloomberg Terminal. But our thinking at the time was, well, what if you unplug the data stream out of your own product and offered it raw to providers or to other customers? Like what kind of magic could they work with that same data?
Starting point is 00:03:43 Super interesting. Okay. So what took you from, so that sort of like brought you into the world of data and then what took you from there to starting Data Council? Well, I think it was, I mean, partially it was the unsuccessful launch of that company and I had to shut it down, you know, a couple of years after we started. But in the meantime, I had moved to the Bay Area and sort of gotten to the startup community there, which was sort of the next level leveling up for me personally. So even though I had to shut that company down, it was called Stratus Security, I ended up, you know, getting sort of keyed into the data world. And by the time 2013 came around,
Starting point is 00:04:21 I had realized that there was this whole sort of strata of data engineering that was being ignored because everyone was talking about the sexy quanti, you know, data science-y stuff that was kind of glittering and, you know, sexy analytics. Well, back then, I don't, I mean, data engineering, you know, was still probably a fairly new term. Yeah, it was definitely was not a role at most, hardly any companies. Maybe Facebook had the notion of a data engineer somewhere, you know, bumping around. But most people in the community were not even really sort of familiar with using that term. Yeah, super interesting. Okay, so you noticed that there's this sort of theme emerging in the type of work that companies are doing in the data space.
Starting point is 00:05:05 And so you decided Data Council. Yeah, so it started off as a meetup inside Spotify's office in New York City. They wanted to attract more machine learning engineers to their projects. And I was doing a consulting project with them. And so we ended up spinning up this meetup because we saw this market opportunity. We call it the Data Engineering Meetup. And it did really well in New York. Then we launched one in SF
Starting point is 00:05:28 and it did equally well there. And by the time 2015 had rolled around, we basically had not just sort of helped the world define what a data engineer was, but we had seen the data scientists come to the group and the analysts come to the group and the researchers come to the group. And it was apparent that everyone wanted to learn
Starting point is 00:05:44 how to work together better with their peers, the adjacent layers of whatever this emerging nascent data stack was going to be. And so we found ourselves as a community kind of thrown right into that conversation. And because we had so much surface area with different kinds of professionals across the data field, Data Council was born out of that meetup, and we've been carrying the torch ever since. Yeah, super interesting. Okay, one question, this is kind of a personal question, but you always hear, you know, sort of the age old wisdom that you learn more from failure than you learn from success, right? And so we're at Data Council. There, you know, are, you know, five or 600
Starting point is 00:06:22 people here, which is huge just coming out of COVID. So, I mean, very successful. But also you said you had to shut your other company down prior to that. Do you think that's true? Did you learn more from sort of shutting down that data company than maybe doing some successful things? Yeah, there's definitely tangible and intangible things that you pick up along the way. And that's part of it.
Starting point is 00:06:47 This is just called experience, right? And I mean, there's a bunch of things that I'm tuned into now, like Data Council is essentially my fourth company. And it was only because of the previous experiences, launching other companies, whether they succeeded or failed, maybe I still would have gotten similar experiences. So I don't know if it's that the failure breeds the wisdom or if
Starting point is 00:07:10 it's just the experience that breeds the wisdom or if it's the same thing. But yeah, like, you know, one thing I'm really aware of that we brought into Data Council is this notion of founder market fit and also the fact that the founder has to articulate the earliest brand of the company and um i've been consciously infusing data council with that brand ever since we started it and i think you know like it's becoming sort of bigger than me now because the team is growing and the community is growing but but really it's kind of like data councils is pete's conference and it's the it's the conference that reflects my values as an engineer. I don't want an over-sponsored conference. I don't want bullshit talks.
Starting point is 00:07:50 I don't want white paper level content. I want to be surrounded by the best, smartest people. And those are software engineers. And so I built a conference that I wanted for myself. And to sort of stick to those values, even through growth is something that's been a bit of a guiding principle for us. Yeah, for sure. Okay. Another personal question. Have you sat in on some of the sessions? And I just know from being involved in conferences, like from a leadership standpoint, you know, a couple of jobs ago, like you're running all over the place, but I just knowing you and the conversations we've had, like you love, you know, getting into the technical stuff. Have you sat in on some of the sessions? A few, a few. It's a little difficult. You know, we have 60 different speakers this week
Starting point is 00:08:28 and four sessions going at one time, plus the office hours track. So there's a lot going on. So unfortunately not too much, but we produce all the videos and upload them for free for the community to YouTube. And so sometimes I consume them. I'm just like the rest of the folks that might not be able to be here. Yeah, yeah. Very, very cool. Okay. One thing I'd love for you to give
Starting point is 00:08:49 our listeners some perspective on. So Data Council has really helped shape some of the, let's say, terminology or definitions around roles and data, right? Because if you go back to, you know, 2012, 2013, data engineering is something that's happening, but, you know, it hasn't been sort of codified like as a role or a specific term, at least as widely as it is now. What are the things that you have seen that have been really positive steps and sort of those definitions across the industry, you know, roles, terminology?
Starting point is 00:09:25 And then what are some of the things that you think are, like the industry is still trying to figure out? Well, I think the Data Council community, just through the sheer innovation and power of engineering, has really helped set forth sort of what the main pieces of infrastructure in a full data stack or a full data system can be. So, you know, we have a few data quality companies that are in Data Council and bump around. We have a few metadata companies, data catalog companies, ETL companies, you know, so there's various folks,
Starting point is 00:10:01 the metrics layers. I mean, I think you see the emergence of all of these categories generally being defined by people in our community or people with some familiarity or adjacency to our community. So I think, you know, we do sort of help each other establish a common vernacular and not just a vernacular, but a common understanding of sort of what the building blocks are. What's been interesting to me is that I think we have these parallel stacks. We have the data analytics ETL stack. Then we have a machine learning stack that sort of runs in parallel to that, but they're actually mostly different pieces.
Starting point is 00:10:36 I'm starting to kind of wait. I'm wondering when we'll start to see some consolidation sort of across those two areas. Like a feature store is kind of like a metric store. And so I think we're start to see some consolidation sort of across those two areas. Like a feature store is kind of like a metric store. And so I think we're starting to see a couple of companies pop up that actually sort of pitch those combined together. So I think we'll start to see maybe some consolidation across these two layers of the stock at some point. Yeah, I think it's interesting. We were talking about this recently in that in some modern companies, you really see the analytics workflow
Starting point is 00:11:08 almost becoming in some ways the front end of the ML workflow, right? Because if you get, I mean, with some of the modern tooling, right, you actually can get a lot of that initial work done, right? Which is super interesting. And that hasn't, to your point, necessarily been fully productized, but like, it's interesting to see that happen within companies, you know, where it's kind of like, oh, wow, actually, like there's less work to do than we thought on the ML side, because sort of the analytics data engineering, like front end of that, that really serves like the BI use cases is now happening in a way that, you know, sort of formats to like an ML workflow. Yeah, for sure. Which is super interesting. Okay. So you also raised a fund, you know, and there are so many podcasts about investing and I know,
Starting point is 00:11:58 you know, very, very little about that. So I don't, I don't want to like get into that, you know, because I don't know what I would say. But what I am interested in is, so you have this really interesting perspective. So practitioner as an engineer, founder in the data space, and then sort of a builder of community that sort of has driven a lot of definition around this. Okay. So that makes me so interested in what do you look for in data technology as an investor, right? Like your thesis or whatever you want to call it. I mean, you really have sort of a really interesting combination of assets there that give you a perspective that I would think is pretty unique as an investor. Yeah, so for me, I mean, it's pretty simple because I'm such an early stage investor.
Starting point is 00:12:49 And also, as I mentioned, I was a founder and sort of have this zero to one sense. And I guess, you know, it sort of dawned on me a few years ago as I was thinking about all the things that I do during my day and organizing, at that time, data councils, you know, we're running around the world. And, but yet there was a, every once in a while I got in a call with a founder from the community who would ask me for advice on their startup or fundraising or something. And those were the calls in my day that I looked back on
Starting point is 00:13:19 and were definitely the high points of my day. And so when I realized that maybe Data Council was just becoming a vehicle or a platform for me to do more of that kind of work, that really made me inspired to take this to the next level. So I raised the Data Community Fund, as you said, in 2020. We have some amazing investors, backers, like Sequoia, Bain Foundation, AngelList, many other folks in the B2B data space.
Starting point is 00:13:45 Oh man, that's amazing. So we're very lucky to get that social proof from those kinds of folks. And in terms of what do I look for, I really invest in team and TAM. I'm a pre-seed, seed stage, very early stage investor. And we don't necessarily have to be right in the same way that a series A or B investor is right. We can sort of, you know, look at the founder's experience, see if they're a great engineer, if they have some key insight that they've learned through their experience, preferably usually at some previous company. Sure. That gives them some key angle and a reason that their startup or their software needs to exist. Sure. So a key insight is one thing that we look for. And then obviously like a really big TAM, a really big market for companies
Starting point is 00:14:46 and for their solution to potentially win the day. We're quite simple in the way we approach things. And we write checks for founders, you know, at inception point, first checks for very, very early stage ideas. Very cool. I mean, what a fun space to be in because you get to play in the technology,
Starting point is 00:15:03 the vision, but that's also a very sort of, I'm not saying later stage investors don't have personal relationships, but the dynamic of that relationship with someone who, you know, has an idea and they're passionate about solving a problem, I would think is pretty energizing. It's very exciting. And to be in a place where, you know, many of the companies that I've invested in now, the reason we even got access to those rounds is because the founder said, oh yeah, like I first spoke about Apache Hoodie at Data Council in 2017. And that's why, you know, I've had good vibes from Data Council and you've helped me by promoting the open source and, you know, our video from the conference has racked up thousands of views on YouTube. And, you know, our video from the conference has racked up thousands of views on YouTube. And,
Starting point is 00:15:45 you know, you really helped sort of expose our open source project in the early days. And, you know, this is why we have such fondness for Data Council as a platform. And that sort of carries on into our investing relationship together as well. Yeah, for sure. I mean, again, I don't know a ton about investing, but I would think if I was a VC and I looked at sort of the platform or deal flow that you have from the community that you've built, I would probably be a little jealous because you get to see these things as they're happening, which is, that's really great. Okay. I'm going to completely flip the question and this may be a little bit unfair. And I know, you know, temper this because I know you're an investor and, you know, you have lots of companies here, but just in terms of your personal interest as an engineer,
Starting point is 00:16:29 not where you would put your money as an investor, but if you were going to go work as an engineer at a company, at a data company, what part of the stack would you go work in? Just out of pure curiosity as, as an engineer, right? Like I'm going to write code to help solve this problem, right? Is it observ going to write code to help solve this problem. Is it observability? Is it streaming? Yeah, you sort of take me back because it's been a long time since I've thought about doing any real engineering. I'm very much an ex-engineer now.
Starting point is 00:16:57 But the thing that really made my eyes light up as a young sort of engineering student was when I learned how databases work and SQL and the optimizations across the data structures and the indexing and the query planning and all those things.
Starting point is 00:17:13 So kind of always been a little bit of a database junkie. So, you know, I'd probably go work with Kishore at StarTree or something like that on some, you know, newfangled Optimize or the guys at EraDB, you know, newfangled, optimized, or the guys at ArrowDB, you know, working on some newer version
Starting point is 00:17:28 of some optimized data system. I think that's probably where I would tend to migrate. Yeah, for sure. And, you know, it's interesting to hear you say that because the database space is pretty tough, right? I mean, like, there's so much interesting technology, but if you think about the time it takes to really build the technology itself at scale is very like difficult to achieve.
Starting point is 00:17:50 And then like bringing it to market, you know, is difficult. So, but it actually just based on what you've done, it doesn't necessarily surprise me that you would sort of go for the jugular on the difficulty. Okay. Last question. So we're live here at Data Council. Interesting new thing that you've learned or new person that you've met that you stuff sort of popping up in the perimeter of data council. And, you know, we've never been a big Python community like the full-on data science community is. Data engineers are not necessarily Python engineers. But we're seeing like lots of cool open source stuff pop up. I mean, I think, you know,
Starting point is 00:18:47 30 or 40% of the startups that I announced were coming out of South on stage at Data Council were probably Python related. So it's just an interesting data point. I don't know if it's here or there, but something that I observed this week that's been interesting to me. Yeah, I agree. The converging of sort of what have been disparate parts of maybe not even technology, but like workflows and sort of interactions is super interesting. Very cool. Well, I can say from experience being on site here,
Starting point is 00:19:17 Data Council has been amazing. So to all of our listeners, you definitely should register and come next year. I've learned a ton. I've met some unbelievable people who have built some unbelievable technology, tons of interesting startups. So Pete, thank you for putting this together. I've personally benefited and best of luck with your fund and investing. Yeah. Thanks for being here and for supporting the conference. Really, really appreciate this opportunity and want to welcome everyone to join
Starting point is 00:19:41 us in Austin next year. What a fun conversation. I think one of the big takeaways that I had from this conversation with Pete was that he really has a lot of experience and background working as an engineer in the data space. And that influences, I think, his empathy for data professionals. And you see that both in the conference that's running. If you were here, you definitely saw that. You see that in the data council in general and the types of content and things that they put out.
Starting point is 00:20:15 And then also, I did actually get to talk a little bit about investment, which was uncharted territory, but super fun. And it was amazing just to hear about Pete's empathy and sort of joy in working with the individuals themselves. And as we've said many times in the show, it's really fun when people are doing exciting things in the data space, but with a focus on the people behind the technology. So also we need to give a big thank you to Pete and the whole team who put the conference on and for allowing us to record on site here. So thank you.
Starting point is 00:20:47 Several more good ones coming up from Data Council. So stay tuned and we'll catch you on the next one. We hope you enjoyed this episode of the Data Stack Show. Be sure to subscribe on your favorite podcast app to get notified about new episodes every week. We'd also love your feedback. You can email me, ericdodds, at eric at datastackshow.com. That's E-R-I-C at datastackshow.com. The show is brought to you by Rudderstack, the CDP for developers.
Starting point is 00:21:15 Learn how to build a CDP on your data warehouse at rudderstack.com. Thank you.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.