The Data Stack Show - Data Council Week (Ep 4): The Data Council Origin Story With Pete Soderling
Episode Date: April 28, 2022Highlights from this week’s conversation include:Pete’s start in data and Data Council (2:01)Learning more from failure (6:42)Shaping terminology and definitions (9:30)What investors look for in d...ata technology (12:43)Working as a data engineer (16:32)Data Council takeaways (18:16)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
Transcript
Discussion (0)
Welcome to the Data Stack Show.
Each week we explore the world of data by talking to the people shaping its future.
You'll learn about new data technology and trends and how data teams and processes are run at top companies.
The Data Stack Show is brought to you by Rudderstack, the CDP for developers.
You can learn more at rudderstack.com.
Welcome back to the Data Stack Show. I am on site
here at Data Council Austin recording some shows. And you'll notice that I said I in the singular,
that's because Costas is out doing some really cool stuff with the Starburst team at the conference.
And so I am flying solo, which is maybe going to give Brooks some heartburn, but I have a great guest.
So I'm going to talk with Pete, who started Data Council, and he was actually an engineer in a
former life and has built this amazing conference. And so I'm just going to ask him about his
background and actually what led him to Data Council. And if I'm feeling intrepid, I might
ask him about his fund as well because he's an investor,
which is uncharted territory.
But since Costas and Brooks are gone, I can do whatever I want.
So let's dive in and talk with Pete.
Pete, welcome to the Data Stack Show.
It's so great to have you here.
Thank you.
It's really exciting to be here.
And we are actually live at Data Council Austin, which is a conference you put on. And seeing the faces in the crowd
in the opening was amazing because I think in many ways, people are just so excited to get
together and talk about this stuff in person, even though we've been doing it, you know, over
for a couple of years. So for sure. Congratulations. And thank you for putting on this amazing event.
Yeah, it's my pleasure.
I think everyone feels like they've been let out of prison or something.
Yeah, that's true.
Okay, so give us the background.
So you've been working in and around data for a long time.
You know, I want to hear about the founding of Data Council.
And you have a fund, which is super interesting. But how did you get your start in data?
Yeah, so it starts back a little bit
earlier than Data Council. And I was an engineer, founder, turned founder in 2003. I started two
companies in New York before 2010. Then I started two companies in SF after 2010, one of which
became Data Council. But one of my New York City companies in 2008, I started an API based cloud security platform. And it was designed
for businesses to sell streams of data through our proxy software to other companies. So it was
a data oriented play. And I ended up talking to lots of premium data providers, think of like
Bloomberg, or Comscore, or Garmin, or these kinds of companies that essentially sell high value data. And we had built this middleware, which was security, proxy, kiosk, metering, billing, sort of all this stuff. And they would plug their API, their data feed into the back of our proxy, and we would advertise it out to the end user and help them turn their data stream into a business.
Oh, interesting. Right.
So that was really the first time.
They're producing data, but the infrastructure to monetize, like the data is valuable, but like it's hard to build infrastructure to monetize that.
Because like if you're Garmin, you're like building maps.
Yeah, you build products.
You're responsible to sort of push your data and give a context in your own product.
And these companies do that well with the Garmin Nav device or the Bloomberg Terminal.
But our thinking at the time was, well, what if you unplug the data stream out of your own product
and offered it raw to providers or to other customers?
Like what kind of magic could they work with that same data?
Super interesting.
Okay.
So what took you from, so that sort of like brought you into the world of data and then what took you from there to starting Data Council?
Well, I think it was, I mean, partially it was the unsuccessful launch of that company and
I had to shut it down, you know, a couple of years after we started. But in the meantime,
I had moved to the Bay Area and sort of gotten to the startup community there, which was sort of the next level leveling up for
me personally. So even though I had to shut that company down, it was called Stratus Security,
I ended up, you know, getting sort of keyed into the data world. And by the time 2013 came around,
I had realized that there was this whole sort of strata of data engineering that was being ignored because everyone was talking about the sexy quanti,
you know, data science-y stuff that was kind of glittering and, you know, sexy analytics.
Well, back then, I don't, I mean, data engineering, you know, was still probably a fairly new term.
Yeah, it was definitely was not a role at most, hardly any companies.
Maybe Facebook had the notion of a data engineer somewhere, you know, bumping around.
But most people in the community were not even really sort of familiar with using that term.
Yeah, super interesting.
Okay, so you noticed that there's this sort of theme emerging in the type of work that companies are doing in the data space.
And so you decided Data Council.
Yeah, so it started off as a meetup inside Spotify's office in New York City.
They wanted to attract more machine learning engineers to their projects.
And I was doing a consulting project with them.
And so we ended up spinning up this meetup because we saw this market opportunity.
We call it the Data Engineering Meetup.
And it did really well in New York.
Then we launched one in SF
and it did equally well there.
And by the time 2015 had rolled around,
we basically had not just sort of helped the world
define what a data engineer was,
but we had seen the data scientists come to the group
and the analysts come to the group
and the researchers come to the group.
And it was apparent that everyone wanted to learn
how to work together better with their peers,
the adjacent layers of whatever this emerging nascent data stack was going to be.
And so we found ourselves as a community kind of thrown right into that conversation.
And because we had so much surface area with different kinds of professionals across the data field,
Data Council was born out of that meetup, and we've been carrying the torch ever since.
Yeah, super interesting. Okay, one question, this is kind of a personal question, but you always
hear, you know, sort of the age old wisdom that you learn more from failure than you learn from
success, right? And so we're at Data Council. There, you know, are, you know, five or 600
people here, which is huge just coming out of COVID.
So, I mean, very successful.
But also you said you had to shut your other company down prior to that.
Do you think that's true?
Did you learn more from sort of shutting down that data company than maybe doing some successful
things?
Yeah, there's definitely tangible and intangible things that you pick up along the way.
And that's part of it.
This is just called experience, right?
And I mean, there's a bunch of things
that I'm tuned into now,
like Data Council is essentially my fourth company.
And it was only because of the previous experiences,
launching other companies,
whether they succeeded or failed,
maybe I still would have gotten similar experiences. So I don't know if it's that the failure breeds the wisdom or if
it's just the experience that breeds the wisdom or if it's the same thing. But yeah, like, you
know, one thing I'm really aware of that we brought into Data Council is this notion of founder market
fit and also the fact that the founder has to articulate the earliest
brand of the company and um i've been consciously infusing data council with that brand ever since
we started it and i think you know like it's becoming sort of bigger than me now because the
team is growing and the community is growing but but really it's kind of like data councils is
pete's conference and it's the it's the conference that reflects my values
as an engineer. I don't want an over-sponsored conference. I don't want bullshit talks.
I don't want white paper level content. I want to be surrounded by the best, smartest people.
And those are software engineers. And so I built a conference that I wanted for myself. And to
sort of stick to those values, even through growth is something
that's been a bit of a guiding principle for us. Yeah, for sure. Okay. Another personal question.
Have you sat in on some of the sessions? And I just know from being involved in conferences,
like from a leadership standpoint, you know, a couple of jobs ago, like you're running all over
the place, but I just knowing you and the conversations we've had, like you love, you
know, getting into the technical stuff. Have you sat in on some of the sessions? A few, a few. It's a little difficult. You know, we have 60 different speakers this week
and four sessions going at one time, plus the office hours track. So there's a lot going on.
So unfortunately not too much, but we produce all the videos and upload them for free for the
community to YouTube. And so sometimes I consume them. I'm just like the rest of the folks that
might not be able to be here.
Yeah, yeah.
Very, very cool.
Okay.
One thing I'd love for you to give
our listeners some perspective on.
So Data Council has really helped shape
some of the, let's say, terminology
or definitions around roles and data, right?
Because if you go back to, you know, 2012, 2013,
data engineering is something that's happening,
but, you know, it hasn't been sort of codified like as a role or a specific term, at least as widely as it is now.
What are the things that you have seen that have been really positive steps and sort of those definitions across the industry, you know, roles, terminology?
And then what are some of the things that you think are,
like the industry is still trying to figure out?
Well, I think the Data Council community, just through the sheer innovation and power of engineering,
has really helped set forth
sort of what the main pieces of infrastructure
in a full data stack or a full data system can be.
So, you know, we have a few data quality companies that are in Data Council and bump around. We have
a few metadata companies, data catalog companies, ETL companies, you know, so there's various folks,
the metrics layers. I mean, I think you see the emergence of all of these categories generally being defined by people in our community or people with some familiarity or
adjacency to our community. So I think, you know, we do sort of help each other establish a common
vernacular and not just a vernacular, but a common understanding of sort of what the building blocks
are. What's been interesting to me is that I think we have these parallel stacks.
We have the data analytics ETL stack.
Then we have a machine learning stack
that sort of runs in parallel to that,
but they're actually mostly different pieces.
I'm starting to kind of wait.
I'm wondering when we'll start to see some consolidation
sort of across those two areas.
Like a feature store is kind of like a metric store. And so I think we're start to see some consolidation sort of across those two areas. Like a feature store
is kind of like a metric store. And so I think we're starting to see a couple of companies pop
up that actually sort of pitch those combined together. So I think we'll start to see maybe
some consolidation across these two layers of the stock at some point. Yeah, I think it's
interesting. We were talking about this recently in that in some modern companies, you really see the analytics workflow
almost becoming in some ways the front end of the ML workflow, right? Because if you get,
I mean, with some of the modern tooling, right, you actually can get a lot of that initial work
done, right? Which is super interesting. And that hasn't, to your point,
necessarily been fully productized, but like, it's interesting to see that happen within companies,
you know, where it's kind of like, oh, wow, actually, like there's less work to do than we
thought on the ML side, because sort of the analytics data engineering, like front end of
that, that really serves like the BI use cases is now happening in a way that, you know, sort of formats to like an ML workflow. Yeah, for sure. Which is super interesting. Okay. So
you also raised a fund, you know, and there are so many podcasts about investing and I know,
you know, very, very little about that. So I don't, I don't want to like get into that,
you know, because I don't know what I would say.
But what I am interested in is, so you have this really interesting perspective. So practitioner as an engineer, founder in the data space, and then sort of a builder of community
that sort of has driven a lot of definition around this. Okay. So that makes me so interested in what do you look for
in data technology as an investor, right? Like your thesis or whatever you want to call it. I
mean, you really have sort of a really interesting combination of assets there that give you a
perspective that I would think is pretty unique as an investor. Yeah, so for me, I mean, it's pretty simple
because I'm such an early stage investor.
And also, as I mentioned, I was a founder
and sort of have this zero to one sense.
And I guess, you know, it sort of dawned on me
a few years ago as I was thinking about all the things
that I do during my day and organizing,
at that time,
data councils, you know, we're running around the world. And, but yet there was a, every once in a while I got in a call with a founder from the community who would ask me for advice on their
startup or fundraising or something. And those were the calls in my day that I looked back on
and were definitely the high points of my day. And so when I realized that maybe Data Council
was just becoming a vehicle or a platform for me
to do more of that kind of work,
that really made me inspired to take this to the next level.
So I raised the Data Community Fund, as you said, in 2020.
We have some amazing investors, backers,
like Sequoia, Bain Foundation, AngelList,
many other folks in the B2B data space.
Oh man, that's amazing.
So we're very lucky to get that social proof from those kinds of folks.
And in terms of what do I look for, I really invest in team and TAM.
I'm a pre-seed, seed stage, very early stage investor.
And we don't necessarily have to be right in the same way that a series A or B investor is right.
We can sort of, you know, look at the founder's experience, see if they're a great engineer, if they have some key insight that they've learned through their experience, preferably usually at some previous company.
Sure.
That gives them some key angle and a reason that their startup or their software needs to exist. Sure. So a key insight is one thing that we look for. And then obviously like a really big TAM, a really big market for companies
and for their solution to potentially win the day.
We're quite simple in the way we approach things.
And we write checks for founders,
you know, at inception point,
first checks for very, very early stage ideas.
Very cool.
I mean, what a fun space to be in
because you get to play in the technology,
the vision, but that's also a very sort of,
I'm not saying later stage investors don't have personal relationships, but the dynamic of that
relationship with someone who, you know, has an idea and they're passionate about solving a problem,
I would think is pretty energizing. It's very exciting. And to be in a place where,
you know, many of the companies that I've invested in now, the reason we even got access to those rounds is because the founder said, oh yeah, like I first spoke about Apache
Hoodie at Data Council in 2017. And that's why, you know, I've had good vibes from Data Council
and you've helped me by promoting the open source and, you know, our video from the conference has
racked up thousands of views on YouTube. And, you know, our video from the conference has racked up thousands of views on YouTube. And,
you know, you really helped sort of expose our open source project in the early days. And, you
know, this is why we have such fondness for Data Council as a platform. And that sort of carries on
into our investing relationship together as well. Yeah, for sure. I mean, again, I don't know a ton
about investing, but I would think if I was a VC and I looked at sort of the platform or deal
flow that you have from the community that you've built, I would probably be a little jealous
because you get to see these things as they're happening, which is, that's really great.
Okay. I'm going to completely flip the question and this may be a little bit unfair. And I know,
you know, temper this because I know you're an investor and, you know, you have lots of companies here, but just in terms of your personal interest as an engineer,
not where you would put your money as an investor, but if you were going to go work
as an engineer at a company, at a data company, what part of the stack would you go work in?
Just out of pure curiosity as, as an engineer, right? Like I'm going to write code to help
solve this problem, right? Is it observ going to write code to help solve this problem.
Is it observability? Is it streaming?
Yeah, you sort of take me back because it's been a long time
since I've thought about doing any real engineering.
I'm very much an ex-engineer now.
But the thing that really made my eyes light up
as a young sort of engineering student
was when I learned how databases work
and SQL and the optimizations
across the data structures
and the indexing
and the query planning
and all those things.
So kind of always been
a little bit of a database junkie.
So, you know, I'd probably go work
with Kishore at StarTree
or something like that
on some, you know, newfangled Optimize
or the guys at EraDB, you know, newfangled, optimized, or the guys at ArrowDB,
you know, working on some newer version
of some optimized data system.
I think that's probably where I would tend to migrate.
Yeah, for sure.
And, you know, it's interesting to hear you say that
because the database space is pretty tough, right?
I mean, like, there's so much interesting technology,
but if you think about the time it takes
to really build the technology itself at scale is very like difficult to achieve.
And then like bringing it to market, you know, is difficult.
So, but it actually just based on what you've done, it doesn't necessarily surprise me that you would sort of go for the jugular on the difficulty.
Okay.
Last question.
So we're live here at Data Council. Interesting new thing that you've learned or new person that you've met that you stuff sort of popping up in the perimeter of data council.
And, you know, we've never been a big Python community like the full-on data science community is.
Data engineers are not necessarily Python engineers.
But we're seeing like lots of cool open source stuff pop up. I mean, I think, you know,
30 or 40% of the startups that I announced were coming out of South on stage at Data Council
were probably Python related. So it's just an interesting data point. I don't know if it's
here or there, but something that I observed this week that's been interesting to me.
Yeah, I agree. The converging of sort of what have been disparate parts of
maybe not even technology, but like workflows
and sort of interactions is super interesting.
Very cool.
Well, I can say from experience being on site here,
Data Council has been amazing.
So to all of our listeners,
you definitely should register and come next year.
I've learned a ton.
I've met some unbelievable people who have built some unbelievable technology, tons of interesting
startups. So Pete, thank you for putting this together. I've personally benefited
and best of luck with your fund and investing. Yeah. Thanks for being here and for supporting
the conference. Really, really appreciate this opportunity and want to welcome everyone to join
us in Austin next year. What a fun conversation.
I think one of the big takeaways that I had from this conversation with Pete was that he really has a lot of experience and background
working as an engineer in the data space.
And that influences, I think, his empathy for data professionals.
And you see that both in the conference that's running.
If you were here, you definitely saw that.
You see that in the data council in general and the types of content and things that they
put out.
And then also, I did actually get to talk a little bit about investment, which was uncharted
territory, but super fun.
And it was amazing just to hear about Pete's empathy and sort of joy in working
with the individuals themselves. And as we've said many times in the show, it's really fun when
people are doing exciting things in the data space, but with a focus on the people behind
the technology. So also we need to give a big thank you to Pete and the whole team who put
the conference on and for allowing us to record on site here.
So thank you.
Several more good ones coming up from Data Council.
So stay tuned and we'll catch you on the next one.
We hope you enjoyed this episode of the Data Stack Show.
Be sure to subscribe on your favorite podcast app to get notified about new episodes every week.
We'd also love your feedback.
You can email me, ericdodds, at eric at datastackshow.com.
That's E-R-I-C at datastackshow.com.
The show is brought to you by Rudderstack, the CDP for developers.
Learn how to build a CDP on your data warehouse at rudderstack.com. Thank you.