The Data Stack Show - 105: The Modern Data Stack Is Just Getting Started with Astasia Myers of Quiet Capital
Episode Date: September 21, 2022Highlights from this week’s conversation include:Astasia’s background and career journey (3:03)How Astasia evaluates data companies (5:25)Defining “modern data stack” (8:39)The limit of the co...mplexity of a solution (18:44)How risky early-stage acquisition really is (26:15)Flashing headlight advice for investing (30:17)Signs you should do a product integration (33:38)The next data infrastructure opportunities (36:19)The likelihood of two data worlds merging (43:55)How important open source is (49:14)Data-centric ML (53:47)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com..
Transcript
Discussion (0)
Welcome to the Data Stack Show.
Each week we explore the world of data by talking to the people shaping its future.
You'll learn about new data technology and trends and how data teams and processes are run at top companies.
The Data Stack Show is brought to you by Rudderstack, the CDP for developers.
You can learn more at rudderstack.com.
All right, Acostas, after recording four episodes this week, guess what happened?
I don't know if you can tell. I think I can. Yeah. I love your,
I love your voice. I think you are ready to go into like the after midnight radio program.
Oh yes. Yeah. I turn this into a late show yes yes for lonely souls out there that's
that's right yeah let's do it actually having like a sultry voice is a great introduction to
this episode because it's our first investor interview on the show, which is super interesting.
So we have Stasia, who is now with Quiet Ventures.
She was at Redpoint for a long time.
And she invests in data tooling and has made investments
in actually a lot of the companies founded by people
who have been on our show, which is super interesting.
Lots of fun connections there.
I think one of my biggest questions is going to be around the way that she thinks about evaluating data tooling.
Because if you think about an investor who focuses on data tools, every single day they're looking at new technology, trying to understand it.
And they have a very wide and deep view on the different ways that companies are trying
to solve particular data problems.
And so I know there's investment criteria on the business side, but they also look at
the technical side of the product.
And so I think it'd be helpful for me and hopefully our listeners to understand
the framework that they use to sort of evaluate that because they
just spend so much time doing.
How about you?
Yeah.
Too many questions in my mind, to be honest, but I'd love to hear her
opinion about like the modern data stack.
What is it like, why it exists and how she thinks that it will evolve.
And of course, we shouldn't lose the opportunity of like asking here about what's next and what's
popped out there. I mean, that's what investors are best at, right? So. Absolutely. Let's do it.
Let's do it. Nastassasia, welcome to the show.
We are so excited to chat with you.
Thanks so much for having me, Eric.
It's a pleasure to be here.
Okay.
Do you want to give us your background and tell us how you got into investing in data products?
Sure thing. So I have been an investor for nearly a decade now, really starting my career at Cisco
on the M&A and venture investing team. Did a whole bunch of really fun investing in storage
businesses back in the day, like in Kirksey City. I then transitioned to Redpoint Ventures to be on
the early stage team, was there for about four and a half years.
And then more recently joined Quiet Capital to be an enterprise partner leading the practice over here.
My background is a specialty in solutions that sell into technical audiences.
So of course, big data issue, learning, cybersecurity, dev tools and infrastructure.
I've been really humbled to partner with businesses
like Dremio and LaunchDarkly, Solo.io, Preset, Hex,
Superbase, Airbyte, some amazing founders
that you even had on the show yourself.
So it's been a fun ride.
And we're always evolving things in data
and it's brighter than ever before.
Awesome.
And you're the first investor on the show, actually, as a guest, which is super exciting.
Couldn't think of anyone better.
And I know that personally, I've read a lot of your work, sort of outlining new data technologies
and your thinking.
So you've been very helpful to me personally.
So thank you for that.
I'd love to start out. So you, as part of your job, you evaluate data tooling all the time, right? I mean, maybe every day. And you have such a wide perspective on the market because of that, because you get to see so many new types of technologies. technologies, and I think specifically the different approaches that companies are taking
to solving similar problems with data, you know, sort of creating new opportunities with data.
I'd love for you to share with our listeners sort of the evaluation framework you use,
because a lot of our listeners have to look for tooling to solve their own data problems a lot, but
they haven't invested as much time as you sort of surveying the market, trying to understand
data tooling.
And so how do you evaluate companies and the way that they're approaching solving particular
data problems?
That's such a great question, Eric.
And I really think it's important to clarify for the listeners, the way that we think about a data tool could actually be different than how we think about an investment in a data and a company.
When we think about it in terms of a company, we're thinking about the team, the technology, the market space.
When we're thinking about a tool, it could have a company behind it,
but it may not necessarily have to be. It could be a great open source project that provides
a lot of value. With a tool, there may not be a monetization opportunity for it,
even if it is fantastic. So I just want to make sure that we clarify the difference
for the audience today. In terms of how we think
about a useful data tool, we think about if there's any existing offerings in this space,
an open source project or a commercial offering that is failing at some aspect of their core
functionality or capability and the magnitude in which they're
not providing the value they should be to users. And so that is the first framing. What exists
today? What are the issues? The second thing that we really like to dig into is the criticality of
this pain. Is the gap something that could be easily solved with people and you can throw
bodies at it? Is it something that would be best served by a software offering? And is it mission
critical to the business that it is solved? We also look at the core IP. If there are any
patents against the technology that suggests it has differentiation that is not easily replicated by alternatives or, you know, open source projects that can offer it for free, have wonderful adoption, but not be monetized. And my favorite thing to dig into with the product itself is how easy it is to use
and implement for teams. I often find that there's really sophisticated data solutions that are out
there, but they're too complex to get up and running to demonstrate value. So when I do
diligence calls on technology, I literally ask them, how long does it take to implement? How long does it take to show value? What is the magnitude of the value you're experiencing? And do you think it will be enduring? It's always best if the tool can actually fit into a macro trend. You know, later we'll be talking about the role of the modern data stack and the movement to cloud. But it's not a must-have, but it's a nice to have.
Super interesting.
That's super helpful.
Let's actually just jump straight into the modern data stack
because this is something you've written about a ton.
And I would love for you to just,
can you define the modern data stack in your own words for our listeners?
Because there are lots of sort of definitions and like a million architectural diagrams
out there that have like different flavors of this.
But you've done some really influential thinking on this.
Can you define it for us in your own words?
Sure thing.
Yeah, there's so many different definitions.
I feel like there's the agnostic research analyst definition. There could
be a vendor definition that likes to highlight certain components based on what they're offering.
From my perspective, the modern data stack is an analytic stack that has the foundation
around a cloud data warehouse. And what really separates a modern data stack from a legacy data stack
is that it is hosted in the cloud. It requires less technical configuration by the user to
demonstrate value. It also often promotes end-user accessibility, so data democratization,
and can cut costs, shortening the amount of implementation time and downtime
because it is hosted, and then actually scale out as the data volumes grow over time.
And so we often think about there are four core components of the modern data stack.
There is the ingestion layer with offerings like Fivetran and Airbyte.
There is the core of the cloud native data warehouse, either Snowflake,
BigQuery, or Redshift. There is a transformation. DDT is often used and then there is the BI service either preset or looker and it's really this
reimagining and the big trend that emerged was moving from ETL to ELT and that was catalyzed by
people wanting to ingest more data from sources like Salesforce, Zendesk, Stri add value, like operational and analytics
with reverse ETL solutions like Census and Hightouch where you take data from the cloud
data warehouse and not just use it for an analytics use case, but also operational processes,
moving it into Salesforce or Zendesk or MailChimp.
And it's also cool to see that now we are leaping in machine learning engineers
and data scientists into the modern data stack by enabling businesses
to leverage that data to build internal or external models.
And when I say external, like a production-grade model that services a customer versus an internal
model that could be something around forecasting.
And so I think this is just the beginning of the modern data stack, but it's really
cool to see over the past two and a half to three years that the foundation really
be set in place.
Super interesting.
Two quick follow-up questions, or I guess a follow-up question about two additional foundation really be set in place. Super interesting.
Two quick follow-up questions, or I guess a follow-up question about two additional components.
So do you see sort of data observability or orchestration as sort of key components
of the stack or those sort of augmenting that core,
you know, sort of those four key parts
that you outlined?
Totally, yeah.
They're very useful components as in stack as well.
I think it's like the traditional core
data stacker, those four components.
The reason that data orchestration solutions
like Astronomer, Preset,
you know, Giva, Open Source, Airflow
become more important is the coordination
of actions on data transformation over time.
Data observability has a few different components to me.
It could be pre-production observability around pipelines and layer of ration data itself to prevent schema changes that could have downstream negative impacts and breakage.
Or it could actually be looking at data distributions on the data warehouse to see if there's any data drift over time.
I think one of the reasons that data observability has become such an important and growing segment
is because dashboards have become widespread throughout an organization. We often find that there are multiple BI solutions within one business.
The ratio of dashboard creator to dashboard viewers, one to 100.
And so when you have all this information, which is distributed to make smart decision-making,
you want to make sure the data is correct for the viewers and that
you're making decisions that can drive the business forward productively.
And so you don't want to have a dashboard without the data.
You don't want to be in a team meeting and get called out for, oh, I think this number
looks wrong.
I don't know if we can actually make a decision today. And so data observability companies really step in to help make sure that
the data is clean, correct, and the business can be more productive.
So, Astasia, like a follow-up question to about the modern data stack.
So modern data stack as a term has been around for a while,
right? And like, okay, things in tech change really, really fast. So based on what you
have experienced so far, have you seen some kind of evolution in the modern data stack?
Like something notable that has changed since, I don't know, first it was introduced as a
term until like today, like something that was added, something that was removed, or like our understanding has changed or some tools have matured, right?
So how have you seen the modern data stack involved in all this time? So cool new segments that have emerged. As I mentioned before, one of the newer ones is operational analytics and reverse ETL, the ability to push data from the data warehouse into third customer billing, and then you may want to push it into Salesforce to try to do better account qualification or email marketing campaigns with MailChimp.
So that's been really cool to see emerge. The second category that's been pretty neat
is the movement from batch workloads
to real-time and streaming
instead of using an ingest layer
with longer time horizons
for collecting and ingesting data
that could be on the order of an hour for a day.
You're actually seeing teams use streaming systems like Kafka or, you know, Red Panda for that layer
so that the data can be fed faster so that the dashboards can be updated more quickly.
What's really cool about that is we're seeing new warehouses come to market that are supposed to support real-time analytics more effectively than the known incumbents like Snowflake and the cloud data warehouse from the service, large cloud providers.
And so those would be like Pino and Druid and ClickHouse. And so I think there's a push in the
market to get data faster to the end users to make decisions is particularly prominent in operational teams.
We kind of saw lots of blog posts coming out of Uber and Lyft over the years that
the criticality of the data needs to be identified and visualized within like 10
to 30 minutes for teams to make decisions and seems to be more popular now than ever before to move from batch to streaming.
So that's been pretty cool to see too.
That's cool.
But my feeling is that the modern data stack, like us, I mean, time flows, but like it gets more and more, not complicated necessarily, but we see more categories added
to it and obviously more vendors for each category.
So now you have data engineering teams or IT teams, I don't know, whoever is responsible
in the company to go and figure these things out and buy all these things.
And they have all these choices in front of them, right?
So what is like, let's say, and probably as an investor,
you have like a better intuition about that.
Like what's the limit that the market has in terms of like a complexity of a solution, right?
Like where are we going to reach a point where, okay, the market will be
like, okay, guys, that's just too much.
You know, like it's not easy, like from an organizational standpoint to
maintain all this infrastructure or even like navigate this infrastructure.
Totally.
Totally.
Yeah.
It's a, it's a great point talking about like historically there were businesses like Informatica that offered a down different components of the data stack into everything
from warehouse transformation,
integration, metric
stores, observability,
reverse ETL,
data orchestration. So yes, I totally
hear you. There has been a
fragmentation over time.
I can imagine that
as things evolve, there will be a reconsolidation
because of this exact take on the procurement side of the house. Why do we need so many vendors
that we're managing? Can't we just have these go one throat to choke for a lot of what we're doing. A good friend of mine just ran a survey with, I think, over 500 data
buyers and were asking about 15 different categories of data spend and how it was going to
evolve over the next five years. And when I saw that huge list, which even includes synthetic
data, which we can throw in there, right?
I was just thinking like, God, it would be really hard to be a data leader these days.
There's so many options to choose from across every category.
Do I really need all these tools?
As I said, I set the foundation of like the four core components that we see, like, do
I have to get all the other eight that this survey had?
We're seeing early indications that teams want to buy products that are integrated and serve many different functions.
We see that with Rudder staff, which takes some multi-chrome approach of ingest streaming and reverse MTL.
You can see that with businesses in the data observability space that are not
just doing data validation and linear regressions and looking at the data
drift and quality inside a warehouse that are going to move up the stack
to go into data catalogs.
You can also see it even with dbt that started as a transfer leash and layer layer and now offers a metric server think about metric server as
democratized look ml so you can define a metric one time and serve it up to
many different SAS products.
So there's consistency across them.
You can make smart decisions. So I personally think that we're going to start to see consolidation
at the layers above the warehouse because picking off, hey, I spend $10,000 a year on reverse UTL.
I spend $10,000 a year for synthetic data.
It doesn't make sense to build a whole bunch of different vendors for that.
You probably just want one contract to do the rest.
And then you get a discount for having more products that you're using from that one vendor.
And so I do expect consolidation going forward.
And even like we're starting to see acquisitions now, right?
The people, Gearbyte just did a really smart acquisition of an open source team.
You know, I can imagine over the, especially with the macro environment,
I would expect over the next year to 18 months
to see a lot of acquisitions emerge from incumbents and from, you know, higher growth
data companies to accelerate the roadmap and broaden the suite of offers.
That's awesome because you give me a reason to ask like the next question that I had in
my mind.
So when we're talking about consolidation,
most people think of big companies like Google or Microsoft
going and acquiring and the dream of many founders out there coming true.
But do you feel that this round of consolidation
is going to be driven mainly by these big companies
acquiring smaller companies?
Or we're going to see more of mergers happening
between smaller companies, right?
Because, okay, you mentioned Airbyte.
Airbyte is a startup, right?
They've been around not for that long.
You don't usually see acquisitions happening that early
in the states of a company.
So what's your feeling there?
What should we expect?
One type or the other more?
Yeah, it's a great question.
I think it's going to be a mix.
I'd probably say that 75% will be tech and talent
by large incumbents, either cloud service providers
or publicly traded companies.
This is a great opportunity for them,
especially with the changes in the macro
to go higher and rate people at a discounted rate.
Another 25% will probably be other later stage startups.
I mean, over the past few months,
we've seen acquisitions of AirVite,
purchasing PruParu,
high touch data, acquiring Workbase,
Snyk, acquiring TopCodeData.
So I think that some of these
later stage startups
are being thoughtful of,
hey, these are great people.
We're aligned on vision.
They'd be more successful internally.
Let's do this now.
You have to remember,
we had this era
over the past 18 months
that have just recently changed
of large capital raises, sometimes sometimes 50 100 million in capital
for these data infrastructure businesses because people are so excited about this
massive wave of adoptions of the growth the data volumes and the precedent set by snowflake of
being the largest enterprise i give up all time And so these startups raised a whole bunch of money and they have the
balance sheets now to go make smart acquisition decisions.
Yeah.
Makes total sense.
And okay.
Acquisitions and taking like two different companies and put them together to work and align like visions and cultures and products and all that stuff like super, super hard, right?
Like we've seen many times, even like in like big corporations acquiring like other companies and then the products just die at some point, right?
Yeah, yeah. And I would imagine, also as a person
who has gone through
the process of building a company
from an early stage,
and I'd like to hear that from you
because you are investing
in early stage companies.
How easy and how risky it is
for an early stage company
to acquire another company
and try to align and
manage the products and the companies?
Like who, what's your, what are your thoughts there?
Yeah, that's a great question.
You know, I used to do M&As for Cisco, so one of the most
acquisitive tech companies of all time.
And so from that experience, I can kind of speak to what I can imagine would be the
pros and cons of doing acquisitions at a growth stage startup.
I think it's important to know there's different types of acquisitions, right?
So there's the traditional acqui-hire.
That's usually eight to 15 people, usually very early in revenue generation, limited number of customer contracts,
maybe not even patents that you need to work through a diligence process.
The next stage is businesses that are revenue generating with contracts with customers.
Sometimes they can be multi-year contracts.
These would probably be between $1 and $10 million of revenue. And then there's the $10 to $30 million of revenue, which is really about having a second product line in the business.
I can imagine that most of the near-term acquisitions
is going to be tech and talent. There's a lot of challenges of how to manage customer relationships
if you're spinning down a business. It's a huge headache for growth founders and executive teams
to manage. And it could be six months to do an acquisition that's revenue generating,
another quarter to do the integration of the team and the tech, and then another year plus
to managing the customer relationships. Something else that these teams will be
considering is the tech staffs that they've been built on. If customers. Like, are they even using the same backend systems?
Is the software written in the same language?
So it can be much harder for these early growth companies to acquire businesses that are revenue
generating, just given the commitment to the customers and the consolidation of the tech stack.
So I wouldn't expect those larger acquisitions at this time.
Tech and talent is easier.
Those deals are usually done through a direct relationship of the acquirers,
founders going out and prospecting great technical teams that they hope to know
or already have a relationship with,
aligning on mission and selling the value prop of finding a happy home,
being more productive inside a larger business to continue to drive their vision,
as well as the more de-risked upside opportunity for them personally.
One more question.
And I know that Eric wants to ask something.
You know me so well.
So, okay.
Usually, like what we hear like from like investors as advice for, let's say, founders
that are on a growth stage and an earlier stage is that focus is
like the most important thing. Like you have to focus on your execution and be like really
focused on what you're doing. And I would assume that like going through an acquisition
or a merge or whatever we want to call it, it's going like necessarily like it's going to like divert the focus right so there is like some associated
risk around that what you would and especially because you are both an investor and you have
like a lot of experience in mnas so in this younger and not that experienced founders that
might have like the opportunity at the growth phase to go and acquire like a company what
what advice you you give?
What would you tell them and what would you tell them to be careful of?
Totally.
Flashing headlight advice is take this very seriously.
It's very hard.
Right.
Acquisitions are usually longer than initially expected, as I say,
six plus months. And that doesn't include post-acquisition integration. And if there's
product that is being sold, but it defined tech stack that can be a lot of time and make or break
the ROI on the acquisition.
I mean, one of the reasons Cisco is so famous with their acquisitions is because of the
integration team kicked ass and made them productive.
And it will often allow these teams to run a standalone business unit so that they didn't
impede growth.
For growth founders, there should be a framework to think about the app position.
One is, is it truly an Apple hire?
How many months and how much money would it cost
for you to go find these people from the market?
And can you get great talent very quickly
to augment the team,
to drive towards that singular mission
faster than before.
You know, we used to joke it'd be considered advanced HR, people that
didn't fit in the structure of your current compensation packages, or you
didn't think you could recruit before outside of an app buyer.
Another way of thinking about it is, is the product that they built going to very easily fit
into our tech stack and accelerate our product development by X number of quarters so that we
can get more customers, raise ACBs, and drive to bigger business faster? And the third is the business financial use case. If there are customers on the platform,
if the size and magnitude of that and how it could immediately affect top line if you integrated it
or let it run as a standalone business. So I can imagine that for each founder, it's going to be a
different motivation of what they need in that moment.
But each different, the three different versions is going to be challenging for other reasons,
various reasons. Asazia, one question on the flavor of acquisition that is technology focused,
right? So you talked about acquirers, that makes sense, right?
Like maybe something that the team
that's being acquired has built is a contiguous problem,
but you're really sort of applying like a brain trust,
you know, to sort of your own vision, right?
But when there is a really strong technical,
you know, sort of formal product reason for the acquisition.
How do you think about if the product is actually a good fit, right? Because you're introducing sort
of, you know, you're heavily augmenting your existing product roadmap. You're bringing in
different functionality, user flows. I mean, the issues are multifaceted, right?
Like if you're actually going to try to do a product integration.
And so how do you, and in your experience, like what are the things that are signals
that this is a good idea to do this?
Mm-hmm.
The first would be market validation with existing customers or prospective customers? Does this naturally fit
into your vision of what our product could become over time? What is the criticality
of this technology and how much would you be willing to pay if we added it? Just what a product manager does day-to-day, validation of the tech and what they should be building for the future.
The second aspect is really understanding the entire technical staff that the product is built on.
If everything is built on BigQuery and you're a Snowflake vendor, you're built on Snowflake, that can be
very hard. If it's built in a completely different language, that can be very hard. So making sure
that you have an understanding of the core tech stack and how easy it would be to integrate it,
because once again, integration is what makes these successful if it would be too
challenging that it's probably not a good fit for the acquire and then the third thing that's useful
to validate is not as much on the tech side but I would highly say that like chemistry with the
team if you're expecting it to be run as an individual product is crucial.
I've seen acquisitions fall apart
because there wasn't strategic alignment
from both leaders
about what the product should become
as part of the acquired company.
So it's great point in time.
We can add value for the next year.
Fantastic.
But really I'm coming along as an executive. What should the product great point in time. We can add value for the next year. Fantastic. But really,
I'm coming over as an executive. What should the product look like in the next three?
All right. That's some great, great points and stuff that I wish I knew a couple of years ago,
to be honest. But it's never too late to learn about that stuff. So thank you. And I think that
this is information that's going to be like very valuable,
like for the people
who listen out there on the show.
So Astasia,
like we talked about
like the modern data stack
as it is today, right?
And like how it involves.
As an investor,
you're always looking
for the next thing, right?
Like you need to be
ahead of the curve, let's say.
It's part of your job.
Can you share with us, like what do you see as the next opportunities there
when it comes to data infrastructure?
Things that you are excited about?
Yeah.
It's always cool digging into new trends and what's emerging.
As I noted at the beginning,
I'm a really early stage investor
focused on pre-seed, seed, and series A.
So many of the great offerings and vendors
that we've talked about so far
are way beyond me and off to the races.
And so I'm always thinking about like,
okay, we have the pillars of the modern data stack.
What's next?
And so as I mentioned earlier,
I think one macro trend that's exciting
is the move from batch to streaming.
While batch processing constitutes
the majority of data workloads today,
we are seeing an increasing proportion of teams
wanting real-time data to support operational use cases.
And so, as I said, the ingest layer changing from.
Airbyte and 5TRION to thinking about Kafka, Red Panda, Maroxa,
Decodable going to more real time databases like ClickHouse, Pino, Druid.
That's been really cool to see.
And what's super neat about real-time,
it's not just in the analytics team stat.
We're starting to see it emerge in machine learning as well
with continuous deployment and distributed serving of ML models.
Think about you go on netflix's you know app you're watching a tv show
that's about murder mysteries and then it very quickly learns your interest and so when you
complete that show it automatically shows you more murder mystery tv shows. And so that's been really cool
to see it move into ML
and the role of continuous learning.
There's some early stage startups
like Claypot that's trying to
facilitate those workloads.
I would say that another trend
that's been pretty cool
and really interesting is
there's been a fragmentation.
We're talking about portfolio and consolidated businesses that offer many products.
Something that we're seeing on the flip side is the fragmentation around water and table formats and query engines. So a water table format layer is like Apache Iceberg
and Apache Hudi and Delta Lake.
And this kind of reduces data gravity.
It allows data to be moved across different environments.
You also see new query engines emerge,
which is pretty cool.
So on the analytics side side we had spark but now
we're starting to our you know even trino we were talking about that earlier but now we're starting
to see the emergence of like in memory analytics with doc db on the data, excuse me, on the ML side, we're starting to see
like Ray and Dask that are trying to be like Python natives.
So that's kind of cool to see like all these table formats, then also fragmentation of
the querying there.
The next layer that I'm trying to think about now is like the ML semantic layer. The next layer that I'm trying to think about now is like the ML semantic
layer. People use pandas for data prep, but then often when they're building a
real model or trying to push it to production, they use PySpark. There's a
really cool business out there called Ponder that's built Modin, which helps
with distributed pandas to make it easier to do data prep at scale
without having to graduate but like gosh i think it'd be really cool if we had like a semantic
layer for ml that can go all the way from data prep to production so teams aren't rewriting
their codes they can actually push these models to production faster i know i'm just going on here
but i feel like there's some really cool stuff going on.
I'm still excited.
I know the modern data stack is very well defined, but I'm still excited.
Yeah, yeah, absolutely.
Absolutely.
And actually like one of the most interesting things with like the modern data stack is
that it's very cool like each year to see how it changes, like because it does.
Like you see all these new things that are like, as we say, like it seems like it becomes more and more complex.
And as we said, that sample, and we are going to have consolidation and like
all that stuff, but the fact that this complexity exists is it also like
translates into innovation, like many things and many great teams out there
that are building like some amazing technology.
And I think that the velocity that these new ideas are
delivered, it's incredible. It's really, really fast-paced. It's very interesting to see how from
one year to the other, things change in this space. And that's amazing. And I think you listed
some very, very interesting technologies out there,
like from Iceberg to DuckDB.
And people are doing some great stuff.
All right.
That's great.
Okay.
I have a question that has to do a little bit more with the market.
And usually, like, okay, when we are talking about data infrastructure in general,
like, it's not something that is new. Like, we've been building databases since we have computers,
right? But until quite recently, innovation was heavily driven by the large enterprise, right?
There was like the banks of America, of the world, like the very big corporations.
And of course, like the high-tech giants like Netflix and Twitter, that they had to work with a lot of data.
And a lot of, like many of these technologies were created like for their scale, let's say.
And they were driving the innovation there, right?
Another thing that I find very interesting with the modern data stack is that if someone like sees how it is defined from a market perspective, it seems like a solution that
fits to the large enterprise, but also it fits very well like to smaller companies,
right? to the large enterprise, but also it fits very well to smaller companies. Right? Like you don't have to be at the scale of Bank of America to implement the
modern data stack in your company and get value of it, right?
And obviously you don't have to spend the same amount of money.
Totally.
So it still feels though that there is some kind of disconnect there.
There is like what is happening right now, like in the enterprise in terms of like
innovation and like evolution of the platforms that started with Hadoop in
the early 2000 until today, and then you also have like the modern data stack
that's pretty much like developed like in parallel.
Do you see these two worlds,
let's say, in the data infrastructure
merge at some point?
Or do you feel that they are going to continue evolving
some kind of in parallel?
And if they are going to merge,
by the way, just to give a hint here, I believe that they are going to merge, by the way, just to give a hint here, I believe that they are going to merge.
But when do you think that this is going to happen and what is missing there?
Yeah, I love your reference about, you know, large enterprises having a lot of innovation.
You know, the Yahoo data team, incredible.
The LinkedIn data team, incredible.
That was a particular era.
Then we went to Google and Twitter and Facebook
and what they were doing at scale.
We had a whole bunch of great open source projects
and founders come up in those communities.
And then the next evolution was, gosh,
are you a Lyft and Uber?
Like you built some incredible stuff to support the data volumes.
I feel like a lot of the great founders that have emerged in data over the past two and a half years have come out of the recently IPO businesses that were data centric. trick. Something that is very interesting to me is as a founder, it's usually more opportune
to go to mid-market companies as early stage design partners so you could have in-depth
conversations consistently and be faster time to product development and monetization.
You've worked selling into enterprises.
Gosh, everyone wants JP Morgan and Amex and even Cisco, Walmart as a customer.
Those are going to be some bigger contracts.
They're going to be absolutely amazing to transform the business.
But gosh, that could be a very long process. That could be nine months to a year. You could
be interfacing with multiple different teams, trying to get buy-in and feedback and alignment.
The procurement and redlining process, if you ever go to contract, could even be
three plus months. So if you're an early stage founder, you want to get high quality feedback as quickly as
possible for people that are willing to pay you money.
And so that's usually mid-market customers.
So we often guide our founders to focus on the mid-market and build a product that's valuable and then earn the right to go to
enterprises as they flesh out their security and their compliance and identify enterprise
needs because the sales cycles are going to be longer.
You can kind of see that in great data businesses like Fivetran and Segment, right, they went into the mid-market, got into the enterprise and start achieving larger ACVs, six figures, seven figures.
But it is a process.
And if they're going to be successful, it's not an if, it's a when. And for most of the data companies we've been speaking with today,
you know, outside of Snowflake,
which is now publicly traded,
you know, they're earlier in their journey.
And so if you're five plus years old,
you're probably starting to move
into the enterprise and see some traction.
But I wouldn't recommend
for series A stage data company
to be having those conversations
because it's going to be a very long process for them.
And it's better to demonstrate repeatable sales
and get product feedback
than to go try to strike out
and get an IBM or equivalent business as a customer.
Yeah, 100%.
Okay, that was some great, great feedback on that.
I'm sure that like many founders are like thinking of that stuff to be on it.
There's always this pressure of like, okay, when are we going to get our
first enterprise customer and I think my feeling is that, okay, you can, you
can, you can go for a long time without having to worry about enterprise.
And I think you can see that also with companies like Snowflake, right?
Snowflake managed to get to the IPO.
Yeah, they had enterprise customers, but they were not an enterprise company, right?
They were driving a lot of like their growth
from the mid markets for a very, very long time. So you can get like a lot of
my luck, let's say by just like focusing on that. Anyway, I know that we are getting closer to the
time here. So just like one last question from me and then like, I'll give it to Eric.
What about OpenShorts?
OpenShorts has been like a traditionally, let's say, big component of the go-to market motions around data products.
How important do you think OpenShorts is and is it going like to remain important
in the future for building a company that builds data infrastructure?
Costas, thank you so much for the question.
I love this question because I get this question all the time.
I work with really early stage people who are IDing and thinking about needing their business.
And they're like, gosh, do I need to be open source?
We're going to go build a data and ML tool.
And the answer is no, you do not need to be open source? I'm going to go build a data and ML tool. And the answer is no, you do not need to be open source.
You need to look at your subcategory and what the precedent in the category has been.
You know, I probably not recommend these days going and building a query engine that's not open source.
But, and, you know, there's often a precedent in a lot of databases
that you should be open source,
but even there, Snowflake is not and you do not have to be.
So I would look at your individual segment
and see what historically has been the go-to-market motion
of whether it's open source or commercial.
Another thing that's really important to think about is
what is actually the
open source value for your customer? Is it because they always want to see the code base? Are they
afraid that if you grow under each, you know, it's such a critical layer in the stack that they'll
have to do migrations, which would be very painful? Is it some other rationale? And then also, what is the value of open source to
the business? If it is, hey, top of funnel, pipeline generation, that's one conversation,
that can be a free trial. Is it, hey, we need awareness and particularly with software
engineers and 70% of the software engineering stack is open source.
We have to go open source.
That's different.
The last thing I would have people think about is the buying and adoption pattern for their product.
If you are considering open source, ideally ideally there's a single player mode for the
open source. I, as a data professional or software engineer, take the open source,
implement it, getting it up and running, and add value. And I theoretically should have the
credit card swiping capability to purchase as an individual unit.
And then over time, you build in team and corporate value to expand the contract size.
If your adoption pattern is multi-pronged, you have a purchaser, you have a stakeholder,
you have a stakeholder, you have a user, sometimes you don't necessarily need open
source there because the user doesn't have the ability to pay you. And he's going to go, he or
she's going to go to a loss and they're going to have to have a sales person come in and pitch
and demonstrate value, do an ROI calculator, move the contract forward.
And in those examples, sometimes even if you're open source, it does not generate
pipeline for you. It creates brand and awareness and helps you, but you're still doing top-down sales.
And so theoretically, you didn't need to be open source to begin with.
It's a nice-to-have versus a must-have.
So as I was saying, I don't think you need to be open source.
It's always a great question.
I've had a lot of back and forth on Twitter about this, but I really think you need to
identify the segment,
purchasing, and the intention for both the business and the user of its value.
Thank you so much.
That was great to hear from you about open source. It's a very controversial matter.
So Eric, your questions.
Eric Boerwinklemaier, All right. Well, we're close to time here. So Stasia,
one more question for you. And this may be unfair because I know you love all of your
investments equally, but if you were going to say, okay, I'm not going to be an investor anymore,
and I'm going to go found or start a company in the data space,
which problem area would you focus on? Like which, like on a personal level, which area sort of
interests you the most that you would want to sort of sink your teeth into like operationally
as part of, you know, whatever founding team, go-to-market team, however you want to look at it. Yeah, that's a great question.
I'm pretty pumped about the movement in ML to data-centric ML from model-centric.
Model-centric being, hey, we spent a lot of time building customized models
and trying to move forward the architectures of them, graph-based,
you know, large neural nets.
Data-centric is this movement where, hey, you can find a lot of great models online,
download them off of GitHub, do the rebalancing and the training.
What's really going to affect the outcome is the data collection labeling and quality and having insights into that will be
critical for the the model's value and so for me i think that's a really exciting space it's kind of
the analog of the data observability space applied to machine learning. And so there's some really cool businesses operating there
like Unlocks and Galileo.
And so I think that's a really cool movement because it's a macro trend.
Once again, like the new approach of the cloud with analytics, it's a macro trend.
It's in the data path.
It could be a daily use tool and it directly affects the end user's performance
in all the building and then in the role itself.
So yeah, we agree.
Can I add something, Eric?
Sure.
So if I judge by her reactions when we were going through the next steps of modern data stack and what
excites her.
I think that she would start the company around the semantic layer for ML.
Like she was the most excited when she was talking about that.
So that's my prediction.
Love it.
Well, Astasia, this has been so wonderful.
Thank you for giving us a little bit of your time.
We learned so much.
We'd love to have you back on the show sometime. Awesome. You guys rock. Thank you so much for
having me. I love all things data. I love talking about what's happened, what's coming,
all the cool episodes and startups. So it was an absolute pleasure. Thanks so much, guys.
I might be stealing one of your takeaways, but I really liked Asasio's perspective on open source.
In some ways, some of the answers were really simple.
If you're trying to give people visibility into how your product is built and how it
works, there are a lot of ways to do that other than just literally having a repo that
everyone can see on GitHub, right?
Because there's a lot that goes into sort of having a successful open source project
in terms of the community surrounding it, contribution, all that sort of stuff.
And it was just refreshing to hear her say,
there are a lot of ways you can give the same or a similar experience to users
without having to, you know, sort of make a part of your product strategy and a true
open source effort, which was fascinating and really, really interesting here.
Yeah, absolutely.
I think he gave like some amazing, like actually advice in many different topics.
And obviously like one of them was about open source and how much like you, it is important to invest in open source.
But I also, what I keep from the conversation that we had is that we're at the stage right now where you can start like a data infrastructure company and go after like the mid-market.
You don't, like 10 years ago, like you pretty much had to go after like the enterprise view on like company or app data.
You don't necessarily have to do that anymore.
And I think that's something that gives like a lot of freedom to entrepreneurs out there or like people who are considering of doing something like that.
So I definitely keep
that from our conversation with her. That was like amazing to hear from an investor.
I agree. All right. Well, thank you for joining the show. Thank you for dealing with my raspy,
sultry voice. And I will make sure that I can talk normally again before the next show.
We hope you enjoyed this episode of the Data Stack Show.
Be sure to subscribe on your favorite podcast app
to get notified about new episodes every week.
We'd also love your feedback.
You can email me, ericdodds, at eric at datastackshow.com.
That's E-R-I-C at datastackshow.com.
The show is brought to you by Rutterstack,
the CDP for developers.
Learn how to build a CDP on your data warehouse
at rutterstack.com.