The Data Stack Show - 105: The Modern Data Stack Is Just Getting Started with Astasia Myers of Quiet Capital

Starting point is 00:00:00 Welcome to the Data Stack Show. Each week we explore the world of data by talking to the people shaping its future. You'll learn about new data technology and trends and how data teams and processes are run at top companies. The Data Stack Show is brought to you by Rudderstack, the CDP for developers. You can learn more at rudderstack.com. All right, Acostas, after recording four episodes this week, guess what happened? I don't know if you can tell. I think I can. Yeah. I love your, I love your voice. I think you are ready to go into like the after midnight radio program.

Starting point is 00:00:42 Oh yes. Yeah. I turn this into a late show yes yes for lonely souls out there that's that's right yeah let's do it actually having like a sultry voice is a great introduction to this episode because it's our first investor interview on the show, which is super interesting. So we have Stasia, who is now with Quiet Ventures. She was at Redpoint for a long time. And she invests in data tooling and has made investments in actually a lot of the companies founded by people who have been on our show, which is super interesting.

Starting point is 00:01:24 Lots of fun connections there. I think one of my biggest questions is going to be around the way that she thinks about evaluating data tooling. Because if you think about an investor who focuses on data tools, every single day they're looking at new technology, trying to understand it. And they have a very wide and deep view on the different ways that companies are trying to solve particular data problems. And so I know there's investment criteria on the business side, but they also look at the technical side of the product. And so I think it'd be helpful for me and hopefully our listeners to understand

Starting point is 00:02:05 the framework that they use to sort of evaluate that because they just spend so much time doing. How about you? Yeah. Too many questions in my mind, to be honest, but I'd love to hear her opinion about like the modern data stack. What is it like, why it exists and how she thinks that it will evolve. And of course, we shouldn't lose the opportunity of like asking here about what's next and what's

Starting point is 00:02:35 popped out there. I mean, that's what investors are best at, right? So. Absolutely. Let's do it. Let's do it. Nastassasia, welcome to the show. We are so excited to chat with you. Thanks so much for having me, Eric. It's a pleasure to be here. Okay. Do you want to give us your background and tell us how you got into investing in data products? Sure thing. So I have been an investor for nearly a decade now, really starting my career at Cisco

Starting point is 00:03:09 on the M&A and venture investing team. Did a whole bunch of really fun investing in storage businesses back in the day, like in Kirksey City. I then transitioned to Redpoint Ventures to be on the early stage team, was there for about four and a half years. And then more recently joined Quiet Capital to be an enterprise partner leading the practice over here. My background is a specialty in solutions that sell into technical audiences. So of course, big data issue, learning, cybersecurity, dev tools and infrastructure. I've been really humbled to partner with businesses like Dremio and LaunchDarkly, Solo.io, Preset, Hex,

Starting point is 00:03:50 Superbase, Airbyte, some amazing founders that you even had on the show yourself. So it's been a fun ride. And we're always evolving things in data and it's brighter than ever before. Awesome. And you're the first investor on the show, actually, as a guest, which is super exciting. Couldn't think of anyone better.

Starting point is 00:04:11 And I know that personally, I've read a lot of your work, sort of outlining new data technologies and your thinking. So you've been very helpful to me personally. So thank you for that. I'd love to start out. So you, as part of your job, you evaluate data tooling all the time, right? I mean, maybe every day. And you have such a wide perspective on the market because of that, because you get to see so many new types of technologies. technologies, and I think specifically the different approaches that companies are taking to solving similar problems with data, you know, sort of creating new opportunities with data. I'd love for you to share with our listeners sort of the evaluation framework you use, because a lot of our listeners have to look for tooling to solve their own data problems a lot, but

Starting point is 00:05:05 they haven't invested as much time as you sort of surveying the market, trying to understand data tooling. And so how do you evaluate companies and the way that they're approaching solving particular data problems? That's such a great question, Eric. And I really think it's important to clarify for the listeners, the way that we think about a data tool could actually be different than how we think about an investment in a data and a company. When we think about it in terms of a company, we're thinking about the team, the technology, the market space. When we're thinking about a tool, it could have a company behind it,

Starting point is 00:05:46 but it may not necessarily have to be. It could be a great open source project that provides a lot of value. With a tool, there may not be a monetization opportunity for it, even if it is fantastic. So I just want to make sure that we clarify the difference for the audience today. In terms of how we think about a useful data tool, we think about if there's any existing offerings in this space, an open source project or a commercial offering that is failing at some aspect of their core functionality or capability and the magnitude in which they're not providing the value they should be to users. And so that is the first framing. What exists

Starting point is 00:06:33 today? What are the issues? The second thing that we really like to dig into is the criticality of this pain. Is the gap something that could be easily solved with people and you can throw bodies at it? Is it something that would be best served by a software offering? And is it mission critical to the business that it is solved? We also look at the core IP. If there are any patents against the technology that suggests it has differentiation that is not easily replicated by alternatives or, you know, open source projects that can offer it for free, have wonderful adoption, but not be monetized. And my favorite thing to dig into with the product itself is how easy it is to use and implement for teams. I often find that there's really sophisticated data solutions that are out there, but they're too complex to get up and running to demonstrate value. So when I do diligence calls on technology, I literally ask them, how long does it take to implement? How long does it take to show value? What is the magnitude of the value you're experiencing? And do you think it will be enduring? It's always best if the tool can actually fit into a macro trend. You know, later we'll be talking about the role of the modern data stack and the movement to cloud. But it's not a must-have, but it's a nice to have.

Starting point is 00:08:09 Super interesting. That's super helpful. Let's actually just jump straight into the modern data stack because this is something you've written about a ton. And I would love for you to just, can you define the modern data stack in your own words for our listeners? Because there are lots of sort of definitions and like a million architectural diagrams out there that have like different flavors of this.

Starting point is 00:08:32 But you've done some really influential thinking on this. Can you define it for us in your own words? Sure thing. Yeah, there's so many different definitions. I feel like there's the agnostic research analyst definition. There could be a vendor definition that likes to highlight certain components based on what they're offering. From my perspective, the modern data stack is an analytic stack that has the foundation around a cloud data warehouse. And what really separates a modern data stack from a legacy data stack

Starting point is 00:09:07 is that it is hosted in the cloud. It requires less technical configuration by the user to demonstrate value. It also often promotes end-user accessibility, so data democratization, and can cut costs, shortening the amount of implementation time and downtime because it is hosted, and then actually scale out as the data volumes grow over time. And so we often think about there are four core components of the modern data stack. There is the ingestion layer with offerings like Fivetran and Airbyte. There is the core of the cloud native data warehouse, either Snowflake, BigQuery, or Redshift. There is a transformation. DDT is often used and then there is the BI service either preset or looker and it's really this

Starting point is 00:10:11 reimagining and the big trend that emerged was moving from ETL to ELT and that was catalyzed by people wanting to ingest more data from sources like Salesforce, Zendesk, Stri add value, like operational and analytics with reverse ETL solutions like Census and Hightouch where you take data from the cloud data warehouse and not just use it for an analytics use case, but also operational processes, moving it into Salesforce or Zendesk or MailChimp. And it's also cool to see that now we are leaping in machine learning engineers and data scientists into the modern data stack by enabling businesses to leverage that data to build internal or external models.

Starting point is 00:11:24 And when I say external, like a production-grade model that services a customer versus an internal model that could be something around forecasting. And so I think this is just the beginning of the modern data stack, but it's really cool to see over the past two and a half to three years that the foundation really be set in place. Super interesting. Two quick follow-up questions, or I guess a follow-up question about two additional foundation really be set in place. Super interesting. Two quick follow-up questions, or I guess a follow-up question about two additional components.

Starting point is 00:11:50 So do you see sort of data observability or orchestration as sort of key components of the stack or those sort of augmenting that core, you know, sort of those four key parts that you outlined? Totally, yeah. They're very useful components as in stack as well. I think it's like the traditional core data stacker, those four components.

Starting point is 00:12:21 The reason that data orchestration solutions like Astronomer, Preset, you know, Giva, Open Source, Airflow become more important is the coordination of actions on data transformation over time. Data observability has a few different components to me. It could be pre-production observability around pipelines and layer of ration data itself to prevent schema changes that could have downstream negative impacts and breakage. Or it could actually be looking at data distributions on the data warehouse to see if there's any data drift over time.

Starting point is 00:13:08 I think one of the reasons that data observability has become such an important and growing segment is because dashboards have become widespread throughout an organization. We often find that there are multiple BI solutions within one business. The ratio of dashboard creator to dashboard viewers, one to 100. And so when you have all this information, which is distributed to make smart decision-making, you want to make sure the data is correct for the viewers and that you're making decisions that can drive the business forward productively. And so you don't want to have a dashboard without the data. You don't want to be in a team meeting and get called out for, oh, I think this number

Starting point is 00:14:00 looks wrong. I don't know if we can actually make a decision today. And so data observability companies really step in to help make sure that the data is clean, correct, and the business can be more productive. So, Astasia, like a follow-up question to about the modern data stack. So modern data stack as a term has been around for a while, right? And like, okay, things in tech change really, really fast. So based on what you have experienced so far, have you seen some kind of evolution in the modern data stack? Like something notable that has changed since, I don't know, first it was introduced as a

Starting point is 00:14:42 term until like today, like something that was added, something that was removed, or like our understanding has changed or some tools have matured, right? So how have you seen the modern data stack involved in all this time? So cool new segments that have emerged. As I mentioned before, one of the newer ones is operational analytics and reverse ETL, the ability to push data from the data warehouse into third customer billing, and then you may want to push it into Salesforce to try to do better account qualification or email marketing campaigns with MailChimp. So that's been really cool to see emerge. The second category that's been pretty neat is the movement from batch workloads to real-time and streaming instead of using an ingest layer with longer time horizons for collecting and ingesting data

Starting point is 00:16:02 that could be on the order of an hour for a day. You're actually seeing teams use streaming systems like Kafka or, you know, Red Panda for that layer so that the data can be fed faster so that the dashboards can be updated more quickly. What's really cool about that is we're seeing new warehouses come to market that are supposed to support real-time analytics more effectively than the known incumbents like Snowflake and the cloud data warehouse from the service, large cloud providers. And so those would be like Pino and Druid and ClickHouse. And so I think there's a push in the market to get data faster to the end users to make decisions is particularly prominent in operational teams. We kind of saw lots of blog posts coming out of Uber and Lyft over the years that the criticality of the data needs to be identified and visualized within like 10

Starting point is 00:17:20 to 30 minutes for teams to make decisions and seems to be more popular now than ever before to move from batch to streaming. So that's been pretty cool to see too. That's cool. But my feeling is that the modern data stack, like us, I mean, time flows, but like it gets more and more, not complicated necessarily, but we see more categories added to it and obviously more vendors for each category. So now you have data engineering teams or IT teams, I don't know, whoever is responsible in the company to go and figure these things out and buy all these things. And they have all these choices in front of them, right?

Starting point is 00:18:06 So what is like, let's say, and probably as an investor, you have like a better intuition about that. Like what's the limit that the market has in terms of like a complexity of a solution, right? Like where are we going to reach a point where, okay, the market will be like, okay, guys, that's just too much. You know, like it's not easy, like from an organizational standpoint to maintain all this infrastructure or even like navigate this infrastructure. Totally.

Starting point is 00:18:40 Totally. Yeah. It's a, it's a great point talking about like historically there were businesses like Informatica that offered a down different components of the data stack into everything from warehouse transformation, integration, metric stores, observability, reverse ETL, data orchestration. So yes, I totally

Starting point is 00:19:17 hear you. There has been a fragmentation over time. I can imagine that as things evolve, there will be a reconsolidation because of this exact take on the procurement side of the house. Why do we need so many vendors that we're managing? Can't we just have these go one throat to choke for a lot of what we're doing. A good friend of mine just ran a survey with, I think, over 500 data buyers and were asking about 15 different categories of data spend and how it was going to evolve over the next five years. And when I saw that huge list, which even includes synthetic

Starting point is 00:20:02 data, which we can throw in there, right? I was just thinking like, God, it would be really hard to be a data leader these days. There's so many options to choose from across every category. Do I really need all these tools? As I said, I set the foundation of like the four core components that we see, like, do I have to get all the other eight that this survey had? We're seeing early indications that teams want to buy products that are integrated and serve many different functions. We see that with Rudder staff, which takes some multi-chrome approach of ingest streaming and reverse MTL.

Starting point is 00:20:45 You can see that with businesses in the data observability space that are not just doing data validation and linear regressions and looking at the data drift and quality inside a warehouse that are going to move up the stack to go into data catalogs. You can also see it even with dbt that started as a transfer leash and layer layer and now offers a metric server think about metric server as democratized look ml so you can define a metric one time and serve it up to many different SAS products. So there's consistency across them.

Starting point is 00:21:25 You can make smart decisions. So I personally think that we're going to start to see consolidation at the layers above the warehouse because picking off, hey, I spend $10,000 a year on reverse UTL. I spend $10,000 a year for synthetic data. It doesn't make sense to build a whole bunch of different vendors for that. You probably just want one contract to do the rest. And then you get a discount for having more products that you're using from that one vendor. And so I do expect consolidation going forward. And even like we're starting to see acquisitions now, right?

Starting point is 00:22:10 The people, Gearbyte just did a really smart acquisition of an open source team. You know, I can imagine over the, especially with the macro environment, I would expect over the next year to 18 months to see a lot of acquisitions emerge from incumbents and from, you know, higher growth data companies to accelerate the roadmap and broaden the suite of offers. That's awesome because you give me a reason to ask like the next question that I had in my mind. So when we're talking about consolidation,

Starting point is 00:22:48 most people think of big companies like Google or Microsoft going and acquiring and the dream of many founders out there coming true. But do you feel that this round of consolidation is going to be driven mainly by these big companies acquiring smaller companies? Or we're going to see more of mergers happening between smaller companies, right? Because, okay, you mentioned Airbyte.

Starting point is 00:23:22 Airbyte is a startup, right? They've been around not for that long. You don't usually see acquisitions happening that early in the states of a company. So what's your feeling there? What should we expect? One type or the other more? Yeah, it's a great question.

Starting point is 00:23:42 I think it's going to be a mix. I'd probably say that 75% will be tech and talent by large incumbents, either cloud service providers or publicly traded companies. This is a great opportunity for them, especially with the changes in the macro to go higher and rate people at a discounted rate. Another 25% will probably be other later stage startups.

Starting point is 00:24:11 I mean, over the past few months, we've seen acquisitions of AirVite, purchasing PruParu, high touch data, acquiring Workbase, Snyk, acquiring TopCodeData. So I think that some of these later stage startups are being thoughtful of,

Starting point is 00:24:30 hey, these are great people. We're aligned on vision. They'd be more successful internally. Let's do this now. You have to remember, we had this era over the past 18 months that have just recently changed

Starting point is 00:24:42 of large capital raises, sometimes sometimes 50 100 million in capital for these data infrastructure businesses because people are so excited about this massive wave of adoptions of the growth the data volumes and the precedent set by snowflake of being the largest enterprise i give up all time And so these startups raised a whole bunch of money and they have the balance sheets now to go make smart acquisition decisions. Yeah. Makes total sense. And okay.

Starting point is 00:25:18 Acquisitions and taking like two different companies and put them together to work and align like visions and cultures and products and all that stuff like super, super hard, right? Like we've seen many times, even like in like big corporations acquiring like other companies and then the products just die at some point, right? Yeah, yeah. And I would imagine, also as a person who has gone through the process of building a company from an early stage, and I'd like to hear that from you because you are investing

Starting point is 00:25:54 in early stage companies. How easy and how risky it is for an early stage company to acquire another company and try to align and manage the products and the companies? Like who, what's your, what are your thoughts there? Yeah, that's a great question.

Starting point is 00:26:15 You know, I used to do M&As for Cisco, so one of the most acquisitive tech companies of all time. And so from that experience, I can kind of speak to what I can imagine would be the pros and cons of doing acquisitions at a growth stage startup. I think it's important to know there's different types of acquisitions, right? So there's the traditional acqui-hire. That's usually eight to 15 people, usually very early in revenue generation, limited number of customer contracts, maybe not even patents that you need to work through a diligence process.

Starting point is 00:26:54 The next stage is businesses that are revenue generating with contracts with customers. Sometimes they can be multi-year contracts. These would probably be between $1 and $10 million of revenue. And then there's the $10 to $30 million of revenue, which is really about having a second product line in the business. I can imagine that most of the near-term acquisitions is going to be tech and talent. There's a lot of challenges of how to manage customer relationships if you're spinning down a business. It's a huge headache for growth founders and executive teams to manage. And it could be six months to do an acquisition that's revenue generating, another quarter to do the integration of the team and the tech, and then another year plus

Starting point is 00:27:52 to managing the customer relationships. Something else that these teams will be considering is the tech staffs that they've been built on. If customers. Like, are they even using the same backend systems? Is the software written in the same language? So it can be much harder for these early growth companies to acquire businesses that are revenue generating, just given the commitment to the customers and the consolidation of the tech stack. So I wouldn't expect those larger acquisitions at this time. Tech and talent is easier. Those deals are usually done through a direct relationship of the acquirers,

Starting point is 00:28:38 founders going out and prospecting great technical teams that they hope to know or already have a relationship with, aligning on mission and selling the value prop of finding a happy home, being more productive inside a larger business to continue to drive their vision, as well as the more de-risked upside opportunity for them personally. One more question. And I know that Eric wants to ask something. You know me so well.

Starting point is 00:29:11 So, okay. Usually, like what we hear like from like investors as advice for, let's say, founders that are on a growth stage and an earlier stage is that focus is like the most important thing. Like you have to focus on your execution and be like really focused on what you're doing. And I would assume that like going through an acquisition or a merge or whatever we want to call it, it's going like necessarily like it's going to like divert the focus right so there is like some associated risk around that what you would and especially because you are both an investor and you have like a lot of experience in mnas so in this younger and not that experienced founders that

Starting point is 00:30:00 might have like the opportunity at the growth phase to go and acquire like a company what what advice you you give? What would you tell them and what would you tell them to be careful of? Totally. Flashing headlight advice is take this very seriously. It's very hard. Right. Acquisitions are usually longer than initially expected, as I say,

Starting point is 00:30:28 six plus months. And that doesn't include post-acquisition integration. And if there's product that is being sold, but it defined tech stack that can be a lot of time and make or break the ROI on the acquisition. I mean, one of the reasons Cisco is so famous with their acquisitions is because of the integration team kicked ass and made them productive. And it will often allow these teams to run a standalone business unit so that they didn't impede growth. For growth founders, there should be a framework to think about the app position.

Starting point is 00:31:10 One is, is it truly an Apple hire? How many months and how much money would it cost for you to go find these people from the market? And can you get great talent very quickly to augment the team, to drive towards that singular mission faster than before. You know, we used to joke it'd be considered advanced HR, people that

Starting point is 00:31:31 didn't fit in the structure of your current compensation packages, or you didn't think you could recruit before outside of an app buyer. Another way of thinking about it is, is the product that they built going to very easily fit into our tech stack and accelerate our product development by X number of quarters so that we can get more customers, raise ACBs, and drive to bigger business faster? And the third is the business financial use case. If there are customers on the platform, if the size and magnitude of that and how it could immediately affect top line if you integrated it or let it run as a standalone business. So I can imagine that for each founder, it's going to be a different motivation of what they need in that moment.

Starting point is 00:32:26 But each different, the three different versions is going to be challenging for other reasons, various reasons. Asazia, one question on the flavor of acquisition that is technology focused, right? So you talked about acquirers, that makes sense, right? Like maybe something that the team that's being acquired has built is a contiguous problem, but you're really sort of applying like a brain trust, you know, to sort of your own vision, right? But when there is a really strong technical,

Starting point is 00:33:02 you know, sort of formal product reason for the acquisition. How do you think about if the product is actually a good fit, right? Because you're introducing sort of, you know, you're heavily augmenting your existing product roadmap. You're bringing in different functionality, user flows. I mean, the issues are multifaceted, right? Like if you're actually going to try to do a product integration. And so how do you, and in your experience, like what are the things that are signals that this is a good idea to do this? Mm-hmm.

Starting point is 00:33:38 The first would be market validation with existing customers or prospective customers? Does this naturally fit into your vision of what our product could become over time? What is the criticality of this technology and how much would you be willing to pay if we added it? Just what a product manager does day-to-day, validation of the tech and what they should be building for the future. The second aspect is really understanding the entire technical staff that the product is built on. If everything is built on BigQuery and you're a Snowflake vendor, you're built on Snowflake, that can be very hard. If it's built in a completely different language, that can be very hard. So making sure that you have an understanding of the core tech stack and how easy it would be to integrate it, because once again, integration is what makes these successful if it would be too

Starting point is 00:34:45 challenging that it's probably not a good fit for the acquire and then the third thing that's useful to validate is not as much on the tech side but I would highly say that like chemistry with the team if you're expecting it to be run as an individual product is crucial. I've seen acquisitions fall apart because there wasn't strategic alignment from both leaders about what the product should become as part of the acquired company.

Starting point is 00:35:20 So it's great point in time. We can add value for the next year. Fantastic. But really I'm coming along as an executive. What should the product great point in time. We can add value for the next year. Fantastic. But really, I'm coming over as an executive. What should the product look like in the next three? All right. That's some great, great points and stuff that I wish I knew a couple of years ago, to be honest. But it's never too late to learn about that stuff. So thank you. And I think that this is information that's going to be like very valuable,

Starting point is 00:35:45 like for the people who listen out there on the show. So Astasia, like we talked about like the modern data stack as it is today, right? And like how it involves. As an investor,

Starting point is 00:35:57 you're always looking for the next thing, right? Like you need to be ahead of the curve, let's say. It's part of your job. Can you share with us, like what do you see as the next opportunities there when it comes to data infrastructure? Things that you are excited about?

Starting point is 00:36:17 Yeah. It's always cool digging into new trends and what's emerging. As I noted at the beginning, I'm a really early stage investor focused on pre-seed, seed, and series A. So many of the great offerings and vendors that we've talked about so far are way beyond me and off to the races.

Starting point is 00:36:38 And so I'm always thinking about like, okay, we have the pillars of the modern data stack. What's next? And so as I mentioned earlier, I think one macro trend that's exciting is the move from batch to streaming. While batch processing constitutes the majority of data workloads today,

Starting point is 00:36:55 we are seeing an increasing proportion of teams wanting real-time data to support operational use cases. And so, as I said, the ingest layer changing from. Airbyte and 5TRION to thinking about Kafka, Red Panda, Maroxa, Decodable going to more real time databases like ClickHouse, Pino, Druid. That's been really cool to see. And what's super neat about real-time, it's not just in the analytics team stat.

Starting point is 00:37:32 We're starting to see it emerge in machine learning as well with continuous deployment and distributed serving of ML models. Think about you go on netflix's you know app you're watching a tv show that's about murder mysteries and then it very quickly learns your interest and so when you complete that show it automatically shows you more murder mystery tv shows. And so that's been really cool to see it move into ML and the role of continuous learning. There's some early stage startups

Starting point is 00:38:13 like Claypot that's trying to facilitate those workloads. I would say that another trend that's been pretty cool and really interesting is there's been a fragmentation. We're talking about portfolio and consolidated businesses that offer many products. Something that we're seeing on the flip side is the fragmentation around water and table formats and query engines. So a water table format layer is like Apache Iceberg

Starting point is 00:38:48 and Apache Hudi and Delta Lake. And this kind of reduces data gravity. It allows data to be moved across different environments. You also see new query engines emerge, which is pretty cool. So on the analytics side side we had spark but now we're starting to our you know even trino we were talking about that earlier but now we're starting to see the emergence of like in memory analytics with doc db on the data, excuse me, on the ML side, we're starting to see

Starting point is 00:39:28 like Ray and Dask that are trying to be like Python natives. So that's kind of cool to see like all these table formats, then also fragmentation of the querying there. The next layer that I'm trying to think about now is like the ML semantic layer. The next layer that I'm trying to think about now is like the ML semantic layer. People use pandas for data prep, but then often when they're building a real model or trying to push it to production, they use PySpark. There's a really cool business out there called Ponder that's built Modin, which helps with distributed pandas to make it easier to do data prep at scale

Starting point is 00:40:08 without having to graduate but like gosh i think it'd be really cool if we had like a semantic layer for ml that can go all the way from data prep to production so teams aren't rewriting their codes they can actually push these models to production faster i know i'm just going on here but i feel like there's some really cool stuff going on. I'm still excited. I know the modern data stack is very well defined, but I'm still excited. Yeah, yeah, absolutely. Absolutely.

Starting point is 00:40:32 And actually like one of the most interesting things with like the modern data stack is that it's very cool like each year to see how it changes, like because it does. Like you see all these new things that are like, as we say, like it seems like it becomes more and more complex. And as we said, that sample, and we are going to have consolidation and like all that stuff, but the fact that this complexity exists is it also like translates into innovation, like many things and many great teams out there that are building like some amazing technology. And I think that the velocity that these new ideas are

Starting point is 00:41:07 delivered, it's incredible. It's really, really fast-paced. It's very interesting to see how from one year to the other, things change in this space. And that's amazing. And I think you listed some very, very interesting technologies out there, like from Iceberg to DuckDB. And people are doing some great stuff. All right. That's great. Okay.

Starting point is 00:41:36 I have a question that has to do a little bit more with the market. And usually, like, okay, when we are talking about data infrastructure in general, like, it's not something that is new. Like, we've been building databases since we have computers, right? But until quite recently, innovation was heavily driven by the large enterprise, right? There was like the banks of America, of the world, like the very big corporations. And of course, like the high-tech giants like Netflix and Twitter, that they had to work with a lot of data. And a lot of, like many of these technologies were created like for their scale, let's say. And they were driving the innovation there, right?

Starting point is 00:42:32 Another thing that I find very interesting with the modern data stack is that if someone like sees how it is defined from a market perspective, it seems like a solution that fits to the large enterprise, but also it fits very well like to smaller companies, right? to the large enterprise, but also it fits very well to smaller companies. Right? Like you don't have to be at the scale of Bank of America to implement the modern data stack in your company and get value of it, right? And obviously you don't have to spend the same amount of money. Totally. So it still feels though that there is some kind of disconnect there. There is like what is happening right now, like in the enterprise in terms of like

Starting point is 00:43:07 innovation and like evolution of the platforms that started with Hadoop in the early 2000 until today, and then you also have like the modern data stack that's pretty much like developed like in parallel. Do you see these two worlds, let's say, in the data infrastructure merge at some point? Or do you feel that they are going to continue evolving some kind of in parallel?

Starting point is 00:43:39 And if they are going to merge, by the way, just to give a hint here, I believe that they are going to merge, by the way, just to give a hint here, I believe that they are going to merge. But when do you think that this is going to happen and what is missing there? Yeah, I love your reference about, you know, large enterprises having a lot of innovation. You know, the Yahoo data team, incredible. The LinkedIn data team, incredible. That was a particular era. Then we went to Google and Twitter and Facebook

Starting point is 00:44:13 and what they were doing at scale. We had a whole bunch of great open source projects and founders come up in those communities. And then the next evolution was, gosh, are you a Lyft and Uber? Like you built some incredible stuff to support the data volumes. I feel like a lot of the great founders that have emerged in data over the past two and a half years have come out of the recently IPO businesses that were data centric. trick. Something that is very interesting to me is as a founder, it's usually more opportune to go to mid-market companies as early stage design partners so you could have in-depth

Starting point is 00:44:58 conversations consistently and be faster time to product development and monetization. You've worked selling into enterprises. Gosh, everyone wants JP Morgan and Amex and even Cisco, Walmart as a customer. Those are going to be some bigger contracts. They're going to be absolutely amazing to transform the business. But gosh, that could be a very long process. That could be nine months to a year. You could be interfacing with multiple different teams, trying to get buy-in and feedback and alignment. The procurement and redlining process, if you ever go to contract, could even be

Starting point is 00:45:41 three plus months. So if you're an early stage founder, you want to get high quality feedback as quickly as possible for people that are willing to pay you money. And so that's usually mid-market customers. So we often guide our founders to focus on the mid-market and build a product that's valuable and then earn the right to go to enterprises as they flesh out their security and their compliance and identify enterprise needs because the sales cycles are going to be longer. You can kind of see that in great data businesses like Fivetran and Segment, right, they went into the mid-market, got into the enterprise and start achieving larger ACVs, six figures, seven figures. But it is a process.

Starting point is 00:46:57 And if they're going to be successful, it's not an if, it's a when. And for most of the data companies we've been speaking with today, you know, outside of Snowflake, which is now publicly traded, you know, they're earlier in their journey. And so if you're five plus years old, you're probably starting to move into the enterprise and see some traction. But I wouldn't recommend

Starting point is 00:47:19 for series A stage data company to be having those conversations because it's going to be a very long process for them. And it's better to demonstrate repeatable sales and get product feedback than to go try to strike out and get an IBM or equivalent business as a customer. Yeah, 100%.

Starting point is 00:47:44 Okay, that was some great, great feedback on that. I'm sure that like many founders are like thinking of that stuff to be on it. There's always this pressure of like, okay, when are we going to get our first enterprise customer and I think my feeling is that, okay, you can, you can, you can go for a long time without having to worry about enterprise. And I think you can see that also with companies like Snowflake, right? Snowflake managed to get to the IPO. Yeah, they had enterprise customers, but they were not an enterprise company, right?

Starting point is 00:48:21 They were driving a lot of like their growth from the mid markets for a very, very long time. So you can get like a lot of my luck, let's say by just like focusing on that. Anyway, I know that we are getting closer to the time here. So just like one last question from me and then like, I'll give it to Eric. What about OpenShorts? OpenShorts has been like a traditionally, let's say, big component of the go-to market motions around data products. How important do you think OpenShorts is and is it going like to remain important in the future for building a company that builds data infrastructure?

Starting point is 00:49:09 Costas, thank you so much for the question. I love this question because I get this question all the time. I work with really early stage people who are IDing and thinking about needing their business. And they're like, gosh, do I need to be open source? We're going to go build a data and ML tool. And the answer is no, you do not need to be open source? I'm going to go build a data and ML tool. And the answer is no, you do not need to be open source. You need to look at your subcategory and what the precedent in the category has been. You know, I probably not recommend these days going and building a query engine that's not open source.

Starting point is 00:49:41 But, and, you know, there's often a precedent in a lot of databases that you should be open source, but even there, Snowflake is not and you do not have to be. So I would look at your individual segment and see what historically has been the go-to-market motion of whether it's open source or commercial. Another thing that's really important to think about is what is actually the

Starting point is 00:50:06 open source value for your customer? Is it because they always want to see the code base? Are they afraid that if you grow under each, you know, it's such a critical layer in the stack that they'll have to do migrations, which would be very painful? Is it some other rationale? And then also, what is the value of open source to the business? If it is, hey, top of funnel, pipeline generation, that's one conversation, that can be a free trial. Is it, hey, we need awareness and particularly with software engineers and 70% of the software engineering stack is open source. We have to go open source. That's different.

Starting point is 00:50:51 The last thing I would have people think about is the buying and adoption pattern for their product. If you are considering open source, ideally ideally there's a single player mode for the open source. I, as a data professional or software engineer, take the open source, implement it, getting it up and running, and add value. And I theoretically should have the credit card swiping capability to purchase as an individual unit. And then over time, you build in team and corporate value to expand the contract size. If your adoption pattern is multi-pronged, you have a purchaser, you have a stakeholder, you have a stakeholder, you have a user, sometimes you don't necessarily need open

Starting point is 00:51:49 source there because the user doesn't have the ability to pay you. And he's going to go, he or she's going to go to a loss and they're going to have to have a sales person come in and pitch and demonstrate value, do an ROI calculator, move the contract forward. And in those examples, sometimes even if you're open source, it does not generate pipeline for you. It creates brand and awareness and helps you, but you're still doing top-down sales. And so theoretically, you didn't need to be open source to begin with. It's a nice-to-have versus a must-have. So as I was saying, I don't think you need to be open source.

Starting point is 00:52:36 It's always a great question. I've had a lot of back and forth on Twitter about this, but I really think you need to identify the segment, purchasing, and the intention for both the business and the user of its value. Thank you so much. That was great to hear from you about open source. It's a very controversial matter. So Eric, your questions. Eric Boerwinklemaier, All right. Well, we're close to time here. So Stasia,

Starting point is 00:53:07 one more question for you. And this may be unfair because I know you love all of your investments equally, but if you were going to say, okay, I'm not going to be an investor anymore, and I'm going to go found or start a company in the data space, which problem area would you focus on? Like which, like on a personal level, which area sort of interests you the most that you would want to sort of sink your teeth into like operationally as part of, you know, whatever founding team, go-to-market team, however you want to look at it. Yeah, that's a great question. I'm pretty pumped about the movement in ML to data-centric ML from model-centric. Model-centric being, hey, we spent a lot of time building customized models

Starting point is 00:54:01 and trying to move forward the architectures of them, graph-based, you know, large neural nets. Data-centric is this movement where, hey, you can find a lot of great models online, download them off of GitHub, do the rebalancing and the training. What's really going to affect the outcome is the data collection labeling and quality and having insights into that will be critical for the the model's value and so for me i think that's a really exciting space it's kind of the analog of the data observability space applied to machine learning. And so there's some really cool businesses operating there like Unlocks and Galileo.

Starting point is 00:54:51 And so I think that's a really cool movement because it's a macro trend. Once again, like the new approach of the cloud with analytics, it's a macro trend. It's in the data path. It could be a daily use tool and it directly affects the end user's performance in all the building and then in the role itself. So yeah, we agree. Can I add something, Eric? Sure.

Starting point is 00:55:18 So if I judge by her reactions when we were going through the next steps of modern data stack and what excites her. I think that she would start the company around the semantic layer for ML. Like she was the most excited when she was talking about that. So that's my prediction. Love it. Well, Astasia, this has been so wonderful. Thank you for giving us a little bit of your time.

Starting point is 00:55:43 We learned so much. We'd love to have you back on the show sometime. Awesome. You guys rock. Thank you so much for having me. I love all things data. I love talking about what's happened, what's coming, all the cool episodes and startups. So it was an absolute pleasure. Thanks so much, guys. I might be stealing one of your takeaways, but I really liked Asasio's perspective on open source. In some ways, some of the answers were really simple. If you're trying to give people visibility into how your product is built and how it works, there are a lot of ways to do that other than just literally having a repo that

Starting point is 00:56:24 everyone can see on GitHub, right? Because there's a lot that goes into sort of having a successful open source project in terms of the community surrounding it, contribution, all that sort of stuff. And it was just refreshing to hear her say, there are a lot of ways you can give the same or a similar experience to users without having to, you know, sort of make a part of your product strategy and a true open source effort, which was fascinating and really, really interesting here. Yeah, absolutely.

Starting point is 00:57:01 I think he gave like some amazing, like actually advice in many different topics. And obviously like one of them was about open source and how much like you, it is important to invest in open source. But I also, what I keep from the conversation that we had is that we're at the stage right now where you can start like a data infrastructure company and go after like the mid-market. You don't, like 10 years ago, like you pretty much had to go after like the enterprise view on like company or app data. You don't necessarily have to do that anymore. And I think that's something that gives like a lot of freedom to entrepreneurs out there or like people who are considering of doing something like that. So I definitely keep that from our conversation with her. That was like amazing to hear from an investor.

Starting point is 00:57:51 I agree. All right. Well, thank you for joining the show. Thank you for dealing with my raspy, sultry voice. And I will make sure that I can talk normally again before the next show. We hope you enjoyed this episode of the Data Stack Show. Be sure to subscribe on your favorite podcast app to get notified about new episodes every week. We'd also love your feedback. You can email me, ericdodds, at eric at datastackshow.com. That's E-R-I-C at datastackshow.com.

Starting point is 00:58:22 The show is brought to you by Rutterstack, the CDP for developers. Learn how to build a CDP on your data warehouse at rutterstack.com.

The Data Stack Show - 105: The Modern Data Stack Is Just Getting Started with Astasia Myers of Quiet Capital

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.