The Data Stack Show - 101: The Future of Machine Learning with Willen Pienaar of Tecton and Tristan Zajonc of Continual

Starting point is 00:00:00 Welcome to the Data Stack Show. Each week we explore the world of data by talking to the people shaping its future. You'll learn about new data technology and trends and how data teams and processes are run at top companies. The Data Stack Show is brought to you by Rudderstack, the CDP for developers. You can learn more at rudderstack.com. Welcome to the DataStack show live stream where we are going to talk all about machine learning with two brilliant minds, Tristan from Continual.ai and Willem from Tekton. Costas, what do you want to ask these two super smart people about ML. Oh, they have both been in the show before. And it's been a while, I think, since we caught like the episodes with them.

Starting point is 00:00:52 So I'm very curious to see like what has changed. Yep. So it's time for a catch up. Let's see what has happened in the ML space since we talked with them. And I'm pretty sure like many things will come up, like they're both two very bright and very knowledgeable guys in the space. So I'm, I'm sure that we will have surprises.

Starting point is 00:01:16 So let's go and chat with them. Let's do it. Welcome to the Data Stack Show Live. This is going to be such a fun conversation. We've been excited about this for weeks, maybe even months now. And we're going to talk all about ML, current state, future, and we have two of the best minds that I know of here to talk about that. So let's start out with some intros.

Starting point is 00:01:46 Tristan, you want to go first? Hey, I'm Tristan. Glad to be here, and thanks for the invite. Great to be here with Willem, whom I've admired for a long time. So my name's Tristan. I'm co-founder currently of a startup called Continual. We're building an operational AI platform

Starting point is 00:02:01 for the modern data stack. I've been working on ML infrastructure for the last 10 years. Did an early enterprise data science platform called Sense, which was acquired by Cloudera, sort of the big data company behind Hadoop. Spent a good number of years there building their machine learning platform,

Starting point is 00:02:19 which they call Cloudera Machine Learning, and got to see all the pluses and minuses of that sort of generation of data infrastructure and machine learning and got to see all the pluses and minuses of that sort of generation of data infrastructure and machine learning infrastructure. So yeah, really excited to be here to talk about the future of machine learning and machine learning infrastructure. Awesome. Willem.

Starting point is 00:02:37 Yeah. Hey, Eric and Justin. Yeah, it's great to be here. Yeah. So my background, about almost a decade in the software and data space. A few years ago, I spent about four years leading the ML platform team at a company called Gojek in Singapore, where we really sunk our teeth into building a complete end-to-end stack for a major data form for a bunch of different use cases. So we really learned a lot there and through that process. And one of the tools we built out of many was Feast, which is a feature store that ultimately was open sourced and became a little bit popular. And about two years ago, I moved over to Tecton, a company that focuses purely on feature stores and has an enterprise offering where I continue to invest in both my time in the open

Starting point is 00:03:25 source side, as well as the enterprise side, building out the feature store technology and the whole category. Leo Dionne Jr.: Love it. And cannot wait later in the conversation to talk about open source, especially as it relates to ML stuff, I think it's a really fascinating topic, I think. Let's, I want to kick it off with a question that I think a lot of our listeners have faced in implementing ML at their companies.

Starting point is 00:03:56 And the context behind this is that we've had tons of guests on the show who talk about things like overuse of ML, misapplication of ML, you know, sort of like we're, you know, the data science team was throwing ML at any problem that moved and, you know, that created problems, et cetera. Or situations where it's sort of the inverse, right? Where we spent a lot of time trying to solve this problem and ultimately we realized that it would have been way better to sort of, you know, use infrastructure, you know, both inside of companies for companies of building tooling around ML infrastructure and seeing it live on the ground. Would love your perspective on when is it right to use ML?

Starting point is 00:04:57 I know that's a sort of a broad question, but what are the conditions? And we'd just love, even for our listeners who have sophisticated ML functions running at their companies, what are some of the signals of it being a really good use case for ML? So, Willem, do you want to start since Tristan did his intro first? Yeah, I mean, I think for me, there are two classes to this. There's use cases that are well-trodden paths that are already established within the space, like in the market. You can think of like recommendation systems or fraud detection or churn predictions. And there are more experimental,

Starting point is 00:05:39 more moonshotty projects. And even at my time at Gojek, we saw this a lot. We saw teams that would conjure up the totally new use cases that you can never think of and say ML is required here. So typically, I'd say if you're thinking of introducing ML, if you're using or if you're attacking an existing use case that is already established in the market, you probably can quantify the impact of that before you even start.

Starting point is 00:06:06 You know how many users you have, you know what traffic you have. You can probably get a back-of-the-map estimate of what the impact would be just based on the amount of companies and teams that have already built those systems. If you're entering a kind of moonshot-y space, then I think it is a little bit more dangerous, especially if those spaces have already existing techniques, like let's say you're in banking or finance and you can use

Starting point is 00:06:32 SQL, you can use R, use something else or simpler techniques. In those cases, I think it's a little bit different. I'd say try and quantify the impact ahead of time and steer clear of the moonshotty or new types of use cases because there are not that many major ML use cases that are being discovered every day. A lot of the top ones are already out there.

Starting point is 00:07:02 Super helpful. Yeah, no, I would agree with everything Will said there. I mean, I would say on the last point, I do think we're entering an era where there may be some additional use cases that people are going to discover with these foundations, these large language models that are being developed.

Starting point is 00:07:17 And so it does feel like, and then there is some early examples of this. Like if you look at, for instance, GitHub's Copilot, like what's the revenue from that? It's actually quite significant, very, very fast. But historically, I actually wouldn't say that wasn't the case. There weren't a lot of examples of that over the last five years. There's maybe a few examples now, and it's very, you know, it's hard to find those new use cases. So definitely would agree with that. The other thing I would add here really is it's highly

Starting point is 00:07:44 contingent on how difficult it is, right? So, you know, I would recommend if you're going to build a large language model from scratch, that's going to be incredibly difficult. Like don't, you know, unlikely you should do that. You're going to spend tens of billions of dollars to do that. On the other hand, if you can use an API experiment, you know, in sort of minutes and to see if you can get some interesting results or change your product in an interesting way, absolutely go do that. And so I think the same thing, I think that applies to sort of all ML use cases. As the difficulty of doing the use cases goes down, there's an opportunity for more users, more use cases to implement it or sort of the ROI to be positive

Starting point is 00:08:20 for that. And so I think both Willem and I are working on systems that are essentially trying to reduce the complexity of doing ML, and so that will make it more of the ROI positive in more domains. Henry Suryawirawanacke... Guys, I have a question. Do you think we can go through like some of the most common use cases that you see out there that have been traditionally like tackled through ML and do it like in a very like pragmatic, let's say, way because you know, like most people, especially people

Starting point is 00:08:54 who don't work in this area, like when they hear about ML and AI, they think about like self-driving cars, like language models, like automated generated art and like all these like very fancy stuff that we see that's like really at the state of the art right now in terms of like what ML models can do. But machine

Starting point is 00:09:18 learning, it's nothing new, right? Like it has been used for a very long time and there are like very concrete use cases with a very concrete like business value out there. I think there is like value to just like go through like the most common use cases out there so people can like relate to that, right? David Pérez- And I will add on to that a little bit just to maybe add a

Starting point is 00:09:42 little bit of spice. I've kind of had this theory for a while that you can boil business models down into sort of a durable set. So if you think about something like e-commerce, when you think about ML in the context of a purchase flow, you probably know a huge amount of what's already known about that. And it's just changing variables. And so it's really interesting to think about some of that stuff is probably just already known, like the work doesn't even have to be done. So anyways, just wanted to add that little component onto it. Yeah, maybe I can jump in. So I think from at least my perspective, I'm a little biased because the types of customers we have does slant more towards like line of business.

Starting point is 00:10:28 You know, it's like e-commerce or, you know, it's potentially banks or it's like ride-hailing companies, those kinds of customers. Less so, you know, like customers doing like language models or, you know, image or video. And so we're not really focused on self-driving cars and more like that kind of leading edge of the space. So what we see a lot is the top two are definitely recommendation systems and fraud detection systems, primarily because we're focused so heavily on real-time, bad text on.

Starting point is 00:11:01 And my past experience at Gojek, that's also a focus area for us and so i'd say those are the two ones and of course churn prediction and optimization um or two other big ones so at gojek for example we you know predict churn for users we would identify a cluster of users that are high risk and then we would send out vouchers to them and make sure that they're happy and they have a higher retention. And then, of course, pricing and personalization of your product towards a customer is also another area that is a little bit domain specific, but also a very common use case that we see. And it can go from batch to real time depending on the kind of use customer yeah i mean i sometimes think of it as like three main categories one you know building

Starting point is 00:11:52 fundamentally new products and services that are like only possible because of the ai that's embedded inside that's self-driving cars right self-driving cars example you know alexa is an example of that you know series an example of that. So these are products that could not exist if you didn't have this underlying capability. That I would say is the minority, but could actually be the most transformative over the next 10 years in terms of what is possible. I think what we certainly see the most at Continual

Starting point is 00:12:18 and in my previous roles were really twofold. One is improving existing products and services. So for instance, personalization is just a no-brainer for an e-commerce store. To do that at various parts of the customer journey from search, re-ranking to sending personalized emails to what's on the homepage, that's huge. And it makes a direct revenue impact on it. There's lots of micro, if you're a hyperscaler like Facebook, there's lots of micro optimizations that you can do. Who exactly do you show? What image do you do? And then what text should you show as part of that image? And what friends should you show? And so all those little ML models are feeding into a product experience. Some of them may have relatively small effects, but you're a big enough company to do them. The third set of use cases, which we see

Starting point is 00:13:06 a lot at Continual currently, is all the ones around business operations. And so if you think about a retailer or you think about a manufacturing company, they don't have a customer-facing product, but they do have an immense business where there's opportunities to make predictions. And typically, we see two main classes in this category. One is around your customers. So things like churn, right? Lead scoring, upsell opportunities, what more on a kind of a, not around real-time basis. So everything to make predictions about your customers. And the other one is around operational use cases, things like inventory forecasting, supply chain optimization, logistics optimization, lots and lots of, if you're at a certain scale,

Starting point is 00:13:44 then doing those optimizations become important, particularly when you're in a competitive industry with low margins. And so that's sort of a final set of use cases. And I see a lot of those right now. Stas Milius Ivovitch. So guys, that's awesome, first of all. Like, it's super helpful for me too. Like, I'm always trying a little bit, like, to enumerate all the different

Starting point is 00:14:04 use cases around the mill, but like what you said, like makes like total sense. And why we don't have, let's say, churn as a service, churn prediction as a service, or why don't we have, let's say, I don't know, recommenders as a service, right? And we need, let's say, companies to go and invest in infrastructure for ML and like building on top of that infrastructure, like all these models. What's the reason that we haven't seen like a market like this and we lean towards, let's say, a world where like people have, like companies have like to do their own and maybe their infrastructure to build these models? So maybe we should take turns. Tristan starts one first, but I'll take this one quickly this time.

Starting point is 00:14:46 I think there's an aspect of like, it is actually happening. We are seeing vertical products for ML being built. We are seeing AWS personalizes purpose-built for Rexes, right? And there are fraud detection vendors out there. And so there are off-the-shelf tools that you can use. There are shortcomings to them, and there's risk to them because they're typically not completely end-to-end. And so you have integration pains in some cases with those vendors, but they are finding success. And also a lot of the work that those teams have to do. But I think another one is, well,

Starting point is 00:15:19 another point or aspect is IP. For ML, a lot of companies see the actual system, the ML system as something that's important and a competitive advantage to them. And so they often don't want to outsource that because if everybody can just use vendor, then what's your competitive advantage with ML? You're basically breaking even on that front. And so they think, okay, we can just invest in this area

Starting point is 00:15:40 and leapfrog our competition. Yeah, that's super interesting. What do you think, Tristan? What's your take on that? Tristan Marquez- I think the reasons, so I do think that for product use cases, the IP issue is very real, but I think for the, uh, for like the business operations use cases like churn, there is this question, well, why isn't a vertical, why isn't it in a verticalized tool? And I think it's the same reasons why, you know, BI tools still exist, horizontal

Starting point is 00:16:03 BI tools, the data is so diverse and the questions that you're going to ask are so subtly different. So even when we have customers, almost every customer that we talk to asks us about churn. And then the question is, well, what do you mean by churn? Is it churn in the next 30 days? Is it at the end of the contract duration? Is it a dollar-based churn measure where maybe you have a usage-based churn and you could have an expansion and contraction? Maybe you have a premium plan and a basic plan and you're trying to decide on whether the churn is between that. Maybe it's all of these. Maybe it's over different time horizons. And your business wants to have all of these different predictions. And it's very

Starting point is 00:16:37 hard for a vertical tool to do that, both from an outcome perspective to defining all the outcomes that you want. You'll do churn and you'll do these variations of churn. Then you'll want to do lifetime value. Then you'll want to do, like, are they a highly active user? Then you'll want to do what product next they're going to buy. And those predictions that you're going to want are also going to leverage the same inputs, the signals. Ultimately, your predictive model

Starting point is 00:16:56 is only going to be as good as the data that's flowing into it. And so then the question becomes, okay, well, what data do you have? And you're going to want to integrate all this different data from all these different sources. So for instance, we have customers on, for instance, in the Shopify ecosystem where you'd

Starting point is 00:17:11 think, oh, everybody has standardized data. Why can't I standardize LTV models and standardize churn models or something like that, just standardized personalization models? But they have other data, right? They do have, for instance, on a few in-person stores, right? Which are not part of the Shopify ecosystem. And so I think that's where you're seeing this sort of like new modern data architecture where people are one, trying to integrate and aggregate data inside these kind of cloud data warehouses. And then, you know, on that shared data, build a whole bunch of

Starting point is 00:17:38 shared sort of use cases on top. And BI is obviously, you know, the first one you buy a horizontal BI tool. I think ML tools, ML is very similar where you can build and you can leverage that data that's inside your data warehouse. Yeah, I think that's a very good point. And I think it originally was even a kind of, this impacts the infrastructure and tooling, and it was worse originally where you'd have ML tooling that is super horizontal. And even if you're vertical, the vertical tool is also limited in some ways, but you know, we're not limited, but it needs to be tailored to a specific use case.

Starting point is 00:18:11 Like if you take, you know, just fraud detection, for example, that is a very broad category. There are different types of fraud and it all depends on the data model of the company, you know, if it's credit card fraud or something else, you know, KYC. So yes, I's it is improving

Starting point is 00:18:29 though so as we go from horizontal to vertical but there is definitely a problem of customization and so it needs to be a lower level abstraction than often is produced out there one one follow-up question on that i'd love to begin just a little bit more on the specifics there. What specifically have you seen improve, and how has that changed the process of delivering ML? At what points in the lifecycle of the build have the most significant changes happened? Well, I'll start on this one.

Starting point is 00:19:07 I still, I mean, my honest take is I still think the DevOps ecosystem is horrifying. You know, it reminds me a lot of like the Hadoop era. So, you know, I spent a lot of my, you know, five years of my life sort of in the Hadoop ecosystem, you know, in sort of the 2015 to 2020 timeframe. And it's incredibly powerful. You can do amazing things with it, right?

Starting point is 00:19:27 Everybody's excited about it. Everybody sort of, there's that energy behind it. That same thing applies in ML and MLOps. And that's all true. And there's open source and there's a vibrant ecosystem, all of that. But then it sort of gets to this point where you're like, wow, this is way too complicated.

Starting point is 00:19:41 And that happened with the big data ecosystem, right? Nick Schrock, the CEO of Dagster, has a saying where he says, we went from like an era of big data to big complexity. I'm like, I sort of feel like the same thing has happened in MLOps. Now, one thing that I think, sort of two things that I think are,

Starting point is 00:19:55 that are happening in MLOps, which I am excited by. One is I do think that there's a rise of really next generation best of breed tools, right? And so, you know, TechCon might be one around feature stores, you know, you have weights and biases around sort of experiment tracking. These are like, you know, good, good tools that are definitely far better than anything,

Starting point is 00:20:13 you know, if even an alternative existed in the past. You know, I'm also excited by, I think there are, you're seeing in the tech companies, like sort of little next generation platforms coming out that are, you know, sort of have higher level of abstractions. So, you know, Facebook just talked about their internal platform called Looper, which is sort of a declarative end-to-end real-time machine learning platform for product decisions. It's incredibly, you know, they've radically, radically simplified interface that engineers need to use to build predictive features into their products. And so they can have hundreds of use cases now that are very rapidly implemented and relatively easy to maintain. And so at Continual, we're kind of trying to do similar things with Loring.

Starting point is 00:20:58 I still think this, if I talk to any person who's doing MLOps, nobody says they love what they have is is what my most my conversations with people who are in the trenches right it's like we get it to work but you know it's it's not totally awesome yeah i think the best of breed there's kind of two paradigms in my mind there's end to end and there's best of breed within end to end you've got the horizontal platforms like the og michelangelo and I guess Kubeflow is horizontal. And then we have the vertical ones that we just spoke about. And out of all of those, I think they have different trade-offs, right?

Starting point is 00:21:30 Like the vertical one we said, it may need to be tailored towards the use case and it's not as... All the use cases are subtly different depending on the domain. But then if you've got a best of breed, you've got a different problem where you've got an end-to-end flow in which you're introducing a single component. And then as a vendor, you can build the perfect component. But then if the user still has to build the end-to-end system, that's hard. And so what we see in the ML op space is it's a death by a million cuts, right?

Starting point is 00:21:58 You have so many decisions you have to make yourself. How do you do artifact tracking throughout the whole lifecycle and metadata management? And how do you do experimentation? Because you're not just plugging these into like Lego pieces. And that's extremely, extremely difficult. And then more so in the basic breed world. But I do see this like,

Starting point is 00:22:16 there's a divergence and convergence, right? There's a divergence where folks go away and build these tools. And then there's a recognition of the tools that are best of breed. And then, you know, you can see all these blog posts coming out of, oh, you know, dbt works this product and there's integrations between them.

Starting point is 00:22:31 And more and more they're getting glued together in a way that makes sense and allows you to chain them and removes all the decision friction and fatigue that users have to experience today. So yeah, we're in a kind of like weird spot right now in the ML Ops industry space, but're in a kind of like weird spot right now in the MLOps space, but hopefully we can kind of like power through this and get to kind of consolidate its modern ML stack.

Starting point is 00:22:52 Yeah, no, I totally agree with that. I think there is this, there's this tension between this two. And I think, you know, there are a lot of startups that are doing these sort of best of breed narrow products. And then they're thinking about the integrations and trying to swing those integrations. I think the hyperscalers are saying, oh, no, no, we have

Starting point is 00:23:07 these end-to-end platforms. But if you actually look at them, they're a bunch of not best of breed individual things that you still have to glue all together. So it'd be one thing if they were saying, okay, no, here's a template to do sort of continually improving Rexis type use cases, right? Where the models maintain, the predictions are being made in real time, the features are maintained, right? The whole thing is being monitored. If they were saying, okay, we make that easy for you, then you might say, okay, I'm going to go all in on an end-to-end platform. I think the challenge right now is if you look at these platforms, they basically are, right, a bunch of different components that, you know, then you sort of have to kind of glue

Starting point is 00:23:38 exercise to the reader to glue them together, and they all have these stack diagrams that look crazy, you know, crazily complicated. I'm definitely hopeful that there will be sort of end to end approaches that make it very, very easy to implement use cases, but that don't don't expose all that complexity to users. I don't think it means end to end. OK, we're one vendor that has 10 different products and you put them all together yourself, I think you have to say what's the end that you're trying to achieve? Maybe you have to narrow

Starting point is 00:24:07 yourself into a domain, right? Maybe you have to narrow yourself into a domain like personalization or real-time machine learning or sort of continual batch maintenance in your data warehouse for business use cases, right? And then you can build, if you can narrow the scope, maybe you can find the right abstraction that makes the end deliverable easier to achieve while, you know, successfully. Basically. Henry Suryawirawanaclarenailatil.com So I hear you talking about like ML Ops and I'd like to hear from you about what like differentiates the operations around like the infrastructure that's

Starting point is 00:24:43 ML has compared like to the rest of the infrastructure that ML has compared to the rest of the infrastructure that the company has. I mean, Ops is a very big topic. We've been investing and coming up with new tools all the time. And there are some amazing things that are happening when it comes to DevOps, for example, and all the platforms and new tools out there. Even in data, big data, as you said, like we started like with big data, it became like, like a big complexity to manage.

Starting point is 00:25:10 And there's like a lot of improvement there too. Like for data engineers, like to simplify like operations and work more efficiently. So like why ML is different? Like what do we need? What is missing? Well, effectively you've got a, you know, data driven decision system. And so there's inherent complexity

Starting point is 00:25:30 about making decisions in your company that have an impact on your bottom line, with a system that is, you know, making those decisions based on data, whether it's ML, or like a regression model, or whatever it is. And so you can't do something like have a test oracle that just says, okay, this thing is good to go, ship it into production. You never have 100% confidence. And so you need ML-specific infrastructure

Starting point is 00:25:55 where there's experimentation systems or monitoring systems that can track the outcomes and compare that to predictions and make those things obvious to your end users. And I think those areas are still a little bit nascent today, the appreciation for that. What we see a lot of companies do is they identify the P0s, the critical things that they have to do to get a model into production and get an API up and maybe get it serving traffic,

Starting point is 00:26:21 but they don't have the rest of the story around that, the monitoring and experimentation. And often this ties back to the problem of when to use ML and when to not use ML. If you didn't quantify this ahead of time and you didn't perhaps start with a non-machine learning model that you could A-B test against your machine learning model, then you're almost doomed to fail. But yeah, the summary is there are inherent complexities with ML if you're basing decisions of your organization around data. Yeah, no, I agree with that. I mean, I do think that right now there's two siloed stacks.

Starting point is 00:26:55 There's this sort of machine learning stack that honestly feels to me more like it's coming out of the sort of that Hadoop era, sort of like the next gen. Maybe it's a little bit more cloudy or something, but it's kind of got that feel to it. And then there's this analytics and there's more analytics oriented stack, which is very much centered on SQL and the data warehouse. And then there's, you know, a whole ecosystem around that from job orchestrators to data

Starting point is 00:27:15 quality and monitoring tools. There's a whole ecosystem of vendors, huge vendors around data observability and monitoring. Then as far as I can tell, I haven't like at all looked at the ML monitoring and observability and monitoring. Then as far as I can tell, I haven't at all looked at the ML monitoring and observability use cases. I do think that there will be convergence of these stacks. I think we will converge onto these hyperscale data platforms. That's where

Starting point is 00:27:35 the data is going to primarily live. I do think that there only needs to be one job orchestration system. You don't need two job orchestration systems, one for ML and one for the rest of your data engineering. At least if you're going to build all these things job orchestration system. You don't need two job orchestration systems, one for ML and one for the rest of your data engineering, at least if you're going to build all these things yourself. I think it's interesting, is there a convergence of monitoring? Because the use cases around ML monitoring is very different, and the traditional data monitoring companies are not building the

Starting point is 00:28:01 features that an ML monitoring team would want. It does feel like they're kind of separate, and there's things that you would want over in one area that you might want in the other area. So I think it'll be interesting to see how they converge. There are unique challenges, though, I think, to Willem's point, making real-time decisions, right, where you have real-time features, you have historical training of unpredictable models and non-deterministic outcomes. You don't even know the outcome of what you're doing until you deploy it into production potentially, right? What's the product impact? You might actually, the machine learning metric that you can measure during training might

Starting point is 00:28:35 not actually be the business metric that you care about. So you have to run an A-B test and kind of do this sort of roll out of A-B tests. Thinking about that is a very sophisticated, on the scale of sort of data products that you can build, I do think the machine learning products are the kind of most challenging type of products because they really require you to think about all of these concerns and then the continual ongoing life cycle of those concerns, one and done type situation typically. Yeah, the point that you raised about the analytics stack and the ML stack is also a very valid one. I mean, it's clear to me that there is a

Starting point is 00:29:05 yearning for simplicity architecturally within companies. And so that's part of the appeal of the modern data stack is that you can just shove everything into your BigQuery or Snowflake or other data warehouse or lakehouse and centralize everything around that, right? Ingestion,

Starting point is 00:29:21 transformation, reporting, etc. I think the challenge with ML is, of course, you're making real-time decisions in a lot of cases. And so there's a kind of a philosophical gap there, organizational friction where you've got data warehouses built in a way that is, perhaps there's no staging, there's no staging production split,

Starting point is 00:29:42 but engineers demand that. And so they are wary to use the data warehouse as this kind of like interface or source of truth for production. But at the same time, you're seeing teams, you know, ship ETL pipelines with models in them for batch use cases, perhaps. And so there is a bleed over between those two. And I think long term, we'll see a consolidation there, just because there's a lot of, you know, pressure towards having a single system that you store your data in not a bunch of data islands maybe one or two ways maximum that you want to transform your data it's only if you really need to have for example an etl system perhaps streaming or you need to have on-demand or real-time transformations that you pulled out the data but people want to have a single place where they do something in a single way.

Starting point is 00:30:27 And I think a big part of that's education as well. If you've got a workforce that's not being taken from traditional roles into analytics or data, and now you're also bolting on ML into that, there's a lot of retooling and reskilling that's happening and you don't want to overwhelm your workforce. Yeah, makes total sense. Do you feel like this convergence has started already? Definitely, yes. Are there some like examples like with the technology out there that's like demonstrate this convergence happening?

Starting point is 00:30:59 Yeah, I mean, we're just seeing companies like Snowflake and BigQuery growing in adoption. And teams rightly starting with tools like dbt for machine learning even. And then as they need fresher predictions or low latency predictions, introducing more real-time elements to it. Mm-hmm. So, so, so yeah. Yeah, no, I think there, that, that absolutely, like there's a convergence towards like, look, these hyperscale data platforms that have SQL at the core. You know, Snowflake, BigQuery, best kind of best read Databricks and even the direction of Databricks, which has a more different maybe heritage. But if you look at directionally where that, where they're going from a

Starting point is 00:31:44 technology, it's much more into tables and query planners and Delta Lake and all that stuff, even if it's kind of under this late house umbrella. That seems to be the core foundation. I think every company that we talk to wants to consolidate data for all of their use cases, ML and analytics and kind of the whole business

Starting point is 00:32:03 into one of these hyperscale data platforms. The challenge with respect to ML then becomes, well, okay, what are these additional needs that ML has, particularly real-time ML? So, you know, it's real-time feature generation that tends to lead to streaming. And so what's your streaming story, right? There's real-time feature store storage or real-time feature serving that leads to, well, what's your key value, sort of role-oriented store, right? So these data platforms that have so much traction are all built for analytical use cases. And so there are technical limits right now

Starting point is 00:32:34 that haven't yet been really overcome. So the obvious ones to me are the streaming one, the real-time serving, maybe, you know, vector, you know, sort of like nearest neighbor vector search for things like personalization, where you need to actually do sort of approximate nearest neighbor lookups. These are kind of core bits on a production ML infrastructure that you would typically

Starting point is 00:32:52 have. I think there's a question then, are those going to be separate systems, right? Are you going to have a straight pipeline where you do that? Are you going to have like a hot cache of data? Or are these, you know, these core platforms so ambitious, right ambitious that they're going to try to absorb those or expose those capabilities inside of the core platform? So Snowflake recently just announced that they have this hybrid HTAP or hybrid tables concept where you can do fast row lookups that potentially enable some additional real-time serving use cases that might be useful for ML. They're looking at heavily investing in streaming that might close the gap in terms of

Starting point is 00:33:30 real-time feature generation so that you can consolidate and bring those workloads on the platform. Databricks is a platform that has some additional flexibility where you can do some of those. It doesn't have real-time serving. It'll be interesting to see where this all goes

Starting point is 00:33:45 and whether it ends up being kind of one core data platform and infrastructure with a whole bunch of workflows on top where the other vendors are building workflows or whether there are core infrastructure bits that we'll kind of need to still glue together to do production machine learning. So you mentioned something, William, that's very interesting.

Starting point is 00:34:05 You mentioned that people start using dbt even for ML use cases. And it makes me wonder, there was always this dichotomy between ML and analytics use cases where we were saying, okay, the language of analytics is SQL, for ML it's Python, right? Like people don't, let's say cross this boundary like easily. Did you see something like changing there? Like, do you see like SQL becoming like more important? And why is this happening if it's happening?

Starting point is 00:34:38 Yeah, I think a big part of this is, well, there's multiple points here, but one is the performance aspect. Like if you write the code in SQL, you just get the performance of this provider out of the box, right? You're getting really, really high performance queries. And to Chris's point of like these vendors or these cloud providers extending their data warehouses,

Starting point is 00:34:59 you know, a lot of them support Python. Okay, BigQuery only supports JavaScript, but I think, you know, if you look at like Databricks and Snowflake, you know, you can pull in Python libraries. And so, supports JavaScript, but I think if you look at Databricks and Snowflake, you can pull in Python libraries. And so the days of saying you have to extract the data and use Python outside of it, I'm not sure if that's

Starting point is 00:35:14 a long-term viable reason to say these are distinct systems. Increasingly, I think folks want to have a single way to do things, and I think if these platforms increasingly have capabilities that mimic what is available on the ETL side, there's really not a lot of justification to kind of externalize that. You do want to reuse those platforms and keep your architecture as simple as possible. Yeah, another point I think is that AI and ML is becoming increasingly data-centric.

Starting point is 00:35:45 So what matters really is the data that's feeding into your models. And the pipelines that are happening after that are becoming increasingly commodified. So what you really need to do, there's an argument that what ML is, is basically, okay, what are the set of inputs? How do you model your inputs, your features to your ML problem? And what are the set of outputs that you're trying to predict? And in the end state, do you really care? Is that all you really need to provide, right? Maybe all the other stuff gets hidden from you. So if you have a system where the data becomes much, much more important, then all of your work ends up sort of focused on data transformation. And that's really where

Starting point is 00:36:21 these data platforms shine. So where they don't shine, okay, if you're going to write, you know, your custom TensorFlow model and you need to train it, your PyTorch model, you need to train it on GPU. No, I mean, pushing that down into the data warehouse makes no sense. Currently, it probably is not going to make sense, you know, any, honestly ever. So, but, but, but, but on the other hand, if, you know, if all of that stuff is sort of that stuff is hidden from you, then your, your job ends up being a sort of a data management and data manipulation job. If all of that stuff is hidden from you, then your job ends up being a data management and data manipulation job.

Starting point is 00:36:47 And I think there's no question that SQL, maybe with a little bit of UDFs here and there in Python, is just such a more manageable way to do your data transformation, data engineering work, including feature engineering work. And that's where tools like dbt, which sort of puts SQL at the core, but now are increasingly even allowing

Starting point is 00:37:04 even little snippets of python where necessary for those escape patches you just get a much simpler to operate system a much more performant system and then sort and then much more you know easy to govern and manage system so your ip team wants you as well and then sort of who's who's who's not going to adopt that right but what when we think about the sort of let's say like graduation from the analytics side in a centralized store that's sql based to serving in real time are you seeing sort of the need for real-time flow out of that, right? So you build on the analytics stack and then you sort of graduate, say, into the need for the real-time use cases,

Starting point is 00:37:51 as you prove out value, realize additional opportunities, et cetera. Number one, are you seeing that happen? And then two, it sounds like there's still actually like a pretty gigantic gap technologically moving from that. Even if you have that foundation really tight in the centralized store, actually moving to serve that stuff real time is non-trivial if you're just based on the centralized store. Justin, do you want to start or? Well, I mean, this is right up your alley, but I think that there's a huge gap.

Starting point is 00:38:26 So they're between us currently, but let me let Willem talk to it. Yeah, I mean, there is a gap there. Well, there's a lot of challenges just from the, it's so heterogeneous out there, the infrastructures. But what we see is teams

Starting point is 00:38:39 starting with the centralized stack, the kind of dead warehouse, proving the value of a use case in batch if they can. So that's like phase one. Phase two is often you're shipping that data into some kind of production environment, a static copy of it, or data or a model or something from that.

Starting point is 00:38:56 And there's a freshness problem in that case, but you can respond at low latency. But often, in most cases, you've got a product that's operationally running in real time, and you've got an event stream that's coupled to that, and that's managed by engineers. And so if you really want fresh

Starting point is 00:39:14 and a real-time system with fresh data and models that can depend on that data, that's kind of the value that a feature store provides. It unifies these, like the offline and the online world. There's a big technological gap, and I think that is part of the problem that we're trying to solve with feature stores is you know how do you go from kind of like the the offline training batch world into the online real-time world in a consistent way because the model needs to move between the two but you know teams often have a hand over there so that's one there's a technical challenge

Starting point is 00:39:44 as well as an organizational challenge, right? You've got analysts creating features, perhaps, or data, and then data scientists improving those as features, training a model and shipping that into production, handing it over to engineers. And so there's a lot of teaching between the two of them.

Starting point is 00:40:00 How do you actually interface those? And so the tools we're hoping can make that easier for folks. Super interesting. How do you actually interface those? And so the tools we're hoping can make that easier for folks. Yeah. Super interesting. Yeah, no, I think this is one of the big unsolved problems. That on one hand, we have this great data foundation, but the real-time use cases,

Starting point is 00:40:20 kind of real-serving use cases are just hard to do. There's just a glimmer that maybe they'll be possible to do on a single platform with things like stuff like hybrid tables, but it's so, so early there. And then you end up with these, I know you end up with these two worlds. And as soon as you end up with the two worlds,

Starting point is 00:40:37 you have to do this complicated dance between them where you're moving data from your batch environment to your online environment. And then maybe you want to actually move your online environment to your batch environment so you then maybe you want to actually move your online environment to your batch environment so you can clear which direction you always want to go. Actually, people take different approaches where they start with the online and log to the

Starting point is 00:40:52 offline, or you take the offline and move it up to the online. And so all of a sudden you've got a fair amount of complexity. And then that's obviously what motivates tooling around this feature stores to be built. Do you feel that this is like more of like a technology issue? Like, is there like technology missing right now?

Starting point is 00:41:11 Or is it an organizational issue? Because what I hear like from William is like, there is like a choreography among like many different also like people and probably also like departments and like trying to like to make this happen. And there are like feedback loops that needed to be there that probably, I don't know, maybe they include even a broader set of people, right, in the organization. So what do you think is like the main challenge right now that the industry needs like to address?

Starting point is 00:41:39 So that's almost two questions, but I'll say that, yes, certainly, imagine you're a data scientist, maybe more the data scientist that's the initiator of a machine learning project in a company. The amount of teams that you need to interface with to get into production is high, right? It's not just the team with the API that's going to integrate with you. It's the data platform team, and where you're going to run your training pipelines.

Starting point is 00:42:03 Maybe there's an ML platform team, and they've got something purposeful, but unlikely. There's a team that maybe wants to look at, you want monitoring for your system. So you need to speak to a team about monitoring, like an SRE team or a DevOps team. There's a security compliance team. There's the operations team

Starting point is 00:42:18 that actually speak to the customer on the street. And so there's so many stakeholders to manage. A lot of data scientists become more product managers and we've not made it easier for them as an industry to just get into production. And so that's kind of what the vendor is trying to do with tools is provide a gateway, a portal into this solution that's being built for each one of these groups so that it kind of like, one person isn't, or one group isn't responsible to go and interface with everybody and kind of, because the point just was making is, it's essentially a loop, right?

Starting point is 00:42:51 So there's a kind of like training, serving, prediction, data collection, logging, storage, and then transformation loop that's end to end. And so many teams are involved there. And, you know, we're trying to just make that easier to address through tooling. Yeah, I think it's, you know, it's not, my view is it's not a recipe for long-term success. If you have a significant amount of coordination to do each sort of job to be done that you as an individual gets assigned. And so if you have to talk to all these different teams and hold a meeting and try to understand their systems and may understand your systems and your needs, it's just a recipe for things going very, very slow. And I think there's basically two ways to solve that problem.

Starting point is 00:43:32 The one way is to have extremely well-defined interfaces between these different services where you don't really need to talk to the other team to use it. So if there's a monitoring system and you just use it and you don't even talk to them, they've exposed to you those interfaces. And yeah, that's sort of the Amazon model right every small team and then clean interfaces right everything's api first right that's kind of their innovation model the other way is to you know try to find something where it is a little bit more end-to-end where a single person can do more right but there's only a certain amount of complexity that an individual can can can put in their mind so you have to reduce the you know complexity very very dramatically and so i think and both of those are challenges because i think as you think about the interfaces can put in their mind. So you have to reduce the complexity very, very dramatically.

Starting point is 00:44:07 And so I think, and both of those are challenges because I think as you think about the interfaces, the abstractions are not always obvious, right? We're very much evolving, right? So we're coordinating in part because we're trying to figure out, hey, what does everybody need for ML, for production ML? And then likewise, end to end,

Starting point is 00:44:21 it's often hard to find the abstractions there that don't box the user in more than they want. So it's kind of a trade-off there. I think that's how I see people navigating it. A good example of some of the challenges here is if you've got an Android or an iOS app and you're making some kind of prediction,

Starting point is 00:44:38 you want to track what action the user takes based on some kind of personalization, perhaps. Often that requires that mobile team to go and develop some kind of personalization, perhaps. Often that requires that mobile team to go and develop some kind of custom logic as part of their mobile application in order to collect the data that ultimately goes back into your

Starting point is 00:44:54 experiment. And so there's all these little subtle areas in which you need to interface with teams to just get end-to-end. And so I think the abstractions are still being, you know, those ages are still being, you know, there's, there's ages of still being cut. And I think that's the key problem to be solved.

Starting point is 00:45:10 Yeah. Do you see also space like for a new role? Because William, you mentioned like the data scientist, like turning into a kind of like PM at the end, like trying to monitor all these relationships there. Do you think that there is like a need? William Duggan- Yeah, I see, I see three roles. I see that the research scientist, you know, the person that's taking two years to write a paper and is using the data in the company or organization to do that. Then I see the MLE, hands-on, goes end-to-end, you know, builds this thing and gets into prod and maybe even is on call for that. And then I see the DS that becomes the product manager, essentially.

Starting point is 00:45:47 The center point and all the spokes or the star emerge from him towards all the stakeholders. And he or she owns this use case. I don't think there's really a need for a new role, but there's essentially archetypes that we have seen out there in the wild. Mm-hmm. Yeah, the one thing I might differ a little bit is, essentially archetypes that we have seen out there in the wild. Yeah, the one thing I might differ a little bit is,

Starting point is 00:46:13 I think ultimately for AI and ML to become sort of widely adopted, it needs to be put into the hands of more users, and that includes product engineers. I don't see any reason why a product engineer cannot build an ML-powered feature in the long term, or an analytics engineer, right, maybe has more of a DBT SQL background, can't build a production ML model. Production, in production or an analytics engineer, right? Maybe has more of a DBT SQL background, can't build a production ML model. Production, in production, no handoff, right?

Starting point is 00:46:28 An in-production model that's continually maintained. That should be, in my view, it's like anybody who's sort of with that, that's building production systems, right? From an analytics engineer to a data engineer, to a machine learning engineer, to a data scientist, to a product engineer should be able to do that. I do think the research engineer is sort of separate. They're going to be in the weeds doing things in a little bit of a separate

Starting point is 00:46:47 universe. And occasionally for very, very critical systems, maybe that's only the domain of the ML engineer. But I generally think that if you're building a product, there's going to be more and more use cases and more and more systems that enables somebody who's more of a product engineer, right?

Starting point is 00:47:03 Personalization. It feels like a product engineer plus a data engineer should be able to get that job done if the tools exist. And then I think the ML engineer will also love it because they'll be able to do either different work or deeper work or more work or have more impact if every single use case doesn't require sort of this like deep, deep, you know, end-to-end experience of infrastructure and ML, which is what the ML engineer today needs. I agree. So as we see the industry kind of commoditize, the engineers have a much higher leverage. And then as their problems get solved, the data science problems get solved, and we commoditize that whole layer.

Starting point is 00:47:42 Ultimately, the product engineers or the product folks even will be the ones building these ML solutions. So that's definitely, I'd say, the group that will attack the long tail of use cases. I think if you look at like a, you know, like a Reddit or maybe like a Facebook, maybe your key recommendation system or your key ML use case, that'll always be custom built, maybe like Google search, that'll be custom built. But, you know, there will be a long tail of use cases that the product teams can build themselves using, you know, some kind of solution that's perhaps central around your data warehouse and, you know, with abstractions that they're familiar with already. Yeah, super exciting.

Starting point is 00:48:21 That makes a lot of sense. So, all right. You mentioned like quite a few times, like this problem between like the bots and the real time and like the low latency requirements that ML has. What are like, let's say the best patterns like are utilized right now to bridge this gap and for people people to productize these models. How does it work and what's the state of the art in the space? Do you mean for a specific use case or for...

Starting point is 00:48:56 Oh, there are the chains based on the use case? That's become even more interesting. Well, I think the truth is at the foundational level, it's all about the data, right? And so that's why we started at Tekton with like feature stores and providing a way for you to craft data and, you know, features that will power your models. I think downstream from that, it's really very specific to your use case, extremely specific. So, you know, if you're building recommendation systems, it's very different from, and Tristan said earlier, subtly different from fraud detection or churn. But I think from a data transformation and organization perspective, feature stores, at least, or tools in that layer of the stack, even DBT, provide a lot of value. They provide you ways to go 70, 80% of the way

Starting point is 00:49:46 by crafting the features that ultimately power your models. And often you can experiment and see performance of models offline before you even go live, right? Depending on the use case, that may not be accurate always. So you typically do want to go live yourself, but it's very hard to answer that question in terms of without diving into specific use cases. Um, and even then it's okay.

Starting point is 00:50:09 It's so different for, from customer to customer. David Pérez- you know, this, this is kind of going into the weeds, although I'm curious on Willem's viewpoint on this is I do think there, there are two patterns. If I look at the feature sort of landscape, there are two patterns that I see being adopted. One is what you could call like an online first approach. And so if you look at how, you know, Facebook describes its, you know, feature store environment,

Starting point is 00:50:31 if you look at how YouTube for what, you know, describes their feature store environment, which are massive, you know, scales where they're generating tons and tons of data, they don't have, you know, a lack of training data, for instance. They tend to adopt a more online approach where you generate online features and then you log those features out and you kind of wait around to collect the training data based on the new feed. You kind of deploy them first to online and then you log them and then you train your models off of that. I think there's a different approach which takes more, which kind of puts more emphasis

Starting point is 00:51:01 on the ability to backfill data and sort of generate features and then kind of generate training data going back in time. And that introduces a fair amount of complexity, which tools like, you know, Tecton and Feast, I think, you know, and traditional feature stores solve, which have this backfill. It has a different sort of architecture and a different set of trade-offs. It does probably get to, again, your use case. But I do think that that's, you know, something that I'm watching very closely is, you know, sort of how that unfolds, those two different architecturally. They have big impacts on your architecture. They do seem to not be a golden bullet or a silver bullet.

Starting point is 00:51:57 In the case of the login weight, if you don't have high traffic and a lot of volume of users, it takes a long time to collect the training data. So if you ship a new feature, you need to log that. Maybe it takes you two weeks. If you don't have a lot of traffic, maybe it takes you two months. And so your iteration speed can be slow. If you're Google, maybe it takes you minutes, right? And then, you know, the other side,

Starting point is 00:52:14 there's architectural complexity to, you know, the original, the traditional architecture feature store because you have the offline and online worlds. What I'm excited about is technologies like Snowflake and others where, you know, they have real-time ingestion and hybrid tables. And you have stream-centric platforms that are being developed that could potentially consolidate these two worlds. But even today, companies like Tectum also have logging architectures, right? If you're interfacing with an API, for example, for features, for values, let's say you're calling some API to get some data about a customer or transaction,

Starting point is 00:52:51 like a credit card company or something. The only way to deal with that is to log it out for training purposes later. You can't use that ahead of time. You can't query them in bulk offline. And so even today, Tekton is kind of like in a hybrid state. But I think over time, the log and weight architecture does have a lot of appeal. Super interesting.

Starting point is 00:53:14 Well, we're close to time here and I want to leave plenty of time for questions. So please write your questions in. I'll start out with one here, which both of you have talked a little bit about this, maybe in general, and you're both building really cool tooling in the ML space, but one of our listeners wanted to know, what are some of the,

Starting point is 00:53:39 maybe just pick one for the sake of time, sort of the most exciting thing that you're seeing in the ML space specifically? I mean, you don't have to name a tool if you don't want to, but as builders of these tools, what excites you most that you're seeing? Well, I can start on this one. I mean, two things.

Starting point is 00:53:59 So obviously what we're building today, which is sort of a declarative approach to operational AI. So I just think there's a tremendous need for higher level of abstractions for production machine learning. And I think, you know, we're trying to do that continual. I think there's, you know, I get super excited when I read about, okay, you know, Facebook has this thing called Looper, you can read the paper there. A really exciting example of an end to end declarative platform for real time machine learning. Apple has something called Overton, which is really exciting for more natural language processing use cases. Stitch Fitch just had a great blog on their system. So for me, it's just like taking, I view it that

Starting point is 00:54:33 there's a generation one, maybe Uber, Michelangelo is the canonical OG of that example where you're all the different components, totally makes sense. What's the next step? And so I'm a super, and I think we're starting to see that coming out of both startups and coming out of sort of the hyperscalers who are kind of onto the next thing. The next thing, which I think you can't ignore is the foundation models, large language models, you know, the things that open AI are doing. I do think we're, I am very bullish on this being sort of a new chapter and it's very unclear what's, what it's going to

Starting point is 00:55:05 unlock but it's starting you know you're even starting to see some commercial successes for use cases and i think it's i think it's going and moving forward at a tremendous speed and it uh not only is going to unlock a whole new set of use cases but actually there's a whole new set of tooling concerns that you're going to have to address too. So it's unclear what the developer tooling ecosystem, what the data management tooling ecosystem is going to look like for these extremely large language models. And so I'm really excited by both the use cases

Starting point is 00:55:32 and the tooling for these large language models. Yeah, I want to kind of echo the point that Tristan made around Looper. It's an extremely exciting direction that I think, you know, if you look at what has happened over the last couple of years, large tech companies have innovated and the market has commoditized those technologies or approaches. And what we see from Facebook or Meta is this

Starting point is 00:55:58 platform that's declarative and very focused on the product engineer. I'm super excited about that and the simple abstractions that address ML use cases. So I think it's really about the persona that's being addressed here. And so it seems like we're moving on to the product teams a little bit more. So I think, yeah, that's the key thing that

Starting point is 00:56:19 I'm excited about. Very cool. All right, well, we're going to try and sneak one more in here. And it's about open source. So, you know, open source and software in general is a really interesting topic. But this is a really interesting question specifically to ML. Is open source even more important in ML, right?

Starting point is 00:56:44 Because there's a lot. ML can sort of have this flavor of ambiguity around it for people who aren't necessarily close to it. How important is open source in ML? I think when this industry is a little bit wild, wasty, it's more important, especially because we spoke about the abstractions. And if the abstraction is not perfect, then you're stuck if you're not using an open source tool, right?

Starting point is 00:57:11 We see this a lot. For example, in Feast, if you want to use a different database as your backend, how do you do that with a vendor that doesn't support it? You need to wait. With Feast, you can kind of plug in your own backend store. Long term, I don't, the jury is still out whether it is necessary to have open source as the delivery mechanism for the functionality. There's a use, or there's a certainly a lot of companies, especially if you look at the modern data stack, that have proven that you can solve a whole class of problems with a cloud-based solution.

Starting point is 00:57:48 So I think for, as we said earlier, the long tail of use cases, the jury is still out. Yeah, I think we're going to see a similar transition to what happened in the data sphere where managed services, especially for infrastructure, for the infrastructure of ML, where there's sort of stateful services that you need to manage, it's going to move towards people want fully managed services that they can just use and they can kind of get their job done. And once those services become good enough, once they sort of trust the abstractions, the, you know, the SLAs, the company itself, it's going to be just so obvious that, hey, just use these, use these, use these vendors. I think you see that, you know, weights and biases, you know, in experiment tracking.

Starting point is 00:58:25 It's just like, yeah, just go use weights and biases. I mean, pay 50 bucks a month. The value you're getting from that is amazing if you're looking for a way to track experiments. And it's not an open source product, but it's very much targeting

Starting point is 00:58:36 the ML developer crowd that you'd think would be kind of the most open source friendly. You know, I'm still hopeful that open source remains a huge part of ML, like, you know, in terms of both open publishing, open libraries, like the core libraries behind ML algorithms.

Starting point is 00:58:50 And I think, you know, I think those will stay open source for longer. Although even now with the, you know, I think I would have said these algorithms would stay open source forever. become a little bit unclear whether the sort of the open approach maybe hugging face is the best example of this it's going to win versus you know that these hyperscalers are going to release these models that are proprietary but they're just going to be so amazing that you're going to kind of get your nose and just use them and then there'll be enough competition in the marketplace that you know you're not going to get held hostage right so you're going to feel like hey no no big deal you know i can always switch between google or Microsoft or OpenAI for these large-scale models. So we'll see.

Starting point is 00:59:28 Yeah, super interesting. All right, well, we are at the buzzer. This has been such a helpful conversation. Tristan, Willem, thank you so much for giving us some of your time. Super helpful for us and for our listeners. Thanks for having me. Thanks for having me as well.

Starting point is 00:59:45 It's been my pleasure. Costas, I appreciated so much the honest take that both Tristan and Willem had on, I'll say maybe the gap between the promise of ML and the reality of MLOps on the ground for people doing the work today. And, you know, both of them had very strong feelings that it's a pretty gnarly space still.

Starting point is 01:00:18 And there's still a lot of things that are really hard to do. And, you know, that was just really refreshing, especially. And, you know, that was just really refreshing, especially to hear, you know, they're both founders of ML products. And, you know, I think at one point, Tristan called, you know, part of the ecosystem horrifying.

Starting point is 01:00:38 You know, so I just appreciated that take. And I think that's very helpful, not only for our listeners, you know, but for us just to realize, you know, there's, there's tons and tons of promise out there and companies like Continual and Tekton doing really cool stuff, but you know, we're still in really early innings. Yeah. I think there, there was like a wealth of updates on what's going on out there.

Starting point is 01:01:07 I think it's good that we hear that Ops in general are still like an unsolved problem in ML, and I think it makes sense. Like, you know, we personally like to come up with the technology and then like you figure out operations around that and obviously like in And obviously, ML, there are similarities with software engineering, but there are also differences. So we need probably different tooling or different methodologies. I don't know. But ML needs to mature enough to get to the point where you can say,

Starting point is 01:01:39 okay, operations is what we should care about. And there is plenty of implications that this is happening right now. That's what I hear from the guys. I also keep some stuff about what is needed out there, just from feature stores, like the products that these guys were building. But there's like broader need, like even like database systems need like to come up with more innovation in order to solve some of the ML problems, right? Like what we were talking about, how to serve the models and like the features for the models and these hybrid new database systems like Snowflake and how

Starting point is 01:02:23 this can help with that, all that stuff. Still early, but there's a lot of innovation that needs to happen and that's from some angles, like pretty basic innovation also, like very in the very deep of the infrastructure that we are using, like at the database and storage level even. So it's a very exciting space. Like I would be more than happy to, I don't know, like be one of these folks out there that they build products and companies in this space. So anyone who thinks about it,

Starting point is 01:03:01 I think they should go and give it a try. Absolutely. And Costas will build an ML startup name generator, AI driven, of course, to help support you in your mission. I don't like projects.

Starting point is 01:03:18 Alright. Thanks for joining us for another live stream and we'll let you know when the next one's coming out. Catch you on the next show. We hope you enjoyed this episode of the Data Stack Show. Be sure to subscribe on your favorite podcast app to get notified about new episodes every week. We'd also love your feedback. You can email me, ericdodds, at eric at datastackshow.com. That's E-R-I-C at datastackshow.com. The show is brought to you by Rudderstack, the CDP for developers.

Starting point is 01:03:54 Learn how to build a CDP on your data warehouse at rudderstack.com.

Pet Camera - EBO Air 2

The Data Stack Show - 101: The Future of Machine Learning with Willen Pienaar of Tecton and Tristan Zajonc of Continual

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.

Your Ad Here

Pet Camera - EBO Air 2

The Data Stack Show - 101: The Future of Machine Learning with Willen Pienaar of Tecton and Tristan Zajonc of Continual

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.