The Data Stack Show - 101: The Future of Machine Learning with Willen Pienaar of Tecton and Tristan Zajonc of Continual
Episode Date: August 24, 2022Highlights from this week’s conversation include:When is it right to use ML? (5:22)ML business models (10:21)Significant changes in delivering ML (19:07)Why ML is different (25:19)SQL becoming more ...important (34:39)Graduating from SQL-based to real-time (37:22)Space for a new role (45:11)State-of-the-art models (49:03)The most exciting thing in the ML space (53:59)Open source in ML (56:39)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
Transcript
Discussion (0)
Welcome to the Data Stack Show.
Each week we explore the world of data by talking to the people shaping its future.
You'll learn about new data technology and trends and how data teams and processes are run at top companies.
The Data Stack Show is brought to you by Rudderstack, the CDP for developers.
You can learn more at rudderstack.com. Welcome to the DataStack show live stream where
we are going to talk all about machine learning with two brilliant minds, Tristan from Continual.ai
and Willem from Tekton. Costas, what do you want to ask these two super smart people about ML. Oh, they have both been in the show before.
And it's been a while, I think, since we caught like the episodes with them.
So I'm very curious to see like what has changed.
Yep.
So it's time for a catch up.
Let's see what has happened in the ML space since we talked with them.
And I'm pretty sure
like many things will come up, like they're both two very bright and very
knowledgeable guys in the space.
So I'm, I'm sure that we will have surprises.
So let's go and chat with them.
Let's do it.
Welcome to the Data Stack Show Live.
This is going to be such a fun conversation.
We've been excited about this for weeks, maybe even months now.
And we're going to talk all about ML, current state, future, and we have two of the best
minds that I know of here to talk about that.
So let's start out with some intros.
Tristan, you want to go first?
Hey, I'm Tristan.
Glad to be here, and thanks for the invite.
Great to be here with Willem,
whom I've admired for a long time.
So my name's Tristan.
I'm co-founder currently of a startup called Continual.
We're building an operational AI platform
for the modern data stack.
I've been working on ML infrastructure
for the last 10 years.
Did an early enterprise data science platform called Sense,
which was acquired by Cloudera,
sort of the big data company behind Hadoop.
Spent a good number of years there
building their machine learning platform,
which they call Cloudera Machine Learning,
and got to see all the pluses and minuses
of that sort of generation of data infrastructure and machine learning and got to see all the pluses and minuses of that sort of generation
of data infrastructure and machine learning infrastructure.
So yeah, really excited to be here to talk about the future of machine learning and machine
learning infrastructure.
Awesome.
Willem.
Yeah.
Hey, Eric and Justin.
Yeah, it's great to be here.
Yeah.
So my background, about almost a decade in the software and data space. A few years ago, I spent about four years leading the ML platform team at a company called Gojek in Singapore, where we really sunk our teeth into building a complete end-to-end stack for a major data form for a bunch of different use cases. So we really learned a lot there and through that process. And one of the tools we built out of many was Feast, which is a feature store that ultimately
was open sourced and became a little bit popular.
And about two years ago, I moved over to Tecton, a company that focuses purely on feature stores
and has an enterprise offering where I continue to invest in both my time in the open
source side, as well as the enterprise side, building out the feature store
technology and the whole category.
Leo Dionne Jr.: Love it.
And cannot wait later in the conversation to talk about open source, especially
as it relates to ML stuff, I think it's a really fascinating topic, I think.
Let's, I want to kick it off with a question
that I think a lot of our listeners have faced
in implementing ML at their companies.
And the context behind this is that
we've had tons of guests on the show
who talk about things like overuse of ML,
misapplication of ML, you know, sort of like we're, you know, the data science team was
throwing ML at any problem that moved and, you know, that created problems, et cetera.
Or situations where it's sort of the inverse, right? Where we spent a lot of time trying to solve this problem and ultimately we realized that it would have been way better to sort of, you know, use infrastructure, you know, both inside of companies for companies
of building tooling around ML infrastructure and seeing it live on the ground.
Would love your perspective on when is it right to use ML?
I know that's a sort of a broad question, but what are the conditions? And we'd just love, even for our listeners who have sophisticated ML functions running at their companies, what are some of the signals of it being a really good use case for ML?
So, Willem, do you want to start since Tristan did his intro first?
Yeah, I mean, I think for me, there are two classes to this.
There's use cases that are well-trodden paths that are already established within the space,
like in the market.
You can think of like recommendation systems
or fraud detection or churn predictions.
And there are more experimental,
more moonshotty projects.
And even at my time at Gojek,
we saw this a lot.
We saw teams that would conjure up
the totally new use cases that you can never think of and say ML is required here. So typically,
I'd say if you're thinking of introducing ML, if you're using or if you're attacking an existing
use case that is already established in the market, you probably can quantify the impact of
that before you even start.
You know how many users you have, you know what traffic you have.
You can probably get a back-of-the-map estimate of what the impact would be just based on
the amount of companies and teams that have already built those systems.
If you're entering a kind of moonshot-y space, then I think it is a little bit more dangerous,
especially if those spaces have
already
existing techniques, like let's say
you're in banking or finance and you can use
SQL, you can use R, use something else
or simpler techniques.
In those cases, I think
it's a little bit different.
I'd say try and quantify
the impact ahead of time
and steer clear of the moonshotty or new types of use cases because there are not that many major ML use cases that are being discovered every day.
A lot of the top ones are already out there.
Super helpful.
Yeah, no, I would agree with everything Will said there.
I mean, I would say on the last point,
I do think we're entering an era
where there may be some additional use cases
that people are going to discover
with these foundations,
these large language models that are being developed.
And so it does feel like,
and then there is some early examples of this.
Like if you look at, for instance, GitHub's Copilot,
like what's the revenue from that?
It's actually quite significant, very, very fast. But historically, I actually
wouldn't say that wasn't the case. There weren't a lot of examples of that over the last five years.
There's maybe a few examples now, and it's very, you know, it's hard to find those new use cases.
So definitely would agree with that. The other thing I would add here really is it's highly
contingent on how difficult it is, right? So, you know, I would recommend if you're going to build a large
language model from scratch, that's going to be incredibly difficult. Like don't, you know,
unlikely you should do that. You're going to spend tens of billions of dollars to do that.
On the other hand, if you can use an API experiment, you know, in sort of minutes and to
see if you can get some interesting results or change your product in an interesting way, absolutely go do that. And so I think the same thing,
I think that applies to sort of all ML use cases. As the difficulty
of doing the use cases goes down, there's an opportunity for
more users, more use cases to implement it or sort of the ROI to be positive
for that. And so I think both
Willem and I are working on systems that are
essentially trying to reduce the complexity of doing ML, and so that will make it
more of the ROI positive in more domains.
Henry Suryawirawanacke... Guys, I have a question.
Do you think we can go through like some of the most common use cases that you see
out there that have been traditionally like tackled through ML and do it like in a very like
pragmatic, let's say, way because you know, like most people, especially people
who don't work in this area, like when they hear about ML and AI, they think
about like self-driving cars, like language models,
like automated generated
art and like all these like very
fancy stuff that we see that's like really
at the state of the art right now in terms
of like what ML models
can do. But machine
learning, it's nothing new, right?
Like it has been used for a very
long time
and there are like very
concrete use cases with a very concrete like business value out there. I think
there is like value to just like go through like the most common use cases
out there so people can like relate to that, right?
David Pérez- And I will add on to that a little bit just to maybe add a
little bit of spice. I've kind of had this theory for a
while that you can boil business models down into sort of a durable set. So if you think about
something like e-commerce, when you think about ML in the context of a purchase flow, you probably
know a huge amount of what's already known about that. And it's just changing variables. And so
it's really interesting to think about some of that stuff is probably just already known, like the work doesn't even have to be done. So anyways, just wanted to add that little
component onto it. Yeah, maybe I can jump in. So I think from at least my perspective,
I'm a little biased because the types of customers we have does
slant more towards like line of business.
You know, it's like e-commerce or, you know, it's potentially banks or it's like ride-hailing
companies, those kinds of customers.
Less so, you know, like customers doing like language models or, you know, image or video.
And so we're not really focused on self-driving cars
and more like that kind of leading edge of the space.
So what we see a lot is the top two are definitely recommendation systems
and fraud detection systems, primarily because we're focused
so heavily on real-time, bad text on.
And my past experience at Gojek, that's also a focus area for us and so i'd say those are
the two ones and of course churn prediction and optimization um or two other big ones so
at gojek for example we you know predict churn for users we would identify a cluster of users
that are high risk and then we would send out vouchers to them and make sure that they're happy and they
have a higher retention.
And then, of course, pricing and personalization of your product towards a customer is also
another area that is a little bit domain specific, but also a very common use case that we see.
And it can go from batch to real time depending on the kind of use customer yeah i mean i sometimes think of it as like three main categories one you know building
fundamentally new products and services that are like only possible because of the ai that's
embedded inside that's self-driving cars right self-driving cars example you know alexa is an
example of that you know series an example of that. So these are products that could not exist
if you didn't have this underlying capability.
That I would say is the minority,
but could actually be the most transformative
over the next 10 years in terms of what is possible.
I think what we certainly see the most at Continual
and in my previous roles were really twofold.
One is improving existing products and services.
So for instance, personalization is just a no-brainer for an e-commerce store. To do that at various parts of the customer journey from search, re-ranking to sending personalized emails to what's on the homepage, that's huge. And it makes a direct revenue impact on it. There's lots of micro, if you're a hyperscaler like Facebook, there's lots
of micro optimizations that you can do. Who exactly do you show? What image do you do? And
then what text should you show as part of that image? And what friends should you show? And so
all those little ML models are feeding into a product experience. Some of them may have
relatively small effects, but you're a big enough company to do them.
The third set of use cases, which we see
a lot at Continual currently, is all the ones around business operations. And so if you think
about a retailer or you think about a manufacturing company, they don't have a customer-facing product,
but they do have an immense business where there's opportunities to make predictions.
And typically, we see two main classes in this category. One is around
your customers. So things like churn, right? Lead scoring, upsell opportunities, what more on a kind
of a, not around real-time basis. So everything to make predictions about your customers.
And the other one is around operational use cases, things like inventory forecasting,
supply chain optimization, logistics optimization, lots and lots of, if you're at a certain scale,
then doing those optimizations become important,
particularly when you're in a competitive industry with low margins.
And so that's sort of a final set of use cases.
And I see a lot of those right now.
Stas Milius Ivovitch.
So guys, that's awesome, first of all.
Like, it's super helpful for me too.
Like, I'm always trying a little bit, like, to enumerate all the different
use cases around the mill, but like what you said, like makes like total sense.
And why we don't have, let's say, churn as a service, churn prediction as a service,
or why don't we have, let's say, I don't know, recommenders as a service, right?
And we need, let's say, companies to go and invest in infrastructure for ML and
like building on top of that infrastructure, like all these models.
What's the reason that we haven't seen like a market like this and we lean towards, let's say, a world where like people have, like companies have like to do their own and maybe their infrastructure to build these models?
So maybe we should take turns.
Tristan starts one first, but I'll take this one quickly this time.
I think there's an aspect of like, it is actually happening.
We are seeing vertical products for ML being built.
We are seeing AWS personalizes purpose-built for Rexes, right?
And there are fraud detection vendors out there.
And so there are off-the-shelf tools that you can use.
There are shortcomings to them, and there's risk to them because they're typically not completely end-to-end. And so you
have integration pains in some cases with those vendors, but they are finding success.
And also a lot of the work that those teams have to do. But I think another one is, well,
another point or aspect is IP. For ML, a lot of companies see the actual system,
the ML system as something that's important
and a competitive advantage to them.
And so they often don't want to outsource that
because if everybody can just use vendor,
then what's your competitive advantage with ML?
You're basically breaking even on that front.
And so they think, okay, we can just invest in this area
and leapfrog our competition.
Yeah, that's super interesting.
What do you think, Tristan?
What's your take on that?
Tristan Marquez- I think the reasons, so I do think that for product use cases, the IP issue is very real, but I think for the, uh, for like the business operations
use cases like churn, there is this question, well, why isn't a vertical,
why isn't it in a verticalized tool?
And I think it's the same reasons why, you know, BI tools still exist, horizontal
BI tools, the data is so diverse and
the questions that you're going to ask are so subtly different. So even when we have customers,
almost every customer that we talk to asks us about churn. And then the question is, well,
what do you mean by churn? Is it churn in the next 30 days? Is it at the end of the contract
duration? Is it a dollar-based churn measure where maybe you have a usage-based churn and
you could have an expansion and contraction? Maybe you have a premium plan and a basic plan and you're trying
to decide on whether the churn is between that. Maybe it's all of these. Maybe it's over different
time horizons. And your business wants to have all of these different predictions. And it's very
hard for a vertical tool to do that, both from an outcome perspective to defining all the outcomes
that you want. You'll do churn and you'll do these variations of churn. Then you'll want to do lifetime value.
Then you'll want to do,
like, are they a highly active user?
Then you'll want to do what product next they're going to buy.
And those predictions that you're going to want
are also going to leverage the same inputs, the signals.
Ultimately, your predictive model
is only going to be as good
as the data that's flowing into it.
And so then the question becomes,
okay, well, what data do you have?
And you're going to want to integrate
all this different data from all these different
sources.
So for instance, we have customers on, for instance, in the Shopify ecosystem where you'd
think, oh, everybody has standardized data.
Why can't I standardize LTV models and standardize churn models or something like that, just
standardized personalization models?
But they have other data, right?
They do have, for instance, on a few in-person stores, right?
Which are not part of the Shopify ecosystem. And so I think that's where you're seeing this sort of like
new modern data architecture where people are one, trying to integrate and aggregate data inside
these kind of cloud data warehouses. And then, you know, on that shared data, build a whole bunch of
shared sort of use cases on top. And BI is obviously, you know, the first one you buy a
horizontal BI tool. I think ML tools,
ML is very similar where you can build and you can leverage that data that's inside your data
warehouse. Yeah, I think that's a very good point. And I think it originally was even a kind of,
this impacts the infrastructure and tooling, and it was worse originally where you'd have ML tooling
that is super horizontal. And even if you're vertical, the vertical tool is also limited in some ways,
but you know, we're not limited,
but it needs to be tailored to a specific use case.
Like if you take, you know,
just fraud detection, for example,
that is a very broad category.
There are different types of fraud
and it all depends on the data model of the company,
you know, if it's credit card fraud or something else,
you know, KYC.
So yes, I's it is improving
though so as we go from horizontal to vertical but there is definitely a problem of customization
and so it needs to be a lower level abstraction than often is produced out there one one follow-up
question on that i'd love to begin just a little bit more on the specifics there.
What specifically have you seen improve,
and how has that changed the process of delivering ML?
At what points in the lifecycle of the build
have the most significant changes happened?
Well, I'll start on this one.
I still, I mean, my honest take is I still think
the DevOps ecosystem is horrifying.
You know, it reminds me a lot of like the Hadoop era.
So, you know, I spent a lot of my, you know,
five years of my life sort of in the Hadoop ecosystem,
you know, in sort of the 2015 to 2020 timeframe.
And it's incredibly powerful.
You can do amazing things with it, right?
Everybody's excited about it.
Everybody sort of, there's that energy behind it.
That same thing applies in ML and MLOps.
And that's all true.
And there's open source
and there's a vibrant ecosystem, all of that.
But then it sort of gets to this point where you're like,
wow, this is way too complicated.
And that happened with the big data ecosystem, right?
Nick Schrock, the CEO of Dagster,
has a saying where he says,
we went from like an era of big data to big complexity.
I'm like, I sort of feel like the same thing
has happened in MLOps.
Now, one thing that I think,
sort of two things that I think are,
that are happening in MLOps,
which I am excited by.
One is I do think that there's a rise of
really next generation best of breed tools, right?
And so, you know,
TechCon might be one around feature stores, you know, you have weights
and biases around sort of experiment tracking.
These are like, you know, good, good tools that are definitely far better than anything,
you know, if even an alternative existed in the past.
You know, I'm also excited by, I think there are, you're seeing in the tech companies,
like sort of little next generation platforms coming out that are, you know, sort of have higher level of abstractions.
So, you know, Facebook just talked about their internal platform called Looper, which is sort of a declarative end-to-end real-time machine learning platform for product decisions.
It's incredibly, you know, they've radically, radically simplified interface that engineers need to use to build predictive features into their products.
And so they can have hundreds of use cases now that are very rapidly implemented
and relatively easy to maintain.
And so at Continual, we're kind of trying to do similar things with Loring.
I still think this, if I talk to any person who's doing MLOps,
nobody says they love what they have is is what my most
my conversations with people who are in the trenches right it's like we get it to work but
you know it's it's not totally awesome yeah i think the best of breed there's kind of two
paradigms in my mind there's end to end and there's best of breed within end to end you've
got the horizontal platforms like the og michelangelo and I guess Kubeflow is horizontal.
And then we have the vertical ones that we just spoke about.
And out of all of those, I think they have different trade-offs, right?
Like the vertical one we said, it may need to be tailored towards the use case and it's
not as...
All the use cases are subtly different depending on the domain.
But then if you've got a best of breed, you've got a different problem where you've got an
end-to-end flow in which you're introducing a single component.
And then as a vendor, you can build the perfect component.
But then if the user still has to build the end-to-end system, that's hard.
And so what we see in the ML op space is it's a death by a million cuts, right?
You have so many decisions you have to make yourself.
How do you do artifact tracking throughout the whole lifecycle and metadata management?
And how do you do experimentation?
Because you're not just plugging these
into like Lego pieces.
And that's extremely, extremely difficult.
And then more so in the basic breed world.
But I do see this like,
there's a divergence and convergence, right?
There's a divergence where folks go away
and build these tools.
And then there's a recognition of the tools
that are best of breed.
And then, you know, you can see all these blog posts coming out of,
oh, you know, dbt works this product
and there's integrations between them.
And more and more they're getting glued together
in a way that makes sense and allows you to chain them
and removes all the decision friction and fatigue
that users have to experience today.
So yeah, we're in a kind of like weird spot right now
in the ML Ops industry space, but're in a kind of like weird spot right now in the MLOps space,
but hopefully we can kind of like power through this
and get to kind of consolidate its modern ML stack.
Yeah, no, I totally agree with that.
I think there is this,
there's this tension between this two.
And I think, you know,
there are a lot of startups that are doing these sort of
best of breed narrow products.
And then they're thinking about the integrations
and trying to swing those integrations. I think the hyperscalers are saying, oh, no, no, we have
these end-to-end platforms. But if you actually look at them, they're a bunch of not best of
breed individual things that you still have to glue all together. So it'd be one thing if they
were saying, okay, no, here's a template to do sort of continually improving Rexis type use cases,
right? Where the models maintain, the predictions are being made in real time, the features are
maintained, right? The whole thing is being monitored. If they were saying,
okay, we make that easy for you, then you might say, okay, I'm going to go all in on an end-to-end
platform. I think the challenge right now is if you look at these platforms, they basically are,
right, a bunch of different components that, you know, then you sort of have to kind of glue
exercise to the reader to glue them together, and they all have these stack diagrams that look
crazy, you know, crazily complicated.
I'm definitely hopeful that there will be sort of end to end approaches
that make it very, very easy to implement use cases,
but that don't don't expose all that complexity to users.
I don't think it means end to end.
OK, we're one vendor that has 10 different products and you put them all together
yourself, I think you have to say what's the end that you're trying to achieve? Maybe you have to narrow
yourself into a domain, right? Maybe you have to narrow yourself into a domain like personalization
or real-time machine learning or sort of continual batch maintenance in your data warehouse for
business use cases, right? And then you can build, if you can narrow the scope, maybe you can find
the right abstraction that makes the end deliverable easier to achieve while, you know, successfully.
Basically.
Henry Suryawirawanaclarenailatil.com
So I hear you talking about like ML Ops and I'd like to hear from you about
what like differentiates the operations around like the infrastructure that's
ML has compared like to the rest of the infrastructure that ML has compared to
the rest of the infrastructure that the company has.
I mean, Ops is a very big topic.
We've been investing and coming up with new tools all the time.
And there are some amazing things that are happening when it comes to DevOps,
for example, and all the platforms and new tools out there.
Even in data, big data, as you said, like we started like with big data, it became
like, like a big complexity to manage.
And there's like a lot of improvement there too.
Like for data engineers, like to simplify like operations
and work more efficiently.
So like why ML is different?
Like what do we need?
What is missing?
Well, effectively you've got a, you know,
data driven decision system. And so there's inherent complexity
about making decisions in your company that have an impact on
your bottom line, with a system that is, you know, making those
decisions based on data, whether it's ML, or like a regression
model, or whatever it is. And so you can't do something like have a test oracle
that just says, okay, this thing is good to go,
ship it into production.
You never have 100% confidence.
And so you need ML-specific infrastructure
where there's experimentation systems or monitoring systems
that can track the outcomes and compare that to predictions
and make those things obvious to your end users.
And I think those areas are still a little bit nascent today,
the appreciation for that.
What we see a lot of companies do is they identify the P0s,
the critical things that they have to do to get a model into production
and get an API up and maybe get it serving traffic,
but they don't have the rest of the story around that,
the monitoring and experimentation.
And often this ties back to the problem of when to use ML and when to not use ML.
If you didn't quantify this ahead of time and you didn't perhaps start with a non-machine learning model that you could A-B test against your machine learning model, then you're almost
doomed to fail. But yeah, the summary is there are inherent complexities with ML
if you're basing decisions of your organization around data.
Yeah, no, I agree with that.
I mean, I do think that right now there's two siloed stacks.
There's this sort of machine learning stack that honestly feels to me
more like it's coming out of the sort of that Hadoop era,
sort of like the next gen.
Maybe it's a little bit more cloudy or something,
but it's kind of got that feel to it.
And then there's this analytics and there's more analytics oriented stack, which is very
much centered on SQL and the data warehouse.
And then there's, you know, a whole ecosystem around that from job orchestrators to data
quality and monitoring tools.
There's a whole ecosystem of vendors, huge vendors around data observability and monitoring.
Then as far as I can tell, I haven't like at all looked at the ML monitoring and observability and monitoring. Then as far as I can tell, I haven't at all looked at the ML monitoring and observability
use cases.
I do think that there will be convergence
of these stacks.
I think we will converge onto these
hyperscale data platforms. That's where
the data is going to primarily live.
I do think that there only needs to be
one job orchestration system. You don't need
two job orchestration systems, one for ML
and one for
the rest of your data engineering. At least if you're going to build all these things job orchestration system. You don't need two job orchestration systems, one for ML and one for
the rest of your data engineering, at least if you're going to build all these things yourself.
I think it's interesting, is there a convergence of monitoring? Because the use cases around ML monitoring is very different, and the traditional data monitoring companies are not building the
features that an ML monitoring team would want. It does feel like they're kind of separate, and there's things that you would want over in one
area that you might want in the other area. So I think it'll be interesting to see how they
converge. There are unique challenges, though, I think, to Willem's point, making real-time
decisions, right, where you have real-time features, you have historical training of
unpredictable models and non-deterministic outcomes.
You don't even know the outcome of what you're doing until you deploy it into production potentially, right?
What's the product impact?
You might actually, the machine learning metric that you can measure during training might
not actually be the business metric that you care about.
So you have to run an A-B test and kind of do this sort of roll out of A-B tests.
Thinking about that is a very sophisticated, on the scale of sort of data products that you can build, I do think the machine learning
products are the kind of most challenging type of products because they really
require you to think about all of these concerns and then the continual ongoing life cycle
of those concerns, one and done type situation typically.
Yeah, the point that you raised about the analytics stack and the ML stack is also a very valid one. I mean,
it's clear to me that there is a
yearning for simplicity
architecturally within companies.
And so that's part of the
appeal of the modern data stack is that you can just
shove everything into your BigQuery or
Snowflake or other data warehouse or lakehouse
and centralize everything
around that, right? Ingestion,
transformation, reporting, etc.
I think the challenge with ML is, of course,
you're making real-time decisions in a lot of cases.
And so there's a kind of a philosophical gap there,
organizational friction where you've got
data warehouses built in a way that is,
perhaps there's no staging,
there's no staging production split,
but engineers demand that.
And so they are wary to use the data warehouse as this kind of like interface or source of truth for production.
But at the same time, you're seeing teams, you know, ship ETL pipelines with models in them for batch use cases, perhaps.
And so there is a bleed over between those two.
And I think long term, we'll see a consolidation there, just because there's a lot of, you know, pressure towards having a single system that you store your data in not a bunch of data islands maybe one or two ways
maximum that you want to transform your data it's only if you really need to have for example
an etl system perhaps streaming or you need to have on-demand or real-time transformations that
you pulled out the data but people want to have a single place where they do something in a single way.
And I think a big part of that's education as well.
If you've got a workforce that's not being taken from traditional roles into analytics or data, and now you're also bolting on ML into that,
there's a lot of retooling and reskilling that's happening and you don't want to overwhelm your workforce.
Yeah, makes total sense.
Do you feel like this convergence has started already?
Definitely, yes.
Are there some like examples like with the technology out there that's like demonstrate
this convergence happening?
Yeah, I mean, we're just seeing companies like Snowflake and BigQuery growing in adoption.
And teams rightly starting with tools like dbt for machine learning even.
And then as they need fresher predictions or low latency predictions, introducing more real-time elements to it. Mm-hmm.
So, so, so yeah.
Yeah, no, I think there, that, that absolutely, like there's a convergence towards like, look, these hyperscale data platforms that have SQL at the core.
You know, Snowflake, BigQuery, best kind of best read Databricks and even
the direction of Databricks, which has a more different maybe heritage.
But if you look at directionally where that, where they're going from a
technology,
it's much more into tables and query planners
and Delta Lake and all that stuff,
even if it's kind of under this late house umbrella.
That seems to be the core foundation.
I think every company that we talk to
wants to consolidate data for all of their use cases,
ML and analytics and kind of the whole business
into one of these hyperscale data platforms. The challenge with respect to ML then becomes, well, okay,
what are these additional needs that ML has, particularly real-time ML? So, you know,
it's real-time feature generation that tends to lead to streaming. And so what's your streaming
story, right? There's real-time feature store storage or real-time feature serving that leads
to, well, what's your key value, sort of role-oriented store, right?
So these data platforms that have so much traction
are all built for analytical use cases.
And so there are technical limits right now
that haven't yet been really overcome.
So the obvious ones to me are the streaming one,
the real-time serving,
maybe, you know, vector, you know,
sort of like nearest neighbor vector search
for things like personalization, where you need to actually do sort of approximate
nearest neighbor lookups.
These are kind of core bits on a production ML infrastructure that you would typically
have.
I think there's a question then, are those going to be separate systems, right?
Are you going to have a straight pipeline where you do that?
Are you going to have like a hot cache of data?
Or are these, you know, these core platforms so ambitious, right ambitious that they're going to try to absorb those or expose those capabilities inside of the core platform?
So Snowflake recently just announced that they have this hybrid HTAP or hybrid tables concept where you can do fast row lookups that potentially enable some additional real-time serving use cases that might be useful for ML.
They're looking at heavily investing in streaming that might
close the gap in terms of
real-time feature generation so that you can
consolidate and bring those workloads on
the platform. Databricks is a platform
that has some additional
flexibility where you can do some of those.
It doesn't have real-time serving.
It'll be interesting to see where
this all goes
and whether it ends up being kind of one core data platform
and infrastructure with a whole bunch of workflows on top
where the other vendors are building workflows
or whether there are core infrastructure bits
that we'll kind of need to still glue together
to do production machine learning.
So you mentioned something, William,
that's very interesting.
You mentioned that people start using dbt even for ML use cases.
And it makes me wonder, there was always this dichotomy between ML and
analytics use cases where we were saying, okay, the language of analytics
is SQL, for ML it's Python, right?
Like people don't, let's say cross this boundary like easily.
Did you see something like changing there?
Like, do you see like SQL becoming like more important?
And why is this happening if it's happening?
Yeah, I think a big part of this is, well, there's multiple points here,
but one is the performance aspect.
Like if you write the code in SQL,
you just get the performance of this provider
out of the box, right?
You're getting really, really high performance queries.
And to Chris's point of like these vendors
or these cloud providers extending their data warehouses,
you know, a lot of them support Python.
Okay, BigQuery only supports JavaScript,
but I think, you know, if you look at like Databricks
and Snowflake, you know, you can pull in Python libraries. And so, supports JavaScript, but I think if you look at Databricks and Snowflake, you can pull in
Python libraries.
And so the days of saying you have to
extract the data and use Python outside
of it, I'm not sure if that's
a long-term
viable reason to say these are
distinct systems.
Increasingly, I think folks want to
have a single way to do things, and I think
if these platforms increasingly have capabilities that mimic what is available on the ETL side, there's really not a lot of justification to kind of externalize that.
You do want to reuse those platforms and keep your architecture as simple as possible.
Yeah, another point I think is that AI and ML is becoming increasingly data-centric.
So what matters really is the data that's feeding into your models.
And the pipelines that are happening after that are becoming increasingly commodified.
So what you really need to do, there's an argument that what ML is, is basically, okay, what are the set of inputs?
How do you model your inputs, your features to your ML problem?
And what are the set of outputs that you're trying to predict? And in the end state, do you really care? Is that
all you really need to provide, right? Maybe all the other stuff gets hidden from you.
So if you have a system where the data becomes much, much more important,
then all of your work ends up sort of focused on data transformation. And that's really where
these data platforms shine. So where they don't shine, okay, if you're going to write, you know, your custom TensorFlow
model and you need to train it, your PyTorch model, you need to train it on GPU.
No, I mean, pushing that down into the data warehouse makes no sense.
Currently, it probably is not going to make sense, you know, any, honestly ever.
So, but, but, but, but on the other hand, if, you know, if all of that stuff is sort
of that stuff is hidden from you, then your, your job ends up being a sort of a data
management and data manipulation job. If all of that stuff is hidden from you, then your job ends up being a data management
and data manipulation job.
And I think there's no question that SQL,
maybe with a little bit of UDFs here and there in Python,
is just such a more manageable way
to do your data transformation, data engineering work,
including feature engineering work.
And that's where tools like dbt,
which sort of puts SQL at the core,
but now are increasingly even allowing
even little snippets
of python where necessary for those escape patches you just get a much simpler to operate system a
much more performant system and then sort and then much more you know easy to govern and manage
system so your ip team wants you as well and then sort of who's who's who's not going to adopt that right but what when we think about the sort of let's say like graduation
from the analytics side in a centralized store that's sql based to serving in real time
are you seeing sort of the need for real-time flow out of that, right? So you build on the analytics stack
and then you sort of graduate, say,
into the need for the real-time use cases,
as you prove out value,
realize additional opportunities, et cetera.
Number one, are you seeing that happen?
And then two, it sounds like there's still actually
like a pretty gigantic gap technologically moving from that.
Even if you have that foundation really tight in the centralized store, actually moving to serve that stuff real time is non-trivial if you're just based on the centralized store.
Justin, do you want to start or?
Well, I mean, this is right up your alley, but I think that there's a huge gap.
So they're between us currently,
but let me let Willem talk to it.
Yeah, I mean, there is a gap there.
Well, there's a lot of challenges
just from the,
it's so heterogeneous out there,
the infrastructures.
But what we see is teams
starting with the centralized stack,
the kind of dead warehouse,
proving the value of a use case in batch if they can.
So that's like phase one.
Phase two is often you're shipping that data
into some kind of production environment,
a static copy of it,
or data or a model or something from that.
And there's a freshness problem in that case,
but you can respond at low latency.
But often, in most cases,
you've got a product that's operationally running
in real time,
and you've got an event stream that's coupled to that,
and that's managed by engineers.
And so if you really want fresh
and a real-time system with fresh data
and models that can depend on that data,
that's kind of the value that a feature store provides.
It unifies these, like the offline and the online world.
There's a big technological gap, and I think that is part of the problem that we're trying to solve
with feature stores is you know how do you go from kind of like the the offline training batch world
into the online real-time world in a consistent way because the model needs to move between the
two but you know teams often have a hand over there so that's one there's a technical challenge
as well as an organizational challenge, right?
You've got analysts creating
features, perhaps, or data, and then
data scientists improving those as features,
training a model and shipping that into
production, handing it over to engineers.
And so
there's a lot of teaching between the two of them.
How do you actually interface those? And so the tools
we're hoping can make that easier
for folks.
Super interesting. How do you actually interface those? And so the tools we're hoping can make that easier for folks. Yeah.
Super interesting.
Yeah, no, I think this is one of the big unsolved problems.
That on one hand, we have this great data foundation,
but the real-time use cases,
kind of real-serving use cases are just hard to do.
There's just a glimmer that maybe
they'll be possible to do on a single platform
with things like stuff like hybrid tables,
but it's so, so early there.
And then you end up with these,
I know you end up with these two worlds.
And as soon as you end up with the two worlds,
you have to do this complicated dance between them
where you're moving data from your batch environment
to your online environment.
And then maybe you want to actually move
your online environment to your batch environment so you then maybe you want to actually move your online environment to your batch environment
so you can clear which direction you always want to go.
Actually, people take different approaches
where they start with the online and log to the
offline, or you take the offline and
move it up to the online. And so all of a
sudden you've got a fair amount of complexity.
And then that's obviously what motivates
tooling around this feature stores
to be built.
Do you feel that this is like more of like a technology issue?
Like, is there like technology missing right now?
Or is it an organizational issue?
Because what I hear like from William is like, there is like a choreography
among like many different also like people and probably also like departments
and like trying to like to make this happen.
And there are like feedback loops that needed to be there that probably, I don't
know, maybe they include even a broader set of people, right, in the organization.
So what do you think is like the main challenge right now that
the industry needs like to address?
So that's almost two questions, but I'll say that, yes, certainly, imagine you're a data scientist,
maybe more the data scientist that's the initiator
of a machine learning project in a company.
The amount of teams that you need to interface with
to get into production is high, right?
It's not just the team with the API that's going to integrate with you.
It's the data platform team,
and where you're going to run your training pipelines.
Maybe there's an ML platform team,
and they've got something purposeful, but unlikely.
There's a team that maybe wants to look at,
you want monitoring for your system.
So you need to speak to a team about monitoring,
like an SRE team or a DevOps team.
There's a security compliance team.
There's the operations team
that actually speak to the customer on the street.
And so there's so many stakeholders to manage.
A lot of data scientists become more
product managers and we've not made it easier for them as an industry to just get into production.
And so that's kind of what the vendor is trying to do with tools is provide a gateway, a portal
into this solution that's being built for each one of these groups so that it kind of like,
one person isn't, or one group isn't responsible to go and interface with everybody and kind of, because the point just was making
is, it's essentially a loop, right?
So there's a kind of like training, serving, prediction, data collection, logging, storage,
and then transformation loop that's end to end.
And so many teams are involved there.
And, you know, we're trying to just make that easier to address through tooling.
Yeah, I think it's, you know, it's not, my view is it's not a recipe for long-term success.
If you have a significant amount of coordination to do each sort of job to be done that you as an individual gets assigned.
And so if you have to talk to all these different teams and hold a meeting and try to understand their systems and may understand your systems and your needs, it's just a recipe for
things going very, very slow. And I think there's basically two ways to solve that problem.
The one way is to have extremely well-defined interfaces between these different services
where you don't really need to talk to the other team to use it. So if there's a monitoring system
and you just use it and you don't even talk to them, they've exposed to you those interfaces.
And yeah, that's sort of the Amazon model right every small team and then clean interfaces right everything's api first right that's kind of their innovation model the
other way is to you know try to find something where it is a little bit more end-to-end where
a single person can do more right but there's only a certain amount of complexity that an individual
can can can put in their mind so you have to reduce the you know complexity very very dramatically
and so i think and both of those are challenges because i think as you think about the interfaces can put in their mind. So you have to reduce the complexity very, very dramatically.
And so I think, and both of those are challenges because I think as you think about the interfaces,
the abstractions are not always obvious, right?
We're very much evolving, right?
So we're coordinating in part
because we're trying to figure out,
hey, what does everybody need for ML,
for production ML?
And then likewise, end to end,
it's often hard to find the abstractions there
that don't box the user in more than they want. So it's
kind of a trade-off there. I think that's
how I see people navigating it.
A good example of some of the challenges here is
if you've got
an Android or an iOS app
and you're making some kind of prediction,
you want to track what action
the user takes based on some kind of
personalization, perhaps.
Often that requires that mobile team to go and develop some kind of personalization, perhaps. Often that requires that mobile
team to go and develop some kind of
custom logic as part of their
mobile application in order to collect
the data that ultimately goes back into your
experiment. And so there's
all these little subtle areas in which
you need to interface with teams to
just get end-to-end.
And so I think the abstractions are still
being, you know, those ages are still being, you know, there's,
there's ages of still being cut.
And I think that's the key problem to be solved.
Yeah.
Do you see also space like for a new role?
Because William, you mentioned like the data scientist, like turning into a
kind of like PM at the end, like trying to monitor all these relationships there.
Do you think that there is like a need?
William Duggan- Yeah, I see, I see three roles.
I see that the research scientist, you know, the person that's taking two years to write a paper and is using the data in the company or organization to do that.
Then I see the MLE, hands-on, goes end-to-end, you know, builds this thing and gets into prod and maybe even is on call for that. And then I see the DS that becomes the product manager, essentially.
The center point and all the spokes or the star emerge from him
towards all the stakeholders.
And he or she owns this use case.
I don't think there's really a need for a new role,
but there's essentially archetypes that we have seen out there in the wild.
Mm-hmm.
Yeah, the one thing I might differ a little bit is, essentially archetypes that we have seen out there in the wild.
Yeah, the one thing I might differ a little bit is,
I think ultimately for AI and ML to become sort of widely adopted,
it needs to be put into the hands of more users,
and that includes product engineers.
I don't see any reason why a product engineer cannot build an ML-powered feature in the long term,
or an analytics engineer, right,
maybe has more of a DBT SQL background,
can't build a production ML model. Production, in production or an analytics engineer, right? Maybe has more of a DBT SQL background, can't build a production ML model.
Production, in production, no handoff, right?
An in-production model that's continually maintained.
That should be, in my view, it's like anybody who's sort of with that, that's building
production systems, right?
From an analytics engineer to a data engineer, to a machine learning engineer, to a data
scientist, to a product engineer should be able to do that.
I do think the research engineer is sort of separate.
They're going to be in the weeds
doing things in a little bit of a separate
universe. And occasionally for very,
very critical systems, maybe that's
only the domain of the ML engineer.
But I generally think that
if you're building a product,
there's going to be more and more use cases and more
and more systems that enables somebody
who's more of a product engineer, right?
Personalization. It feels like a product engineer plus a data engineer
should be able to get that job done if the tools exist.
And then I think the ML engineer will also love it because they'll be able to do either
different work or deeper work or more work or have more impact if every single use case
doesn't require sort of this like deep, deep, you know, end-to-end experience of infrastructure and ML, which is what the ML engineer today needs.
I agree.
So as we see the industry kind of commoditize, the engineers have a much higher leverage.
And then as their problems get solved, the data science problems get solved, and we commoditize that whole layer.
Ultimately, the product engineers or the product folks even will be the ones building these ML solutions. So that's definitely,
I'd say, the group that will attack the long tail of use cases. I think if you look at like a,
you know, like a Reddit or maybe like a Facebook, maybe your key recommendation system or your key
ML use case, that'll always be custom built, maybe like Google search, that'll be custom built.
But, you know, there will be a long tail of use cases that the product teams can build
themselves using, you know, some kind of solution that's perhaps central around your data warehouse
and, you know, with abstractions that they're familiar with already.
Yeah, super exciting.
That makes a lot of sense. So, all right.
You mentioned like quite a few times, like this problem between like the bots
and the real time and like the low latency requirements that ML has.
What are like, let's say the best patterns like are utilized right now to bridge this
gap and for people people to productize these
models.
How does it work and what's the state of the art in the space?
Do you mean for a specific use case or for...
Oh, there are the chains based on the use case?
That's become even more interesting.
Well, I think the truth is at the foundational level, it's all about the data, right?
And so that's why we started at Tekton with like feature stores and providing a way for you to craft data and, you know, features that will power your models.
I think downstream from that, it's really very specific to your use case, extremely specific.
So, you know, if you're building recommendation systems, it's very different from, and Tristan said earlier, subtly different from fraud detection or churn.
But I think from a data transformation and organization perspective, feature stores, at least, or tools in that layer of the stack, even DBT, provide a lot of value.
They provide you ways to go 70, 80% of the way
by crafting the features that ultimately power your models.
And often you can experiment and see performance of models offline
before you even go live, right?
Depending on the use case, that may not be accurate always.
So you typically do want to go live yourself,
but it's very hard to answer that question
in terms of without diving into specific use cases.
Um, and even then it's okay.
It's so different for, from customer to customer.
David Pérez- you know, this, this is kind of going into the weeds,
although I'm curious on Willem's viewpoint on this is I do think
there, there are two patterns.
If I look at the feature sort of landscape, there are two patterns
that I see being adopted.
One is what you could call like an online first approach.
And so if you look at how, you know, Facebook describes its, you know, feature store environment,
if you look at how YouTube for what, you know, describes their feature store environment,
which are massive, you know, scales where they're generating tons and tons of data,
they don't have, you know, a lack of training data, for instance.
They tend to adopt a more online approach where you generate online features and then you log those features out and you kind of wait
around to collect the training data based on the new feed.
You kind of deploy them first to online and then you log them and then you train your
models off of that.
I think there's a different approach which takes more, which kind of puts more emphasis
on the ability to backfill data and sort of generate features and then kind of generate training data going back in time.
And that introduces a fair amount of complexity, which tools like, you know, Tecton and Feast,
I think, you know, and traditional feature stores solve, which have this backfill.
It has a different sort of architecture and a different set of trade-offs.
It does probably get to, again, your use case.
But I do think that that's, you know, something that I'm watching very closely is, you know,
sort of how that unfolds, those two different architecturally. They have big impacts on your architecture.
They do seem to not be a golden bullet or a silver bullet.
In the case of the login weight, if you don't have high traffic and a lot of volume of users,
it takes a long time to collect the training data.
So if you ship a new feature, you need to log that.
Maybe it takes you two weeks.
If you don't have a lot of traffic, maybe it takes you two months.
And so your iteration speed can be slow.
If you're Google, maybe it takes you minutes, right?
And then, you know, the other side,
there's architectural complexity to, you know,
the original, the traditional architecture feature store because you have the offline and online worlds.
What I'm excited about is technologies like Snowflake
and others where, you know, they have real-time ingestion and hybrid tables.
And you have stream-centric platforms that are being developed that could potentially consolidate these two worlds.
But even today, companies like Tectum also have logging architectures, right? If you're interfacing with an API, for example, for features, for values,
let's say you're calling some API
to get some data about a customer or transaction,
like a credit card company or something.
The only way to deal with that is to log it out
for training purposes later.
You can't use that ahead of time.
You can't query them in bulk offline.
And so even today, Tekton is kind of like in a hybrid state.
But I think over time, the log and weight architecture does have a lot of appeal.
Super interesting.
Well, we're close to time here and I want to leave plenty of time for questions.
So please write your questions in.
I'll start out with one here, which both of you have talked a little bit about this,
maybe in general,
and you're both building really cool tooling
in the ML space,
but one of our listeners wanted to know,
what are some of the,
maybe just pick one for the sake of time,
sort of the most exciting thing that you're seeing
in the ML
space specifically?
I mean, you don't have to name a tool if you don't want to, but as builders of these tools,
what excites you most that you're seeing?
Well, I can start on this one.
I mean, two things.
So obviously what we're building today, which is sort of a declarative approach to operational
AI.
So I just think there's a tremendous need for higher level of abstractions for
production machine learning. And I think, you know, we're trying to do that continual. I think
there's, you know, I get super excited when I read about, okay, you know, Facebook has this thing
called Looper, you can read the paper there. A really exciting example of an end to end declarative
platform for real time machine learning. Apple has something called Overton, which is really exciting for more natural language processing use cases. Stitch
Fitch just had a great blog on their system. So for me, it's just like taking, I view it that
there's a generation one, maybe Uber, Michelangelo is the canonical OG of that example where you're
all the different components, totally makes sense. What's the next step? And so I'm a super,
and I think we're starting to see that coming out of both startups and
coming out of sort of the hyperscalers who are kind of onto the next thing.
The next thing, which I think you can't ignore is the foundation models, large language models,
you know, the things that open AI are doing.
I do think we're, I am very bullish on this being sort of a new chapter and it's very
unclear what's, what it's going to
unlock but it's starting you know you're even starting to see some commercial successes
for use cases and i think it's i think it's going and moving forward at a tremendous speed and it
uh not only is going to unlock a whole new set of use cases but actually there's a whole new set of
tooling concerns that you're going to have to address too. So it's unclear what the developer tooling ecosystem,
what the data management tooling ecosystem
is going to look like
for these extremely large language models.
And so I'm really excited by both the use cases
and the tooling for these large language models.
Yeah, I want to kind of echo the point
that Tristan made around Looper.
It's an extremely exciting direction
that I think, you know,
if you look at what has
happened over the last couple of years, large tech companies have innovated and the market has
commoditized those technologies or approaches. And what we see from Facebook or Meta is this
platform that's declarative and very focused on the product engineer. I'm super excited about that and the simple abstractions
that address ML use
cases. So I think it's really
about the persona that's being addressed here.
And so it seems like we're moving on
to the product teams a little
bit more.
So I think, yeah, that's the key thing that
I'm excited about.
Very cool. All right, well, we're going to try and sneak one
more in here.
And it's about open source.
So, you know, open source and software in general
is a really interesting topic.
But this is a really interesting question specifically to ML.
Is open source even more important in ML, right?
Because there's a lot.
ML can sort of have this flavor of ambiguity around it
for people who aren't necessarily close to it.
How important is open source in ML?
I think when this industry is a little bit wild, wasty,
it's more important,
especially because we spoke about the abstractions.
And if the abstraction is not perfect, then you're stuck if you're not using an open source tool, right?
We see this a lot.
For example, in Feast, if you want to use a different database as your backend, how do you do that with a vendor that doesn't support it?
You need to wait.
With Feast, you can kind of plug in your own backend store.
Long term, I don't, the jury is still out whether it is necessary to have open source
as the delivery mechanism for the functionality.
There's a use, or there's a certainly a lot of companies, especially if you look at the
modern data stack, that have proven that you can solve a whole class of problems with a cloud-based solution.
So I think for, as we said earlier, the long tail of use cases, the jury is still out.
Yeah, I think we're going to see a similar transition to what happened in the data sphere where managed services, especially for infrastructure, for the infrastructure of ML, where there's sort of stateful services that you need to manage, it's going to move towards people
want fully managed services that they can just use and they can kind of get their job
done.
And once those services become good enough, once they sort of trust the abstractions,
the, you know, the SLAs, the company itself, it's going to be just so obvious that, hey,
just use these, use these, use these vendors.
I think you see that, you know, weights and biases, you know, in experiment tracking.
It's just like, yeah,
just go use weights and biases.
I mean, pay 50 bucks a month.
The value you're getting from that is amazing
if you're looking for a way
to track experiments.
And it's not an open source product,
but it's very much targeting
the ML developer crowd
that you'd think would be
kind of the most open source friendly.
You know, I'm still hopeful
that open source remains
a huge part of ML,
like, you know, in terms of
both open publishing, open libraries, like the core libraries behind ML algorithms.
And I think, you know, I think those will stay open source for longer.
Although even now with the, you know, I think I would have said these algorithms would stay open source forever. become a little bit unclear whether the sort of the open approach maybe hugging face is the best
example of this it's going to win versus you know that these hyperscalers are going to release these
models that are proprietary but they're just going to be so amazing that you're going to kind of
get your nose and just use them and then there'll be enough competition in the marketplace that you
know you're not going to get held hostage right so you're going to feel like hey no no big deal
you know i can always switch between google or Microsoft or OpenAI for these large-scale models.
So we'll see.
Yeah, super interesting.
All right, well, we are at the buzzer.
This has been such a helpful conversation.
Tristan, Willem, thank you so much
for giving us some of your time.
Super helpful for us and for our listeners.
Thanks for having me.
Thanks for having me as well.
It's been my pleasure.
Costas, I appreciated so much
the honest take that both Tristan
and Willem had on,
I'll say maybe the gap
between the promise of ML
and the reality of MLOps on the ground for people doing the work today.
And, you know, both of them had very strong feelings that it's a pretty gnarly space still.
And there's still a lot of things that are really hard to do.
And, you know, that was just really refreshing, especially. And, you know,
that was just really refreshing,
especially to hear, you know,
they're both founders of ML products.
And, you know, I think at one point,
Tristan called, you know,
part of the ecosystem horrifying.
You know, so I just appreciated that take.
And I think that's very helpful,
not only for our listeners,
you know, but for us just to realize, you know, there's, there's tons and tons
of promise out there and companies like Continual and Tekton doing really cool
stuff, but you know, we're still in really early innings.
Yeah.
I think there, there was like a wealth of updates on what's going on out there.
I think it's good that we hear that Ops in general are still like an unsolved
problem in ML, and I think it makes sense.
Like, you know, we personally like to come up with the technology and then
like you figure out operations around that and obviously like in And obviously, ML, there are similarities with software engineering,
but there are also differences.
So we need probably different tooling or different methodologies.
I don't know.
But ML needs to mature enough to get to the point where you can say,
okay, operations is what we should care about.
And there is plenty of implications that this is happening right now.
That's what I hear from the guys.
I also keep some stuff about what is needed out there, just from feature
stores, like the products that these guys were building.
But there's like broader need, like even like database systems need like to come up with more innovation in order to solve some of the ML problems, right?
Like what we were talking about, how to serve the models and like the features
for the models and these hybrid new database systems like Snowflake and how
this can help with that, all that stuff.
Still early, but there's a lot of innovation that needs to happen and that's from some angles,
like pretty basic innovation also, like very in the very deep of the infrastructure that we are using, like at the database and storage level even.
So it's a very exciting space.
Like I would be more than happy to,
I don't know, like be one of these folks out there
that they build products and companies in this space.
So anyone who thinks about it,
I think they should go and give it a try.
Absolutely.
And Costas will build an ML startup
name generator,
AI driven, of course,
to help support you in your mission.
I don't like
projects.
Alright. Thanks for joining us
for another live stream
and we'll let you know when the next one's coming out.
Catch you on the next show. We hope you enjoyed this episode of the Data Stack Show. Be sure to
subscribe on your favorite podcast app to get notified about new episodes every week. We'd
also love your feedback. You can email me, ericdodds, at eric at datastackshow.com. That's
E-R-I-C at datastackshow.com.
The show is brought to you by Rudderstack, the CDP for developers.
Learn how to build a CDP on your data warehouse at rudderstack.com.