The Data Stack Show - 25: MLOps and Feature Stores with Willem Pienaar from Tecton

Starting point is 00:00:00 Welcome to the Data Stack Show, where we talk with data engineers, data teams, data scientists, and the teams and people consuming data products. I'm Eric Dodds. And I'm Kostas Pardalis. Join us each week as we explore the world of data and meet the people shaping it. Welcome to another episode of the Data Stack Show. I'm Kostas and today I have the pleasure to host Willem PNR, Tech Lead at Tecton, in another episode where we will be discussing about feature stores, MLOps and open source. Willem is working in one of the

Starting point is 00:00:40 hottest startups right now around feature stores and he's also the maintainer of probably one of the best open source feature store solutions out there. So we will have the opportunity to chat with him and dive into what feature stores are, why we are building them, why they are using them, what MLOps is about, and how open source is important in this new wave of technology that is supporting machine learning. Unfortunately, today Eric is not going to join us, but we will have a good time discussing with Willem and many things to learn from him. So let's dive in.

Starting point is 00:01:17 Welcome everyone to another episode of the Data Stack Show. Today I have a very special guest. His name is Willemiennar. We are going to be discussing about quite recent development in the space of data in general, which is feature stores, but everything around data and machine learning. And I'm really excited to have this conversation with him. Welcome, William. How are you? Thanks, Costas. I'm great.

Starting point is 00:01:48 Thanks for having me on the show. Yeah, yeah, of course. So would you like to start by giving us a quick introduction and a little bit of a background story about you? Sure. So I can give you a quick background. I'm South African, born and raised, grew up there, studied mechanical and electronic engineering. I built a company while I was a student, a networking company, and then sold that. After that, in South Africa, I worked in control systems, engineering, industrial

Starting point is 00:02:19 automation. I did that for a few years and eventually immigrated to Thailand, where I worked kind of as a software engineer slash where we, you know, built remote sensors and a lot of like streaming data from power plants in the jungle to central control systems and things like that. So I've been in and around kind of the engineering space, the data space, kind of vertical solutions for a while. And after working there for a few years, I moved to Singapore where I joined at the time kind of like a company that had been deemed a rocket ship that just crossed 1 billion in valuation. It's an Indonesian company called Gojek. Oh, wow. So they're currently, it's currently a $10 billion company.

Starting point is 00:03:23 At the time, they're mostly focused on ride hailing as their core product, but they are today a multi-application or a multi-product platform. So they do food deliveries, digital payments, lifestyle services. So you can get like somebody to come and fix your car or you can pay and get like airtime like data for your phone and things like that so there's like every single need that you have in the day like if you need a motorcycle taxi a car you know delivery groceries i've got like 17 different services so i joined that team our director for us to build it's basically at the start to get ML into production because they were sitting on

Starting point is 00:04:06 mountains of data, just like Uber and Lyft and all of these companies, but they weren't leveraging that at all. And they had a bunch of data scientists that they'd hired, but those folks couldn't get into production because the engineers building the products weren't incentivized to really help them. So I was the engineering lead that helped them kind of build the initial systems, actual ML systems. So not products so much. And our team was kind of, it started off as kind of like a embedded in the data science team.

Starting point is 00:04:39 Eventually we became a platform team and then we ended up building a lot of data products and data tooling. So at that point, that was about two years into being at Kojic. Our team focused on the end-to-end ML lifecycle. We were only about 10 to 15 folks at the time. Data scientists were like 50 to 60. And so we kind of pivoted towards building tools that a large amount of teams could use. APIs, UI services, it's not a scalable approach that we could find. And some of the things that we worked on were like the feature stores and model serving

Starting point is 00:05:16 and training and schedulers and versioning and processing of data and experimentation systems. It's all kind of in our purview and all ML focused. Yeah. So after that, I joined Tekton. I was there for about four years and I just recently joined Tekton. I mean, it was just a match made in heaven.

Starting point is 00:05:38 Tekton is a company that's focused purely on feature stores. And that's kind of my specialty at Gojek. And, you know, I also led the team that built Feast, which is the feature store that we built with Google at Gojek. And so at Tecton, our focus is primarily to build a world-class feature store. And we have two products that we're kind of building out there

Starting point is 00:06:00 and we're trying to build towards a unified vision for them. So that's kind of a short story in a long form well that's quite a journey both uh geographically speaking and also in terms of your career so that's amazing cool can you share a little bit more about tekton i mean you said that tekton is like focusing mainly on building like a feature store. Can you tell us a little bit more about the company and the product itself? And then we are going to get more into more detail about both feature stores in general and also like the Tekton product itself. Yeah, that's a good question. So Tekton was founded by the original folks that built the

Starting point is 00:06:42 Michelangelo platform at Uber. And I think most people in the data space have heard of that. So that was kind of a seminal internal proprietary platform that was built at Uber. And it was sold as something that democratized machine learning. That's a very overused term, but it was widely used within Uber to kind of productionize both data and models. And what a lot of people told us is that that system was widely used within Uber to kind of productionize both data and models. And what a lot of people told us is that that system is actually used for a lot of EDA and iteration and development. It's not just for productionization, but it's a very famous system.

Starting point is 00:07:15 So they left there. So it's Mike, who was the PM on the project. Kevin, who is well-known as an engineering leader. And they founded Tekton. And I think they started in stealth 2019. So they've been secretly at the start building a feature store startup. And they've grown the team prior to me

Starting point is 00:07:36 joining to about 23 people. I think I was the 24th or the 25th person to join. So they've got a very advanced, I'd almost go as far as to say it's the leading feature store right now that is at least publicly available whether open source or proprietary or paid and it's it's a complete end-to-end feature store and it's addresses both like enterprise and kind of like small startups it's not fully open to the public right now. So you need to obviously sign up and pay

Starting point is 00:08:08 and go through the normal sales channels. But it's something that we want to get in everybody's hands in the future. But there are some specific differences of the products between Dekton and Feast, but we can get into that a bit later. Yeah, absolutely. Quick question before we move forward.

Starting point is 00:08:25 You mentioned about being the most advanced feature store right now in the market. I mean, my background is mainly in data engineering, to be honest. I'm not a person who has worked in ML. So I know about feature stores, but I haven't used them extensively myself. So I did a bit of research

Starting point is 00:08:43 and I try to see what is available out there. And what I've seen and noticed is that there are many technologies that are coming from very big corporations. Like you mentioned, for example, Michelangelo and I found like what Airbnb is doing. I think they have, how they call it, like the Zipline, I think.

Starting point is 00:09:02 Yes, Zipline is the feature store and BigHit is the ML platform. Yeah, so it looks like every big corporation has pretty much come up with their own architecture. But you don't, at least I didn't manage to find that many open source solutions. Are there open source solutions outside of Fist? So there's HoppsWorks.

Starting point is 00:09:23 I'm not sure if it's S or not, but they're one. They came out more or less the same time as us. They were kind of a Hadoop-focused one. They had like proprietary underlying technologies, like file systems and things. There are smaller ones. I think there's one called Butterfree that I recently saw that seems a little bit nascent, but they're coming out.

Starting point is 00:09:45 And I believe that some of the proprietary feature stores will be open source in this year as well. At least there have been some rumors. That's good. That's exciting. Cool. Okay. So let's move forward and do a little bit of more technical details. And let's start with, let's talk first of all, what is a feature? I mean, we're talking about feature stores.

Starting point is 00:10:09 Yeah. I mean, the simplest answer to that is i mean it could be an advanced answer but simple answer is it's an input to a model so it's it's literally a data point that is used in a model to make a prediction and how is i mean how is different compared to, let's say, the typical data types that we have in the database? What's the difference there? Or is it pretty much the same thing, just like packets in a different way? I think it's more an abstract label that is assigned to specific data, because it's just in what context the data is being used you can take raw event data and then feed it into a model and it can be considered a feature so it's when it's fed into the model then it and it has some kind of influence on the outcome that the model is you know producing that's when it becomes a feature but in terms of the data types that

Starting point is 00:11:01 you're feeding to the model it's almost almost always integers or floats or binary values. If you're feeding strings or, let's say, bytes, often the model has then the capabilities to interpret those types. But it's normally primitive types that you're feeding into a model. And in most cases, these features are only valuable. It's not in all cases,

Starting point is 00:11:28 once you've aggregated them to some degree. So if you look at like the amount of purchases that a user has made or some kind of value that allows the model to make a stronger inference on a user or a customer or whatever the entity is that you care about, that you're making the prediction about. So typically it's something that features aggregated data,

Starting point is 00:11:52 but it can also be raw data. But the most common case is to have, I guess, some kind of aggregation, right? Yes. Okay. That's interesting. And can you give us a little bit of background around the lifecycle of a feature? As you said, it can be from raw data,

Starting point is 00:12:10 aggregation. How do we come up with a feature? How do we start from the raw data that we get and how we end up with a feature that we can store on a feature store and use it on our models online or offline for training? I'll give you the non-feature store flow first because the feature stores are all different.

Starting point is 00:12:27 The non-feature store flow is the user exports some historical data from a warehouse or some lake that the company's organized for them. So they sample the data and then they, you know, take like 10,000 or 100,000 rows and then they just process that data. That's as a Pandas data frame or something. And then they train a model on that,

Starting point is 00:12:45 and they look at the model's performance. And then typically they would ship that model into production somehow and then get an engineering team to kind of rewrite those transformations on the real-time event stream or transactional data that's available in production. And as the systems are transacting, that data is fed to the model and they can make predictions.

Starting point is 00:13:09 And if you looked at that flow, you could also productionize that flow by, you know, the training part that the data scientist did at the start could be extended to have more data and it could be automated through airflow or some pipelining system, but that's kind of the high level flow. And so the feature in that story is the transformation that's made on the raw data and it is fed into the model during training. Often you will log a list of features as strings, column names with the model binary and you can then reference those

Starting point is 00:13:46 same features in production because all of your models will probably have different lists of features that they're you know kind of referencing and so the lifecycle continues in the production and then somehow you need to tie the the data sources that you have in production with the list of features that's saved with that model binary. So your model serving infrastructure needs to know how to select the right columns and data points in production and feed that to the model. Otherwise you're going to have a skew or some, if the wrong features are being fed to the model, it's just going to be an inaccurate prediction. So that's a typical flow and how the feature stores fit into this, I'm not sure if we want to get into the feature stores,

Starting point is 00:14:28 but the lifecycle is extended to, it's kind of split in that the feature store provides two interfaces, one at the training time and one at the serving time. And it prevents you from, or it removes the need to kind of re-engineer features and it gives you a kind of unified interface to the same data, same features. We can get into that in a bit, but just the final part

Starting point is 00:14:49 on the lifecycle, I guess the final place where you would look at the lifecycle of the feature, because you've made that prediction, is you would have an experimentation system that tracks the outcome of the prediction. And if the outcome is good, then you could go back and say, these features are actually predictive. And if the outcome is bad, then you can say, well, maybe these features are the problem, or maybe the model type is the problem. Maybe there's some intrinsic problem with the kind of way that we frame the problem domain. But yeah, so you'd want to have the model itself and all the logic that you have around it and the features as part of the collection of artifacts that are associated with an outcome in an experiment.

Starting point is 00:15:32 And by experiment, I mean like, let's say if you've got a website, you could maybe be testing two models and those models might be recommending specific products. So you can measure based on user behavior, which model is doing the best and the features are the primary influence there. That's super interesting. Actually, I find it fascinating. Like it's a completely different

Starting point is 00:15:52 type of complexity when you are serving models compared to a software product and how you serve it. When you have again operations, we have again like lifecycle and you have like also similarities, but at the same time, the tools that you need and that's like the feeling that I'm getting from you and the methodologies that like, they are different. And I'm really happy

Starting point is 00:16:11 that I have you here today to learn more about that. Okay, we chatted about like what the feature is and we touched a little bit also about feature stores. Let's get a little bit more into like the feature store itself.

Starting point is 00:16:23 You mentioned something about putting like two different phrases, one for the training part and one when the model is online. What is a feature store at the end and how it is different from a database or a data store in general where we store data? And what are the components there? Yeah, this is something I've kind of thought about a lot. And the best way I can explain it is that the feature store is an opinionated data system that allows you to operationalize data for machine learning. So it's a data system meant for machine learning, and it has some unique properties based on the requirements that machine learning models have. So by the way, this definition is not universal

Starting point is 00:17:07 because all feature stores are basically different and people have different opinions of what a feature store should be. But there are some characteristics that make up most feature stores. So the one that I think is extremely important is that a feature store provides a kind of unified, consistent interface for you in the offline and the online worlds. So with models, on part of the lifecycle, you're training the model, and then the next side, you are serving that model in production. That production could be an online serving,

Starting point is 00:17:38 or it could also be a batch scoring where you're doing a large batch of data that you want to make predictions on. But an important failure mode that we often see in production systems where they don't have a feature store is there has to be a re-engineering of features in both environments because typically there are different teams

Starting point is 00:17:57 working in different environments. You'll have data scientists working with Python in the offline side, and then you have Golang and Java in the production side with engineers. And so they end up pre-engineering a lot of these features and that causes drift and problems with models.

Starting point is 00:18:11 So the feature store provides a single interface between your model and the data. And so it literally is an API or SDK that allows you to pull data and it serves the data to your model. And it ensures the quality of the data to your model and it ensures the quality of the data to that model then feature stores and that that fundamentally removes this kind of data drift concept drift problem it depends on the architecture of the feature store of course

Starting point is 00:18:37 another problem that feature stores solve is feature reuse so it allows you to kind of define both in those two contexts, but between the kind of offline and online world, sorry, the streaming and batch world, consistent definitions of features. So you can define a transformation once and other teams can see that definition and they can consume your features.

Starting point is 00:18:59 They can fork the transformation and then reapply that and create new features. So it allows for collaboration. It allows for reuse. That's actually one of the biggest problems we had at Gojek was teams were just copying and pasting each other's code if they knew about it. But often they were just re-engineering the same features over and over

Starting point is 00:19:19 so that recreating the same transformations. Now, this aspect is not necessarily unique to a feature store, but it's something that it's very uniquely positioned to do because it really sits at the center of your machine learning. It's essentially the foundation to your machine learning architecture. So the feature store provides that consistent view. It provides also an abstraction from between the model and your data infrastructure. So this is also something that we had massive problems with at Gojek where teams would build training pipelines

Starting point is 00:19:49 and then they would write SQL queries that are basically running before model training. And in production, they would have access to Redis and a lot of connectivity and boilerplate code. So feature stores decouple the process of creating and materializing features from the consumption of that, which in turn makes your models highly portable. So there's no direct coupling or assumption that certain boilerplate code will be packaged with your model. And so I think those are the kind of key things

Starting point is 00:20:25 that make a feature store unique. It's this kind of consistent view between both environments. It provides also online serving capabilities. So it gives you low latency access to features and production. It also gives you often the kind of more advanced feature stores

Starting point is 00:20:39 provide point in time guarantees. So it ensures that when you are training a model, that the view that the model sees on historical data is accurate, and that it represents the same view that the model will see in an online case. This isn't always easy to do, because you need to do a lot of kind of fuzzy as of joins with data in order to ensure that you don't accidentally leak future data to models. So to drill a little bit into that, it's very easy to, as a data scientist, accidentally, when you're doing like a join of like 20 or so tables to produce a training data set, to easily just accidentally give some future data, like maybe it's an aggregation that's

Starting point is 00:21:22 over a day. And you think that data that's stored on today's timestamp means that it was from the previous day, but actually it's from the coming day. Now your model can see into the future when you're training it, but when it actually gets deployed into production, you can't get that data. And so it's just wildly inaccurate. So those are like subtle little things

Starting point is 00:21:45 that trip up a lot of teams when they productionize models and that a feature store helps with. That's very interesting. I'll go back and ask about the feature again, just because I tried to make it more clear to myself, to be honest. So if I understand correctly,

Starting point is 00:21:59 if you want to think about the feature in an abstract way, because initially, to be honest, like when I was thinking about features and reading about it, I was thinking that at the end, there is a database somewhere where you have like some data stored there,

Starting point is 00:22:12 which is the result of doing a pre-aggregation, right? But the more we talk together, I tend to think that the feature at the end is something much more complex than that. And it has to encapsulate like more information than just the output of a transformation. So is it accurate to say that like at the end, the feature is a piece of code that actually executes like the aggregation or defines the aggregation or the type of processing that you want to do on the data, together with source, because the data needs to come from somewhere, and this cannot be arbitrary.

Starting point is 00:22:45 It has to be well-defined as part of the feature. The model, of course, that we associated with at the end, and also the time, right? Because something that we observe today, even if we are talking about the same data source, or we use the same aggregation, it doesn't mean that it's going to be the same again tomorrow, or it was the same yesterday. That's what I'm saying makes sense. To some degree, but I would challenge you on some of that. So going to be the same again tomorrow or it was the same yesterday. Does what I'm saying make sense? To some degree, but I would challenge you on some of that. So are you saying the feature is the definition of all those things? Yes. It's not clear to me how the model is associated to the feature here or connected.

Starting point is 00:23:17 Because normally a model has a dependency on a range of features, but the feature has no awareness of models that consume it. Okay. Yeah, I was thinking more about the model as being the entity that's going to consume the feature. So in this sense, it makes sense to associate with it. But yeah, I get your point now. The feature can live there and you can reuse the feature also with different models, if

Starting point is 00:23:41 I understand correctly. Yeah. So if you disconnect the model there, you've got your input source data, and then you've got the transformation. Those are actually the only, that's all you need to produce a specific feature. I don't think time would be in the mix there because, yes, over time things would change, but if you change the transformation or the source data, then that is the input artifact that is changing. If you have a deterministic function that produces a specific feature.

Starting point is 00:24:12 So if the input data changes or if the transformation changes, it's a new feature or it's a new version of the same feature. And feature stores also help you with tracking that. So if you have a feature store that allows for tracking of versions, then if one of those two things change, then it will be a new version of the feature. And interestingly, then when you consume that feature, if your model has a dependency on an old feature, you'll consume the old data and the old transformation. And if you consume from the new version,

Starting point is 00:24:41 it'll be the new transformation or the new data. Also, I mean, there is an aspect of it does depend on how you partition your data like the time element does come in there so if you're just doing a refresh of the data every every week or month will be different right there's seasonality effect in data so what we typically do is we just consider those to be, we consider those to be the same feature, but different models. So it depends on, you can be really pedantic about the versioning there, but for refreshing models, it's typically not that serious.

Starting point is 00:25:20 As long as you have the right validation on your source data and you can make sure that the effects of seasonality is not too wild. Sorry, getting a bit digress here, but yeah, I'm with you. Okay, okay. Thank you so much. Now it's much more clear about the feature. Sorry, I really find this conversation that we have like as an amazing opportunity for me to learn more about that stuff.

Starting point is 00:25:43 So I might do some silly questions. I know that there might do some silly questions. I know that there might be some people out there that might be much more advanced and work in this space. But yeah, I'm selfish. Okay, thank you. All right, so moving a little bit forward, staying in the feature stores, I still just understand a little bit more

Starting point is 00:26:01 how a feature store is architected. What are the components? If you see it from a software engineering perspective, let's say I would like to start building a feature store. What kind of architecture I should expect to see there and what are the main components of it? The traditional feature stores have an offline store. This is a place where you are going to materialize data.

Starting point is 00:26:26 So essentially, you're going to take data from some source. You're going to use, this is another component the feature store has, some kind of compute layer, some transformation system like Spark, you know, Airflow. It could even be like an ELT stack, like warehouse. And then you're going to produce data and then you're going to store it in the offline store. That store is used by the feature store, and often you have an API that's your feature store API that you query. It'll then hit the offline store with a query, produce a training dataset, and export that for you

Starting point is 00:26:58 to train your model on. The feature stores also have an online store, and so it'll have typically an online API, which you will hit with a query in production. And that will be backed by, let's say, a Dynamo, a Redis, some kind of low latency store key value in almost all cases. And that store is also populated by these jobs that transform the data. The more advanced feature stores have some operational components as well.

Starting point is 00:27:28 So if you talk about Tekton, Feast also has some of these capabilities, but not as advanced as Tekton. It plugs into monitoring systems. It also has feature transformation, on-demand feature transformation services. You can do something like not just pre-compute features to be served, you can also do a transformation on the fly. So sometimes you have, like let's say you've got a driver making a booking on a ride-hailing app. You only have their location

Starting point is 00:27:58 when they're making the booking and you only have the location of the customer when they're making the booking. So you can't pre-compute that. But you still need to produce features that are dependent on those input variables. So Tecton has this ability to do on-the-fly feature computation,

Starting point is 00:28:13 and you can actually define those transformations ahead of time, but they execute at runtime. So integration with monitoring systems, on-the-fly computation, pre-computed computation, offline store, online store. I'd say those are the primary components. And then you have the computations are either batch jobs or they're streaming jobs. So if you're doing transformations on streams, they're long-lived.

Starting point is 00:28:39 And if you're doing batch, then they're just running on some schedule like every day or every hour or something like that. I'd say those are the canonical components of a feature store. But if you were listening to what I was saying earlier about what makes a feature store unique, and if you look at what Feast has implemented, I'd say the only thing that really needs to be there is the online store and an ability to create training data from your offline data. So that's kind of the essential complexity. That's great. Actually, I was checking Feast at some point.

Starting point is 00:29:13 And if I'm not mistaken, like in Feast, for example, you don't have like transformations there, right? Is that correct? Yeah. And that made me think and together with, I was reading an article at some point where there was some kind of like critique around feature stores. And actually what they were saying is that feature stores are great, but feature stores are also something that needs to evolve as, let's say, like machine learning inside the

Starting point is 00:29:37 organization evolves, right? Like if you start today to try and experiment and come up with some models and all that stuff, probably getting a full feature store is going to be like an overkill. So you mentioned two things that you said that like they are the basic requirements to have like a feature store, which is the offline training

Starting point is 00:29:55 and the online service of the data. What is the evolution as the company grows and as the company starts becoming more and more serious around the ML and the data science teams that they have, how do you see also the feature stores evolving in there? That's a great question. So this is something we've been thinking about a lot as well. Could a single data scientist use a feature store? Can a two, three-man team deploy and run a feature store for a single use case?

Starting point is 00:30:27 We haven't found a use case for a single data scientist, but we believe that it's possible for small teams. Like let's say there's a company, they've got one team, this team has to build one model and get into production, and they need a system that gives them kind of a structured way to get data into production without engineers being involved they would deploy a feature store and they would kind of just use that themselves when more teams start to depend on for want to use feature stores like they're going to get more ml models into production that require features or when that

Starting point is 00:31:04 team iterates on the same ML system, but with different iterations of the same model. So like the type of model is the same, the problem it's solving, but they've got different variants and each model needs to be tracked with a list of different features. Then it makes sense to kind of double down

Starting point is 00:31:21 on the feature store and get some, I guess you'd either need a more advanced feature store, like depending on if you're using a Feast or a proprietary solution, as opposed to something yourself. But at some point, you can't just have like a Redis and maybe some Airflow scripts that are pushing data into production.

Starting point is 00:31:41 You need to have something that's providing you versioning, providing you tracking of features, battle-tested APIs and things like that. But you can emerge and evolve from a solutions team that's solving one problem to having that feature store owned by a platform team. That's, I guess, the next step. So it's a central engineering team

Starting point is 00:32:05 that manages the feature store. They do things like provide access control. They make sure that data gets garbage collected in stores. They make sure that SLOs and SLAs are being met, that the performance guarantees are being met, that if jobs are failing, that they're going to be the ones fixing that. Then you've essentially separated two worlds, right?

Starting point is 00:32:22 On the one side, you have data engineers, data scientists, that originally, they were creating data like features, and they were taking their own models into production, and they were doing like end to end. But eventually, it becomes two worlds. One is data engineers or data scientists creating features, features that may or may not be used by them. It could be for other teams. And often what I've seen at kind of large companies is that analysts are being asked to do this. So they ask analysts to write like SQL,

Starting point is 00:32:55 BigQuery SQL, Snowflake and all that stuff because analysts are really good at that. It's efficient. And you create this wealth of like transformations on the one side and then the feature store is just this layer that productionizes and operationalizes that data and then on the other side you have this catalog of features that you as a user you can just pick the ones that you want based on metadata that's stored on those features train your model iterate on that until you're happy and then production that. But you probably are not going to engineer any features. You might just reuse existing

Starting point is 00:33:29 features. So I think that's kind of like the final point at which you are at the end of your evolution. Then it's mostly about security and access control and scalability and enterprise functionality. And kind of that's where Tekton is currently very good at. So Feast is something that is mostly deployed by teams that are more advanced than the single solutions team. It's almost in all cases a platform team, but it's not an enterprise feature store like Tekton. It's very fascinating.

Starting point is 00:34:00 Yeah, absolutely, absolutely. It's very interesting to hear about that. So feature stores are something like quite new, right? It's a new concept in terms of technology. You taught many different parts of it. And I assume that there's like also different maturity on these parts today. What parts do you see and components from a feature store that there's a lot of space for improvement right now and where do you think like that direction is going to both from your experience in your previous

Starting point is 00:34:30 company that you were like also feast and but also like in tecton because from what i understand tecton was also like interacting with a different type of more enterprise type of company which probably usually they have also a little bit of different requirements. That's a very good question. This is a very, I think the tricky one here to solve is who you're addressing. The biggest problem with the feature store today is that it solves many problems because it's uniquely positioned to solve those problems. And so it becomes this platform that, you know,

Starting point is 00:35:01 it's kind of a Frankenstein monster. So I think feature stores will evolve in different directions and they will be more focused over time. So I think you'll see kind of a split between feature stores that are more focused on the solution teams and the kind of smaller teams, and then you'll see ones that are focused on the platforms and enterprises and their needs are different so i think that basic problems are already somewhat solved if you look at spark transformations or dbt it's not perfect

Starting point is 00:35:36 but there are solutions in creating features and the kind of focus right now is not so much how do you create features how do you compute them how do you not so much how do you create features, how do you compute them, how do you store them, and how do you serve them? It's how do you do everything around that, the kind of discovery and reuse, access, how do you do things like the lineage between features, dependencies, how do you track how models are performing that use features,

Starting point is 00:36:02 how do you integrate with adjacent monitoring systems and data validation and quality systems? Those are kind of the enterprise needs. And then if you look at like a lower scale kind of solution team focus, it's a little bit more on how do you make it easier to get started with feature stores? How do you make it easy to integrate

Starting point is 00:36:20 into existing workflows? How do you make it less kind of overwhelming for teams? And I think all of the feature stores today are still kind of tough to get started. So I bet that if you went to Feast, you didn't install and run it. You probably just read the docs because it's not just the pip install, right?

Starting point is 00:36:38 You have to spin up infrastructure. You need a use case and you need to do quite a lot to go end to end with it. So it a use case and you need to do quite a lot to go end-to-end with it. So it depends on who you're kind of targeting, the kind of smaller teams, larger teams, platform teams. But I think the V1 problems are solved. The V2 problems are different for those two. And those are the ones I kind of mentioned earlier.

Starting point is 00:37:00 Yeah, that's great. It's very, very interesting to hear about the enterprise where it looks like a lot of value in this organization is always around governance and all these things that have been addressed or we are trying to address also in different spaces, but how do they apply specifically in the case of a feature store, which is super interesting to see the same story but narrated from the side of a feature store, which is super interesting to see the same story,

Starting point is 00:37:25 but narrated from the sign of a feature store. So last, let's say a bit of more technical question before we move. And I'd like to discuss a little bit more about Tekton. How does the feature stores in general integrate with the rest of the data infrastructure that the company has? You mentioned that setting up a feature store

Starting point is 00:37:45 is not like a simple process usually because there's a lot of different components of data infrastructure that you have to deploy there. What are the main touch points with the rest of the data infrastructure that a feature store has today? Main touch points are you have data sources, either batch or streaming,

Starting point is 00:38:04 and you have some kind of either batch or streaming, and you have some kind of job runner or compute engine. So like Cloud Dataflow, Kinesis, Spark, something that can run a process that can take data from that source, pull it in, do some transformations or take transformations. There's an ETL system, essentially, and then load that into stores, one or more stores. So in the old Feast architecture, you'd pull data from the source and you'd push to a stream. And from that stream, it will get sunk into online and offline stores. But in the new Feast architecture and then the Tecton architecture, what happens is you pull from, let's say, a batch source.

Starting point is 00:38:40 It could be a warehouse like Redshift. It can be a bucket. And you can pull from streams like Kafka or something like PubSub and do transformations and then just push to a single online store and a single offline store. So there's the compute layer. There's the two sources. There's the storage engines. The storage engines may be existing infrastructure. So feature stores, at least the good ones, reuse existing infrastructure and they don't create new data islands.

Starting point is 00:39:08 And then there's also integration with operational systems like, you know, if you've got a Grafana and a Prometheus, or you've got some kind of logging system like Stackdriver or Kibana or Elk Stack, feature stores integrate with those. And because they're production systems, right, you're depending on, like, literally the business decisions are being made on the fly with this data. So they are critical to have operational excellence on. You need the logs, you need the metrics, you need alerts. So they integrate with all those systems,

Starting point is 00:39:42 like a PagerDuty or Sentry and all of these kind of monitoring and metric systems. And then, of course, the kind of critical integration is into the model serving layer. So the feature servers, the model server and the feature server speak to each other. So the models will call out to get features. And this also happens during training. So if there's a pipeline training model, then that also calls out to get features. And this also happens during training. So if there's a pipeline training model, then that also calls out to the feature store.

Starting point is 00:40:09 And depending on your feature store, it'll either be deployed to Kubernetes or it'll be deployed to kind of like a managed environment. But I'd say most of them actually require Kubernetes these days to run. And if your feature store allows you to train locally, like in your notebook, So that's also another integration touch point. And then recently there's been like, I don't know if you know of Lyft's Munson. No, I haven't heard of it. Wait, is it Lyft or is it another company?

Starting point is 00:40:30 But it's a metadata, it's a discovery system. Kind of like a, you know, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, Munson? No, I haven't heard of it. Wait, is it Lyft or is it another company? But it's a metadata, it's a discovery system, kind of like a data discovery, metadata tracking system that you deploy in your organization. And it basically pulls or collects information about all the systems that have data across your org.

Starting point is 00:41:02 So that's something that has been becoming quite popular. Data Hub is another one. And they've recently integrated with Feast as well. So the integrations between those systems and feature stores are also important. Super interesting. I see there are many touch points there. So it requires from what I understand,

Starting point is 00:41:22 like quite a lot of effort to set it up and also have probably complex operations around that, which takes me to my next question. Let's chat a little bit more about Tekton and more specifically about what it means to productize such complex architectures, especially on the cloud. So how did you manage to do that with Tekton? Can you tell us a little bit more about this? Well, I'd love to give you the finer details, but I've just joined the team two months ago, so I wasn't really involved with most of those small things,

Starting point is 00:41:51 but I can tell you at a high level how we operate. There's multiple aspects to it. Tekton today runs as a managed service where we have architected the system in such a way that we can run a single Tecton control plane, basically the brain of operations, and we have a

Starting point is 00:42:14 separate data plane, and this data plane can be deployed into a customer's cloud environment. Essentially, what this provides is a way for us to horizontally scale out the amount of customers that we can support and provide them data locality, like their data doesn't have to leave their environments. So we have a large engineering team that's heavily focused on ensuring the reliability and stability and performance, as well as just the functionality that's available in that system, both from a kind of control and operational standpoint as well as execution standpoint.

Starting point is 00:42:48 So how do you do computations for the customer and how can you make that efficient and how can you save them money and how can you give them earlier alerts and warnings and how do you integrate into the stores that they're already using? Then on the other side, we've got product teams. I'm a little bit closer to the product side. So we have a lot of conversations on

Starting point is 00:43:08 what is the most intuitive way for users to define features? How do you allow them to specify the configuration that tells the feature store how to operate? Because in the data space, it's unlike engineering in that you're not reigning in the chaos, right? You're not reducing complexity. There's an innate complexity to data. And the more features you create, the more uncertainty and complexity and entropy is introduced into the system. So you kind of want to give them as much structure as you can, while at the same time, giving them freedom to, you know, operate. Like you can't just say, you can do an average or a min-max, right? You have to allow them to write any kind of

Starting point is 00:43:56 transformations, bring their own code if they want to bring their own dependencies, but at the same time, prevent them from, you know, taking down a production system and accidentally bringing in some sleep function or something. Yeah. So on the product side, we're heavily focused on understanding how the users think and what to provide to them. And the great thing about this is that we have two worlds here. We have Feast, the open source side, and we have Takedon where we have different customers and different users.

Starting point is 00:44:28 So, and then finally, it's just, yeah, I mean, we have amazing founders that are, you know, seen a lot of great implementations of feature stores like Uber, Michelangelo, and other companies. And they're very well connected. And we have great investors as well with Sequoia and Andreessen Horowitz

Starting point is 00:44:45 that really guides us in our venture. Yeah, yeah, absolutely. That's very important. So how does Feast and Tekton work together? What's the vision there, both from your side and also from Tekton's side? Because you joined the company there, you will be working on a product, Tekton, and at the same time, I assume you

Starting point is 00:45:08 are going to continue maintaining Feast. So what's the story behind this? Well, that's a great question. I think that when we started, we started independently. And then at some point, we just realized we're trying to solve the same problem, and we'll probably be better doing this together. And we have these great two products so for us it's just about figuring out how to build the best feature store and we believe

Starting point is 00:45:33 that you know there will be large overlap between these two but that the feast and takedown will kind of gravitate towards solving problems for different groups of users, where Feast will be a little bit more for teams that just want to get started quickly, solve specific problems. They're more at the kind of nascent stage. But if you go to a large bank or corporate, something that requires companies or teams that require high scale or multi-tenancy or advanced access control, then you're more likely to go towards Tekton. So for us, we're still trying to kind of converge

Starting point is 00:46:09 these two visions. So we're working very closely, I'm very close to the Feast and Tekton sites. We can be unifying these visions. But I think over the next three to six months, it'll become much clearer exactly what we are, what we have decided. That's as much as I can answer right now. I hope

Starting point is 00:46:25 that was satisfying enough for you, but... No, no, no, that's good. That's good. I totally understand. How's your experience working on an open source project, by the way? It's extremely rewarding and it's also kind of draining at some points. So you don't really have often close loop feedback. So you only see the tip of the iceberg in users. So like 2% of users will make an issue or give you feedback, but that'll often be negative. So you really have to kind of have conviction that what you're doing is right.

Starting point is 00:46:58 Luckily, I had to run Feast internally at Gojek for like three years or two years at least. So it was very rewarding to work with our customers internally and just get them to use it and make them happy and see how impactful the software is. And so you don't need to have conviction that an open source project is successful. We just kind of put it out there and do it out in the open. And if people like it, they like it. If they don't, they don't.

Starting point is 00:47:21 But it turns out they kind of do. And so for the most part, it's been very rewarding, but it can be be a lot of work so it's best if you're paid to do it yeah i totally understand uh how it feels so we are almost uh close to our time and we have many things to discuss to be honest i mean it's very fascinating, this whole space with feature stores. But one last question. Tecton recently raised a quite impressive round from some very impressive VCs here in the Silicon Valley. You mentioned some of them already. Can you tell us a little bit about what does this mean? I mean, both for the company itself, like what excites you about what's going to happen in the next couple of months? And also what it means about feature stores in general, right?

Starting point is 00:48:10 And this market, let's say, that is like emerging. Yeah, so the market is going to get a lot more competitive. We've already seen Amazon release their feature store. Not sure if you had a look at that. We believe that, you know, all other cloud providers also bring them out. And so raising that round is kind of a vote of confidence from our investors that, you know, they believe that we are one of the stronger players in this big round. And I think that Tecton is probably the most, you know, it's the right environment and the right people to build an industry-dominating or the most successful feature store, which is part of the reason why I joined this team.

Starting point is 00:48:52 What you can expect to see going forward, I think in the short term, is a lot easier access, a lot more transparency in terms of our APIs and the functionality that we provide. And we'll be going towards users a lot more. So previously, I mean, from a technical standpoint, we're going to be a lot more open towards integrating

Starting point is 00:49:19 into existing infrastructure and reusing existing infrastructure instead of providing a managed service with specific types of infrastructure so i think that's we've collected a lot of feedback we've got a lot of great customers that have been working really closely with us and so there's a lot of things that will be landing in the next couple of months but i think the thing that I'm the most excited about is getting more eyes on the product itself and opening up what we've been working on. Yeah, and also see how it's going to work together with Fist because from what I understand from what you said earlier,

Starting point is 00:49:57 there are also things going to happen there. Thank you so much. It was a great conversation. I really enjoyed it. I learned a lot and I really appreciate that. And yeah, I hope to meet again like in a couple of months and see how things are going and learn more about this. Definitely. I'm going to take you up on that offer. Thank you so much for your time today. Thank you everyone for joining us in another episode of the Datastack show. I hope you enjoyed today's episode with Willem as much as I did.

Starting point is 00:50:27 When we started recording this episode, I had many questions and I would say even some doubts about the importance of the feature stores. I know many, many more things right now about them and I truly understand why they are important. Willem did an amazing job explaining that to us. And I'm really looking forward to have another recording with him in the near future. William has many things to share with us about this exciting new world of MLOs. Thank you so much again for listening to our show

Starting point is 00:50:59 and see you on the next episode.

The Data Stack Show - 25: MLOps and Feature Stores with Willem Pienaar from Tecton

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.