The Data Stack Show - 134: Unpacking the AI Revolution and the Technology Behind A Feature-First Future with H.O. Maycotte of FeatureBase
Episode Date: April 12, 2023Highlights from this week’s conversation include:The journey of H.O. into data and becoming the CEO of FeatureBase (2:37)Characteristics of the super evolution in technology (6:36)ChatGPT as the mis...sionary of AI (9:45)The tension between authenticity and technology (13:12)What is FeatureBase? (17:53)Comparing FeatureBase to feature stores (25:58)Workload capacities and possibilities in FeatureBase (33:20)The importance of developer experience on a platform (38:23)Exciting developments for FeatureBase in the future (47:13)Final thoughts and takeaways (53:52)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
Transcript
Discussion (0)
Welcome to the Data Stack Show.
Each week we explore the world of data by talking to the people shaping its future.
You'll learn about new data technology and trends and how data teams and processes are run at top companies.
The Data Stack Show is brought to you by Rudderstack, the CDP for developers.
You can learn more at rudderstack.com.
Hey, exciting news. We're teaming up with Ananth
from the Data Engineering Weekly newsletter
for the State of Data Engineering Survey.
It's a five-minute survey, and we need your help.
As the data world continues to rapidly evolve,
we're interested in your insights
around data team priorities, team dynamics, data stacks,
how you approach projects like identity resolution,
and even data team roles. Your answer will help us build a report that will get featured in Data
Engineering Weekly, and after it launches, we'll have a special Data Stack Show episode discussing
the results. Plus, we'll send you some Data Stack Show swag for participating. Visit
rudderstack.com slash survey to participate. That's rudderstack.com slash survey.
Welcome back to the Data Stack Show. Kostas, today we're chatting with H.O. Maycott and what a story
he has to tell. I am not even going to get into it because I'm so excited for our listeners to
hear about his upbringing in a
hill town without technology in rural Mexico. So there's your little teaser. I want to ask H.O.,
of course, there's a lot to talk about with Futurebase, his company, which is a fascinating
technology, but he's also really into the future and really into AI. And so I want to hear about his vision for the future and for AI. And then did that
influence or how did it influence him founding FeatureBase? And then, of course, I want to hear
the technical details and I'm sure you have technical questions. So you've used it. So
what are you, you've used FeatureBase. So what are you going to ask about?
There are plenty of questions. FeatureBase is very interesting because it's outside of, okay,
being like, let's say, a database for features.
It has some very interesting use cases.
Like I primarily, to be honest, like got interested in it
because I felt that it's like a great technology to use
for the use cases around CDPs.
And working with event data and creating audiences and all that stuff that you know pretty well from both Routersite and also working in marketing.
But it's much more than that.
They do a great job in exposing the technology behind it,
which is always very interesting for me.
So they get into a lot of detail of why they've decided to use it
and work with bitmaps and how they use them and all that stuff,
which might be a little bit intimidating for someone
who just wants to try the product,
but for someone who's curious and understands how it works, it's amazing.
And I think we will have a lot of opportunities to talk about building a business around a
very new technology, what it takes, and what the journey looks like.
Because there is a journey behind FeatureBase.
It's not like a product that was launched six months ago.
And I think HO is the right person to talk about that stuff.
I agree. Well, let's dig in.
Let's do it.
HO, welcome to the Data Stack Show.
Wow, we have a ton to talk about.
The future AI, FeatureBase.
So why don't we start where we always do. So give us your background and kind of what led you to starting feature-based.
Yeah. Thank you all for having me here. Yeah. My name is H.O. Maycott. I'm the CEO of feature-based.
I was born and raised in Mexico in a hill town that had very little access to media and
technology. So it was a place where
like your wildest imagination could run free without paradigm. And, you know, I always from
a very early age was obsessed with the idea that humans were just on the edge of being superhuman
and that, you know, society and life, not necessarily humanity, but life in general
was sort of on the edge of this super evolution. And so as that took me through school and my career,
I continue to pull on that thread, right?
I think especially what's happened in the last six to eight weeks
is really fascinating.
You know, I've lived my life for this moment,
like some call it the fourth industrial revolution,
but without a doubt, like AI is officially here.
I think ChatGPT is like the missionary for AI.
It's not exactly how I thought
it was going to manifest with like funny images and marketing copy, but for good or for bad,
it's here. And I do think that, you know, what's happening will allow life to evolve to its next
order. I'm fascinated by it. And I think that, you know, we as humans need to sort of unite
to make sure that we're helping machines help humans and not, you know, helping
machines and humans help machines, because if we're not careful, you know, a lot of our innovators
are focused very much on technology for technology sake. And so we got to remember, we got to keep
that in mind. So, you know, so every day I love building on the future. I consider myself an
ambassador of the future, but also trying to make sure that, you know, at least the role I play and
that my company plays
is trying to keep that balance and remember that we're doing this to help us and life and not the
other way around. Yeah. So much to dig in there. Let's go into the hill town in Mexico though.
You said from an early age, you were enamored by this concept of there sort of being, you know, humanity almost going through like a stepwise change.
But you also said you didn't have access to a ton of media.
Where did the seeds of those ideas come from?
It's a really good question.
I think, you know, I just couldn't handle the idea as a little kid that like, you know, you broke your back.
You were paralyzed.
Like, why can't you fix those nerves? Like's just you know it's just matter it's just
you know it's just it's you should be able to fix that or right like why if we have a heart
condition does the rest of the body fail with it so those thoughts have always sort of obsessed me
maybe because i didn't have media i didn't realize it wasn't possible like i thought it was possible
i always thought it was possible.
And I think as I've gotten older, I continue to try to challenge why are these things not
possible?
And without a doubt, I think we're going to solve them all now.
This is a whole other subject.
So we can get together for another episode.
I think as long as we don't have a social revolution interrupting all of our advances,
I think we're going to see a wild amount of innovation coming at us.
We as humans, like I said in my purpose statement,
we have to help machines help us.
I think we're just going to have to learn to adapt to very rapid change.
Just the impacts that social media have had on kids and on society,
it's 15 years old.
It just came at us so fast.
All of these things are coming at us so fast. And, you know, the first two industrial revolutions were like
a hundred years each. The last one was 50 years. You know, this one's going to be like 20. So
anyways, it's going to be, it's going to be fascinating. And I think maybe that, that
idealistic, you know, up in the mountains, flowers on the hills background, you know, up in the mountains, flowers on the hills, background, you know, kept me from the paradigms that block my thinking. So I love it. Yeah.
I love it. Yeah. The mind of a child and sort of, you know, imagining what adults think is
impossible. It seems like you've carried that through, you know, your life and career,
which is truly inspiring. One quick question on
the super evolution. So you mentioned a couple of things that, you know, were like medical and
nature, you know, so sort of biotechnology, but what are sort of the big, you know, I certainly
think that's, there's going to be a huge breakthrough in biotechnology for, you know,
sort of medical things, but what are other characteristics of this super evolution
that you see? Yeah, I think a lot of people are asking the questions right now, like, you know, sort of medical things, but what are other characteristics of this super evolution that you see? Yeah, I think a lot of people are asking the questions right now, like,
you know, what is it going to take to sort of bridge the gap between the AGI, the, you know,
the general intelligence that we feel and sort of dream is coming along, right? When does the
machine wake up and all of the narrow AI and the simple AI that we're doing today? And I always
try to reframe that question that I actually think it's a much broader
question, right?
It's like AI is on a journey and it will get there again, unless we have social unrest
that stops technology.
We will be there sooner than we think.
But I think the other end of the spectrum is really what's happening on the synthetic
biology side, right?
We're trying to make little robots and we're so impressed when it can, you know, jump from one box to the next box.
But like, you know, imagine if you could reprogram a cat, like the cat is a million times more
advanced than that robot.
You know, I always make the joke that like, if I could have a swarm of programmable chinchillas,
right, I could create a lawn business and have these chinchillas come like mow your
lawn and fertilize for free.
And, you know know they'd have a
blast doing it and it would be entertainment at the same time but like that is probably closer
to us in capabilities than you know creating an army of 800 little robots that come out and mow
our lawn or you know do our landscaping and so there's some fascinating things there's a friend
of mine started a company called colossal and their big audacious goal is to bring back the woolly mammoth. He partnered with George Church, the co-founder who invented CRISPR. And they are 100%, and notice I say they, they are 100% convinced that they're bringing back the woolly mammoth in about five years. And it's fascinating to have hundreds of samples. They've been able to put together the genomes. And for us that are technologists, what blows my
mind is that they're starting on one end with the Asian elephant genome. And they've sort of mapped
out by putting together all these fuzzy genomes, what the woolly mammoth genome looked like.
And there's only 60 genes that are different, right? So they flip the bits per se on these 60 genes.
And all of a sudden, it has like four times the mass.
It has long hair.
It has different hemoglobin flowing through its body.
Imagine if we had gotten around and built a robotic Asian elephant, and now we wanted
to build a woolly mammoth robotic elephant.
It would take us millions of lines of code and years of effort and engineering.
To think that we can go from the Asian elephant to the mammoth with 60 genes is just mind-blowing.
So whatever programming language nature uses, it's several orders more efficient than the ones that
we're using. So if we can figure out how to harness that, that might be a faster path to
some of the things that we're trying to solve. So in any event, I'm still holding out for which path is going to be faster, but I do
think we need to consider both paths as we think about the future.
Fascinating.
I could dig into that for hours, but let's steer it a little bit more towards data and
technology and dig into AI, right?
So chat GPT, huge topic of conversation.
I love your description of chat GPT as the missionary for AI. I agree with that. We've actually, we've had various discussions
about AI on the show, you know, throughout our entire time, which has been really interesting.
And I think chat GPT is kind of the first thing that we've seen or that
we've discussed on the show that has sort of like mass practical, like appeal and utility, you know,
whereas a lot of the previous iterations of AI sort of lived behind a shroud, or you consumed it,
you know, with some level of obfuscation as like an end user, or people just didn't understand it,
right? But,
you know, anyone can go into chat GPT, ask you the question and they get an answer. It's like,
wow, okay, this is real. But you didn't necessarily expect that, you know, me as a
marketer, I'm going to use that to like get my new product pages, you know, for Rudderstack,
you know, all fancy and then, you know, generate my blog images for my blog posts.
What did you think was going to happen?
If we rewind five years and you said,
okay, the first missionaries are going to be like X, Y, and Z
in terms of the manifestations of AI.
Yeah, I mean, I think if I had a robot right now,
they could lift and stack boxes.
I'd sell the heck out of them to Amazon.
I think we thought AI and some of this robotic stuff
was going to be taking
our jobs from the bottom, right?
Like, so just one insight I have with what's happening now is it looks like we're going
to go after the middle, right?
Like the legal profession, for example, has always sort of feared technology because it
couldn't quite do their job right.
For the first time, it sounds like this evolution, revolution on search, which we call chat GPT, which we call these large language models, you know, can finally, you know, do that work really well.
In fact, it's probably really well suited for it.
And so, so it's so, you know, one of the interesting manifestations is it's sort of going for our jobs in the middle.
But I think more than anything, like it relates to us, it relates with us, like we relate to it.
And so like we're to us it relates with us like we relate to it and so like
we're convinced that it's real and whether it's real or not i don't really know like you know if
we think it's real you know there's a lot of questions about content credibility and bias
and all of those things that get get baked into it but like we as humans are kind of suckers right
so as long as we think it's real which i think we we do think it's real, and I think it's going to have really profound implications, right?
I think, A, we all believe in AI now.
My little cousin, my grandmother, everybody talks about large language models.
And I think it is going to go after some jobs and change the way we operate.
I think what I'm kind of obsessed with is how do we democratize that?
It's really like three or four really big companies that have access to all of this. But for everybody else, AI sucks, right?
Like data still sucks. So AI really sucks. Like how do we make this so that we can augment
ourselves, right? Every email I have, every text message I have, every time I buy a piece of
clothing, like I should be able to have my own version of it that's helping me every day.
It's my own personal co-pilot. So we have a lot of opportunity over the next few years to think through the different paths that we've taken to take all of this, but certainly the wheels are
in motion now and it makes me so happy. Yeah. I want to go a little sidebar here.
You mentioned, is it real? Can you
describe the tension there from your perspective? Because I think that's an interesting,
and I'll tell you, I'll give you context for the question coming from my brain is that,
you know, I think about it, I tend to think about it from the technology side, right? And so it's,
I mean, this is a real model, like taking real input and using
real training to produce like an output, right? But there are a lot of people, you know,
the education space is a good example, right? Like is an essay that's produced by chat GPT.
That's really good. Like, is that real? You know, like, is that kind of what you're getting at?
Yeah. I mean, I think maybe even a little more existentially like what is real i uh i helped start a non-profit media organization called the texas tribune a long time
ago you know and one of the premises behind starting that organization is that like democracy
requires an unbiased source telling you know telling the facts but like even as as factual
as we tried to be there was probably some bias in the
way we wrote it and the way we reported it you know the media tends to lean left in general but
you know if it was on national tv you know 30 years ago it was real right like for the most
part we all believed it whether or not it was real or not like we believed it that it got harder
with newspapers and print publications and you know
like i think we had a inverse curve hit a spike when the internet got created and the inverse
curve is like critical thinking went straight down and and and the proliferation of any version
of reality that you wanted went straight up yeah right and so now what chad gp key has done is just
and these large language models have done is just like like obfuscated, turned all of that into a black box. So like, you know, fact checking was already hard and critical thinking was already hard. Like this stuff is really convincing. Like, you know, like Chad GPT, tell me how I'm going to go lift a 10,000 pound weight. Well, it'll tell me and it'll be very convincing and excited and enthusiastic about how I'm going to
go about doing that. So, so, you know, that reality, I don't know, but like, it's going to
be a lot harder to fact check. It's going to be a lot harder to, you know, to, to figure out what's
credible and that's credible, but some degree doesn't matter. Like we've been believing,
you know, what media internet now these models are feeding us and, you know,
for good or for bad, reality is going to be further distorted.
Yeah. How do you balance that? Sorry, I'm going to continue down the sidebar.
How do you balance that as an ambassador for the future, right? Because if you think,
and thank you also for answering the question with beginning by saying even more existentially.
I love that.
So how do you manage, because that's somewhat of an existential crisis where it's like,
I'm an ambassador for the future where these things are coming to fruition.
However, you also acknowledge that the distortion of reality, serious questions, you know, for society?
Yeah, it's a good question, maybe with a quick sidebar on sort of some insights I've had on myself recently. I'm definitely an optimist. Everybody tends to ask me how I'm doing on a
scale one to 10. And it's always a 10. And people are how could it be a 10? Like your
house is on fire. I'm like, but I'm a 10. You know, so I have this weird ability to not feel to some degree.
So, you know, don't ask my wife and children how having a non-infantic husband goes, but
they like scale of one to 10.
You will.
Good question.
But I tend to think that like, you know, it is what it is, right?
This is evolution.
This is life.
I think, you know, Darwin will kick in there somewhere but like there are for sure going to be some tremendously negative consequences that come from this distortion of
reality and those that have the power to create the content whether they're doing it you know
consciously or subconsciously have some consequences on their hands right we've seen these with the
social networks like without going into the details you know these social networks have had
a profound impact on my personal family you know and so I think we're just sort of at the beginning
of seeing the consequences of all of this, but it's evolution, right? Like, you know, we will,
we will evolve as a result, but you know, let's go read the internet in Russia right now. Let's
go read the internet in Mexico right now. And they're going to have a very different version
about the exact same, you know,
about the exact same current events.
So, yeah.
Yeah.
Well, always appreciate an optimist and someone,
I mean, in many ways,
you're accepting the reality of the inevitable,
you know, which I really appreciate.
Okay, let's talk about how all of that
is packaged into, you know, or what pieces are packaged into you starting FeatureBase.
You're a serial entrepreneur.
FeatureBase is your latest company.
Can you give us a quick overview of, you know, sort of FeatureBase, like what it is, what it's used for, and then circle back and tell us, like, why did you start it?
Was it in response to some of those,
you know, sort of fundamental beliefs? Yeah, I love that order. And so, so yeah,
its core feature base is just a really fast analytical database. It's all in memory. And it's,
it was inspired by the feature extraction and engineering process. And so we figured out that
like, most data was originally stored in records.
And then people started storing it in columns
to be able to analyze it.
But every column they created was yet another copy.
And so we sort of moved into this,
like let's move and copy data in order to analyze it.
And when we had this penicillin moment,
which we'll get to in just a moment
and create a feature base,
we realized that if you stored sort of data
at the value at the feature,
you know, machines could process that information much more efficiently,
that it was a way of empathizing with the way machines wanted to process data
and not the way that humans process data.
And so feature-based is far more computationally effective and efficient
than doing it the way that the human construct has led us to do it
with traditional analytical formats. And so we've invested about $30 million into the technology.
As I like to say, it's kind of like the particle physics of data. I think our IO underneath the
hood is probably on the Guinness World Book of Records every single time. We go after one of these bigger and bigger workloads.
And so taming that has taken a lot of effort.
And we're absolutely maniacally obsessed
sort of on getting the developer experience
so that people can use it.
I make the bad joke that it's like a flux capacitor.
So unless you have a DeLorean,
time travel is going to be pretty difficult.
But we're making huge strides right now
to make it adoptable and usable.
And further, I'm starting to make some moves
to perhaps build a service around it
that makes it a lot more than just a database.
Like infrastructure is really hard.
And we sell this amazing engine.
And if I went to you and said,
hey, you've got a car,
what if I could give you an engine
that went 10 times as fast with 10 times less fuel? would you like it? You'd probably tell me yes. But if the
next day I show up on Costas's front door and I say, Hey, here's your engine. Good luck. Like
it's not going to, it's not going to go very well. Right. So, so I'm trying to think through like,
you know, how do we bolt on a steering wheel and some wheels and then further, like, you know, should it have a driver, right? Like, you know, I've got data in
Snowflake and I want to run this model, right? That would be really nice, right? So I'm currently
trying to go from like the unbelievably efficient analytical engine, you know, to what can that
power to actually deliver a full experience to the end user. So we're sort of in those throes now.
Yeah, absolutely. And you mentioned the penicillin moment.
You may have mentioned it, but can you reiterate it if you did?
Yeah. So I'll give you two penicillin moments. It was like penicillin squared. So the last company I had was called Humble. Humble was a CDP for sports media and entertainment companies. So we
had about 10% of the world's sports teams as clients. And the origin of that humble was a CDP for sports media and entertainment companies. So we had about 10% of the world's
sports teams as clients. And the origin of that company was a
little bit less commercial. And it was I've always been obsessed
with trying to democratize access of these things to
consumers. So when we invented the technology that powers
feature base, we were trying to sort of think through like,
what would a digital genome look like? How could we represent a
human and all of their attributes, And those could be behavioral, medical,
otherwise, right? So these led to the format that's underneath the hood for feature-based,
it's features, right? It's like the presence of an attribute or a behavior and that underneath
that gets sorted bitmap. So we were like, hey, let's noodle on that. And you could take,
you know, my genome and your genome and find the
pattern. And those aggregates could tell you how we would behave without knowing HO or Eric, right?
So it was pretty powerful for analyzing audiences and consumers and behaviors. Very quickly,
we realized that we really couldn't find a way to empower the consumer directly. There wasn't a
business model to say like, hey, let's go help you take back your data and we'll go help you monetize it. So we just leveled up a step to sports media and entertainment
companies who had these amazing fan bases. And we got quickly enamored with the idea that every
customer would bring like hundreds of millions of people into the platform, right? Like a TV network,
a sports league, like it was hundreds of millions of consumers every time. And we love that,
you know, because it worked really well inside of this data that we had. But before we fully got our
technology ironed out, we were trying to use everything out there that we could,
things like Elasticsearch. We were trying to force it to do these filters, aggregation,
sorts, rank sorts. And at that scale was just very difficult.
And so like our most important queries were going from like sub second to second to minutes. I think
at one point our 40 node elastic search cluster was returning that query in about 20 seconds. So,
you know, we were starting to cache things and I was just obsessed with the idea, like, no,
no caching. Everything has to be real time. It was just obsessed with the idea like, no, no caching.
Everything has to be real time.
It was probably a stupid obsession at the time because the end user didn't care, but
I was obsessed with it.
And so that's where this idea of the digital genome and the ultimate format that ensued
came in.
We had two engineers that had been doing quantitative stock market trading their entire career.
And they just said, hey, H.O., we've got this wild idea. Every time we prepared data for the models that we use to trade
stocks, we'd essentially go select all the features that we wanted and we'd store the
data's features, which were essentially decision-ready data. It was a one or a no or a zero.
If it was there, it wasn't. And then the model would use these features as input.
What happens if we just convert all of the data into features that is getting created?
And what if instead of putting it in a database for sort of human-centric data,
what if we create a format specifically for features and store it in that native sort of
form? And so I gave them six months and they came back with a two-node cluster of the technology.
And within weeks, we commissioned the 40 node elastic search cluster and it could do our segmentation and
aggregation queries in single digit milliseconds. And so that was the penicillin moment. It was like,
wow, we just defined physics in a way that's so simple. But at the time, it just could do like really high cardinality workloads.
Everything was Boolean, so yes or no.
So over time, I eventually convinced my board
to let me spin that out about four and a half,
five years ago into its own company.
Like I said, we put about $30 million into it.
And so we started teaching it things like integer.
How would you store integers in a binary representation?
So we found this
white paper called bit slice indexing. And so you could store a 64-bit integer in seven,
and it would still have the performance of the underlying bitmaps, right? So you could do range
queries on it and all of those kinds of things. And so eventually we taught at floating point,
and we use these compression techniques to be able to do dense, mixed density, ultra
high cardinality data, a bitmap compression technique called roaring. We modified it and
made it a 64-bit version of roaring. I think we were the first to do that. And then we stuck it
in a B plus tree so it could behave more like a regular database. And fast forward to today,
we've got what mostly looks like an analytical database, but is very different
underneath the hood.
Amazing.
All right.
Well, that is actually the perfect time.
Costas, I have tons more questions, but please, I know that there are so many technical questions
that cropped up based on HO's description.
So take it away.
Yeah, yeah.
So HO, let's start by talking a little bit
about feature stores, right?
Yeah.
Like the term feature store has been around
like for a while now.
Probably they're like post the peak of the hype cycle,
let's say, right?
But when I was trying to understand
what a feature store is,
I was confused, to be honest. Yeah.
It wasn't, and I guess it still isn't, in most of its implementations, it's not a
single data store, right?
It's pretty much a whole architecture that tries to support both online use cases, or
let's say real-time use cases, and also batch use
cases in terms of processing the data.
Because it makes sense, right?
You have your historical data, obviously it's going to be batch processing, right?
And then you also have the data that is coming and you want as soon as possible to create
features and feed them to the model.
So it never felt to me like we are talking about a database system.
Yes, they were using various components like from Snowflake to Hive to Databricks to everything.
But I think the most interesting part of the feature stores was Redis.
There was always a Redis there that was storing the features
and serving the features, right?
So at least that's my understanding
of the feature store.
What's like,
how you have experience like feature stores
and also how do you compare it
to what a feature base is, right?
Yeah.
And so I think, you know think this is a lesson for a lot
of technologists and entrepreneurs. I'm not going to lie. When we invented this,
it solved our problems so well at Humble that we didn't have to do a whole lot more to it.
To really solve the high cardinality segmentation use case that we built it for. But I've always
had this blind faith that it can solve a whole lot
more than just that. And so we've been a bit of a proverbial solution chasing a problem for a while.
And that's always a hard place to be. And it takes a lot of blind faith and it takes a lot of optimism
and grit and all of those things that make us crazy as founders. So we've gone through a journey
of like, what are we? And trying to meet
the market where it is, trying to meet the chasms where they are. And so as we were exploring a
category change about four years ago, three, four years ago, we were looking at sort of the
underlying process of turning data into features and what we were doing underneath the hood,
right? There's one hot encoding, there's, you know, a variety of things that you're doing.
And so, you know, can you call it a one hot database?
You know, what do you call it?
And at the end of the day, we were storing features.
So we're like, hey, it's a feature store, like a data store.
But instead of data, it's features, right?
So we meant like a storage system for features.
We didn't mean like a model lifecycle management, right?
That does versioning and lineage and all the other stuff that the modern sort of quote
unquote feature stores do.
And so we'd been working on this launch and we relaunched as a feature store and literally
within weeks of sort of changing our category, then the Michelangelo project sort of spun
out and you had Feast and Tecton and, you know, they redefined ultimately and not even
redefined that we hadn't quite defined it yet, except for ourselves.
They really defined ultimately what feature stores were going to become.
And their versions of feature stores,
which I think still align with the current definition,
are more of, in my mind, a model lifecycle management system.
They're not a storage system for features.
They're really helping you manage the creation and management of features, offline and online
features.
And most of them have at least three databases underneath the hood, right?
So you've got a variety of databases that are coming together to solve that problem,
which is a very different problem than the one we set out to solve.
Our problem is that data at scale is very difficult
and that you have to copy and move it and that everything is batch, right?
Like, and yeah, if I go process my features and batch and stick them in Redis,
yeah, I can serve them really fast.
But what if I could just compute those features on the fly?
What if I didn't have to pre-process those features?
What if my transformations, aggregations, and joins were happening in real time?
What if those were in the model instead of in my pipelines and in my batch jobs?
It would be so much easier to track lineage and versioning.
So that was what we were trying to solve with our feature store, but it became a difficult
sales process because our top of funnel was full.
Everybody was interested in feature stores and we'd show up and we'd be like, well, we
have a feature storage system. And they're like, well, we want to put this model in production. How are
you going to help us? And we're like, we can't. So sadly, we had to sort of pivot out of it.
I do think at some point it's going to get redefined again. I feel like the category
sort of slowed in interest, but I think features are an unbelievably important part of the future.
I mean, it's the way machines think, not the way humans think. Like, we want filing cabinets, let's keep our filing
cabinets for the humans. But like machines love features, models love features, you know, CPUs,
GPUs, they love features. So I do think features and a feature first future is going to dominate,
you know, the way that we scale data. Yeah, makes total sense. And how you would compare feature-based to vector databases?
What's the difference?
It's a really good question.
So we have floating point support in feature-based now,
but we don't have native floating point.
And by that, I mean the same technique that we use for integers,
we're now applying it, we're in development on it, we're now applying to floating point as well.
So being able to store us a 64 bit float, and we'll see exactly where it ends up, but let's
call it 10 bits. So we're pretty excited about that, because our core engine should be able to
serve full feature vectors at a fraction of the cost and at much more scale than the ones that are currently out on the market.
I also believe that so much of AI today is batch. So much of AI is based on records, training a model, and developing a score, but we avoid the analytical queries like looking at an index, looking at a population
when we're outputting those scores
because the queries are really expensive.
So I think there's an entirely new paradigm
when an analytical database can serve
both a last mile transactional workload
and the core analytical workload
and do the feature vectors all at the same time.
And so we have all of that in preview internally,
but we're very cognizant that being able to store
a full feature vector efficiently is a pretty killer feature.
And so we're very quickly in development on it right now.
That's super cool.
So the only, like, I mean, the only, okay,
it's an important difference,
but like the main difference is like in the data types, right?
What kind of data one system or the other can handle?
So with the vector systems,
you have floats representations primarily, right?
While right now with feature-based,
you are working primarily with integers.
And out of the hood,
what you have there is a bitmap,
which is a series of zeros and ones, right?
Exactly.
So what kind of workloads someone can experience today with FeatureBase?
What's like, let's say, the best scenario for someone to go and try FeatureBase and have a whoa moment by using it? Yeah, I mean, I think it's for good or for bad,
and I'm happy being transparent with all of my flaws
and all of our challenges.
Like for good or for bad,
feature-based has been a database of last resort
for very large workloads, right?
So when thousands of servers
have not been able to solve the job,
our customers have been willing to invest
the months that it takes
to wrap their head around the data model. You've experienced a little bit of this. It is a distributed system.
We've got high availability features. You can do replication factors. You know, all of those
things are pretty important for the type of workloads that we serve. But for the most part,
we're serving very high ingest workloads that need rapid segmentation and filtering on that data. So that is really
the bread and butter for FutureBase. Now, we've very quickly been able to wrap a lot of other
workloads around it, but until it's easier to adopt, which it's becoming, those have been the
workloads that we're serving. And I'll give you one example, a company called Tremor Video.
When we first started working with them, they had about a thousand node Hadoop and Druid cluster that served up there. They were storing somewhere about a million events
in that cluster. And then they would run predictions on the consumers and the devices
that were feeding data into this cluster. And it would take them about 24 hours to generate
those predictions. So we came in and we were able to reduce the thousand
servers to 11, and we could do those same predictions in about a third of a second.
So a couple of things that are important to mention here is it was a thousand to 11 servers,
so saving millions a year in compute. Now, don't believe everything HO says. The 11 servers were a
lot bigger than the thousand servers that they were using previously, but they did the
calculation and it's about a 70% reduction in cost. But I think what's more important is now
those workloads, let's just call it a second instead of a third of a second. That query was
happening in a second. So instead of taking a day, it would take a second. So you could run
84,000 queries in a day on that same compute cluster that they had.
So just absolutely changed the way they ran business.
They were now completely real-time in a space.
This is the advertising space.
And they've now scaled that up to about a trillion events a day
and tracking about 20 billion devices globally.
And literally, there is just nothing else that can do that.
And I love those.
Those are great.
Those are like trophies you can put on the wall.
But like, that's not the everyday problem, right?
That's not the problem that the masses have.
And to build a really big company, we've got to find problems that more of the masses have.
So that's why we've been maniacally obsessed on, you know, developer experience, which
we realize is the key to that mass adoption.
Yeah, yeah, 100%.
I have a couple of questions there.
So from what I understand,
let's say a CDP scenario is pretty much ideal, right?
For marketeers, let's say,
who won't like to be able to segment
and create audiences and all the standard things
that, let's say, someone is doing with
CDP, you can do that scale and with extreme low latency by using feature-based, right?
Exactly, exactly, exactly.
And we typically break it up into three buckets, right?
Like consumer experience, which includes personalization, segmentation, recommendation, all the things
that you just talked about that are natural to a CDP. Anomaly detection is highly faceted as well, right? So it's something that
has to happen really quickly and the feature stores and the approaches today pre-process
the data, right? If you're a credit card processor, you have to decide if it's fraudulent
or not in like 50 milliseconds. And they can do that, but you know why? Because the fraud vector
they're using to make that decision was pre-computed. And it was probably being served out of Redis. But it
might have been pre-computed 24 hours ahead. We see these companies pre-processing in days.
So that's not okay. We've got to process that in that moment based on the totality of all of the
data. So there's another really great opportunity for this real-time workload.
And then lastly, I'll say a lot of the stuff happening
in AI is really interesting,
especially around computer vision.
Things like labels, once they get tokenized
and transformed out of their sort of raw formats,
end up being highly categorical, right?
They look a lot like consumer behaviors
and consumer insights, right?
So at the end of the day,
there's really unstructured search,
which gets turned into structure. And then there's structured search, which is faceted search,
right? So at the end of the day, it kind of all leads to the same place. So, you know,
I am optimistic that this is going to serve a variety of important workloads as we keep
innovating. 100%. Yeah, I totally agree with that. Yeah, it's
super, super interesting.
Let's go and
talk a little bit more about the
developer experience now.
You've been building
the product for a while now. You have
customers. You've seen
what it takes to take
a new piece of technology
and try to adopt it. It's not easy,
right? And it seems
that more and more people start to believe
that it's not just the
user experience, it's also the developer
experience, which is pretty important.
Making sure that you can help
developers succeed in whatever they do
is an important
aspect of succeeding or not
with bringing a product to the market. not. It's like bringing like a product market.
So,
so without like a few things through this process,
like what you have experienced.
Yeah.
I mean,
I think being in love with your own technology is a huge problem,
you know,
and empathizing with the end user is all that matters.
Right.
But others,
people,
what others think of you,
your product and your brand is what really matters. Obviously, that's like 101.
I think it's important to explain a little bit about our journey. So we definitely were
originally an open source project under the name Pelosa. So that was the original name of
the project. And by all measures, we were wildly successful. Investors sort of flocked to us and
saw all the stars going up. And as soon as we took on investor money, the investors said,
well, this is so important that you've got to turn your sales process into an enterprise sales
process. Huge mistake, number one, because if you look at the curve of innovation and the chasm
that we all have to cross to get on the other side,
like analytics was in a prime spot at that point. Today, I would say analytics is way off on the
other side. So at the time, product market fit was great. And we decided to start selling this
from an enterprise perspective. We ended up going about as high in the organization as we could.
And these sales cycles were long. They were very large contracts, like half a million, million dollar a year contracts. And we would sell before people
would adopt it. So the developer experience seemingly didn't matter. And let's hang on
that word seemingly, because we'd go sign a contract and then the teams were introduced
to FeatureBase and they were like, hey, we just signed this contract, go figure out how to use it.
And so they kind of had no choice and it was a painful process, but we had an army of
customer success people and deployment people, and we would go help them get it implemented.
About a year ago, I just looked at my board and I said, this is crazy. We can go build a big
business, maybe $100 million business, but I don't want to build $100 million. And I'm talking about
revenue. I don't want to build $100 million business. And I'm talking about revenue. I don't want to build a hundred million dollars business. I want to build a billion,
multi-billion dollar business. And there is no way we're going to get there.
We have to recaptivate the hearts and minds of the developers. And so a year ago, we,
we fired all of marketing, all of sales, and we went PLG. And at the same time,
we decided we were going to take a year to bring certain things to market. We're almost done with SQL.
I know you've been using the product a little bit over the last couple of months.
We have a lot more along those lines.
We pushed out a whole new iteration of documentation yesterday.
And so it's been a mad rush for the last year to remove five APIs.
We had two ingest APIs.
Everything's now gotten standardized around
SQL and SQL was difficult for us to get our minds around before because we're like,
there's a much better language for bitwise operations or a bitmap oriented format,
but like it didn't matter, right? That was great in our own heads. So we've had to have a strong
dose of reality over the last year as we've worked on this developer sort of adoption.
And I think we have another six to 12 months to go before we can say, hey, this is now adoptable.
And in the meantime, I'm working on plans to acquire a few companies that are going to
eliminate a lot of those challenges too, right? Like why should someone have to buy or install
yet another database to go run models on their Snowflake data, like Stalin Snowflake or Databricks or Redshift.
You should just be able to tell me where your data is and what model you want to run.
So a lot of what we're working on now on the roadmap sort of addresses that, making it
even easier.
So we've got cloud and cloud consumption out to market.
We've got SQL almost out to market.
One of my favorite areas is user-defined functions.
So being able to register functions in the database
and run them actually in the database,
as opposed to having to move and copy data to the models.
And then serverless is another big piece
that we're finishing right now,
you know, to bring costs and efficiency even further down.
That's a pretty busy roadmap.
That sounds like fun. Sorry, didn't go again. I's a pretty busy roadmap. That sounds like fun.
Sorry, I didn't go again.
I didn't want to go.
No, I was going to say,
it is, and it's been a year's worth of work,
and we're really excited
to start to see these things
coming to fruition now,
but they can never come fast enough.
Yeah, yeah, 100%.
What was one of the most,
let's say, surprising learning
that you went through through this transition of being these high
tops from top to down kind of sales in the enterprise and leaving that behind and trying
to go after other developers.
What surprised you?
Yeah, I mean, I think everything is about market timing and product market fit.
And just because you have it at one point doesn't mean that you have it at the other.
Like we had it four years ago, but when we switched to enterprise, like, you know, we
learned that motion, but it was inefficient.
And then by the time we came back, like, you know, who really cares about very fast analytics,
right?
Like, I mean, it matters, but it's not the problem that's on everybody's minds today, right? Like, I mean, it matters, but it's, it's not the problem that's
on everybody's minds today. Right. And, you know, everybody's trying to figure out the machine
learning, you know, pipeline and paradigm and, and further yet, now we have large language models.
Are they going to eat everything? Like they probably could, like we should be doing our
analytics and our machine learning, you know, in, in a singular way. And so, you know, I just keep
coming back to the idea that AI sucks,
right? Like for the average company and the average person, it's amazing on TV and in the
movies and, you know, and with all this chat GPT stuff, but like practical AI is very difficult
and very distant. And so I definitely, you know, I'm going to continue to move as fast and hard as
I can to to make that experience
really easy.
Like I said earlier, we're like an engine.
I'm going to buy some wheels and a steering wheel, and I am going to offer an Uber-like
service so that you can get from A to B.
And I might not be able to go serve these trillion a day workloads as efficiently, but
I'm going to serve the broad masses needs more efficiently.
So that's a long-winded way of saying I've worked hard to make this database more adoptable.
We have more work to do, but I don't think it's enough.
People don't want and need yet another database.
What the market needs is solutions to real tangible problems every day.
Yeah, yeah, 100%.
I totally agree with you.
And by the way, I have to say that it's very impressive
and I don't know what other word to use,
but talking with someone who has gone through the process
of building a company,
reached the point where you have product market fit,
you go after the enterprise
and decide to
leave that behind and
in a way, let's say, rebuild the
company from scratch,
that takes a huge
amount of courage to do.
I mean, that's super, super impressive.
I have to say
to share that with you,
because I know from my personal experience
and also like by working like in startups,
do that, it's like extremely hard.
It's super, super hard.
Like it's, if you think that's like a leap of faith,
like to start a company from zero to one,
taking a company from a hundred to zero to go back to one,
that's wow. That says a lot about the person. So thank you for sharing that with us. Well, yeah, of course. And I think, you know,
at least in my case, I have pretty blind faith in features. Like I really do believe that if we're
going to have the machines doing our work for us, like we need to think like the machines,
not like humans, and we're still stuck thinking like the humans. So, you know,
I have this blind faith that features will power the future, right? And that everything's going to
be feature first. And so we haven't quite found the exact right approach to it, but we're going to,
or we're going to die trying. Yeah. Yeah. And it sounds like you are the right person to do that. Well, thanks.
I might have you call my board and tell them that.
But yeah, thank you.
All right.
So one last question from me,
then I'll give the microphone back to Eric.
So share with us something exciting about feature-based
that is coming up in the next couple of weeks or months
or something that we should keep in our mind
and make sure that we go and check when it comes out?
Yeah, I think the most exciting thing
that we're working right now
is this user-defined functions, UDF.
I think we're not the only ones working on it.
Single store is doing a really amazing job.
We'll see how it ultimately manifests.
We've got all the WASM stuff happening as well.
But I do very much believe,
like I've got, you know,
in my passion for features,
you know, my personal obsessions underneath that
is to eliminate copying and moving of data, right?
I believe that models and data are going to collapse.
If we've learned nothing more
from these large language models, like the data and the model are becoming pretty much the
same thing. And so I think that the only way to really scale the future is to really think about
this as like the working memory of AI. You made this point, you asked this question earlier that
we didn't quite get to, but like, you know, we as humans don't go back and analyze everything we've ever done.
Rewatch the videos of everything that we've ever done.
Read transcripts of everything we've done.
Like you and I've had quite a few interactions.
I didn't go reread all of those.
We wouldn't have time to do that, but we have just enough knowledge about our prior interactions
that we can bring to what's happening in this moment and make decisions, right?
So I think the only way we're going to be able to scale the future is to think about it in that same way, right?
Like the working memory of AI, right?
Like being able to recall just what you need from the historical context with what's happening at this moment to be able to make decisions.
And to do that, we're going to have to bring models to the data, right?
We need to stop copying and moving data to the models.
And so one way or the other, it's going to happen.
I hope we're one of the pioneers of it
because models love eating features.
We're a feature storage system.
So like bringing models to the feature storage system
in my brain makes a lot of sense.
But one way or the other,
I'm excited to see that both in our own product and in the
world. I think it's going to make the world a lot more secure. I think it's going to shift
innovation to creating value, not to the data engineering that's involved with all of the
machine learning pipelines and the lineage and the versioning. And when we can start to network
the output of these models, I think that it's a wonderful future. So the very beginnings of this for us are simple.
It's just models written in Python. You know, you would SQL, you go registered in the database and
you can run it as data arrives. You can run it on a cron job. You can run it as you do your query,
but it runs in the same compute engine. And further, it's going to run on the same serverless
compute engine that we've built, right? So you can isolate the model from the query
execution piece. And anyways, we're pretty excited about that. And we hope the world agrees that it's
going to be a good new capability. Yeah, yeah. And hopefully, we'll have the chance to talk
more about that when it is released. So I'm inviting you already. Well, thank you. Thank you.
Well, we have a principal engineer, Matt Jaffe, who's been leading those efforts. And he is
brilliant, far more technical, super articulate.
I think you would love Matt Jaffe's dive into not only are features computationally far more effective and efficient than storing data, but now that we've got those serverless capabilities, I say this all the time, I think we're going to cut the cost of analytical workloads by at least 99%. So whoever's making money right now on these workloads, they should tremble
because the world is about to shift quickly, right?
Like we're going to move to like computing faster and more regularly.
We've got to figure out how to make these models continuously trainable.
Like, like that's where we need to start to shift our energy.
A hundred percent.
A hundred percent.
All right, Eric, the microphone is back to you.
Yes.
As always happens, we can keep going and keep going.
But we do have to respect our producer while he's gone.
Okay, H.O., this is more of a personal question
because you are highly optimistic.
You seem extremely high-functioning.
You understand technology on a deep level,
but you also think existentially, as evidenced by the earlier part of our conversation.
Is there anything on a personal level in terms of productivity or how do you operate in your
day-to-day? And is there anything that you could share with our listeners that's been
particularly helpful? Because it seems like you have a lot of ideas flowing through the old gray matter up there.
Yeah, it's a really great question.
And I wish I had a spectacular answer for you.
So I'm going to try to get a good answer.
I have a general on my team.
His name is Cord Campbell.
He was one of the first employees at Splunk.
And then he started a company called Logly, if you'll remember.
He's been in search for 20 years.
So as he sees all this like large language model stuff, he's like, oh, yeah, that's just
like the next evolution of search.
I'm like, yeah, but it's worth like 30 billion now.
So it's a little more than just the next evolution of search.
But Cord's brilliant.
And Cord's helping us monetize, not monetize, think about how to democratize these technologies
more so that we can use them every
day as a co-pilot and so as we work on this next iteration that i was telling you about like how
do we create sort of the uber that goes from data to model we want consumership at the forefront of
it right like this isn't like about a company it's about individuals and maybe they belong to a
company maybe they belong to many companies.
But we want the individual to have a free tier where they can index their email.
They can index their text.
They can index all of their file.
They can index all of these things.
And he does this every day.
So he's got a technology called Mita that he's constantly indexing everything into.
So if you were having a conversation right now, all the things you were saying, he'd be feeding to Mita. It crawls URLs, it eats PDFs. And so as he's working, he's asking
it questions, but the biggest piece of it is he gives it feedback. So if Mita comes back with a
fact or some opinion that's not right, he just gives it that feedback loop. And so I think
prompt engineering, prompt feedback, and being able to apply it to our daily
life is going to be really critical. So Cord is doing what you asked me that I should be doing.
And I'm hoping Cord is going to help us productize this so that we can all, including me,
be doing this every day, right? It's just, we would be so much more productive if we just
had that augmentation. I love it. I love it. Well, H.O., this has been
an absolutely fascinating conversation. Thank you so much for giving us some time,
and we would love to have you back because we only scratched the surface.
Well, thank you all so much. I love what you do.
Wow, Costas, what an episode with H.O. Maycott. I mean, his story was amazing, but feature-based seems like quite a technology.
I think my biggest takeaway was actually his optimism.
And I thought it was interesting.
He had a lot of
technology and how that influenced his view of what was possible.
And he's really carried that through.
And you heard multiple times throughout the episode, I was so insistent that we wouldn't
cash anything.
He just has this persistence about we shouldn't have to face these limitations.
I think feature-based is a really interesting
manifestation of those characteristics of him
because he really has overcome some amazing things
with a pretty wild piece of technology.
Yeah, and like, okay, I have to say something here,
which I found like amazingly fascinating.
It has to do with a person like the
human being like hl it might sound that you have like a very stubborn person right that and that's
needed to go and like me like get something that hasn't been created before and get it to the point
where it is adopted and like people use it
but at the same time like it's i don't know probably like the only person that has
demonstrated like an extreme level of flexibility and what i mean by that is like the story of how
they started the company they went to the enterprise they had product market fit and
then they decided that like we want to build something even bigger and that required to
pretty much go back to zero and start again. That's wow. From a founder perspective, being
able to do that and take this amount of risk requires, of course, to be very stubborn with your vision obviously also like a lot of
flexibility at the same time and i think like this whole episode in this whole conversation
is like how it's like a testament of like how important like the vision and the belief of the
humans behind the technologies for the success of the technology. Of course, we talked
a lot about technical things,
but this, I think, comes
next. It's more important
to understand
these qualities and how important
they are, and then
documentation is out there.
We can just go and read it.
That's what I'm keeping
from these episodes and all of the reasons that I would encourage everyone to We can just go and read it. So yeah, that's what I'm keeping from this episode
and some of the reasons that I would encourage everyone
to go and listen again.
Absolutely.
Yeah.
We also talked about the future.
We did.
Which was pretty wild.
And he has some pretty exciting predictions
about a super evolution that's coming upon us quickly.
Yeah, we also talked about biology
and he's an ambassador of the future, right?
He's an ambassador of the future.
So yes, definitely check it out
if you're interested at all in the next super evolution,
bitmap features and super fast database technology
and just generally a really optimistic
and engaging, brilliant person.
We will catch you on the next one.
We hope you enjoyed this episode of the Data Stack Show.
Be sure to subscribe on your favorite podcast app to get notified about new episodes every week.
We'd also love your feedback.
You can email me, ericdodds, at eric at datastackshow.com.
That's E-R-I-C at datastackshow.com. Thank you.