Software Huddle - Operational Data Warehouse with Nikhil Benesch
Episode Date: April 30, 2024Today's episode is with Nikhil Benesch, who's the co-founder and CTO at Materialize, an Operational Data Warehouse. Materialize gets you the best of both worlds, combining the capabilities of your dat...a warehouse with the immediacy of streaming. This fusion allows businesses to operate with data in real-time. We discussed the data infrastructure stuff of it, how they built it, how they think about billing, how they think about cloud primitives and what they wish they had.
Transcript
Discussion (0)
but data warehouses have traditionally been something that you run overnight and it spits
out a report for you to look at in the morning. Maybe it's part of your monthly business review.
It's not something that's on the critical path for your day-to-day business operations,
but that's where materialize is different because it happens in real time because the core of it is
the streaming engine. You can put materialize into your core business day-to-day operations,
pull all that data in real time, join it together,
do the analysis live, and then take action on that data as part of your day-to-day business operations.
Which person influenced you the most in your career?
CTO of Cockroach, Peter Mattis, was my boss for a little bit.
And he is an absolute machine when it comes to writing code and debugging problems.
If you could master one skill that you don't have right now, what would it be?
Hey, folks, Alex here.
Today's episode is with Nikhil Benesh, who's the co-founder and CTO at Materialize.
They're an operational data warehouse.
Really interesting stuff they're doing where if you have these expensive queries on your
operational data, whether that's Postgres, MySQL, Kafka, things like that, they'll ingest that in and sort of pre-calculate that for you.
So if you've got these big joins, big aggregations, big computationally heavy things, materialize all that for you using SQL.
You can query it using basically Postgres compatible interface and show that to your end users in a way that's not doing these big expensive queries on your databases.
So lots of interesting stuff there.
I love getting in the data infrastructure stuff of it, how they built it, how they think about billing,
how they think about cloud primitives and what they wish they had.
I thought Nikhil was very interesting on all these points.
As always, if you have any questions, comments, if you have any guests you want on the show,
feel free to reach out to me or to Sean.
And with that, let's get to the show. Nikhil, welcome to the show.
Thanks very much for having me. Excited to be here.
Absolutely. So you are the co-founder and CTO at Materialize, which is a streaming database.
I want to get all into that sort of stuff. But before we start,
maybe just tell us a little bit about you and your background.
Yeah. So I first got exposed to the streaming data flow world when I was in college doing a little computer science research as an undergraduate. I worked on a streaming data flow system called Noria, a joint project between MIT and Harvard. And one of the people that kept coming up in that work was this guy named Frank McSherry, had done a bunch of foundational data flow work on a streaming Dataflow system called Nyad,
which came out of Microsoft's research. This was back in, I want to say 2012,
when that work happened. And I was looking into streaming Dataflow stuff as part of Noria in 2016
or so. And Frank had done some really seriously cool stuff, demonstrated crazy performance
with the Dataflow system he had built. But then I set
that aside when I graduated and went to work for a company called Cockroach Labs, building the
open source Cockroach DB based on Google Spanner, horizontally scalable SQL. And I ended up at
Cockroach because I had spent every software internship and job up to that point, poorly
re-implementing joins inside of applications,
because there was always some DBA or infrastructure team that was terrified of expensive queries being
run on the SQL database. And I was really excited about the prospect of building a SQL database into
the world that could scale joins where you didn't need to worry about that kind of performance issue
because you could just throw more computers at your database and have it scale effortlessly.
And while I was at Cockridge, I met one of the other co-founders of Materialize, Arjun Narayan.
And he actually encountered Frank McSherry while he was doing a PhD on a totally different subject, the thing Frank did before streaming Dataflow systems. And Arjun had become convinced as a result of Frank's later work, because he started following Frank after that initial exposure, he'd become convinced that incremental view maintenance via Dataflow systems was the answer to pretty much every problem in software engineering. And of course, that's not quite true. But there is something essential that
that captures that so many problems are the result of your database has to go do way more work than
it needs to do. Every time a query shows up, because probably it just computed the answer to
that query a few milliseconds ago, if it's powering the homepage of your application,
why is it pretending like
it's never seen that query before and computing everything from scratch? So that seed of an idea
really took hold for me. And when Arjun told me that he and Frank were considering starting a
company based on commercializing that tech that Frank had built to bring that idea into the world,
I just, I knew I had to join. So that's how I ended up here at Materialize.
That was five years ago now.
So it's been a fun ride.
Okay, cool.
And I'm glad you mentioned Noria
because I was going to ask how,
like I read a little bit about timely data flow
and differential data flow.
And I was wondering how Noria fit into that.
So that'll be great.
I want to get into that.
I also think it's interesting
that your career is basically just like
taking these cool research papers
and then implementing them as like, you know companies both at copyright responder and now uh you know noria
and dataflow at materialize so that's pretty cool for sure maybe i guess just to get a start like
what is materialize i see streaming database like what do we mean by streaming database so
funnily enough we've actually tried to move away from the term streaming
database. And the reason we've moved away from it is because it describes how materialized works
rather than what materialized is. So the term we're using now to describe what materialized is,
is operational data warehouse. And what makes it similar to other data warehouses is that it's the
place where you bring together the disparate sources of data in your business. So you might have five different application databases
for five different microservices, and then your sales team is on Salesforce, your marketing team
is on HubSpot, and seven other teams have seven other SaaS tools. And Materialize, like a data
warehouse, is where you can bring all those tools together. But data warehouses have traditionally
been something that you run overnight,
and it spits out a report for you to look at in the morning.
Maybe it's part of your monthly business review.
It's not something that's on the critical path
for your day-to-day business operations.
But that's where Materialize is different.
Because it happens in real time,
because the core of it is the streaming engine,
you can put Materialize into your core business day to day operations,
pull all that data in real time, join it together, do the analysis live, and then take action on that
data as part of your database day to day business operations. Gotcha. So it's it's not redshift or
snowflake, which is gonna be like slower, sort of lower concurrency type stuff, but it's going to be,
you know, higher concurrency, like a lot of transactions going against it, but still
combining a lot of things and maybe, you know, working against a good amount of data for that.
Do an aggregation that way. For sure. And we actually simplify our lives in that the transactions
occur outside of Materialize. You have upstream Postgres or MySQL databases that are doing all
the hard work of concurrency control to put those transactions in a serializable order,
enforce all the constraints. And then as soon as those transactions commit,
those systems forward the information to materialize. So we read off the bin logs of
those databases, get all of those transactions in near real time, and then update all of the
materialized views to reflect the operations that have taken place almost immediately.
Gotcha. All right. And then can you give me just like a few example,
you know, actual use cases that like people like this is the problem they had, and this is how
they're doing it with materialize? Yeah, so I got to go on stage with one of our first
customers, a company called Ramp. They provide credit card services to a bunch of startups now.
And they started using Materialize last year to do fraud detection. They had a fraud detection
workload that was running on a traditional data warehouse. And the reason they ended up in this
situation is because to do fraud detection well, they needed to bring together a bunch of different information about users and transactions. And then
they had a bunch of analysts who were able to very efficiently write SQL queries to look for
patterns in the data to find things that looked like anomalies, that looked like suspected fraud,
and produce a report with all of the suspected fraudulent transactions. But they could only run
that every 30 minutes.
And the logic got so complex over time that it started taking even longer than 30 minutes for
that report to run. And when you're doing fraud detection, 30 minutes is a long time. That's 30
minutes that somebody is out there swiping that credit card, spending money that's not theirs.
And they were able to port that logic to Materialize without a whole lot of work because it's the same SQL under the hood, give or take minor syntactic differences.
And on Materialize, that logic is up to date within seconds.
So that same report is now available within just moments of the transaction occurring.
Yep, gotcha.
Okay.
And I see like, that sounds really cool when i see like in the docs
they're both like queries where i can run queries against materialized directly there's also syncs
which is i can be producing stuff it looks like into yeah okay are most people using queries are
most people using syncs is that like a you know 50 50 or like what does that sort of balance look
like it's a pretty healthy blend and what we'll see is that a number of our customers
actually use both depending on the use case.
And syncs are nice if you want a durable record
of everything that has changed.
And for auditability, people often want the whole log
so they can look back 30 days and say,
what did Materialize say was wrong with this transaction
at this point in time?
And if you're willing to pay the operational overhead of running Kafka and storing all of that data,
you get some nice features out of Syncs.
But querying Materialize directly is really nice when you just want to wire up your application directly to Materialize
and you don't want the overhead of running Kafka or another system.
Yeah, gotcha. Okay, sounds good.
And then in terms of the types of queries running kafka or another system yeah gotcha okay sounds good and then are
in terms of like the types of queries and things like these are these usually like
big aggregation type queries you know where it's looking a bunch of rows and doing counts and sums
or things like that or is it even just like you know just filtered stuff but like just you know
straightforward filters and joins but materialized um rather than rather than doing all the compute at query time.
It's funny. This is one of the hardest things to explain about materialized. Are the queries that
you run on materialized analytical or are they transactional? And for me, they err on the side
of analytical with an important caveat. And I say they're analytical because they can be multi-way joins, like 15
different relations, extremely wide tables, what you think of as traditionally analytical queries,
and then you slap them with an aggregation at the end, maybe some heavy filters,
and you get a pretty nice tight result set that had to look at tens or hundreds of gigabytes of
data. But where Materialize is different from standard analytical
tools is that it's not great for exploratory analysis. If you are asking a query for the
first time, Materialize is going to be much slower at giving you the answer than a traditional data
warehouse would be. They're hyper-optimized for grinding over terabytes of data as fast as
possible. Materialize isn't optimized for that.
It wants you to declare your intent and say, hey, this one analytical query,
super important that I always have this one up to date.
And then you don't mind that maybe it takes 30 minutes for Materialize to get
to the first answer for that query.
Because the point is, once it's gotten the first answer,
it's going to live update with every future answer every time the data changes from that point forward.
Gotcha.
Okay.
And so I go to materialize and I create a materialized view and it's going to be just
sort of building that in the background for a little while.
But then once I have that, you know, looking up an individual row is just like a individual
row lookup in a table.
Exactly.
And if you build an index on that view, Materialize will keep the
results of that view in memory. And then a point lookup is basically just a HashMap lookup. So it's
super fast, like tens of milliseconds. And most of that is the latency of your request getting to the
AWS data center and back. Yeah. Okay, cool. So what's happening like underneath the hood there?
You mentioned like all these different data sources that are coming in, like what's going on on Materialize then? MySQL. We have a webhook source now, which is a nice Swiss army knife for disparate upstream
sources and then Kafka sources. So the storage layer takes the data from upstream systems and
writes it to materialize as custom persistence layer, which is a layer on top of S3. So it's
super cheap, slow, but super cheap. And that at that point, it's just like the raw data,
like not materialized into view. It's just the raw data. Okay.
Okay.
Exactly.
Untransformed, written down in the order in which it came.
And we have support for upsert semantics so that if you have a key,
materialized will eventually compact away the old values of that key so that you don't have runaway storage on these topics.
But also it's on S3.
So for a lot of users, they're never going to have
so much data in a given topic that the cost is going to be prohibited just to write it all down.
Then on top of that, we have the compute layer. And that's really the heart of Materialize.
And at the core of that compute layer is Frank's research. That's timely and differential data
flow, which is the system that was derived from NIAT, that original paper I read way back when.
And that is a stream processor built in Rust that, before Materialize, existed as a standalone library that people could download and write programs against.
But to use Differential Dataflow, you had to write a Rust binary, compile it, link it, and figure out how to... It didn't have any bells and whistles around it to make it easy to use in
real production applications, but rock solid, super performant, incredible computer science
research inside of that. And around that, we have built a SQL engine. So you show up at materialize with a SQL query, we transform that
into a data flow plan that gets rendered into a differential data flow program that's then
executed with this Rust library at its core. And these data flows are very simple conceptually,
you read the data out of S3, apply the data flow, and then spit the processed data out the other end.
And the other end is potentially a materialized view if you just want to write it back to S3.
An indexed view if you're interested in keeping that data in memory so you can efficiently do point lookups into it.
Or you can chain a sink off of a materialized view and write what's in S3 inside of materializes world
back out to Kafka or another external system. Gotcha. Okay. Okay. So let me unpack that a bit.
So you said that, that rust portion, that's, that's something that, that was created outside
of materialize and y'all are using, is that right? Okay. Interesting. And what's the timeline on that?
Cause materialize is five years old. I guess like how old is, is Russ? Like, is this like, was this around five years ago?
That like Russ binary thing? Yeah. So I don't know exactly when Frank started on
timely and differential, but they had existed for at least a year or two when I started working
with them back in 2016. And I want to say it was about 2014, 2015
that Frank started working on Timely and Differential.
So they were one of the early Rust projects.
If you go back on the Rust community,
you'll find Frank asking questions like,
hey, my program's not compiling anymore.
And they're like, oh yeah, sorry.
We made a breaking syntax change to Rust.
Okay. Wow, that's amazing.
Okay, so you do that.
And then is it every time I create either a materialized view or an index, that's what
will create a unique data flow program to be submitted to that binary?
Exactly.
Exactly.
Yeah, those are specifically the two things that create data flows in Materialize.
Okay, sounds good.
And then the outputs of those go to...
So if I have a materialized view that's not indexed, that's going to be on s3. So now if I'm doing queries against that,
we're looking at like, maybe like a couple 100 milliseconds, probably for a lookup for that.
Is that right? It's, it's often even worse than that, because we don't optimize our s3 format
for point lookups. So we have to load potentially way more data than people expect in order to answer those
sorts of queries. And we have plans for fixing this eventually. But for the moment, that's out
of spec for materialize, that's a you're holding it wrong moment. And if you want those fast lookups,
we tell people create an index. So it's all sitting there in memory. Yep. Yep. Okay, sounds
good. So talk to me a little bit about sources, because you mentioned, you mentioned Kafka. And then also, there's like Postgres, MySQL and all that when I'm looking at the docs, there's like, you know, there's like a dedicated Postgres connector exporter type thing. And then for other ones, there's like Debezium, which is basically, you know, getting into Kafka that way. Yeah, what's it like? What's the story? Like, if I'm using Postgres, should I be using the direct one? Should I be using Debezium in Kafka. It's twofold, really. One is folks who don't already have Kafka and Debezium set up found it
operationally challenging to have to go introduce Kafka and Debezium into their organization.
And one of the big value propositions of Materialize is that it makes streaming easy.
And we like to say that it makes streaming so easy that you don't even have to think about
the word streaming. And that means that Kafka is a hard thing to suggest to
people because Kafka is often exactly what these organizations that are not streaming first are
trying to avoid. But that said, we pair really well with Kafka. So for organizations that do
have Kafka and Debusium already, it's still very much a first class integration.
It's just we don't want to force that choice on people if we can help it.
And then the other thing is consistency.
We want to be able to reflect upstream transaction boundaries inside of Materialize.
If you write to two different tables in one transaction, Postgres or MySQL will atomically
update the database so that you can never see the update to one table and not the update to the other table.
And Debezium does not export that transactional metadata in a way that's easy to consume.
It's an option you have to enable, and we couldn't find an efficient way to ingest that information.
Whereas if we read from the Postgres binlog directly, or if we read from the MySQL binlog directly, that transactional information is right there in the binlog format. And we can make sure to only show you atomic updates in
Materialize as well. Gotcha. Can you tell me a little bit about consistency in Materialize?
Like you have a section on this, which I think is really interesting. I think it's like a heart,
or just like a weird problem to think about. If I have, you know, five different upstream
data sources,
what does consistency mean within materialized given those are coming in at different times or who knows? Like, what does that sort of look? How do you think about consistency?
That's a phenomenal question. And I have a somewhat unsatisfying answer. So the answer
that exists today is consistency largely starts at the moment the data arrives and materialize. So if you have five
different upstream systems, we don't know how to relate a transaction in one system to a transaction
in another system. They could have happened in either order. We just don't know. Maybe there
are timestamps, but you can't trust the timestamps because there can be little skews and clocks.
So we don't try. Right now, we write down the time of ingestion for every transaction. And whatever order that happens to be in Materialize is the order that we declare to be correct.
But what makes this still a valuable property to provide is that we commit to that order inside of Materialize.
So if you have 10 downstream views, all querying different subsets of your five upstream sources, you're always going to see consistent data across
those views. And if you start a transaction in Materialize and then read from any number of those
views, you're going to see exactly the same data from all the upstreams reflected in all of those
views. So you at least get a consistent snapshot within Materialize. Something we're launching
this quarter is a feature called real-time recency, which gives you consistency between materialize and basically one upstream system at a time.
And the specific guarantee, so if you see a write succeed to Postgres and you have real-time recency enabled in materialize, you're guaranteed that your next read to Materialize
will include that write that you saw.
So you get a read your writes consistency property.
Are you like enabling Materialize as a replica of Postgres
or like how is that sort of working?
Yeah, so we worked with Kyle Kingsbury from Jepson on this.
He's the one who came up with the term real-time recency
and suggested this approach.
And it's a phenomenally simple approach once you hear it. What Materialize does in this case is
when a query arrives, it reaches out to the upstream system, so Postgres, Kafka, MySQL,
and asks for the latest offset. For Postgres, that offset is the LSN of the last transaction
to commit. Similarly for MySQL, for kafka it's just the latest offset
in the topic and then in materialize we wait for that lsn to flow into materialize get committed
to our storage layer and then flow through the compute layer and that guarantees that because
as a user you saw right commit to postgres and then you ask materialize a question we go ask
postgres hey what's the latest transaction you've seen? And there's an ordering guarantee there.
We know that Postgres will tell us an LSN that includes the LSN of the transaction that just
committed. And now you're guaranteed that Materialize is going to show you data that
reflects that transaction. Gotcha. Okay. Wow, that's super sick. What'd you say the sort of
timeline on that was? We're going to be launching this in preview this quarter. But the big open question for us is what is the performance? Because this
introduces this synchronous request to Postgres on every query, and you're forced to wait out
the data flow propagation delay. And our current estimates are this is a second or two. And for a
lot of people, they're coming to materialize for responsive reads and waiting a second or two. And for a lot of people, they're coming to materialize for responsive reads and
waiting a second or two for every query to succeed is not going to work for their use case.
Gotcha. Is that something that you like enable on a per query basis? Or will it be, hey, if you set
up this materialized view, it's also enabled with real time recency. And therefore, every query to
this materialized view has that enabled? We're thinking on a per-session basis.
So you connect to materialize,
you set a session variable saying,
I want to opt into real-time recency,
and then every query you do
gets these real-time recency guarantees.
Yeah, okay.
Okay, sounds good.
Tell me a little bit about S3.
I see so many data infrastructure stuff using S3.
For sure.
Just like, like what's, what's the sort of reasoning there of, of, you know, instead
of local disk and that sort of replication using S3 as, as sort of that foundational
layer.
Economics is the fundamental answer.
It is so cheap to store data on S3.
If you throw it on S3, you kind of don't have to think about it.
You can put a terabyte on S3 and and for most organizations' budgets, that doesn't register at all.
And that is not true of local disks.
That's not true of EBS.
A terabyte of EBS is something that starts to register for people.
Now, Materialize recently learned, this is in preview right now, to spill data flow state to disk. Up until a couple weeks ago, Materialize had to
store all intermediate state and indexes for data flows in memory. And disks are expensive. Memory
is wildly expensive. So that was not economical for some use cases. But we've built our own memory allocator for Linux that enables paging out to
disk. So those data flows now actually do make use of the locally attached SSDs on the machines
that they're running on. And that brings the cost down and brings stability up because you don't hit
this wall when you run out of memory, you start spilling to disk instead so we actually do make use of local disks now it's just not
something we rely on for durability the durability comes from s3 and the local disks are there to
provide performance gotcha okay and this this is probably a stupid question because i don't
understand the internals but you said like to enable that spilling to disk, you wrote your own memory allocator to do that.
Was that easier than just tweaking the Dataflow thing to write to disk itself?
Or I guess, how did you sort of approach that sort of problem?
It was.
The way that the Dataflow engine works, the way Timely and Differential have been built,
they're really smart about how
they write to memory. That's maybe the key thing that Frank and his co-researchers at NIAID did to
make NIAID and then timely and differential perform so well. It basically always accesses
memory sequentially and is very careful to avoid random accesses into memory because that clears the cache and those memory accesses become much more expensive.
And that's exactly the same trick that you want to play when you are reading from disk and writing to disk.
You want as much of that to be as sequential as possible. And by building this as a new memory allocator,
nothing about the data flow engine had to change,
and its access patterns were already nearly optimal
for accessing disk.
So that just ended up being the right place to slot this in.
The initial thought was actually to do something even simpler
and just enable swap.
Just tell Linux, feel free to swap out whatever you like. But for various operational reasons,
it ended up being easier to have a little more control. And the memory allocator gives us enough
control without having had to write a totally separate implementation of these core data flow
operators to be disk backed instead of memory back.
Gotcha. Okay. And then, okay, so let's say I sign up for materialize, I, you know, I get like a,
you know, a large size compute or something like that, submit a few materialized views with these
sources. Do I have like one box that's that like my own that's running this? Or is it multi tenant?
Is it split across a number of boxes? Like what does that sort of architecture look like?
Yeah, it's deliberately abstracted away from you
as a end user.
We want you to not think about this
to the extent possible.
So let me first describe the end user experience
and then I'll dig into how it actually works
on the backend.
But the way it's meant to feel as an end user
is that you get access to a materialized region. And we're upfront about this being a specific data center of a
specific cloud provider. So we only run in AWS at the moment, but you can choose between something
on the West Coast, something on the East Coast and something in the EU. And once you're in that
region, it's meant to feel like you have this unlimited pool of materialized compute and storage resources available to you. But you do have to explicitly indicate when you
want to bring a use case to materialize. And we call that a cluster. So you'll type create cluster,
and then you have to pick a size for the cluster today. Right now it's t-shirt sizing, small,
medium, large, extra large. So mirror is EC2 instance sizes. The difference is when you get to 2x large and
3x large and 4x large, we will invisibly start scaling across multiple machines.
We don't tell you where that point is because we want the flexibility to dial around the
inflection point on our side without users noticing. Maybe we have a bunch of small
machines or maybe we have one huge machine. It shouldn't matter to you as a user. But this
cluster has one really important property, which is data flows running on it. They crash together,
but they also share state together. So their lifetimes are all bound together. If this
cluster runs out of memory, all the data flows that were being maintained by that cluster
evaporate.
But the reason you might want to put everything on one cluster instead of sharding it across clusters is if there's intermediate state that's usable by multiple data flows,
co-locating them all together on the same cluster lets them share that state.
Okay. And when you say intermediate state, that would be like actually me
creating a materialized view and then two or three other things reading from that material,
like also using that materialized view and then two or three other things reading from that material, like also using that materialized view? Or what's that look like?
Yeah, so that's explicit intermediate state that you as the user have control over. And it can make
a lot of sense to build things like that. But there's also invisible intermediate state that
materialize will create for you automatically in order to make the data flow efficient.
And the simplest example of this
is a join. In order to efficiently maintain a join, you need indexes for both the left and the
right sides of that join. So when a new left record shows up, you need to see if it pairs
with anything on the right. And if that meant scanning over the entire right collection,
way too slow. But if you have an index there, that's a really efficient look up,
and it's an efficiently maintained join. So that's invisible to you as a user. But if you
had multiple data flows that were doing that same join, and you created those indexes on a cluster,
and then created multiple downstream indexes, all of them would be able to reuse those intermediate
indexes in their computations. around, but also you want to expose enough so that they can like optimize and understand that like you instead of having two different clusters that are both have that joy and now are both
maintaining that intermediate state. I guess like how do you how do you all think about
educating your users with that? Is that you know, do you do a lot of that with the docs and things
or do you just sit down with them and talk about that? Or like what does that sort of look like?
Just sharing enough of the internals that they can get efficient usage out of it. It's probably the biggest challenge in
Materialize's adoption right now is teaching people how to use this system because it is so
complex and there's such specific domain expertise that you need in order to understand why Materialize
is using memory, how you can get it to use less memory. And it's
very similar to a problem that the industry has had for decades with optimizing database queries.
You have to think about the optimizations differently, but at a meta level, the problem
is very much the same. You need a DBA who deeply understands the optimizer and can look at the
explain plan for a query and help you understand whether it's
making efficient use of indexes, whether there's a specific new index that you should build,
whether you maybe need to rewrite that query a little bit to help the optimizer along.
So we do what we can with docs, but there's a big hands-on component today. We have a great
field engineering team that helps onboard new customers, sit down with them, understand their use case. And sometimes if necessary, go line by line through a SQL query and talk about
what's expensive and how to rewrite that SQL query to be more efficient and materialize.
Yeah. Interesting. And are there, are there things like explain plans for
these data flows or something like that to get a sense of like what's happening or like,
exactly. Yeah. Okay.
That's the big interface for understanding what the data flow looks like.
You can type explain, and there's four different variants.
There's the raw SQL plan, there's the optimized plan, and there's a low-level plan.
And users are not meant to understand all of these themselves.
We have documentation for the ones that are user
accessible, but then some of the other ones are there for when we need to really go deep with a
particular user on a particularly problematic query. We're also trying to build up visualizations
in our console. So when you sign up for Materialize, the first thing you see is this web
interface, and it'll drop you into a web-based SQL shell. So you can start playing around with
the SQL interface because that's still the primary way that people interact with Materialize.
But once you've gotten data flows up and running, the console becomes even more powerful because we can visualize that data flow for you.
And the graph is a much more natural, intuitive representation than the text will explain plan.
So we're investing in that as fast as we can,
because that really just brings the data flows
to life for people.
Yep, very cool.
You mentioned, you know,
pulling in a lot of different sources
and you mentioned, you know,
not only your Postgres database,
but things like HubSpot or other SaaS tools.
Like how does that,
how do they get from HubSpot
or something like that as a source into Materialize?
You have two options today.
One is painful, which is you write a custom microservice that pulls the data out and puts it in Kafka.
This actually works fine if what is upstream of Materialize is already a custom microservice.
It's less palatable if you just want to pull Salesforce data into Materialize.
Why are you investing in
the tech to build that connector? But the interim solution we have right now that works quite well
is webhooks. For a lot of these upstream tools, they can emit webhooks. They have some native
plug in a URL and we will send events to that URL. And then in materialize, you see those raw
events and they're in a pretty raw form. sql is actually a pretty good tool for manipulating those webhook events so you just you write some
cleaning materialized views and then you get the the events in a pretty usable format but
also launching this quarter we're really excited about it is a fivetran destination
so we'll be able to plug into the universe of Fivetran connectors,
and they have pretty much everything under the sun,
and then you can natively have Materialize on the other end.
And if you're willing to pay for one of their higher tiers,
you can get the sync frequency down to a minute,
which feels pretty real-time,
but it does leave a little bit of the value of Materialize on the table
because we can get updates every second and be just fine.
And the reason Fivetran doesn't do this is because if you're updating traditional data
warehouses, a snowflake or a red shift every minute, that's already starting to stress them
out and they don't have a ton of pressure to get that faster, but they are working on turning the
internals of Fivetran into a streaming system so that they can ultimately deliver on syncs that
are faster than every minute.
And we're really looking forward to that future.
Cool.
In terms of like most popular sources,
like do you see a lot of sort of SaaS things like that?
Or is it a lot of, you know, operational databases,
Postgres, MySQL, and Kafka stuff
that like is generated internally from a company?
Or like, how do you see that breakdown?
Yeah, so this is anecdotal,
but the rough sense is most people show up at Materialize
with a specific piece of data
that they want to bring into Materialize for one use case.
And that's usually something in a Postgres or MySQL database
or something in Kafka.
And there's a critical need to operate on
that data faster. And that's what pushes them to reach for a tool like Materialize. And once they
get used to Materialize, once they see how powerful Materialize is, they start dreaming about use
cases that they hadn't considered. And those use cases often look like, well, I already have this
critical data in Materialize, and I have some other data in my CRM that I, if I could just bring that into materialize,
that would feel really easy. It'd be a really quick way to build an application.
And that's when they start reaching for bringing their SaaS tools into materialize.
Segment data comes up a lot for us because there's usually just a fire hose of data at
every organization that's running through a CDP. And there's usually just a firehose of data at every organization
that's running through a CDP. And that's very easy to plug into Materialize now with the webhook
source. Okay. Okay. Very nice. In terms of like that webhook source, do you charge like per webhook
or is that just part of the compute that they're paying for as part of their cluster? Like what
does that look like? Yeah. So we bill for two things today. It'll be three things ultimately. One is the storage cost, just literally our S3 bill passed on to you.
I was doing the calculations and it was like t-shirt each t-shirt size consumes a
certain number of credits per hour and that reflects the underlying infrastructure cost
plus our margin because that's where we provide people with value and we trust with the webhook
sources we cover our costs basically by charging for the storage.
And we want you to actually transform that data.
And that's where you actually pay for Materialize.
One day we'll charge for networking as well.
But right now, networking is included.
Like bandwidth across?
Egress bandwidth.
Yep.
The inbound is free because of AWS and the outbound is quite expensive as it is with all the major cloud providers yep interesting and so is
that is that outbound that's like when people are actually querying their data and getting it back
or using it when you when they're putting it into a sync or is it even the networking as part of your data flow type stuff?
I don't think we would ever charge
for the inter-AZ fees
that we accumulate on our side.
But that's not a commitment.
Don't hold me to that.
Yeah, sure.
And in terms of egress,
we have not quite decided
what makes the most sense. The most extreme version is every byte transferred out of egress, we have not quite decided what makes the most sense.
The most extreme version is every byte transferred out of Materialize, whether in response to
a select query or whether as part of a Kafka sink or even as part of, say, a Kafka source
because there's a little metadata chatter that goes back even as a Kafka consumer.
Would we bill for all of that potentially?
Or would we take a more Snowflake-style approach and it's just the bulk data on load? So Kafka sinks in our case that we would bill for all of that potentially? Or would we take a more snowflake style approach? And it's
just the bulk data on mode. So Kafka syncs in our case that we would bill for, and maybe we'd
abstract it away. It's not literally bytes over the wire, but how many messages sent over your
Kafka sync. But we never answered these questions. And right now it's negligible enough that we're
just giving it to users for free and we'll figure it out at some point in the future.
Yeah, it's interesting just talking to like data infrastructure founders and just how much sort of network fees.
Yeah, it's hard because like customers don't love paying for them, but it's like it's got to get paid for some way, you know.
But somebody has to pay for it, at least as long as the major cloud providers are charging for it.
So, yeah, exactly.
What do you what do you think about network US fees fees like do you think mostly racket do you think it's like oh
there's you know there's a real cost there and we need to like it sent people to economize on that
like what yeah what are your thoughts there i don't have an informed take i am confused about
the situation where the tier two players are giving bandwidth away for free and the tier
one players are charging what seems like an exorbitant rate. And that's where my understanding
stops. I don't know enough about the internal dynamics on these things to understand what's
going on. I do think it's suspicious that all three major cloud providers charge approximately
the same amount for bandwidth, which some people
think is exorbitant, which makes me feel like there really are truly costs here. And it would be
extremely difficult for these cloud providers to stop charging for it. Because I want to believe
that GCP would start giving away bandwidth for free if they could afford to do it to start eating at Azure and AWS's business,
but they're not. So my sense is there's some real costs there, but I wouldn't bet money on it.
It could be something else. Yep. Yep. When you talk about two or two,
are you talking like Cloudflare and they're like zero egress type stuff? Is that what you mean?
Yeah. Yeah. That's maybe an unfair framing. I'm just thinking in terms of market share. Yeah, exactly. One person explained to me who had worked at AWS and saying, um,
the hard part is like, you know, there there's inbound and outbound and Cloudflare has like
such an overbuilt network in terms of inbound for like DDoS attacks that they basically just
have essentially free outbound. So then for for them they can sort of give it away for
free but um i don't know i don't know enough about this stuff either so i just try and pick it up
from everyone yeah i mean that that sounds plausible to me for sure but yeah i'm not a
networking expert so yeah we talked about s3 a little bit what about s3 express one zone is that
something interesting and useful to you hey not quite there yet like what do you what are you thinking about that it made waves it materialized when it
was announced i think everyone took a pause that afternoon to try to understand what it was and
follow the discourse on on twitter and uh we're friends with the the warp stream guys richie and
ryan so we were chatting with them about what it meant for for warp stream because materializes
persistence layer actually looks quite a bit like what WarpStream does under the hood to store data.
And if the chronology of these businesses had been different, there's a real chance that
Materialize would have been built on top of WarpStream instead of building our own persist
layer. And there's something really interesting there in terms of the latency that you get. And if we could move right ahead
logs off of slow S3 to fast S3, the latency for materialized would come way down. And the way this
will actually present for us, it doesn't bite people much right now with materialized because
you don't really notice the S3 latency because you build indexes on the stuff you care about.
And then it's all in memory. But real-time recency is what exposes the latency of writing to S3
because we don't want to show you any data that we have not committed to S3. And when the commit
to S3 is on the critical path, the way it is for a real-time recency query, you have to wait out
the data coming from Postgres to materialize, then the S3 commit, which is often the slowest
part, and then the data flow propagation.
So if we can minimize that real-time recency, instead of being a second and a half penalty,
is maybe a 100 millisecond, 200 millisecond penalty.
And now you're in business for a huge variety of use cases, and you don't have to give up
consistency, which is really cool.
The economics are challenging.
That's what I've heard. We have not run the numbers in detail ourselves, but what I've heard
from Richie and some of the other folks who have started evaluating it is it's expensive. It's a
privilege to use S3 Express One Zone. And I don't know if that's fundamental, if that's going to come down over time the way S3 costs came down over time.
And the other question I have is they called it S3 Express One Zone.
It is a mouthful.
And is that because S3 Express Multi-Zone is coming in the near future?
And that would save us some engineering effort because we wouldn't trust writing to just one AZ for our durability. We'd have to write to two AZs. And now we're writing to S3 twice and it's more to manage. We'd love to offload that problem to Amazon. So that's the thing we're waiting to see. Just get a little more information about how the economics work out, what the future plans are for S3 Express before we do too much of an investment in it. Okay, cool. Yeah, I'm excited to see how that changes things, especially I think that that multi-zone thing will be,
you know, where you just have to write once
and not worry about that.
Yeah, that's going to be interesting.
One last thing, just sort of technically and stuff.
So you all have Postgres wire compatibility
in terms of querying.
Is that a case of actually re-implementing Postgres
and all that?
Or are you sort of using the
Postgres as a query planner and engine at the front end and then putting it into your indexes
and S3 at the end? It is a full re-implementation of Postgres. And we adopted this approach,
this philosophical approach from Cockroach. So it's funny. Cockroach is a Go re-implementation of Postgres and Materialize is a Rust re-implementation of Postgres. And we actually did consider the question in the early days of, should we take Postgres because on the dark days, somebody comes with a bug in Materialize and there's some little edge case and you discover some behavior in Postgres.
You're like, oh my God, why is all balls a valid timestamp in Postgres?
Where did that go?
And you go read the source and it's allegedly some military expression for midnight.
And then you go add the source, and it's allegedly some military expression for midnight. And then
you go add that to materialize. And when seven of those happen in a row, you start to wonder if you
made the wrong choice. But the flip side of that is materialize really cares about computations
being deterministic. The correctness of the system falls apart if anything non-deterministic slips
into the data flow execution layer. And Postgres is riddled with non-deterministic slips into the data flow execution layer and
Postgres is riddled with non-determinism, that's not a slight,
it's very useful, but like the random function,
people use that all the time and materialize cannot support it because we
can't do deterministic execution.
If you're randomly generating numbers in the middle of a, of a data flow.
So we'd be finding a totally
different class of bugs that may have been intractable if we'd forked Postgres. But we'll
never know for sure. We'll never know that counterfactual. But yeah, we did end up biting
off re-implementing all of Postgres' parsing and planning in Rust. Okay. Wow. Yeah, that's a big
job. Okay, let's talk a little bit about sort of just like business
aspects, pricing, billing, things like that, especially building a data infrastructure.
I guess like how you talked a little bit about your billing model. How did you sort of think
about that? And how much work went into thinking, how do we bill? How many factors do we have for
billing and just that whole process? So our CEO, Arjun, went deep on pricing and looked at a bunch of other cloud databases to understand their pricing models, talked to some of the people who had built out those pricing models in the early days.
And then as far as I can tell, I went into a hut, had a think for a couple of weeks, and then came out and said, this is the answer. This is how we're going to bill for materialized. And it's worked pretty well with one asterisk, which
it's an interesting philosophical question, which is we bill cost plus. So we take our costs,
take a margin and then pass those along. And this creates a weird incentive for the engineering team
because every time we optimize a query better that is money that
a customer is no longer paying us because they can go size down that cluster and now they're using
less materialized and what you have to believe is that making things more efficient means more
people will be able to use materialized you'll make it up in volume and we do believe that but
there have been some optimizations like sp spilling to disk, where we've
realized, oh, my goodness, we have just fundamentally changed the economics of materialized because of
this pricing model. So it's not perfect. But what I do like about it is that it's
it forces users to use materialized in a cost efficient way. They can't find edge cases that cost us a lot of money.
And anytime you do value-based pricing,
there's always these edge cases where somebody shows up and it's like,
Oh, the thing I need this product for is dirt cheap.
And it ends up being wildly expensive for the infrastructure provider.
And then there's this game of cat and mouse where they have to put limits in place.
But maybe those limits disrupt other valid use cases.
And maybe you have hard conversations with people about like, hey, you're just not going to be a customer of ours because our pricing model doesn't support you.
So I think it's weird no matter what you do.
And when we've talked to people, everyone has this experience in one way or another. It's just
hard to get right. Yep. Yep. What's the relationship like with, you know, you said you're currently all
on AWS. Like, is there a close relationship with them in terms of like optimization and things like
that? Or is it like, you know, you're mostly on your own until you like, and just keep figuring
this out and that sort of thing as a provider on top of AWS?
It's close and getting closer.
As we expand our AWS footprint,
we're that much more meaningful to their direction.
And we've been chatting more and more
with their product teams about what EC2 instance types
would be interesting to us in the future,
especially two quarters ago, we were not as certain that types would be interesting to us in the future, especially two quarters ago,
we were not as certain that we would be able to roll out
spill-to-disk as quickly as we could.
And we were really memory hungry.
And by default, AWS really only has an eight to one memory
to gigabytes of memory to CPU ratio.
Those are the memory optimized instances.
And they have these crazy instance types
that offer way more memory than that, And they have these crazy instance types that offer way more
memory than that, but they're really expensive. They're not part of their main EC2 offering.
And it's still interesting to consider if they were to roll out a even more memory optimized
instance with a 12 to one ratio or 16 to one ratio, would that be better for materialize?
So it's been really nice to be able to chat with
their product teams to understand just how far off market are we with our requests? Do you have
a bunch of other customers who are asking for instances that look like this? Redis is another
AWS customer that's worked closely with them on designing some instance types. And we're actually
moving over to one of the Redis Labs promoted instance types.
It's this I4I instance class.
And they have like 30 gigabytes of disk
for every gigabyte of memory,
which is five times more
than the other AWS instance types with disk.
And that's going to be really valuable for us
as we get better and better at spilling to disk. Nice. Do you expose, like say I choose a large cluster, do you expose sort of infrastructure
metrics to customers like CPU usage or memory pressure or anything like that? Or like what's
that sort of look like as a user? Those are exactly the two that we expose.
And it's your responsibility as a user to see how much CPU and memory your data flows are using.
And we do our best to present that to you,
and the console every day gets better at showing that to you and identifying,
hey, this data flow is expensive, and it's because this index inside of that data flow
is using 30% of the memory on this cluster.
We've thought about adding more data flow specific metrics, because CPU usage
for a while was just 100% all the time. And for us, we're making perfect use of the CPU, we're
doing as much work as possible. And for users, that was very stressful. So we thought about,
is there a different metric we should expose, which is like amount of useful work this data flow is doing each second.
And if that starts getting close to 100%, now you have a problem.
And what we ended up doing instead was changing how timely data flow worked to use less CPU.
So we made the CPU usage metric meaningful instead of introducing a different metric.
Gotcha. And if I'm sort of running out of, if I'm, you know, having CPU or memory issues,
what's going to happen?
Are my materialized views just going to lag?
And then do I like increase,
can I just increase a cluster in place
or what does that look like?
Alter cluster, set size equals medium, large,
whatever's bigger than what you're currently running on.
Okay.
Because I imagine the computer's like pretty stateless, right you just um yeah since everything's on s3 yeah and
it can take a fairly long time to bring a new size of a cluster online especially if you're
running a large cluster and you need to go to extra large you're maybe thinking 20 30 minutes
maybe talking 20 30 minutes for that to come online. We call that rehydration,
because you've got to go read all that state from S3, reperform the computation on all of it,
and get back to where you were. But we offer tools that are getting increasingly good at building automation around this so that you can turn on the new size, wait for it to come online,
and then turn off the old size. So so right now it's kind of all or nothing
either your data flow is keeping up and the lag is one second or less than one second or you've
hit the wall and you have fallen behind and you can no longer query materialize and that's not a
great user experience because you go from no performance problems to immediate outage and
spill to disk is changing this
fundamentally. Because as you start spilling to disk, you start going slower and slower and slower.
And as the disk builds up, you go even slower, and you start lagging behind even more. But you
have a warning warning. Yeah. Yeah. And while you're lagging, it's not like you can't see the
data, you're just getting increasingly stale data. So maybe your data is five minutes out of date now.
But that, for your use case, may be fine.
Not great, but it may be tolerable.
And that gives you time to turn on the new cluster
at the new size and let it catch up
without it being a total outage.
Very cool. Okay.
Last thing on billing.
So it looks like a mostly managed model.
Do you have...
Man, I can't remember what Sam Lambert called it, but it's like a cloud side model or something like that where people can run materialize in an account that they own or anything like that. Have you thought about anything like that? Or is it just like, you know, that hassle is not worth it? Like, you know, the managed version works for us. Like, how do you think?
Like a BYOC, bring your own cloud model. Yeah, exactly. Yep. Yep. Where it's still like in AWS, but it's maybe an account they
control and you're just sort of hooking into it or yeah. We've definitely thought about it and
we've definitely gotten a lot of requests for it as well. I'm personally a bit of a BYOC skeptic
because I don't think it changes the security posture that much. Empirically, yes, organizations
seem to be much more comfortable
if they can run your scripts in their AWS account. But when you actually zoom out and look at it,
you end up with a control plane that has admin API credentials into the customer's VPC that can
do essentially anything. So it provides some minor security and privacy benefits, but I'm not
convinced that they're all that robust. And then you have a
huge debuggability problem as the cloud provider. We get CPU and memory profiles live streamed from
everything that's running in production. And in some cases, we've had seg faults happen in
production. We can go run that process under GDB if we have to, and pull out every piece of debugging information
that we could ever want.
And it's completely untenable to do that
in a BYOC model.
The customer, if you're lucky,
will get on a screen share with you
and type some commands on their machine while you watch,
but you're never going to get
a live interactive debugging shot with GDB.
And that's really been an unlock for our velocity.
We get to move a lot faster
controlling all the infrastructure.
Do you think people will still be asking for BYOC a lot
in five or 10 years?
Or is that just like,
hey, we're in this transition period
where people don't understand what's mine,
what's yours, managed, unmanaged,
things like that.
What do you see that sort of future look like?
I used to have a stronger hot take on this that I've softened in recent months.
So I think if you can pull off the fully managed service model, like Snowflake did, like Materialize is, that works really well.
Because nobody actually wants to be in the business of keeping this stuff up.
They just want to trust your security and privacy controls.
It's really hard to bootstrap those controls.
We've invested a lot in materialize in security and in privacy and getting our SOC 2 certification
and performing regular audits and access control and all sorts of internal infrastructure to make
sure that anytime an engineer looks at production infrastructure, we write down every command that
they ran, they have to get approval from another engineer, all that kind of stuff. And that's hard to do as a small company.
So I can imagine that BYOC sticks around as a way of bootstrapping new infrastructure
businesses into existence because they have to get started somehow.
You have to get your first customer somehow.
And it's very capital intensive to get to the level of stock to certification and
controls that you need to win enterprise deals.
So I don't know.
I don't know how that dynamic is going to play out over the next decade,
but I'll be very curious to see.
Have you heard of new one?
I think I'm pronouncing it right now.
I haven't.
What's that?
They are BYOC as a service,
a recently
launched startup, and they provide a framework for companies
to build BYOC products. And I think there's something really
interesting there. Because they provide this like Terraform
provider, skeleton and log exporter that's all integrated
with their platform. So you put a Terraform provider and new on
and then they handle a bunch of the orchestration
around operationalizing the BYOC deployment of that,
showing you what customers are doing
when they're lagging behind,
you can force update them.
Like maybe that's the way forward
that mitigates the pain
of building out the BYOC infrastructure.
Yeah, that's interesting.
I feel like there's a company,
you know, five years ago that was helping with just sort of like on-prem installs of these things and doing upgrades and things like that. And I can't remember, I want to say like replicated, but I don't think that's it. Something that was like helping with that. But now, yeah, cloud version, BYOC version, that's pretty interesting.
Yeah, so Nuon is exactly that for BYOC. Okay, that's interesting. On that same note, you mentioned the difficulties of
building a data infrastructure startup as a small team, getting that momentum going. Are there any
sort of either cloud primitives or companies or things that would have made this a lot easier
that you wish existed either from AWS or maybe something else like Nuan that would have just
made any of
this process easier? I'm going to cheat and throw an answer your way that came from one of your
other Software Huddle interviews, which is FoundationDB as a service. There we go. Okay.
I don't know why this doesn't exist. I want it desperately. What we actually do is use
CockroachDB. So we're Cockroach Cloud customers. But it's funny, the way we use
Cockroach is we created a table that's essentially key comma value. And we do inserts, updates and
deletes into that table as point reads and updates. And we basically use none of the SQL features of
CockroachDB. We use it as a hosted implementation of Raft, where we can get support and get someone on the line
if something goes wrong.
But we would love if Cockroach
offered a simpler API
so we could get a key value stored directly.
Or if AWS wanted to run etcd
or FoundationDB as a service.
Yep, yep.
Is Dynamo like too expensive, too slow?
Like what's the...
Too slow.
We looked at it
because that's AWS's answer.
But the commit latencies are,
I want to say the P99 is at least 100 milliseconds.
And that's just way too slow for us.
The Cockroach P99 commit latency
is single digit milliseconds.
I want to say six or seven milliseconds
and it's often closer to two or three.
And that's what we need in order to update our metadata
without it showing up
in the user experience.
Gotcha. Gotcha.
That's interesting.
Foundation, I've heard from a few people
just like, yeah,
I want to know more about that
because it's showing up in different places.
It's kind of a cool thing.
So yeah, cool.
That's great.
All right.
I want to close off with just like
some common questions
that we ask everyone.
So semi-rapid fire here. But here you go. First one,
what wastes the most time in your day? Slack.
Slack. Okay. Yes. The, the interrupt driven nature of Slack.
Yep. Exactly. Do you, do you turn it off or do you like, and have deep work or do you just
always have it available? I imagine like as CTO, you have to. I have moved my entire sleep schedule around to get three hours of focus time about 11 PM to 2 AM
Eastern. And that's when I'm able to do deep focus work that dodges the people on the West
Coast and dodges the people in Europe. So that works pretty well.
That's amazing. Yep. Okay. If you can invest in one company, it's not the company you work for,
not a public company, which you actually could invest in, but like, you know, a non-public
company, who would it be? Warpstream, without a doubt. What tool or technology can you not
without? Polar Signals has become that for us. They do continuous CPU profiling. We started
working with them a year ago. You drop in a Kubernetes daemon set, runs an agent on every node, and it extracts CPU profiles from your entire production infrastructure.
Wow. Okay. Very cool. Which person influenced you the most in your career?
CTO of Cockroach, Peter Mattis, was my boss for a little bit. And he is an absolute machine when it comes to writing code and debugging problems.
He will just jump to definition in the Linux kernel if that's what it takes to find a bug.
And to this day, I'm so inspired by the number of problems Peter was able to track down with that approach and the number of things that Peter was able to code while also being CTO of Cockroach.
Yep, Yep.
How much time do you get to code now in your role as a CTO as Materialize?
Very little.
I'd say five hours a week on a good week.
And that's coming out of like magic time, nights and weekends.
That's not coming out of my core working hours.
Yep. And is that on sort of core
materialized stuff or like how do you like how do you choose what to work on when you do get the
five hours it gets harder and harder because i want to work on stuff that's important but not
urgent and uh finding those those projects they also have to be increasingly small because i have
such limited time and in order for me to feel like I'm delivering value,
I need to be able to,
over four weeks and 20 hours of work,
do something meaningful.
But there have been some wins like that recently
that have been satisfying.
But that's always what I'm on the lookout for.
Whereas there are tiny little thing
that's not on the critical path,
but will be really meaningful
if I'm able to knock it out, pick it up.
Yeah, yeah.
And there's still just like nothing like sitting down and accomplishing something
in code. It's like clear if it works or doesn't work. It's not like people stuff or marketing
or whatever else. It's like, you know, the clarity of it.
For sure. I'm an engineer at my core and there's just something so creative about the process of
software engineering. Like that was my art form growing up. I can't draw, I can't sing, I can't paint.
And I can code and I can make things that way.
And that's always been really fulfilling for me.
Yeah, I agree.
Cool.
If you could master one skill
that you don't have right now, what would it be?
I wish I had any musical talents.
Completely unrelated to the job,
but that is something I am just very much lacking
that's like when you said like the uh no no music no art that's like that's sort of my thing if i
had one skill it would be like design i like be able to do just like some amount of design and
you know it helps in other areas too but yeah something something creative outside of outside
of software um all right cool last question With AI in the news and all that,
what's your probability
that AI equals doom
for the human race?
Are you worried about that?
I'm not worried about it.
And I'm happy to be wrong
on this point, but...
I wouldn't be too happy
to be wrong about that.
Many times throughout human history, we have examples of people predicting that the
next thing was going to bring doom to humanity. And so far, they haven't been right. Of course,
bias in this, right? Because the moment that they're right, humanity ceases to exist and it's
all over for us. But they said the industrial revolution was going to change everything.
It didn't.
We're still here going strong.
And I feel the same way about AI.
I just, I'm not convinced
that it's gotten anywhere close
to the point where it's going
to supplant humanity.
I think we're going to find some way
to peacefully and happily coexist
with chat GPT,
at least in my lifetime.
Yeah, yeah, very cool. Well, Mikhail, thanks for coming on. Like, this was awesome. I'm jealous of,
you know, like the very cool things you've done, both, you know, cockroach sort of implementing
Spanner and now and now we're going to materialize and implementing Dataflow and Norea, all this
stuff. So awesome what you're doing. Thanks for thanks for sharing your time with me.
Thanks for having me. I'm jealous you get to spend your days doing this talking to
a lot of really, really cool folks.
So, yeah.
Yep, yep.
I should have said, sorry,
where can people find out more about you,
about Materialize, if they want to find more?
Yeah, so go to materialize.com.
We've got an active Slack community.
There's a big banner.
Jump in there.
You can reach out to the engineering team in there.
We monitor it pretty closely,
and we'd love to chat with anyone
who's interested about Materialize
or the underlying tech. Timely differential, we still get a steady stream of
questions on that and happy to dig into the technical details there with anyone curious.
Cool. Great. Nikhil, thanks for coming on.
Thanks for having me on.