Software Huddle - Multi-tenancy with Khawaja Shams
Episode Date: April 23, 2024Today's episode is with Khawaja Shams. Khawaja is the CEO and co-founder of Momento, which is a Serverless Cache. He used to lead the DynamoDB team at AWS and now he's doing Memento. We talk about... a lot of different things, including multi-tenancy and cellular architecture and what it's like to build on AWS and sell infrastructure products to end customers and just a lot of other really good stuff. We hope you enjoy this episode. 01:12 Introduction 03:38 multi-tenancy 08:13 S3 and Tigris 15:09 Aurora 19:11 Momento 31:21 Cellular Architecture 41:16 Most people are doing cross-AZ wrong 52:23 Elasticsearch 01:03:08 Rapid Fire
Transcript
Discussion (0)
I am one of your biggest fanboys, the author of the DynamoDB book.
I chased you down at reInvent.
It's like, hey, Alex, it's me.
What multi-tenancy is and why it's so important?
The ability to share underlying resources so that you're not doing repetitive or duplicative work.
How much do you know about Elasticsearch?
What would you like to know?
If you could master one skill you don't have right now, what would it be?
So many.
Hey, folks, this is Alex.
And today's episode is with Quaja Shams.
Quaja is the CEO and co-founder of Memento, which is a serverless cache.
And he's a friend of mine.
I just love chatting with him because, you know, if I ever have a question about distributed systems or AWS, he's a really good resource on that.
He used to lead the DynamoDB team at AWS and now is doing Memento.
And we just talk about a lot of different things in here, including multi-tenancy and cellular architecture and what it's like to build on AWS and sell infrastructure products to end customers and just a lot of really good stuff here.
So I love this episode.
Hope you do too.
As always, if you have any questions, comments,
guests you want on the show, anything like that,
feel free to reach out to me or to Sean.
And with that, let's get to the show.
Quaja, welcome to the show.
Thank you, Alex, for having me.
Always a pleasure talking to you.
Yeah, absolutely.
So you are the CEO and co-founder at Memento.
But I'd say more importantly,
you're a friend that I've met.
I met two years ago, I think.
And you're just like the person I go to when I have deep systems questions, distributed systems, or how AWS works, or all these sorts of things.
So I'm excited to sort of get you on here and take some of our conversations that we have or some of the dumb questions I have and just make them public.
Because I think a lot of people should know this stuff.
But with that background, maybe just tell us a little bit about you and what you're doing.
Yeah, I mean, it's humbling to hear you describe me
like that because I am one of your biggest fanboys,
the author of the DynamoDB book.
I chased you down at reInvent.
It's like, hey, Alex, it's me.
So I appreciate your kind words.
I've been following you for a lot longer.
So my name is Kwaja.
I started my career building cameras on board the Mars rovers, got tired of hardware because
I'm impatient and I like to see things happen right away.
That instant gratification really drives me.
So I started doing image processing for all the Mars rover images that were coming back,
got impatient again of all the data centers and dealing with ordering hardware and whatnot and became one of
the earliest AWS production customers. This was back in 2008, 2009 timeframe. And back when all
of AWS was one solutions architect and got to work very deeply with a bunch of AWS teams and just
been part of this journey of cloud adoption
since the very beginning and inspired by it.
2013, I joined Amazon.
I ran DynamoDB, and then I ran all the video services for AWS Cloud.
Gotcha. Yep. And what are you up to now?
Today, I run a small startup called Memento.
We build a serverless caching product. We offer key value
store data structures and low latency PubSub. The underlying hypothesis just goes back to the
instant gratification. I think people should build things and infrastructure should just get out of
the way. That's what inspired me about AWS and cloud. That's why I left NASA to join AWS.
And I really think if people can make developers
more productive, they can really change the world.
So that's what we try to do at Memento,
specifically narrowing in on caching
and low latency messaging.
Gotcha, absolutely.
So one thing that you've brought up a number of times
when we've chatted, I've seen you write about it
and all this all over the place, is this idea of multi-tenancy and just like how powerful that is in systems and infrastructure.
So maybe just give folks an overview of what multi-tenancy is and why it's so important.
Yeah, multi-tenancy is the ability to share underlying resources so that you're not doing repetitive or duplicative work.
And, you know, just to be clear, where I go for my inspiration on multi-tenancy is a set of Mark Brooker's blogs, Mark Brooker, Andy Warfield.
They've just done incredible work just explaining why multi-tenancy works really, really well
at the infrastructure level.
But it's all around us in life as well.
So, you know, something as, you know, starting with like when you provision
an EC2 instance, you're sharing the data centers, you're sharing the rack, you're not getting an
entire rack. So whether you like it or not, multi-tenancy is in your AWS. If you're not even
on AWS, you have your own data center, well, you're probably not producing your own power.
So multi-tenancy is there. If you are drinking water that you're
not collecting from rain, that's coming from a multi-tenanted source. Same thing with the internet
inside of your house. So when you look all around, multi-tenancy surrounds us and it is the way that
people make consumption of resources more efficient, which results in better availability, better performance,
and just an overall better consumer experience.
That's multi-tenancy at a high level.
Yeah.
And I can sort of like get that.
And I think most people would get that and be like, hey, yeah, I don't want to go, you
know, grow all my own food or gather my own water or that sort of thing.
But I think like where it would sort of break down or surprise people or like engineers and things like that
is you mentioned EC2 and that makes sense too.
I don't want to run my own data center.
So it makes sense that someone else does.
I have my own EC2 instance.
But I think then you like at DynamoDB,
at Memento are taking that a step further
where it's like, hey, you're not even renting your own instance.
You're like getting a shared service where there's a bunch of instances
and all customers are sharing all this stuff
in this multi-tenant system.
Yeah, in the early days, it was kind of great
that you didn't need to rent your own data center space
to get an access to one more instance.
It was available to you instantly, right?
That's why they were called instances.
So the EC2 instances were available to you instantly
without having to build things.
Now, that was fundamentally a much more efficient way
to consume resources.
But then a few years down the line, what we realized is that you can make
things even more efficient. So when you run a Lambda, right? So even with EC2, what people
were doing is they would provision an instance, but even that instance may run idle most of the
time. And if you are sharing capacity at the instance level, it leaves room for further
efficiency. So some instances might be running
really heavy on CPU. Some might be running really heavy on memory. Some might be running really,
really heavy on networking. And some might just be idle most of the time. So if you zoom in,
and if you increase the granularity at which the multi-tenancy is occurring, lots of optimizations come your way.
So as an example, when you provision,
when you start to use S3,
you're not provisioning a bunch of machines
and drives for your S3 bucket.
You don't start by saying,
how many S3 capacity units do you need for your bucket?
You literally create a bucket and you start writing. And that
allows AWS and any, you know, whoever the storage providers are, it allows them to manage
utilization of that capacity at a much, much, you know, finer grain. Because now they can make sure
that, you know, if some drives are more consumed than others, that they can make sure that you know if some uh some drives are are more consumed
than others that they can kind of spread the load and you don't have you know provision s3 hard
drives that are just sitting there idle across millions of different customer accounts so that's
where you know you zoom in and you start doing capacity management at resource at granularity that is finer finer than
an instance and this is why it's really really hard to beat s3 for really low cost incredibly
durable storage yeah speaking of did you see that there is someone trying to beat us there's like a
new s3 um competitor man tigris i believe it came out just like last week have you seen this yeah i
mean s3 uh tigris is great, by the way.
And there's been lots and lots of people trying to reproduce and replicate S3 and also building
on top of S3, right?
So from the very, very early days, you know, there were things like the Swift object store.
And then we got, you know, Eucalyptus was trying to do things beyond EC2.
And then, you know, OpenStack had a whole bunch of capabilities.
MinIO showed up.
So there's a whole lot of ways that people have tried to reproduce that infrastructure.
I think what Tigers is doing, and I think where they're right for optimization, is building on top of the shoulders of giants.
You don't have to reinvent S3 to actually add a whole lot of value
to the S3 customers.
You can rely on that multi-tenanted,
incredibly efficient, incredibly durable,
incredibly available fleet,
and then just enable customers
to utilize that fleet even better
for their specific workloads.
Yeah, yeah.
Okay, so going back to what you're saying with S3,
when I think of like, you know,
this new sort of multi-tenanted era, what we're talking
about here is if I consider something like RDS, and let's say I have an RDS database, and for
whatever reason, I really need to, I have access to Vernon Rogles, and I say, show me exactly where
my RDS server was. Theoretically, he could find that and he could take you to a data center
somewhere and be like, okay, your data on this specific like server rack thing right here right like he could
he could sort of there's like some control plane stuff that's outside of that but generally he'd
be like here's your this is your rds database technically whereas like even if you had access
to you know bezos and andy jassy and verner bogos you said, show me where my S3 bucket is.
They couldn't do that because it's like, you know,
it's split across all these different machines that are in different data centers,
like all sorts of things.
And like, you know, they could show you a server
and say like, hey, this one has some chunk of your data on it
along with a hundred other customers' data on it
and stuff like that.
But it's not like a specific instance
that you sort of have and is yours in some sense, even if it's not like, it's not like a specific instance that you sort of have,
and is is yours in some sense, even if it's managed for you. It's like truly just all over
the place. Is that is that a useful way to think about it? Or is that wrong in so many ways?
Yeah, I mean, I think all over the place is is a good way to think about it. And that's actually
where your blast radius is a lot lower, like all over the place is a feature, not a bug, right?
Your resources on S3 are peanut butter across millions of drives.
So Andy Warfield at the Fast Conference recently
gave this astounding statistic.
Tens of thousands of AWS accounts exist
where their buckets are spread over a million drives
or on millions of drives.
Let me repeat that.
Tens of thousands of customers have buckets spanning millions of drives.
Like that's bananas.
Yeah.
When a particular machine fails, it's not like all your data goes away with it.
And so that peanut buttering, why is that important?
Because it allows you to absorb burst.
Bursty writing, bursty reading, bursty network, right?
All of that.
Like you want to create a million small objects like that.
You're not waiting for new instances to pop up and be allocated to you.
They're just there because they're just going to get peanut butter across all of S3's data plane.
And that is very, very powerful.
So it comes down to how quickly can you kind of absorb the burst.
And in a single-tenanted environment, the only way you can absorb the burst is by over-provisioning
and ridiculous amounts of over-provisioning.
In multi-tenanted world, you're over-provisioning a little bit, but you're over provisioning at a fleet wide level.
And every small customer is kind of helping you smooth out your peak to average ratio.
And driving down the peak to average ratio helps increase the utilization, which helps you decrease your costs and also optimize the availability and performance for your overall fleet.
Yeah. Yeah. And that peak to average.
So I think I saw Andy Warfield do that S3 talk,
and he sort of showed that on a graph.
Like, hey, here's an individual workload
and what the peak to average, huge bands.
Now here's if you put 10 of them together, smaller bands.
And here's if you put 1,000 of them together,
and it's like almost a flat line, right?
Like there's not that much difference between that
because once you aggregate those out,
they really smooth it out quite a bit so i mean i like multi-tenant
like i like dynamo right and so like i'm like familiar with multi-tenancy s3 memento is that
same way but like why don't we have more multi-tenant systems like i feel like we still
have a ton of sort of data systems that aren't multi-tenant and like, is there sort of just like certain,
I don't want to say databases, but certain, certain, yeah,
certain patterns or things like that,
that are harder to do in a multi-tenant way. Whereas like, you know,
the split ability of S3 and Dynamo and Memento caching,
like makes it easier to multi-tenant or like,
why don't we have more multi-tenant systems?
Multi-tenancy is hard it's
it's really difficult to get it right you have to avoid noisy neighbors you have to have the right
throttles you have to do utilization and then it's really really hard as a business to be
multi-tenanted because you know when you're selling to the world when you're selling efficiency
to the world like multi-tenanted systems can be, you know, they can cut down your re-circulation by 90%. And it is really,
really hard to build a service that takes, you know, four times longer to get to a billion
dollars in ARR. The early services like RDS and ElastiCash, it's easier for them to get to
very, very large revenue numbers because nobody's
doing this willfully, right?
But if you have a lot more idle capacity that people are buying, then guess what?
They're buying 10 times more capacity than they need.
So from a business perspective, it is really, really hard to justify more multi-tenanted
services.
It's not impossible, but it is harder.
From an engineering perspective, it is much, much more difficult to build a multi-tenanted services. It's not impossible, but it is harder. From an engineering perspective,
it is much, much more difficult to build a multi-tenanted system to deal with all of the
isolation. But over time, everything is going to go towards multi-tenancy. It's just a matter of
who's able to take the leap first and kind of get there. And we see transitions happening in this. So, you know, RDS, single-tenanted, sure.
Aurora, the end storage system underneath the covers,
it's completely multi-tenanted.
So sometimes it takes longer to, you know,
to get to a point where the competitive landscape
is, you know, competitive enough
that you have to go down that path of more efficiency.
But it is a little bit harder to kind of bootstrap and kickstart a multi-tenanted environment.
Gotcha.
How, I don't know, how much do you know about and how much can you talk about
Aurora's storage system?
Like what's going on underneath there as sort of like how it's working in a multi-tenant way
or just like how it's working generally?
Yeah.
I mean, one of the beautiful things about Aurora is that it decouples the compute from the, from the storage.
So you can have multi-AZ replication without having, you know, Aurora nodes in multiple
AZs. That itself is really efficient. So if you look at RDS, the only way you're going to get
multi-AZ replication is by having two RDS instances, one in each AZ.
That's because it's just those two instances that are, you know, doing your replication for you.
And by the way, if one of those goes down, well, that's 50% of your fleet.
Gotcha.
And sorry, so you're saying Aurora can have multi-AZ storage without Aurora nodes and multiple AZs, without multiple Aurora compute nodes,
but they do have multiple storage nodes in there.
But it's like they can, yeah, they have storage nodes all over.
Right. More customer-specific compute nodes, right?
Without the customer-specific compute nodes,
because they decouple the storage and the compute engine.
Now, that storage system is probably one of the best pieces of technology that's been built inside of AWS.
But it is fundamentally very, very powerful because guess what?
The compute and database is nice to do queries and so forth,
but I think people care a lot more about, hey, please don't lose my data
and please don't lose the transactional correctness of my data in a database.
And decoupling that and making that
really crucial part multi-tenanted is incredible because now if you want to bring up a new host,
well, if the storage system already has the data, then you don't have to wait for that node to
warm up before it's available. You don't have to wait for it to bring in an entire terabyte
of data. So, you know, like I think over time everything is going to go towards multi-tenancy
anyway. And over time you might get into a situation where, you know, EC2 and, you know,
Fargate and things like that, the start times will start to decrease as well.
So as the instant start times can decrease,
then you get a lot more efficiencies
regardless in terms of being able to optimize.
But that's just better multi-tenancy
at the EC2 level as well.
You still need to decouple the storage side, I think,
and make that part multi-tenanted on everything else.
Gotcha. Okay. So if I understand it correctly, like when I think of Dynamo, it's like,
it's multi-tenant up and down the stack. There's like no aspect of that. When you provision something in Dynamo, there's like nothing that is yours. Like the request router is multi-tenant,
the metadata store, the transaction coordinator, the storage nodes, all that is multi-tenant.
Whereas with Aurora, storage layer is multi-tenant. Whereas with Aurora,
storage layer is multi-tenant, but then compute on top is going to be customer specific, whether that's your main writer node, whether that's your read replicas, those are going to be
sort of customer specific nodes. That's right. And there's multi-tenancy,
by the way, up and down. EBS is a great example of multi-tenancy.
People don't even realize they think they're getting their own drives,
but behind the scenes, there's a whole lot of optimizations that are happening.
They're not going and provisioning individual nodes to give you that 10 gig drive.
And you can imagine optimizations where people provision a lot of EBS instances that are like 10 gigs
and they use up less than a gigabyte there, right?
So can you imagine the amount of efficiencies
that might be available for somebody
to take advantage of?
Not saying any of those optimizations
have been done or not,
but like it's a very natural optimization, right?
To not allocate the unprovision
or the provision but not use blocks
in a block storage system.
So yeah, things go down this path one way or another.
Yeah, interesting.
Okay, so this might be a good time to talk about Memento, how that fits into multi-tenancy.
So you said Memento is cache, also like low latency pub sub messaging.
Maybe start with the cache aspect.
Like, what aspects of a cache to like i think people probably mostly think redis memcache like
how close are you to those sorts of things and then like how does multi-tenancy fit into this
yeah um so our job at memento is to reduce the number of boxes in your architectural diagram.
And we do end up, you know, swapping out a lot of memcache and Redis.
And what happens in the Redis and memcache scenarios is you don't have one block for your cache.
You have, you know, a cluster.
You have shards.
You have replicas, and you have,
you know, the number of replicas that you need based on your environment and which AZs they're
in and all of that. Like that's a very, very leaky abstraction that leaks all the way to the SDKs
that are now aware of the end-to-end server-side topology. So when you auto-scale and you go up, well, guess what?
Your SDK has to become aware of that.
So we start with a multi-tenanted storage fleet that is basically,
instead of allocating the capacity that a customer is asking you to allocate,
we do it on demand and we peanut butter that capacity across the fleet. Then on the
request side, we have an API gateway that basically does the authentication, does the TLS
termination, and then routes the data to the right layer. Now, this capacity is also completely
multi-tenanted, but it has a couple of other features. It's
got a web server built in and it has fine-grained access control, which allows customers to
connect their mobile devices or their web browsers directly to Memento bypassing a whole bunch of
API layers that you would otherwise need to provide. So if you wanted to build a PubSub
system with millions of subscribers,
you just talk to the Memento web servers
and boom, you're covered.
It also has a whole bunch of like fan out capabilities,
like it can, you know, hash hotkeys and so forth.
And all that capacity is sitting in a warm pool
on behalf of customers.
So if any one of those customers has a spike,
you can kind of absorb those spikes based on the capacity that is available on the storage side, as well as
on the request routing or API gateway side. Okay. Okay. Interesting. And I think as part of this,
one thing you've mentioned offhand before, I haven't dug in far enough on it, is like the
benefits of having a deeply integrated control plane and data plane.
Maybe talk about what you mean by those quickly
and then why it's useful for them to be so deeply integrated
or how you have a deeply integrated Memento, what that means.
Yeah.
So in the early days of AWS,
Jeff Barr used to have this awesome slide that used to talk about
you have two options when you're provisioning infrastructure.
You can kind of under-provision, and then you have angry customers, right?
And because you have this usage pattern that looks like this.
So you can over-provision, and then you have an angry CFO, right?
And what happens when you have a spike is you might have an angry both.
Like you might anger customers and
your cfo because you had all this capacity that was provisioned this whole time but it wasn't
there when you needed it right so the the integrated you know data plane is a um you know
it's a set of capacity that's that's made available on the storage side, as well as the ability to extract that storage
at a very, very high TPS.
But the control plane comes into play
to make sure that the fleet-wide health
is managed appropriately.
It's the thing that is making sure
that you have enough capacity across the fleet.
You have enough capacity for every single cache
that is out there.
And it is the thing that informs our API gateway layer
to with the latest topology changes for any given cache. This, you know, tight integration between
the control plane and the and the data plane allows us to make our clients much, much simpler,
because now they don't have to worry about the server side topology. And we don't have to worry
about different clients having different ideas of what the state of the world looks like.
So we can make changes really, really fast behind the scenes
because we have a tight control over the messaging
to the API gateway layers.
And we can be assured that data is getting routed immediately
with the latest topology in mind.
Yep, yep.
I like that. I think you said, man, I'm just picking something.
But like the fact that that control plane and data plane is,
is so tightly integrated as like, I would compare it to,
let's think like RDS, right?
Where there's an RDS control plane where I go,
maybe I go to the console or I use CloudFormation or something like that.
And I say, provision me a RDS instance.
And it's sort of a control plane sort of manage that and spins it up.
And then the data plane is the instance itself, which I'm doing operations against.
But then if I start hitting that too hard, you know, it's not going to like know how to help fix that in any way.
Right.
If I say like, if I'm like overloading my database, maybe I'm hitting a certain key too much or a certain record too much or something like that.
There's not much I can do.
I have to go to the control plane and say, hey, go do this action.
Whereas if I think of DynamoDB and you look at adaptive capacity and things that they're doing where they realize there's a specific partition where that item's just getting hammered. A couple items are getting hammered and they can sort of split those partitions into different ones because there's
like that connective tissue between the data plane and the control plane. And that data plane,
it's like serving these requests, but it's also saying to the control plane, hey, like,
we're hitting some issues here. It's like these keys that are getting these issues. And now the
control plane is able to like adapt to that sort of thing does it am i
understanding that correctly or am i am i like way off base here no i i think you are and the way i
like to frame it is there are shallow control planes that are mostly uh operating at a compute
level right they're like okay does this have enough compute and this is have enough storage
that's it and they you know And they don't have any idea
of how you're distributing the data
across your multiple shards.
That's not up to those shallow control planes.
They're unaware blissfully of whether you have hotkeys.
They're unaware of what your shard utilization is.
Like if one shard is busier than other
because of storage
they don't care that's not that shallow control planes problem now a deep control plane is
operating at the resource level so for s3 you know i gotta make sure that my like all buckets all customers are um have enough capacity right it doesn't matter how many shards
are in their s3 bucket that's s3 teams problem same thing for dynamo you can have one of your
dynamo db partitions get really ballooned up but that's not your problem because dynamo will
automatically you know split for space and boom, you got,
you got that split. So to me, it's more about deep control planes that are aware of the primitive
on which they're offering a service as opposed to a, a shallow control plane that's built for
compute and storage only. Yeah. For when we're talking about like open source data infrastructure, databases,
things like that, like, listen, I like open source. I think it's useful, but I think it also
limits, I guess, like how deep that integration can be. Like, do you sort of agree? I just think
of like the switch from original Dynamo from AWS, from amazon.com, right? Where each team was sort
of running its own Dynamo
and they had to like simplify it operationally
in some ways to make that work.
And then like DynamoDB itself,
where they're like, hey, we're running it for you.
It's not gonna be open source.
It's not gonna be provided to Amazon.com teams.
And because of that,
we can have like this deeper integration between that.
Like, do you think that's a,
is it gonna be like doable for an
open source database to have that sort of deep integration between data plane and control plane
or is that something that can really only happen when you have you know a more proprietary database
does that make sense yeah i i think it's inevitable i i i think people often focus on the data plane
side of open source right and if you look at all the data plane side of open source, right?
And if you look at all the data plane, all these open source companies,
sure, I can give you my data plane and then you can figure out how to run it
or they vend a managed control plane service.
And that's usually proprietary because a lot of the value is not the
actual data plane. It's the management of the data plane. It's the on calls that are
keeping the fleet healthy. It's like when you buy a car, if I made your car free,
it actually doesn't help you as much because a lot of your costs, unless you're
buying really fancy cars, but if you're like an average American, most of your cost is
actually operational.
It's the gas that you're putting into that thing, right?
And the insurance that you're paying for and the parking spot that you're paying for if
you're living in Boston or something like that, right?
Like the operational cost is where the meat is.
Like, so sure, like you can have the data plane, but it's the control plane that ends up making the bigger difference.
I don't think they are mutually exclusive.
Like you could open source control planes, but that's generally quite rare.
Yeah. And it's, and it's kind of weird because it's like, Hey,
if I'm going to go run my,
my own database,
it's probably going to be in more of a single tenant manner rather than like,
you're just not going to,
I think probably to make it,
to take advantage of the integrated control point and data plane,
you probably need multiple tenants where they're sharing capacity in some
sense.
And then you could be over provisioned a little bit,
but,
but like you're saying,
spread it around and different things.
So it's probably just like it's weird i don't want
to say like weird incentives it's just like weird um yeah it's just i think it's harder to make a
multi-tenant system open source like do you know any examples of open tenant system or multi-tenant
systems that are open source they probably exist but i can't think of one off the top of my head.
You can do multi-tenancy on your own, right?
So Memento, for what it's worth, we're big believers in cellular architectures.
And we have this really interesting combination that gives the best of both worlds to customers where we can do private multi-tenancy, where large customers are large enough that they have enough workflows within their organization
that they can kind of absorb the load
between their own workloads.
And they don't need to, you know,
compromise any security or data mixing,
you know, by sharing a cell with different customers.
And you'd be surprised, like, you know, even
as soon as customers are running dozens of instances for their ElastiCache plus their
web servers, like, a private multi-tenanted cell can actually bring meaningful efficiencies into
their ecosystem. And if you really live by the cellular architecture,
like all you're trying to do
is improve your resource utilization.
And multi-tenancy is just a means
to improve resource utilization.
And if you do multi-tenancy at a smaller scale
with just that customer's account,
that's still meaningful efficiency gains for them.
Yep, yep.
You mentioned cellular architecture,
which is something I hear you and just every AWS person talk about all the
time,
but I still feel like it's like under talked about outside of AWS.
So like,
what is cellular architecture?
Yeah.
So cellular architecture got evangelized inside of AWS as a means to reduce
blast radius for services.
So we have very large services at AWS,
and you don't ever want to have a regional failure
for one of those tier zero services.
And the idea was that instead of having one big regional deployment,
you would have lots of small ones,
and then you would shuffle shard the customers between them
so that you can um
you know like you reduce the blast rate if any one of those cells goes down it's not like you
have a regional outage for every single aws customer aws regions have gotten quite large
now right so if the regional failures are catastrophic consequences on our economy for
that matter right yeah yeah so that's how it started
but then there's a lot of other benefits uh around um you know like scale because then a
given cell is not going to be you know meaningfully large so then it becomes a unit of deployment
as well and um it was um you know the person leading it at amazon was uh peter vashel who was the first
distinguished engineer at amazon and he he really led the way in making sure that all the services
especially the new ones that were coming out have been cellularized and we went as far as saying
the internal company goals were a service ought to be able to whip up a new cell in less than four hours. So go all in,
like you can only do cellular architecture.
If you're really good at infrastructure as code,
like your entire cell has to be, you know,
ready to go where you can just click something, deploy it and boom,
four hours later, all the limits and everything is there and ready to go.
And once you get to that part,
then you have the ability to whip up cells that are dedicated to a specific customer.
A cell is typically also encapsulated in its own AWS account.
So even if that AWS account goes away, it's just limited to that one cell.
So it's about blast radius reduction.
It's about scale units and just isolation in general.
Okay.
So then what you're saying, like in US East one, S3 is not just one giant system, but it's a lot of cells.
Each one, like each one replicates sort of the entire system, but those cells are essentially completely independent from each other.
I don't know if S3 has gone in all in on cellularization or not but i can tell you that's
the intent that uh that's supposed to be uh dynamo in a region has lots and lots of deployments okay
so okay so let's talk about dynamo if you can talk about that one like how many cells are we
talking are we talking like tens are we talking like tens of thousands like can you give me like
the order of magnitude and and you know
i know this doesn't have to be exact but like just like is it closer to 10 or 10 000 or a million or
like how many cells are we sort of talking about just to give people no i i think you know it really
varies on on each service but i think in inside of amazon regions i think you're like five cells is
pretty good for a given region because that
reduces your blast radius down from 100 down to down to like 20 right and then like getting that
next 5x uh you know um boost is like 25 cells and like remember there's so many aws regions right so
not every region like some regions are going to be small enough to be your, your smallest cell. And one of the things, so, you know, that's the order I would, I would
imagine now in memento. Gotcha. So it's, it's not, it's not like you're adding a cell every
month or something like that. Like adding a new cell is a pretty rare situation. And it's still
like, yeah, it's not like cells as,'s not like they're was a cattle, not pets.
It's not like a cell as cattle or something like that.
No, you can't treat a cell like cattle either, because that's the that's the other thing.
Right. Like because then it's like, well, you can't sacrifice these cells either.
The you know, going from one region like to two cells is the hardest part.
Once you get there, then you can create cells.
And then there's a whole lot of design choices
that are still left available to the developers.
How do you distribute data between different cells?
Or how do you decide who goes in which cell?
Some services will say, okay, I, I'll put, you know, each customer can have each
resource, like if you let's say, in a hypothetical case of a service that had a bucket, you can say,
like, okay, a bucket is in a cell. So you can have one customer be available in every single one of
your cells, because, you know, you deploy a bucket to a different cell. Or you can say, I'm going to
isolate a customer to their cell and all of
their buckets are going to be in that particular cell. So there's all kinds of trade-offs. And then
how do you do the routing? How do you actually make sure that the global mapping is set up
appropriately and whatnot? So it's a- Yeah. On that sort of question,
let's talk about Dynamo a little bit. Let's say US East 1.
Let's just say they have five cells.
I'm not sure.
But where does that split happen?
If I think of a Dynamo request, like it hits the load balancer, then request router, and then down.
Are the load balancers cellularized?
Are those across all the cells?
Are request routers across all this?
When do I have to look up this request came in from Alex and I need to figure out which cell he belongs to.
Where does that look up happen?
So here's the thing.
The services, a little decoder that you can use is services that give you a unique DNS entry for your endpoint are the ones that have the easiest time, like routing you to an entirely contained cell.
Services that are a little more complex, they don't have to cellularize at the entire service level.
So you can cellularize at the storage node level in Dynamo.
You can cellularize at the request router level.
You can cellularize at the control plane, you know, cache or auto admin level as well.
So it's completely up to the teams in terms of how,
how they,
they sell it.
So like in the case of like media convert,
for instance,
you get,
you know,
an end point and that end point determines which cell you're,
you're going to a lot of like,
so all the media services that got built at elemental,
for instance,
for AWS,
they were all cellularized and they,
what we tried to do was to hand customer a specific endpoint, which allows us to move their stuff around as well to a certain extent.
But for the other services, the cellularization may not be happening at the entire service
level.
It might be happening at the component level.
Yeah.
Oh, that's interesting because I've been sort of using as like a intuitive sense
for people as to whether they're using a multi-tenant or a single-tenant piece of infrastructure
from AWS is like, do you have a unique DNS name you're hitting rather than a separate...
I always say like Dynamo, you're hitting dynamo.useastone.amazonaws and it's a multi-tenant service, same thing
with S3. Whereas if you go provision rds or aurora they give you like you know a bunch you know some unique
identifier dot aurora dot whatever or an nlb for that matter right like when you provision an nlb
you get your own little um endpoint that they can use to change to a different cell different
you know instance that might be routing your your capacity
as well yeah but anyway it sounds like my intuition is is wrong like even some you might have a
service that gives you sort of a unique dns entry but that doesn't mean it's it's necessarily a
single tenant service that you're hitting because it could be you know that dns entry is routing to
a cell rather than routing to a you know your sort of RDS compute instance or something like that.
Is that right?
Yeah. I mean, at that point it's like, you know,
which control plane is managing those?
What is the deployment units?
Again, it comes down for cell architecture.
It just comes down to reducing the blast radius and isolation.
Right.
So, and every service, you know,
is going to go about their own ways to achieve those objectives.
Yeah.
I feel like, I don't know if this is true or just intuition, but I feel like we've had fewer region-wide service disruptions in AWS over the last, let's say, five years.
Do you think that is true?
And if so, is that a consequence of cellular architecture?
Or maybe, hey, maybe a cell is going to be having trouble.
But it's rare now, it seems like, to have a full region-wide thing.
And like, is that related to cellular stuff?
Or is that just related to, you know, you keep doing COEs for 15 years and things are going to get just pretty hardened and things like that?
I don't know if you can pin it to a single thing.
Every day that AWS gets without an outage
is a day that should be celebrated
because the sheer scale at which AWS is operating at
and the sheer number of mission-critical workloads
that run on AWS, it's just really, really impressive.
And the 15 years of operational excellence, the COE,
the dive deep, the ownership of all the operators that go in and proactively mitigate issues before
they become problems that take down the world. There's a whole lot that goes into making AWS
what it is today. And I can't pin it on how it happens, but it is incredibly magical.
The level of availability that AWS has been able to pull off over the
years.
And there has been outages,
but it's just,
you know,
they,
they've gotten better with every single one of them.
Yeah,
for sure.
Okay.
Let's you talk about cellular a little bit.
You've also mentioned to me before that most people are doing cross AZ
wrong.
I haven't gotten
into this with you but like what do you mean by that yeah i doing cross az blindly is is actually
a very dangerous practice so specifically az's don't go down that often so you have to start with
why why do you want to go multi az and if the answer is because an AZ might go down, that doesn't happen as often.
What does happen more often is cross-AZ packet losses that get elevated or cross-AZ disconnects that happen.
They don't happen much, but they happen a lot more often than an entire AZ going down.
And then you start thinking, OK, what are the consequences of this design that I chose to go multi-AZ?
Well, two cents a gigabyte, that gets very expensive very fast.
But you add a millisecond or so of latency of going across AZ, so now your performance is is getting interesting then you start looking at performance at the tail because of those packet losses you start to have availability issues as well or like really bad you know like tcp which transmits at
200 milliseconds like things start to get really bad and they will happen a lot more often across
az the more devices you're going across the more um you know hops you have and the more likely you are to have one of those be
struggling. So you got to work backwards on the problem. If the problem that you're working
backwards from is multi-AZ, like resiliency, there are other patterns that you can use.
For example, I like to promote this notion of, you know, AZ specific swim lanes.
So a swim lane that is entirely contained in the AZ where the database, the cache,
and the web server are living in, you know, entirely in one AZ. And, you know, sure,
you can have, at some point, you need to replicate the data across, but at least your cache and your web server can be living in the in the same AZ so that you can absorb some of those outages further.
Now, as soon as you do that, your performance, your availability and your cost have a meaningful improvement as well.
And it's much better than unnecessary hops across AZs,
which have their consequences.
Yeah, okay.
Well, that brings up,
that brings me to my favorite question,
which I ask about a million people.
Whenever they bring up cross AZ costs,
what's going on there?
Is that a legit valid cost?
Is it sort of,
I don't want to say like rent seeking from AWS,
but is it like,
you know, that they are able to take advantage of that?
Like what's going on with cross AZ costs in your opinion?
So cross AZ is actually quite an expensive endeavor when what we see at two cents per gig might look really, really expensive.
But what we have to appreciate is that AWS is absorbing the peak to average ratios there. And I can be sitting idle,
not sending any data across the AZs, and then I can start a multi gigabit per second workload and
push that through. And the rate that Amazon is giving me is not the sustained rate, right? It's giving me on-demand pricing. Now, what I wish I could do
is pay for a pipe. And, you know, in some cases, like if I don't have a very high peak to average
ratio, I would love to just, you know, buy a direct connect between AZs and just pay Amazon
on those. But I'm telling you, for almost every customer, that direct pipe will cost you more
than the two cents per gig. So it's easy to, you know, it looks really, really expensive,
but the underlying infrastructure that is required to give you that elasticity and that burst
is actually quite expensive. And the level of innovation that AWS has to do to make that as
seamless as possible is also quite expensive.
Yeah. Yeah. I think that makes sense. I guess one other place I hear people complain about it is
like if they're running their own Kafka clusters or if they, maybe if they're even a provider of
Kafka clusters for someone else and they say, well, if you look at MSK, Amazon's managed Kafka,
managed streaming for Kafka or whatever that is, they don't charge across AZ stuff.
So it's like not cost competitive for us if we want to run it on EC2 compared to that.
Like, is that valid?
I mean, is that sort of priced into what's going on with MSK or what's going on there?
It's priced in and MSK reduces its peak to average ratio by being multi-tenanted too.
There you go.
As a service, you know, you're, as a customer,
you end up having much lower peak to average ratio that you're, you know,
that you're throwing onto the network team. Right.
So it comes down, maybe I'm a multi-tenancy parrot, but you know,
it's, it's same thing for, for S3, same thing for ElastiCache.
If you look at the ElastiCache AWS account, their peak-to-average ratio is going to be a lot lower than you pick a random high-utilization ElastiCache customer that's going like this.
So as a customer, the AWS services are better customers of the AWS networking stack than a single tenanted workflow.
So that's kind of baked into it.
And then, yeah, it's baked into the pricing.
Like ElastiCache is 104% premium on your EC2 instance.
Of course, that's covering some of the, you know, the network capacity that you would otherwise pay. Furthermore, like if you look at things like ElastiCache, you know, ElastiCache will absorb, like it'll replicate across the AZs
for you for free, but your gets, you still have to pay for. So you still have to pay for your half
of the, of the gets. You still pay one cent a gig for any reads that you send across AZs. And a lot of customers don't know to try to create that
AZ-specific swim lane that you can use to actually meaningfully reduce your AWS spend
if you do everything else the same, but just have your web server route to the local AZ
and save a bunch of money. Yeah. On that same sort of note, especially what I mentioned with
Kafka earlier, one thing I like to ask data infra founders is you are both a huge user of the cloud, but then also a competitor of the cloud because they have a competitive product.
How is that relationship?
Are the clouds pretty good partners on that stuff?
Is it tense?
What does that sort of look like, your relationship there?
We're certainly competitors, but cloud is also an enabler, right?
So many of the startups just wouldn't exist without the cloud providers.
You look at, let's go back 18 years.
How many tech infrastructure startups existed?
It's not just because there's more money and more innovation is happening.
The level of experimentation that can happen is meaningfully higher as well. Now,
of course, they have some advantages that we don't have, right? And, you know, and you rely on them
to continue to play fair. But at the same time, I have to remain appreciative of the fact that
they've created this ecosystem that I can use to innovate
and add value to my customers and make money for my shareholders.
Yep. Yep. Absolutely. Sort of on that same note, I've been thinking like, have we been
seeing enough improvements from the clouds in the last couple of years? Because I feel like
from 06, 08 to 14 or 18, maybe it was just like gangbusters all the time. I feel like from 06, 08 to 14 or 18 maybe,
it was just gangbusters all the time.
I guess, have we seen enough pricing improvements?
We don't see many price decreases anymore.
Should we be seeing more given just the advances in storage and network
and CPU, all sorts of stuff?
Should we be seeing improvements there?
The EC2 prices might look the same,
but the capacity is getting better, right?
So CPUs are much nicer.
The network is much nicer.
Not just on the bandwidth side,
but just the packets per second that can be handled,
the latency.
So there's a whole lot that's coming in for free.
Now, that said, AWS and the cloud providers are able to test more price elasticity
because now they're no longer fighting to become the incumbent.
They are the incumbent.
So that allows them to charge premiums that they couldn't have charged 15 years ago.
All right.
So, but that also creates an opportunity for startups to show up in and compete
by making efficient use of the infrastructure.
And it's a nice symbiotic kind of ecosystem in that regards.
Yeah.
Yeah.
What about just in terms of like,
are we seeing enough hardware improvements?
And,
and so today's February 26th last year on hacker or last week,
sorry,
on hacker news,
there was an article about,
uh,
it was called SSDs are fast,
except in the cloud.
And it's talking about how like NBMEs can be doing like 10 to 13 gigs per
second of,
of sort of read throughput.
Whereas like in AWS and Azure,
you're getting maybe two or three gigs per second.
Um,
I guess like,
what's, is there more to the, I don't know enough about hardware to know if that's like a valid claim or, or second. I guess like, is there more to that?
I don't know enough about hardware
to know if that's like a valid claim
or if something else,
like I wouldn't think that sort of the clouds
would just be holding it back
or not improving intentionally.
Like maybe there's not enough demand for it.
Maybe it's too expensive.
I guess like, I don't know.
Do you have any thoughts on like,
is the hardware improving as fast as you would expect?
Customers are always going to want things faster and cheaper.
That's the two axioms from Jeff Bezos, right?
Those are undeniable truths.
They'll always happen.
The good news is that there are multiple cloud providers that are vying for your business.
So if a technology existed and that was in high demand,
they would be trying to one-up each other and make it available faster.
But there's a whole lot of things that go into the equation.
You have to have enough scale.
You have to be able to get enough capacity.
You have to make it available to everybody.
And right now, the whole world is in a capacity crunch
because compute consumption and storage consumption
is just going through the roof with the AI revolution.
So I don't think any cloud has an incentive
not to make NVMEs or faster drives available to you.
Now, there's been a bunch of really nice novel innovations
that have happened.
Like all the Nitro and the Graviton stuff,
like your memory is encrypted by default on those instances.
Your drives are encrypted by default.
So, you know, there's a whole lot of other things that are happening on the cloud providers.
And yes, the hypervisor gets in the way.
And yes, there's some things in the way.
But like, at the end of the day, if the capacity exists and if the innovation exists, like it will become available in the cloud for the masses.
How much do you know about Elasticsearch? What would you like to know?
I just spoke with someone from Elasticsearch and it was super useful.
It explained a bunch of stuff to me. But I wonder why we can't have
a more cloud-native Elasticsearch.
It kind of reminds me of Dynamo,
pre-DynamoDB in some ways.
Like just think about like it's kind of like they're going to shard your data and spread across these different nodes.
But like shard management is pretty manual and it's pretty hard to like increase and decrease.
Usually you have to do it by a factor and it's just like kind of a scary operations.
And you're mostly on your own doing it, right?
You're aiming for shard size of like
tens of gigabytes 10 to 50 gigs generally that seems like dynamo ish in some ways right you also
have like storage nodes that are doing everything right like a storage node like handles requests
and serves as like the request coordinator reaching out to all the other storage nodes
but it's also storing the data itself and like hey maybe some separation between like doing
the scatter gather yeah yeah exactly maybe doing like the data itself and like, Hey, maybe some separation between like doing the scatter gather.
Yeah,
yeah,
exactly.
Maybe doing like the request router and just separating that out a little bit more.
It's also doing like replication is,
is like synchronous.
Like you scale up reads by having more read replicas and that replication to
those read replicas is,
is mostly synchronous unless you have like some lagging nodes and it'll maybe
cut them off.
But generally it's synchronous leopard application, even though you're buffering
updates because you're only flushing to disk
every second or something like that.
It just seems
like
almost like a Dynamo-like system, but for
Elasticsearch where Dynamo wants to
route every request to a single partition.
And what if you made a Dynamo that's like, well,
we have to scatter-gather every request instead,
but we get all these other sort of benefits.
I don't know. It just seems like there's an opportunity for a better Elasticsearch there.
So I used to write a bunch of Lucene code in the back when I worked at NASA.
And I'm very passionate about this particular space.
You're right that a Dynamo of Elasticsearch would be just revolutionary.
It is a pretty hard problem, but Dynamo solves hard problems.
And the fundamental differentiator here is that it's single-tenanted nodes
that with a leaky abstraction that the clients have to deal with,
customers as well as their SDKs.
And that is what causes the availability issue.
That is what causes the capacity crunches.
Now imagine a multi-tenanted Elasticsearch where the capacity management was done on your behalf
and the customers were kind of sharing each other's spikes, whether it's on compute storage or
network. That would be a pretty cool system. And I think it is inevitable that Dynamo will build it.
This is something I wanted to build in Dynamo for a very long time. And I think there's been other things to be done,
but I think it's inevitable.
I think this is not a decade away, you know,
but I think a Dynamo style Elasticsearch would be nice.
But what I would hope for is a built-in full text index
inside of dynamo db that i think would be magical because then you rely on dynamo for your core
storage and then you have these indexes which happens to be dynamo's biggest achilles heel
right now that would be the most magical system that i could imagine yep it's i mean mongo is basically doing that
right like adding in these other indexy type things on top of that uh you know sharded mostly
key-based storage um but they're struggling with the same thing they're struggling with the same
thing that elastic search is struggling it's a hard problem unless you get into that multi-tenanted
environment with better capacity planning like 99 99% of the Elasticsearch problems are capacity management.
If you get the capacity management right and you have enough resources,
you'll have a fundamentally better system.
Yep.
I know.
It is.
I don't know.
I'm just surprised there hasn't been sort of more traction on that.
Like there are a few like sort of managed SaaS search providers,
but they're just not quite...
I don't know.
None of them are in the realm of where Elastic is at.
Yep.
There's Algolia.
There's OpenSearch.
Amazon has a product called Kendra
for AI-enabled searches and things like that.
But no, the Dynamo of Elasticsearch would be huge.
I've been a huge fan of the Elk stack, by the way, for a very long
time. So when we launched Dynamo streams, one of the key integrations was with the Elk stack.
So we made that as part of our launch because we thought the marriage of Dynamo and full-text
indexing is quite nice, but I do want it to extend beyond just a zero ETL thing. I want it to be
built into the database.
Yeah. Yeah. That reminds me, you brought up open search and I know you've been
vocal about the use of sort of serverless applied to different AWS databases and things like that.
How multi-tenant are these? I would say all the, the like branded serverless databases that AWS have,
they seem to be like sort of similar architecturally because they have the same
flavor in terms of pricing model and just like how they sort of scale up, scale down,
scale zero, different things like that. I guess like how multi-tenant are those? Are they multi
tenant at any layer? Are they, I guess from what you can tell or from what's been publicly available about them, like what's going on there? So multi-tenancy is a really good, not definitive, but a really good
leading indicator for whether something is fake serverless or not. And, you know, if you figure
out multi-tenancy appropriately, you don't have to tack on serverless as a marketing term. You
will actually have a serverless service. If you're just using it for marketing then it's probably a single tenanted service i have yet to
see a true serverless service that is not multi-tenanted sqs multi-tenanted s3 multi-tenanted, S3 multi-tenanted, you know, Dynamo multi-tenanted.
Dynamo, when it wants to be, yeah.
Yeah.
And so what's going
on with those
serverless, you don't have to
talk about Elastica services, but maybe talk about OpenSearch
or any of them, like
are they like
mostly single-tenanted and just like
auto-scaling up and down or like what's are they like mostly single-tended and and just like auto scaling up and down or like what's
are they using aurora storage but still like just the compute is sort of hard to scale i mean you
go down to to the documentation of open search and and uh aurora serverless and they will tell
you what a or a neptune like what a neptune capacity unit what instance it maps to you can
literally find official documentation that maps the specific units to instance types
where I'm like, okay,
now you're just calling an instance something different.
A rose by any other name would still smell just as sweet
and an instance with a different name,
like a capacity unit, still a lot of operational pain.
Yeah, yeah, interesting.
I guess, I don't know how much is out there
about the new Aurora Limitless database. Did you see that at reInvent? Do you have any sense on like what's going on there? easier for them to pull this off. And I, and you know, like I would check out Caspian, which was, I think, I think it was
Peter's keynote that he talked about that.
Like, you know, systems like that make me really excited that the world is moving towards
a multi-tenanted, you know, that we as a society are moving towards a more multi-tenanted
infrastructure.
Yep.
Yep.
That also reminds me, speaking of reInvent, S3 Express OneZone, I've asked a few people,
like you've written some great stuff on it. What are your thoughts there on like,
where's a good fit? Or is it just like, you know, it's an early entrant, but it needs to be changed?
Like, what do you think about S3 Express OneZone? Oh, I am so impressed by the S3 team,
that they started with a clean sheet of paper. They did not respect anything that was already set in stone, right?
New authentication protocol.
You know, single AZ, by the way, not multi-AZ, right?
It's actually ridiculously fast.
So it is an incredible product.
I am so proud of that team to pull that out. It's the best AWS offering for
an actual serverless cache because you can write stuff and grab it out and you don't have to worry
about anything. So official AWS. Now, the downsides of it are very simple. If you have an object
that's like you're writing a lot, you to pay for that object you know for at least an
hour so like if you have something like a counter or something you know where you're incrementing
it over and over and over again you'll go bankrupt pretty quickly but there's no capacity units or
anything like that so if you've got like objects where the life cycle is on the order of an hour
like it is really really good um and you know the price is actually not bad either. So I hope more people use it.
And if you're creating that AZ-specific swim lane concept, like the single AZ-S3 is quite beautiful.
Yeah, that's interesting.
A few people, both the WarpStream guys and Nikhil from Materialize, they were both saying, hey, interesting.
I think the two big hangups for them are just pricing a little bit it's still a little too expensive and then uh they don't feel
great about just being one az they want like they're saying hey in reality we're going to
write it to two az's just to make sure we we have it and um and now you know now that doubles your
costs on that as well so oh i don't think they're writing your two az's i think if they're telling
you it's one zone it's probably one zone no no they're writing a 2AZ. I think if they're telling you it's one zone, it's probably one zone.
No, no, no. They're saying the Warpstream and Materialize people are saying if we would use this, we would want to write it to 2AZs to make sure if 1AZ goes.
And then because of that, now we need to double our costs and it's double our costs and just more stuff we're doing and dealing with those failures and things like that.
So because of that, yeah, I think they're excited about it, but still.
Things like WarpStream, I would keep, like, you know, they're very tail heavy.
So I would keep the tail stuff in a cache, like a Memento or local caches,
and then use S3 for the backend.
That's not, I think, because for that kind of workload, like memory is fine, fine. And it's not that expensive.
Because you've got months of data that you're storing in F3.
But your most recent data is what needs to be the hottest.
You can keep it in memory.
Yep. Yep. Okay.
All right. Cool. We're running out of time.
So I got to close with...
We have six sort of rapid fire questions that we
ask everyone um so we'll go through these if you could master one skill you don't have right now
what would it be um so many so uh i would like to really get good at one thing and and instead of
trying to do too many things at once so i i would like to master one thing, anything it is.
Like if I can master one thing, I would be very happy.
I try to be too broad.
And I would like to master one skill.
I don't know what it is yet.
And it's not pickable.
You're multi-tenanting your skills inside of you, right?
That's right.
That's an interesting one.
I haven't heard that one.
So that's good. What wastes the most an interesting one i haven't heard that one so
that's good um what wastes the most time in your day i think driving is the biggest waste of time
because i can't you know do multiple things at once um and i never realized that until i started
working from home that how much time we actually waste in driving has made me appreciate everything
from getting my food delivered to my groceries delivered. If I don't have to drive,
I can just walk everywhere and, you know, get healthier. So I try to do as much as I can on
a walk and avoid driving as much as possible. I cannot force myself to pay delivery fees.
Like it just like something inside of me is just like, I can't do it. Like I will order takeout
and I will go pick it up every time because I just like, can't, I can't do it like i will order takeout and i will go pick it up every time because i just like can't i can't do it that's the only thing i i made a mathematical model man like you you if you you
should just try to allocate a very modest hourly fee to your time and then decide if the door dash
fee is worth it or not the other thing is i i like a love podcast so being in the car and just
listening to a podcast like i work from home all the, so I don't get that much podcast time.
So then it's like, oh, I'll go drive for 15,
20 minutes and get some podcasting in.
But if you walked, if you walked instead
while listening to those podcasts,
you know, it'll cut down your healthcare costs
down the line and you'll recuperate
all of the delivery fees.
It's better for the environment too.
That's true.
That's true.
I don't drive very much,
but I can't, I just can't do delivery.
All right.
Next one.
If you can invest in one company, it's not Memento.
It's not a public company.
I want to know like a private company.
If you could invest in any one private company, what would it be?
Yeah.
I'm personally, I'm very excited about Warpstream.
I am very excited about Warpstream.
I think they've got a good product.
I heard about it, you know, before the announcement came out and I got very excited about WarpStream. I think they've got a good product. I heard about it before the announcement came out,
and I got very excited because they are solving
that AZ-specific swim lane problem.
The AZ pain.
And just the operational, they're making so much,
like those Kafka brokers now, they're stateless, right?
You can just scale them up, scale them down. All the storage is an S3. You don't worry about
that. Like that's pretty interesting. Yeah. So I like that answer. You're the second person to
give that answer in the last week. So I think they've got something going on. Um, what tour
technology can you not live without? My phone. I spent a lot of time on my phone and that's not
just to watch like YouTube videos or podcasts, but when I travel,
I try to not lose a second of productivity. And that's why, that's the other reason why I don't
like to drive. Like if I'm on a train or on a plane, I am constantly on and it keeps me connected.
It is, it's a double-edged sword for sure. But having the phone, and I've recently learned that it's the thing that you spend the most time with if you're in tech.
So it's worthwhile to have something with the best battery and so forth available to you as well.
Yeah, for sure.
Which person influenced you the most in your career?
I took a lot of lessons and learnings from
one of my old bosses, Raju Gulabani. He was never easy on me and always told me as it is.
But Raju used to run all databases and AI at AWS. And he's a very tough manager to have. And I really enjoy having, you know, tough managers.
And he's kind of helped me understand
how to do product definition,
how to think more aggressively
about the competitive landscape
and how to, you know, focus on the product side.
And he's still very, very generous with his time
and, you know, mentors me,
even though he's, you know,
not even working full-time anymore. So I've always been appreciative, but there's a lot of people that have
influenced me and, and basically provide free mentorship that I haven't really earned yet.
You're one of them. And, um, you know, so it's always nice to have people that are, uh, that are
just there to help you out. Yep, yep, cool.
I feel like you need a hard,
a pretty strict person in charge of databases, right?
You wanna make sure that those are working well.
You don't want something like that.
So that's, yeah, that's a great one.
All right, last one.
AI, of course, very popular over the last year
and a half or so.
What is your probability that AI equals doom
for the human race?
Zero.
Zero.
Nice.
I like it.
Optimism.
Yeah.
Technology can be doomed for humans too, but like AI, it's going to create more jobs.
It's going to, it's going to make us so much more efficient.
It's going to make us so much more efficient. It's going to accelerate innovation. Like it's not like we're so far away from the doom that it's just not worth worrying about.
Yeah. Okay.
So you're not bombing data centers
or anything like that right now?
No, absolutely not.
Hazo, I agree with you.
I'm an optimist.
I think it's pretty exciting what's going on.
So Kwasi, this has been great.
I always love talking to you and it's good to get, get some of this recording so other people can hear it as
well. Uh, if people want to find out more about you, more about memento, where should they look?
Go memento.com and follow Quajon on, on the Twitters as well. We'll put, we'll put both
of those in the show notes, but, uh, yeah, thanks for coming on. It was great chatting with you.
Thank you so much for having me.