Software Huddle - Multi-tenancy with Khawaja Shams

Starting point is 00:00:00 I am one of your biggest fanboys, the author of the DynamoDB book. I chased you down at reInvent. It's like, hey, Alex, it's me. What multi-tenancy is and why it's so important? The ability to share underlying resources so that you're not doing repetitive or duplicative work. How much do you know about Elasticsearch? What would you like to know? If you could master one skill you don't have right now, what would it be?

Starting point is 00:00:27 So many. Hey, folks, this is Alex. And today's episode is with Quaja Shams. Quaja is the CEO and co-founder of Memento, which is a serverless cache. And he's a friend of mine. I just love chatting with him because, you know, if I ever have a question about distributed systems or AWS, he's a really good resource on that. He used to lead the DynamoDB team at AWS and now is doing Memento. And we just talk about a lot of different things in here, including multi-tenancy and cellular architecture and what it's like to build on AWS and sell infrastructure products to end customers and just a lot of really good stuff here.

Starting point is 00:01:02 So I love this episode. Hope you do too. As always, if you have any questions, comments, guests you want on the show, anything like that, feel free to reach out to me or to Sean. And with that, let's get to the show. Quaja, welcome to the show. Thank you, Alex, for having me.

Starting point is 00:01:14 Always a pleasure talking to you. Yeah, absolutely. So you are the CEO and co-founder at Memento. But I'd say more importantly, you're a friend that I've met. I met two years ago, I think. And you're just like the person I go to when I have deep systems questions, distributed systems, or how AWS works, or all these sorts of things. So I'm excited to sort of get you on here and take some of our conversations that we have or some of the dumb questions I have and just make them public.

Starting point is 00:01:39 Because I think a lot of people should know this stuff. But with that background, maybe just tell us a little bit about you and what you're doing. Yeah, I mean, it's humbling to hear you describe me like that because I am one of your biggest fanboys, the author of the DynamoDB book. I chased you down at reInvent. It's like, hey, Alex, it's me. So I appreciate your kind words.

Starting point is 00:02:00 I've been following you for a lot longer. So my name is Kwaja. I started my career building cameras on board the Mars rovers, got tired of hardware because I'm impatient and I like to see things happen right away. That instant gratification really drives me. So I started doing image processing for all the Mars rover images that were coming back, got impatient again of all the data centers and dealing with ordering hardware and whatnot and became one of the earliest AWS production customers. This was back in 2008, 2009 timeframe. And back when all

Starting point is 00:02:33 of AWS was one solutions architect and got to work very deeply with a bunch of AWS teams and just been part of this journey of cloud adoption since the very beginning and inspired by it. 2013, I joined Amazon. I ran DynamoDB, and then I ran all the video services for AWS Cloud. Gotcha. Yep. And what are you up to now? Today, I run a small startup called Memento. We build a serverless caching product. We offer key value

Starting point is 00:03:07 store data structures and low latency PubSub. The underlying hypothesis just goes back to the instant gratification. I think people should build things and infrastructure should just get out of the way. That's what inspired me about AWS and cloud. That's why I left NASA to join AWS. And I really think if people can make developers more productive, they can really change the world. So that's what we try to do at Memento, specifically narrowing in on caching and low latency messaging.

Starting point is 00:03:38 Gotcha, absolutely. So one thing that you've brought up a number of times when we've chatted, I've seen you write about it and all this all over the place, is this idea of multi-tenancy and just like how powerful that is in systems and infrastructure. So maybe just give folks an overview of what multi-tenancy is and why it's so important. Yeah, multi-tenancy is the ability to share underlying resources so that you're not doing repetitive or duplicative work. And, you know, just to be clear, where I go for my inspiration on multi-tenancy is a set of Mark Brooker's blogs, Mark Brooker, Andy Warfield. They've just done incredible work just explaining why multi-tenancy works really, really well

Starting point is 00:04:18 at the infrastructure level. But it's all around us in life as well. So, you know, something as, you know, starting with like when you provision an EC2 instance, you're sharing the data centers, you're sharing the rack, you're not getting an entire rack. So whether you like it or not, multi-tenancy is in your AWS. If you're not even on AWS, you have your own data center, well, you're probably not producing your own power. So multi-tenancy is there. If you are drinking water that you're not collecting from rain, that's coming from a multi-tenanted source. Same thing with the internet

Starting point is 00:04:51 inside of your house. So when you look all around, multi-tenancy surrounds us and it is the way that people make consumption of resources more efficient, which results in better availability, better performance, and just an overall better consumer experience. That's multi-tenancy at a high level. Yeah. And I can sort of like get that. And I think most people would get that and be like, hey, yeah, I don't want to go, you know, grow all my own food or gather my own water or that sort of thing.

Starting point is 00:05:22 But I think like where it would sort of break down or surprise people or like engineers and things like that is you mentioned EC2 and that makes sense too. I don't want to run my own data center. So it makes sense that someone else does. I have my own EC2 instance. But I think then you like at DynamoDB, at Memento are taking that a step further where it's like, hey, you're not even renting your own instance.

Starting point is 00:05:44 You're like getting a shared service where there's a bunch of instances and all customers are sharing all this stuff in this multi-tenant system. Yeah, in the early days, it was kind of great that you didn't need to rent your own data center space to get an access to one more instance. It was available to you instantly, right? That's why they were called instances.

Starting point is 00:06:09 So the EC2 instances were available to you instantly without having to build things. Now, that was fundamentally a much more efficient way to consume resources. But then a few years down the line, what we realized is that you can make things even more efficient. So when you run a Lambda, right? So even with EC2, what people were doing is they would provision an instance, but even that instance may run idle most of the time. And if you are sharing capacity at the instance level, it leaves room for further

Starting point is 00:06:44 efficiency. So some instances might be running really heavy on CPU. Some might be running really heavy on memory. Some might be running really, really heavy on networking. And some might just be idle most of the time. So if you zoom in, and if you increase the granularity at which the multi-tenancy is occurring, lots of optimizations come your way. So as an example, when you provision, when you start to use S3, you're not provisioning a bunch of machines and drives for your S3 bucket.

Starting point is 00:07:17 You don't start by saying, how many S3 capacity units do you need for your bucket? You literally create a bucket and you start writing. And that allows AWS and any, you know, whoever the storage providers are, it allows them to manage utilization of that capacity at a much, much, you know, finer grain. Because now they can make sure that, you know, if some drives are more consumed than others, that they can make sure that you know if some uh some drives are are more consumed than others that they can kind of spread the load and you don't have you know provision s3 hard drives that are just sitting there idle across millions of different customer accounts so that's

Starting point is 00:07:58 where you know you zoom in and you start doing capacity management at resource at granularity that is finer finer than an instance and this is why it's really really hard to beat s3 for really low cost incredibly durable storage yeah speaking of did you see that there is someone trying to beat us there's like a new s3 um competitor man tigris i believe it came out just like last week have you seen this yeah i mean s3 uh tigris is great, by the way. And there's been lots and lots of people trying to reproduce and replicate S3 and also building on top of S3, right? So from the very, very early days, you know, there were things like the Swift object store.

Starting point is 00:08:38 And then we got, you know, Eucalyptus was trying to do things beyond EC2. And then, you know, OpenStack had a whole bunch of capabilities. MinIO showed up. So there's a whole lot of ways that people have tried to reproduce that infrastructure. I think what Tigers is doing, and I think where they're right for optimization, is building on top of the shoulders of giants. You don't have to reinvent S3 to actually add a whole lot of value to the S3 customers. You can rely on that multi-tenanted,

Starting point is 00:09:09 incredibly efficient, incredibly durable, incredibly available fleet, and then just enable customers to utilize that fleet even better for their specific workloads. Yeah, yeah. Okay, so going back to what you're saying with S3, when I think of like, you know,

Starting point is 00:09:23 this new sort of multi-tenanted era, what we're talking about here is if I consider something like RDS, and let's say I have an RDS database, and for whatever reason, I really need to, I have access to Vernon Rogles, and I say, show me exactly where my RDS server was. Theoretically, he could find that and he could take you to a data center somewhere and be like, okay, your data on this specific like server rack thing right here right like he could he could sort of there's like some control plane stuff that's outside of that but generally he'd be like here's your this is your rds database technically whereas like even if you had access to you know bezos and andy jassy and verner bogos you said, show me where my S3 bucket is.

Starting point is 00:10:07 They couldn't do that because it's like, you know, it's split across all these different machines that are in different data centers, like all sorts of things. And like, you know, they could show you a server and say like, hey, this one has some chunk of your data on it along with a hundred other customers' data on it and stuff like that. But it's not like a specific instance

Starting point is 00:10:24 that you sort of have and is yours in some sense, even if it's not like, it's not like a specific instance that you sort of have, and is is yours in some sense, even if it's managed for you. It's like truly just all over the place. Is that is that a useful way to think about it? Or is that wrong in so many ways? Yeah, I mean, I think all over the place is is a good way to think about it. And that's actually where your blast radius is a lot lower, like all over the place is a feature, not a bug, right? Your resources on S3 are peanut butter across millions of drives. So Andy Warfield at the Fast Conference recently gave this astounding statistic.

Starting point is 00:10:55 Tens of thousands of AWS accounts exist where their buckets are spread over a million drives or on millions of drives. Let me repeat that. Tens of thousands of customers have buckets spanning millions of drives. Like that's bananas. Yeah. When a particular machine fails, it's not like all your data goes away with it.

Starting point is 00:11:19 And so that peanut buttering, why is that important? Because it allows you to absorb burst. Bursty writing, bursty reading, bursty network, right? All of that. Like you want to create a million small objects like that. You're not waiting for new instances to pop up and be allocated to you. They're just there because they're just going to get peanut butter across all of S3's data plane. And that is very, very powerful.

Starting point is 00:11:44 So it comes down to how quickly can you kind of absorb the burst. And in a single-tenanted environment, the only way you can absorb the burst is by over-provisioning and ridiculous amounts of over-provisioning. In multi-tenanted world, you're over-provisioning a little bit, but you're over provisioning at a fleet wide level. And every small customer is kind of helping you smooth out your peak to average ratio. And driving down the peak to average ratio helps increase the utilization, which helps you decrease your costs and also optimize the availability and performance for your overall fleet. Yeah. Yeah. And that peak to average. So I think I saw Andy Warfield do that S3 talk,

Starting point is 00:12:27 and he sort of showed that on a graph. Like, hey, here's an individual workload and what the peak to average, huge bands. Now here's if you put 10 of them together, smaller bands. And here's if you put 1,000 of them together, and it's like almost a flat line, right? Like there's not that much difference between that because once you aggregate those out,

Starting point is 00:12:43 they really smooth it out quite a bit so i mean i like multi-tenant like i like dynamo right and so like i'm like familiar with multi-tenancy s3 memento is that same way but like why don't we have more multi-tenant systems like i feel like we still have a ton of sort of data systems that aren't multi-tenant and like, is there sort of just like certain, I don't want to say databases, but certain, certain, yeah, certain patterns or things like that, that are harder to do in a multi-tenant way. Whereas like, you know, the split ability of S3 and Dynamo and Memento caching,

Starting point is 00:13:19 like makes it easier to multi-tenant or like, why don't we have more multi-tenant systems? Multi-tenancy is hard it's it's really difficult to get it right you have to avoid noisy neighbors you have to have the right throttles you have to do utilization and then it's really really hard as a business to be multi-tenanted because you know when you're selling to the world when you're selling efficiency to the world like multi-tenanted systems can be, you know, they can cut down your re-circulation by 90%. And it is really, really hard to build a service that takes, you know, four times longer to get to a billion

Starting point is 00:13:55 dollars in ARR. The early services like RDS and ElastiCash, it's easier for them to get to very, very large revenue numbers because nobody's doing this willfully, right? But if you have a lot more idle capacity that people are buying, then guess what? They're buying 10 times more capacity than they need. So from a business perspective, it is really, really hard to justify more multi-tenanted services. It's not impossible, but it is harder.

Starting point is 00:14:24 From an engineering perspective, it is much, much more difficult to build a multi-tenanted services. It's not impossible, but it is harder. From an engineering perspective, it is much, much more difficult to build a multi-tenanted system to deal with all of the isolation. But over time, everything is going to go towards multi-tenancy. It's just a matter of who's able to take the leap first and kind of get there. And we see transitions happening in this. So, you know, RDS, single-tenanted, sure. Aurora, the end storage system underneath the covers, it's completely multi-tenanted. So sometimes it takes longer to, you know, to get to a point where the competitive landscape

Starting point is 00:14:59 is, you know, competitive enough that you have to go down that path of more efficiency. But it is a little bit harder to kind of bootstrap and kickstart a multi-tenanted environment. Gotcha. How, I don't know, how much do you know about and how much can you talk about Aurora's storage system? Like what's going on underneath there as sort of like how it's working in a multi-tenant way or just like how it's working generally?

Starting point is 00:15:23 Yeah. I mean, one of the beautiful things about Aurora is that it decouples the compute from the, from the storage. So you can have multi-AZ replication without having, you know, Aurora nodes in multiple AZs. That itself is really efficient. So if you look at RDS, the only way you're going to get multi-AZ replication is by having two RDS instances, one in each AZ. That's because it's just those two instances that are, you know, doing your replication for you. And by the way, if one of those goes down, well, that's 50% of your fleet. Gotcha.

Starting point is 00:15:56 And sorry, so you're saying Aurora can have multi-AZ storage without Aurora nodes and multiple AZs, without multiple Aurora compute nodes, but they do have multiple storage nodes in there. But it's like they can, yeah, they have storage nodes all over. Right. More customer-specific compute nodes, right? Without the customer-specific compute nodes, because they decouple the storage and the compute engine. Now, that storage system is probably one of the best pieces of technology that's been built inside of AWS. But it is fundamentally very, very powerful because guess what?

Starting point is 00:16:32 The compute and database is nice to do queries and so forth, but I think people care a lot more about, hey, please don't lose my data and please don't lose the transactional correctness of my data in a database. And decoupling that and making that really crucial part multi-tenanted is incredible because now if you want to bring up a new host, well, if the storage system already has the data, then you don't have to wait for that node to warm up before it's available. You don't have to wait for it to bring in an entire terabyte of data. So, you know, like I think over time everything is going to go towards multi-tenancy

Starting point is 00:17:14 anyway. And over time you might get into a situation where, you know, EC2 and, you know, Fargate and things like that, the start times will start to decrease as well. So as the instant start times can decrease, then you get a lot more efficiencies regardless in terms of being able to optimize. But that's just better multi-tenancy at the EC2 level as well. You still need to decouple the storage side, I think,

Starting point is 00:17:42 and make that part multi-tenanted on everything else. Gotcha. Okay. So if I understand it correctly, like when I think of Dynamo, it's like, it's multi-tenant up and down the stack. There's like no aspect of that. When you provision something in Dynamo, there's like nothing that is yours. Like the request router is multi-tenant, the metadata store, the transaction coordinator, the storage nodes, all that is multi-tenant. Whereas with Aurora, storage layer is multi-tenant. Whereas with Aurora, storage layer is multi-tenant, but then compute on top is going to be customer specific, whether that's your main writer node, whether that's your read replicas, those are going to be sort of customer specific nodes. That's right. And there's multi-tenancy, by the way, up and down. EBS is a great example of multi-tenancy.

Starting point is 00:18:27 People don't even realize they think they're getting their own drives, but behind the scenes, there's a whole lot of optimizations that are happening. They're not going and provisioning individual nodes to give you that 10 gig drive. And you can imagine optimizations where people provision a lot of EBS instances that are like 10 gigs and they use up less than a gigabyte there, right? So can you imagine the amount of efficiencies that might be available for somebody to take advantage of?

Starting point is 00:18:53 Not saying any of those optimizations have been done or not, but like it's a very natural optimization, right? To not allocate the unprovision or the provision but not use blocks in a block storage system. So yeah, things go down this path one way or another. Yeah, interesting.

Starting point is 00:19:12 Okay, so this might be a good time to talk about Memento, how that fits into multi-tenancy. So you said Memento is cache, also like low latency pub sub messaging. Maybe start with the cache aspect. Like, what aspects of a cache to like i think people probably mostly think redis memcache like how close are you to those sorts of things and then like how does multi-tenancy fit into this yeah um so our job at memento is to reduce the number of boxes in your architectural diagram. And we do end up, you know, swapping out a lot of memcache and Redis. And what happens in the Redis and memcache scenarios is you don't have one block for your cache.

Starting point is 00:20:01 You have, you know, a cluster. You have shards. You have replicas, and you have, you know, the number of replicas that you need based on your environment and which AZs they're in and all of that. Like that's a very, very leaky abstraction that leaks all the way to the SDKs that are now aware of the end-to-end server-side topology. So when you auto-scale and you go up, well, guess what? Your SDK has to become aware of that. So we start with a multi-tenanted storage fleet that is basically,

Starting point is 00:20:35 instead of allocating the capacity that a customer is asking you to allocate, we do it on demand and we peanut butter that capacity across the fleet. Then on the request side, we have an API gateway that basically does the authentication, does the TLS termination, and then routes the data to the right layer. Now, this capacity is also completely multi-tenanted, but it has a couple of other features. It's got a web server built in and it has fine-grained access control, which allows customers to connect their mobile devices or their web browsers directly to Memento bypassing a whole bunch of API layers that you would otherwise need to provide. So if you wanted to build a PubSub

Starting point is 00:21:24 system with millions of subscribers, you just talk to the Memento web servers and boom, you're covered. It also has a whole bunch of like fan out capabilities, like it can, you know, hash hotkeys and so forth. And all that capacity is sitting in a warm pool on behalf of customers. So if any one of those customers has a spike,

Starting point is 00:21:43 you can kind of absorb those spikes based on the capacity that is available on the storage side, as well as on the request routing or API gateway side. Okay. Okay. Interesting. And I think as part of this, one thing you've mentioned offhand before, I haven't dug in far enough on it, is like the benefits of having a deeply integrated control plane and data plane. Maybe talk about what you mean by those quickly and then why it's useful for them to be so deeply integrated or how you have a deeply integrated Memento, what that means. Yeah.

Starting point is 00:22:18 So in the early days of AWS, Jeff Barr used to have this awesome slide that used to talk about you have two options when you're provisioning infrastructure. You can kind of under-provision, and then you have angry customers, right? And because you have this usage pattern that looks like this. So you can over-provision, and then you have an angry CFO, right? And what happens when you have a spike is you might have an angry both. Like you might anger customers and

Starting point is 00:22:45 your cfo because you had all this capacity that was provisioned this whole time but it wasn't there when you needed it right so the the integrated you know data plane is a um you know it's a set of capacity that's that's made available on the storage side, as well as the ability to extract that storage at a very, very high TPS. But the control plane comes into play to make sure that the fleet-wide health is managed appropriately. It's the thing that is making sure

Starting point is 00:23:16 that you have enough capacity across the fleet. You have enough capacity for every single cache that is out there. And it is the thing that informs our API gateway layer to with the latest topology changes for any given cache. This, you know, tight integration between the control plane and the and the data plane allows us to make our clients much, much simpler, because now they don't have to worry about the server side topology. And we don't have to worry about different clients having different ideas of what the state of the world looks like.

Starting point is 00:23:48 So we can make changes really, really fast behind the scenes because we have a tight control over the messaging to the API gateway layers. And we can be assured that data is getting routed immediately with the latest topology in mind. Yep, yep. I like that. I think you said, man, I'm just picking something. But like the fact that that control plane and data plane is,

Starting point is 00:24:11 is so tightly integrated as like, I would compare it to, let's think like RDS, right? Where there's an RDS control plane where I go, maybe I go to the console or I use CloudFormation or something like that. And I say, provision me a RDS instance. And it's sort of a control plane sort of manage that and spins it up. And then the data plane is the instance itself, which I'm doing operations against. But then if I start hitting that too hard, you know, it's not going to like know how to help fix that in any way.

Starting point is 00:24:39 Right. If I say like, if I'm like overloading my database, maybe I'm hitting a certain key too much or a certain record too much or something like that. There's not much I can do. I have to go to the control plane and say, hey, go do this action. Whereas if I think of DynamoDB and you look at adaptive capacity and things that they're doing where they realize there's a specific partition where that item's just getting hammered. A couple items are getting hammered and they can sort of split those partitions into different ones because there's like that connective tissue between the data plane and the control plane. And that data plane, it's like serving these requests, but it's also saying to the control plane, hey, like, we're hitting some issues here. It's like these keys that are getting these issues. And now the

Starting point is 00:25:21 control plane is able to like adapt to that sort of thing does it am i understanding that correctly or am i am i like way off base here no i i think you are and the way i like to frame it is there are shallow control planes that are mostly uh operating at a compute level right they're like okay does this have enough compute and this is have enough storage that's it and they you know And they don't have any idea of how you're distributing the data across your multiple shards. That's not up to those shallow control planes.

Starting point is 00:25:54 They're unaware blissfully of whether you have hotkeys. They're unaware of what your shard utilization is. Like if one shard is busier than other because of storage they don't care that's not that shallow control planes problem now a deep control plane is operating at the resource level so for s3 you know i gotta make sure that my like all buckets all customers are um have enough capacity right it doesn't matter how many shards are in their s3 bucket that's s3 teams problem same thing for dynamo you can have one of your dynamo db partitions get really ballooned up but that's not your problem because dynamo will

Starting point is 00:26:41 automatically you know split for space and boom, you got, you got that split. So to me, it's more about deep control planes that are aware of the primitive on which they're offering a service as opposed to a, a shallow control plane that's built for compute and storage only. Yeah. For when we're talking about like open source data infrastructure, databases, things like that, like, listen, I like open source. I think it's useful, but I think it also limits, I guess, like how deep that integration can be. Like, do you sort of agree? I just think of like the switch from original Dynamo from AWS, from amazon.com, right? Where each team was sort of running its own Dynamo

Starting point is 00:27:25 and they had to like simplify it operationally in some ways to make that work. And then like DynamoDB itself, where they're like, hey, we're running it for you. It's not gonna be open source. It's not gonna be provided to Amazon.com teams. And because of that, we can have like this deeper integration between that.

Starting point is 00:27:41 Like, do you think that's a, is it gonna be like doable for an open source database to have that sort of deep integration between data plane and control plane or is that something that can really only happen when you have you know a more proprietary database does that make sense yeah i i think it's inevitable i i i think people often focus on the data plane side of open source right and if you look at all the data plane side of open source, right? And if you look at all the data plane, all these open source companies, sure, I can give you my data plane and then you can figure out how to run it

Starting point is 00:28:16 or they vend a managed control plane service. And that's usually proprietary because a lot of the value is not the actual data plane. It's the management of the data plane. It's the on calls that are keeping the fleet healthy. It's like when you buy a car, if I made your car free, it actually doesn't help you as much because a lot of your costs, unless you're buying really fancy cars, but if you're like an average American, most of your cost is actually operational. It's the gas that you're putting into that thing, right?

Starting point is 00:28:55 And the insurance that you're paying for and the parking spot that you're paying for if you're living in Boston or something like that, right? Like the operational cost is where the meat is. Like, so sure, like you can have the data plane, but it's the control plane that ends up making the bigger difference. I don't think they are mutually exclusive. Like you could open source control planes, but that's generally quite rare. Yeah. And it's, and it's kind of weird because it's like, Hey, if I'm going to go run my,

Starting point is 00:29:26 my own database, it's probably going to be in more of a single tenant manner rather than like, you're just not going to, I think probably to make it, to take advantage of the integrated control point and data plane, you probably need multiple tenants where they're sharing capacity in some sense. And then you could be over provisioned a little bit,

Starting point is 00:29:42 but, but like you're saying, spread it around and different things. So it's probably just like it's weird i don't want to say like weird incentives it's just like weird um yeah it's just i think it's harder to make a multi-tenant system open source like do you know any examples of open tenant system or multi-tenant systems that are open source they probably exist but i can't think of one off the top of my head. You can do multi-tenancy on your own, right?

Starting point is 00:30:14 So Memento, for what it's worth, we're big believers in cellular architectures. And we have this really interesting combination that gives the best of both worlds to customers where we can do private multi-tenancy, where large customers are large enough that they have enough workflows within their organization that they can kind of absorb the load between their own workloads. And they don't need to, you know, compromise any security or data mixing, you know, by sharing a cell with different customers. And you'd be surprised, like, you know, even

Starting point is 00:30:48 as soon as customers are running dozens of instances for their ElastiCache plus their web servers, like, a private multi-tenanted cell can actually bring meaningful efficiencies into their ecosystem. And if you really live by the cellular architecture, like all you're trying to do is improve your resource utilization. And multi-tenancy is just a means to improve resource utilization. And if you do multi-tenancy at a smaller scale

Starting point is 00:31:16 with just that customer's account, that's still meaningful efficiency gains for them. Yep, yep. You mentioned cellular architecture, which is something I hear you and just every AWS person talk about all the time, but I still feel like it's like under talked about outside of AWS. So like,

Starting point is 00:31:32 what is cellular architecture? Yeah. So cellular architecture got evangelized inside of AWS as a means to reduce blast radius for services. So we have very large services at AWS, and you don't ever want to have a regional failure for one of those tier zero services. And the idea was that instead of having one big regional deployment,

Starting point is 00:31:58 you would have lots of small ones, and then you would shuffle shard the customers between them so that you can um you know like you reduce the blast rate if any one of those cells goes down it's not like you have a regional outage for every single aws customer aws regions have gotten quite large now right so if the regional failures are catastrophic consequences on our economy for that matter right yeah yeah so that's how it started but then there's a lot of other benefits uh around um you know like scale because then a

Starting point is 00:32:31 given cell is not going to be you know meaningfully large so then it becomes a unit of deployment as well and um it was um you know the person leading it at amazon was uh peter vashel who was the first distinguished engineer at amazon and he he really led the way in making sure that all the services especially the new ones that were coming out have been cellularized and we went as far as saying the internal company goals were a service ought to be able to whip up a new cell in less than four hours. So go all in, like you can only do cellular architecture. If you're really good at infrastructure as code, like your entire cell has to be, you know,

Starting point is 00:33:14 ready to go where you can just click something, deploy it and boom, four hours later, all the limits and everything is there and ready to go. And once you get to that part, then you have the ability to whip up cells that are dedicated to a specific customer. A cell is typically also encapsulated in its own AWS account. So even if that AWS account goes away, it's just limited to that one cell. So it's about blast radius reduction. It's about scale units and just isolation in general.

Starting point is 00:33:45 Okay. So then what you're saying, like in US East one, S3 is not just one giant system, but it's a lot of cells. Each one, like each one replicates sort of the entire system, but those cells are essentially completely independent from each other. I don't know if S3 has gone in all in on cellularization or not but i can tell you that's the intent that uh that's supposed to be uh dynamo in a region has lots and lots of deployments okay so okay so let's talk about dynamo if you can talk about that one like how many cells are we talking are we talking like tens are we talking like tens of thousands like can you give me like the order of magnitude and and you know

Starting point is 00:34:26 i know this doesn't have to be exact but like just like is it closer to 10 or 10 000 or a million or like how many cells are we sort of talking about just to give people no i i think you know it really varies on on each service but i think in inside of amazon regions i think you're like five cells is pretty good for a given region because that reduces your blast radius down from 100 down to down to like 20 right and then like getting that next 5x uh you know um boost is like 25 cells and like remember there's so many aws regions right so not every region like some regions are going to be small enough to be your, your smallest cell. And one of the things, so, you know, that's the order I would, I would imagine now in memento. Gotcha. So it's, it's not, it's not like you're adding a cell every

Starting point is 00:35:14 month or something like that. Like adding a new cell is a pretty rare situation. And it's still like, yeah, it's not like cells as,'s not like they're was a cattle, not pets. It's not like a cell as cattle or something like that. No, you can't treat a cell like cattle either, because that's the that's the other thing. Right. Like because then it's like, well, you can't sacrifice these cells either. The you know, going from one region like to two cells is the hardest part. Once you get there, then you can create cells. And then there's a whole lot of design choices

Starting point is 00:35:51 that are still left available to the developers. How do you distribute data between different cells? Or how do you decide who goes in which cell? Some services will say, okay, I, I'll put, you know, each customer can have each resource, like if you let's say, in a hypothetical case of a service that had a bucket, you can say, like, okay, a bucket is in a cell. So you can have one customer be available in every single one of your cells, because, you know, you deploy a bucket to a different cell. Or you can say, I'm going to isolate a customer to their cell and all of

Starting point is 00:36:26 their buckets are going to be in that particular cell. So there's all kinds of trade-offs. And then how do you do the routing? How do you actually make sure that the global mapping is set up appropriately and whatnot? So it's a- Yeah. On that sort of question, let's talk about Dynamo a little bit. Let's say US East 1. Let's just say they have five cells. I'm not sure. But where does that split happen? If I think of a Dynamo request, like it hits the load balancer, then request router, and then down.

Starting point is 00:36:54 Are the load balancers cellularized? Are those across all the cells? Are request routers across all this? When do I have to look up this request came in from Alex and I need to figure out which cell he belongs to. Where does that look up happen? So here's the thing. The services, a little decoder that you can use is services that give you a unique DNS entry for your endpoint are the ones that have the easiest time, like routing you to an entirely contained cell. Services that are a little more complex, they don't have to cellularize at the entire service level.

Starting point is 00:37:31 So you can cellularize at the storage node level in Dynamo. You can cellularize at the request router level. You can cellularize at the control plane, you know, cache or auto admin level as well. So it's completely up to the teams in terms of how, how they, they sell it. So like in the case of like media convert, for instance,

Starting point is 00:37:52 you get, you know, an end point and that end point determines which cell you're, you're going to a lot of like, so all the media services that got built at elemental, for instance, for AWS, they were all cellularized and they,

Starting point is 00:38:10 what we tried to do was to hand customer a specific endpoint, which allows us to move their stuff around as well to a certain extent. But for the other services, the cellularization may not be happening at the entire service level. It might be happening at the component level. Yeah. Oh, that's interesting because I've been sort of using as like a intuitive sense for people as to whether they're using a multi-tenant or a single-tenant piece of infrastructure from AWS is like, do you have a unique DNS name you're hitting rather than a separate...

Starting point is 00:38:38 I always say like Dynamo, you're hitting dynamo.useastone.amazonaws and it's a multi-tenant service, same thing with S3. Whereas if you go provision rds or aurora they give you like you know a bunch you know some unique identifier dot aurora dot whatever or an nlb for that matter right like when you provision an nlb you get your own little um endpoint that they can use to change to a different cell different you know instance that might be routing your your capacity as well yeah but anyway it sounds like my intuition is is wrong like even some you might have a service that gives you sort of a unique dns entry but that doesn't mean it's it's necessarily a single tenant service that you're hitting because it could be you know that dns entry is routing to

Starting point is 00:39:22 a cell rather than routing to a you know your sort of RDS compute instance or something like that. Is that right? Yeah. I mean, at that point it's like, you know, which control plane is managing those? What is the deployment units? Again, it comes down for cell architecture. It just comes down to reducing the blast radius and isolation. Right.

Starting point is 00:39:40 So, and every service, you know, is going to go about their own ways to achieve those objectives. Yeah. I feel like, I don't know if this is true or just intuition, but I feel like we've had fewer region-wide service disruptions in AWS over the last, let's say, five years. Do you think that is true? And if so, is that a consequence of cellular architecture? Or maybe, hey, maybe a cell is going to be having trouble. But it's rare now, it seems like, to have a full region-wide thing.

Starting point is 00:40:12 And like, is that related to cellular stuff? Or is that just related to, you know, you keep doing COEs for 15 years and things are going to get just pretty hardened and things like that? I don't know if you can pin it to a single thing. Every day that AWS gets without an outage is a day that should be celebrated because the sheer scale at which AWS is operating at and the sheer number of mission-critical workloads that run on AWS, it's just really, really impressive.

Starting point is 00:40:42 And the 15 years of operational excellence, the COE, the dive deep, the ownership of all the operators that go in and proactively mitigate issues before they become problems that take down the world. There's a whole lot that goes into making AWS what it is today. And I can't pin it on how it happens, but it is incredibly magical. The level of availability that AWS has been able to pull off over the years. And there has been outages, but it's just,

Starting point is 00:41:14 you know, they, they've gotten better with every single one of them. Yeah, for sure. Okay. Let's you talk about cellular a little bit. You've also mentioned to me before that most people are doing cross AZ

Starting point is 00:41:23 wrong. I haven't gotten into this with you but like what do you mean by that yeah i doing cross az blindly is is actually a very dangerous practice so specifically az's don't go down that often so you have to start with why why do you want to go multi az and if the answer is because an AZ might go down, that doesn't happen as often. What does happen more often is cross-AZ packet losses that get elevated or cross-AZ disconnects that happen. They don't happen much, but they happen a lot more often than an entire AZ going down. And then you start thinking, OK, what are the consequences of this design that I chose to go multi-AZ?

Starting point is 00:42:18 Well, two cents a gigabyte, that gets very expensive very fast. But you add a millisecond or so of latency of going across AZ, so now your performance is is getting interesting then you start looking at performance at the tail because of those packet losses you start to have availability issues as well or like really bad you know like tcp which transmits at 200 milliseconds like things start to get really bad and they will happen a lot more often across az the more devices you're going across the more um you know hops you have and the more likely you are to have one of those be struggling. So you got to work backwards on the problem. If the problem that you're working backwards from is multi-AZ, like resiliency, there are other patterns that you can use. For example, I like to promote this notion of, you know, AZ specific swim lanes. So a swim lane that is entirely contained in the AZ where the database, the cache,

Starting point is 00:43:13 and the web server are living in, you know, entirely in one AZ. And, you know, sure, you can have, at some point, you need to replicate the data across, but at least your cache and your web server can be living in the in the same AZ so that you can absorb some of those outages further. Now, as soon as you do that, your performance, your availability and your cost have a meaningful improvement as well. And it's much better than unnecessary hops across AZs, which have their consequences. Yeah, okay. Well, that brings up, that brings me to my favorite question,

Starting point is 00:43:52 which I ask about a million people. Whenever they bring up cross AZ costs, what's going on there? Is that a legit valid cost? Is it sort of, I don't want to say like rent seeking from AWS, but is it like, you know, that they are able to take advantage of that?

Starting point is 00:44:08 Like what's going on with cross AZ costs in your opinion? So cross AZ is actually quite an expensive endeavor when what we see at two cents per gig might look really, really expensive. But what we have to appreciate is that AWS is absorbing the peak to average ratios there. And I can be sitting idle, not sending any data across the AZs, and then I can start a multi gigabit per second workload and push that through. And the rate that Amazon is giving me is not the sustained rate, right? It's giving me on-demand pricing. Now, what I wish I could do is pay for a pipe. And, you know, in some cases, like if I don't have a very high peak to average ratio, I would love to just, you know, buy a direct connect between AZs and just pay Amazon on those. But I'm telling you, for almost every customer, that direct pipe will cost you more

Starting point is 00:45:07 than the two cents per gig. So it's easy to, you know, it looks really, really expensive, but the underlying infrastructure that is required to give you that elasticity and that burst is actually quite expensive. And the level of innovation that AWS has to do to make that as seamless as possible is also quite expensive. Yeah. Yeah. I think that makes sense. I guess one other place I hear people complain about it is like if they're running their own Kafka clusters or if they, maybe if they're even a provider of Kafka clusters for someone else and they say, well, if you look at MSK, Amazon's managed Kafka, managed streaming for Kafka or whatever that is, they don't charge across AZ stuff.

Starting point is 00:45:48 So it's like not cost competitive for us if we want to run it on EC2 compared to that. Like, is that valid? I mean, is that sort of priced into what's going on with MSK or what's going on there? It's priced in and MSK reduces its peak to average ratio by being multi-tenanted too. There you go. As a service, you know, you're, as a customer, you end up having much lower peak to average ratio that you're, you know, that you're throwing onto the network team. Right.

Starting point is 00:46:18 So it comes down, maybe I'm a multi-tenancy parrot, but you know, it's, it's same thing for, for S3, same thing for ElastiCache. If you look at the ElastiCache AWS account, their peak-to-average ratio is going to be a lot lower than you pick a random high-utilization ElastiCache customer that's going like this. So as a customer, the AWS services are better customers of the AWS networking stack than a single tenanted workflow. So that's kind of baked into it. And then, yeah, it's baked into the pricing. Like ElastiCache is 104% premium on your EC2 instance. Of course, that's covering some of the, you know, the network capacity that you would otherwise pay. Furthermore, like if you look at things like ElastiCache, you know, ElastiCache will absorb, like it'll replicate across the AZs

Starting point is 00:47:12 for you for free, but your gets, you still have to pay for. So you still have to pay for your half of the, of the gets. You still pay one cent a gig for any reads that you send across AZs. And a lot of customers don't know to try to create that AZ-specific swim lane that you can use to actually meaningfully reduce your AWS spend if you do everything else the same, but just have your web server route to the local AZ and save a bunch of money. Yeah. On that same sort of note, especially what I mentioned with Kafka earlier, one thing I like to ask data infra founders is you are both a huge user of the cloud, but then also a competitor of the cloud because they have a competitive product. How is that relationship? Are the clouds pretty good partners on that stuff?

Starting point is 00:47:58 Is it tense? What does that sort of look like, your relationship there? We're certainly competitors, but cloud is also an enabler, right? So many of the startups just wouldn't exist without the cloud providers. You look at, let's go back 18 years. How many tech infrastructure startups existed? It's not just because there's more money and more innovation is happening. The level of experimentation that can happen is meaningfully higher as well. Now,

Starting point is 00:48:27 of course, they have some advantages that we don't have, right? And, you know, and you rely on them to continue to play fair. But at the same time, I have to remain appreciative of the fact that they've created this ecosystem that I can use to innovate and add value to my customers and make money for my shareholders. Yep. Yep. Absolutely. Sort of on that same note, I've been thinking like, have we been seeing enough improvements from the clouds in the last couple of years? Because I feel like from 06, 08 to 14 or 18, maybe it was just like gangbusters all the time. I feel like from 06, 08 to 14 or 18 maybe, it was just gangbusters all the time.

Starting point is 00:49:09 I guess, have we seen enough pricing improvements? We don't see many price decreases anymore. Should we be seeing more given just the advances in storage and network and CPU, all sorts of stuff? Should we be seeing improvements there? The EC2 prices might look the same, but the capacity is getting better, right? So CPUs are much nicer.

Starting point is 00:49:37 The network is much nicer. Not just on the bandwidth side, but just the packets per second that can be handled, the latency. So there's a whole lot that's coming in for free. Now, that said, AWS and the cloud providers are able to test more price elasticity because now they're no longer fighting to become the incumbent. They are the incumbent.

Starting point is 00:49:58 So that allows them to charge premiums that they couldn't have charged 15 years ago. All right. So, but that also creates an opportunity for startups to show up in and compete by making efficient use of the infrastructure. And it's a nice symbiotic kind of ecosystem in that regards. Yeah. Yeah. What about just in terms of like,

Starting point is 00:50:19 are we seeing enough hardware improvements? And, and so today's February 26th last year on hacker or last week, sorry, on hacker news, there was an article about, uh, it was called SSDs are fast,

Starting point is 00:50:30 except in the cloud. And it's talking about how like NBMEs can be doing like 10 to 13 gigs per second of, of sort of read throughput. Whereas like in AWS and Azure, you're getting maybe two or three gigs per second. Um, I guess like,

Starting point is 00:50:44 what's, is there more to the, I don't know enough about hardware to know if that's like a valid claim or, or second. I guess like, is there more to that? I don't know enough about hardware to know if that's like a valid claim or if something else, like I wouldn't think that sort of the clouds would just be holding it back or not improving intentionally. Like maybe there's not enough demand for it.

Starting point is 00:50:56 Maybe it's too expensive. I guess like, I don't know. Do you have any thoughts on like, is the hardware improving as fast as you would expect? Customers are always going to want things faster and cheaper. That's the two axioms from Jeff Bezos, right? Those are undeniable truths. They'll always happen.

Starting point is 00:51:13 The good news is that there are multiple cloud providers that are vying for your business. So if a technology existed and that was in high demand, they would be trying to one-up each other and make it available faster. But there's a whole lot of things that go into the equation. You have to have enough scale. You have to be able to get enough capacity. You have to make it available to everybody. And right now, the whole world is in a capacity crunch

Starting point is 00:51:39 because compute consumption and storage consumption is just going through the roof with the AI revolution. So I don't think any cloud has an incentive not to make NVMEs or faster drives available to you. Now, there's been a bunch of really nice novel innovations that have happened. Like all the Nitro and the Graviton stuff, like your memory is encrypted by default on those instances.

Starting point is 00:52:07 Your drives are encrypted by default. So, you know, there's a whole lot of other things that are happening on the cloud providers. And yes, the hypervisor gets in the way. And yes, there's some things in the way. But like, at the end of the day, if the capacity exists and if the innovation exists, like it will become available in the cloud for the masses. How much do you know about Elasticsearch? What would you like to know? I just spoke with someone from Elasticsearch and it was super useful. It explained a bunch of stuff to me. But I wonder why we can't have

Starting point is 00:52:35 a more cloud-native Elasticsearch. It kind of reminds me of Dynamo, pre-DynamoDB in some ways. Like just think about like it's kind of like they're going to shard your data and spread across these different nodes. But like shard management is pretty manual and it's pretty hard to like increase and decrease. Usually you have to do it by a factor and it's just like kind of a scary operations. And you're mostly on your own doing it, right? You're aiming for shard size of like

Starting point is 00:53:05 tens of gigabytes 10 to 50 gigs generally that seems like dynamo ish in some ways right you also have like storage nodes that are doing everything right like a storage node like handles requests and serves as like the request coordinator reaching out to all the other storage nodes but it's also storing the data itself and like hey maybe some separation between like doing the scatter gather yeah yeah exactly maybe doing like the data itself and like, Hey, maybe some separation between like doing the scatter gather. Yeah, yeah, exactly.

Starting point is 00:53:29 Maybe doing like the request router and just separating that out a little bit more. It's also doing like replication is, is like synchronous. Like you scale up reads by having more read replicas and that replication to those read replicas is, is mostly synchronous unless you have like some lagging nodes and it'll maybe cut them off. But generally it's synchronous leopard application, even though you're buffering

Starting point is 00:53:46 updates because you're only flushing to disk every second or something like that. It just seems like almost like a Dynamo-like system, but for Elasticsearch where Dynamo wants to route every request to a single partition. And what if you made a Dynamo that's like, well,

Starting point is 00:54:01 we have to scatter-gather every request instead, but we get all these other sort of benefits. I don't know. It just seems like there's an opportunity for a better Elasticsearch there. So I used to write a bunch of Lucene code in the back when I worked at NASA. And I'm very passionate about this particular space. You're right that a Dynamo of Elasticsearch would be just revolutionary. It is a pretty hard problem, but Dynamo solves hard problems. And the fundamental differentiator here is that it's single-tenanted nodes

Starting point is 00:54:34 that with a leaky abstraction that the clients have to deal with, customers as well as their SDKs. And that is what causes the availability issue. That is what causes the capacity crunches. Now imagine a multi-tenanted Elasticsearch where the capacity management was done on your behalf and the customers were kind of sharing each other's spikes, whether it's on compute storage or network. That would be a pretty cool system. And I think it is inevitable that Dynamo will build it. This is something I wanted to build in Dynamo for a very long time. And I think there's been other things to be done,

Starting point is 00:55:11 but I think it's inevitable. I think this is not a decade away, you know, but I think a Dynamo style Elasticsearch would be nice. But what I would hope for is a built-in full text index inside of dynamo db that i think would be magical because then you rely on dynamo for your core storage and then you have these indexes which happens to be dynamo's biggest achilles heel right now that would be the most magical system that i could imagine yep it's i mean mongo is basically doing that right like adding in these other indexy type things on top of that uh you know sharded mostly

Starting point is 00:55:51 key-based storage um but they're struggling with the same thing they're struggling with the same thing that elastic search is struggling it's a hard problem unless you get into that multi-tenanted environment with better capacity planning like 99 99% of the Elasticsearch problems are capacity management. If you get the capacity management right and you have enough resources, you'll have a fundamentally better system. Yep. I know. It is.

Starting point is 00:56:17 I don't know. I'm just surprised there hasn't been sort of more traction on that. Like there are a few like sort of managed SaaS search providers, but they're just not quite... I don't know. None of them are in the realm of where Elastic is at. Yep. There's Algolia.

Starting point is 00:56:31 There's OpenSearch. Amazon has a product called Kendra for AI-enabled searches and things like that. But no, the Dynamo of Elasticsearch would be huge. I've been a huge fan of the Elk stack, by the way, for a very long time. So when we launched Dynamo streams, one of the key integrations was with the Elk stack. So we made that as part of our launch because we thought the marriage of Dynamo and full-text indexing is quite nice, but I do want it to extend beyond just a zero ETL thing. I want it to be

Starting point is 00:57:03 built into the database. Yeah. Yeah. That reminds me, you brought up open search and I know you've been vocal about the use of sort of serverless applied to different AWS databases and things like that. How multi-tenant are these? I would say all the, the like branded serverless databases that AWS have, they seem to be like sort of similar architecturally because they have the same flavor in terms of pricing model and just like how they sort of scale up, scale down, scale zero, different things like that. I guess like how multi-tenant are those? Are they multi tenant at any layer? Are they, I guess from what you can tell or from what's been publicly available about them, like what's going on there? So multi-tenancy is a really good, not definitive, but a really good

Starting point is 00:57:51 leading indicator for whether something is fake serverless or not. And, you know, if you figure out multi-tenancy appropriately, you don't have to tack on serverless as a marketing term. You will actually have a serverless service. If you're just using it for marketing then it's probably a single tenanted service i have yet to see a true serverless service that is not multi-tenanted sqs multi-tenanted s3 multi-tenanted, S3 multi-tenanted, you know, Dynamo multi-tenanted. Dynamo, when it wants to be, yeah. Yeah. And so what's going on with those

Starting point is 00:58:33 serverless, you don't have to talk about Elastica services, but maybe talk about OpenSearch or any of them, like are they like mostly single-tenanted and just like auto-scaling up and down or like what's are they like mostly single-tended and and just like auto scaling up and down or like what's are they using aurora storage but still like just the compute is sort of hard to scale i mean you go down to to the documentation of open search and and uh aurora serverless and they will tell

Starting point is 00:58:57 you what a or a neptune like what a neptune capacity unit what instance it maps to you can literally find official documentation that maps the specific units to instance types where I'm like, okay, now you're just calling an instance something different. A rose by any other name would still smell just as sweet and an instance with a different name, like a capacity unit, still a lot of operational pain. Yeah, yeah, interesting.

Starting point is 00:59:22 I guess, I don't know how much is out there about the new Aurora Limitless database. Did you see that at reInvent? Do you have any sense on like what's going on there? easier for them to pull this off. And I, and you know, like I would check out Caspian, which was, I think, I think it was Peter's keynote that he talked about that. Like, you know, systems like that make me really excited that the world is moving towards a multi-tenanted, you know, that we as a society are moving towards a more multi-tenanted infrastructure. Yep. Yep.

Starting point is 01:00:00 That also reminds me, speaking of reInvent, S3 Express OneZone, I've asked a few people, like you've written some great stuff on it. What are your thoughts there on like, where's a good fit? Or is it just like, you know, it's an early entrant, but it needs to be changed? Like, what do you think about S3 Express OneZone? Oh, I am so impressed by the S3 team, that they started with a clean sheet of paper. They did not respect anything that was already set in stone, right? New authentication protocol. You know, single AZ, by the way, not multi-AZ, right? It's actually ridiculously fast.

Starting point is 01:00:38 So it is an incredible product. I am so proud of that team to pull that out. It's the best AWS offering for an actual serverless cache because you can write stuff and grab it out and you don't have to worry about anything. So official AWS. Now, the downsides of it are very simple. If you have an object that's like you're writing a lot, you to pay for that object you know for at least an hour so like if you have something like a counter or something you know where you're incrementing it over and over and over again you'll go bankrupt pretty quickly but there's no capacity units or anything like that so if you've got like objects where the life cycle is on the order of an hour

Starting point is 01:01:16 like it is really really good um and you know the price is actually not bad either. So I hope more people use it. And if you're creating that AZ-specific swim lane concept, like the single AZ-S3 is quite beautiful. Yeah, that's interesting. A few people, both the WarpStream guys and Nikhil from Materialize, they were both saying, hey, interesting. I think the two big hangups for them are just pricing a little bit it's still a little too expensive and then uh they don't feel great about just being one az they want like they're saying hey in reality we're going to write it to two az's just to make sure we we have it and um and now you know now that doubles your costs on that as well so oh i don't think they're writing your two az's i think if they're telling

Starting point is 01:02:02 you it's one zone it's probably one zone no no they're writing a 2AZ. I think if they're telling you it's one zone, it's probably one zone. No, no, no. They're saying the Warpstream and Materialize people are saying if we would use this, we would want to write it to 2AZs to make sure if 1AZ goes. And then because of that, now we need to double our costs and it's double our costs and just more stuff we're doing and dealing with those failures and things like that. So because of that, yeah, I think they're excited about it, but still. Things like WarpStream, I would keep, like, you know, they're very tail heavy. So I would keep the tail stuff in a cache, like a Memento or local caches, and then use S3 for the backend. That's not, I think, because for that kind of workload, like memory is fine, fine. And it's not that expensive.

Starting point is 01:02:46 Because you've got months of data that you're storing in F3. But your most recent data is what needs to be the hottest. You can keep it in memory. Yep. Yep. Okay. All right. Cool. We're running out of time. So I got to close with... We have six sort of rapid fire questions that we ask everyone um so we'll go through these if you could master one skill you don't have right now

Starting point is 01:03:11 what would it be um so many so uh i would like to really get good at one thing and and instead of trying to do too many things at once so i i would like to master one thing, anything it is. Like if I can master one thing, I would be very happy. I try to be too broad. And I would like to master one skill. I don't know what it is yet. And it's not pickable. You're multi-tenanting your skills inside of you, right?

Starting point is 01:03:40 That's right. That's an interesting one. I haven't heard that one. So that's good. What wastes the most an interesting one i haven't heard that one so that's good um what wastes the most time in your day i think driving is the biggest waste of time because i can't you know do multiple things at once um and i never realized that until i started working from home that how much time we actually waste in driving has made me appreciate everything from getting my food delivered to my groceries delivered. If I don't have to drive,

Starting point is 01:04:05 I can just walk everywhere and, you know, get healthier. So I try to do as much as I can on a walk and avoid driving as much as possible. I cannot force myself to pay delivery fees. Like it just like something inside of me is just like, I can't do it. Like I will order takeout and I will go pick it up every time because I just like, can't, I can't do it like i will order takeout and i will go pick it up every time because i just like can't i can't do it that's the only thing i i made a mathematical model man like you you if you you should just try to allocate a very modest hourly fee to your time and then decide if the door dash fee is worth it or not the other thing is i i like a love podcast so being in the car and just listening to a podcast like i work from home all the, so I don't get that much podcast time. So then it's like, oh, I'll go drive for 15,

Starting point is 01:04:47 20 minutes and get some podcasting in. But if you walked, if you walked instead while listening to those podcasts, you know, it'll cut down your healthcare costs down the line and you'll recuperate all of the delivery fees. It's better for the environment too. That's true.

Starting point is 01:05:02 That's true. I don't drive very much, but I can't, I just can't do delivery. All right. Next one. If you can invest in one company, it's not Memento. It's not a public company. I want to know like a private company.

Starting point is 01:05:14 If you could invest in any one private company, what would it be? Yeah. I'm personally, I'm very excited about Warpstream. I am very excited about Warpstream. I think they've got a good product. I heard about it, you know, before the announcement came out and I got very excited about WarpStream. I think they've got a good product. I heard about it before the announcement came out, and I got very excited because they are solving that AZ-specific swim lane problem.

Starting point is 01:05:36 The AZ pain. And just the operational, they're making so much, like those Kafka brokers now, they're stateless, right? You can just scale them up, scale them down. All the storage is an S3. You don't worry about that. Like that's pretty interesting. Yeah. So I like that answer. You're the second person to give that answer in the last week. So I think they've got something going on. Um, what tour technology can you not live without? My phone. I spent a lot of time on my phone and that's not just to watch like YouTube videos or podcasts, but when I travel,

Starting point is 01:06:06 I try to not lose a second of productivity. And that's why, that's the other reason why I don't like to drive. Like if I'm on a train or on a plane, I am constantly on and it keeps me connected. It is, it's a double-edged sword for sure. But having the phone, and I've recently learned that it's the thing that you spend the most time with if you're in tech. So it's worthwhile to have something with the best battery and so forth available to you as well. Yeah, for sure. Which person influenced you the most in your career? I took a lot of lessons and learnings from one of my old bosses, Raju Gulabani. He was never easy on me and always told me as it is.

Starting point is 01:06:56 But Raju used to run all databases and AI at AWS. And he's a very tough manager to have. And I really enjoy having, you know, tough managers. And he's kind of helped me understand how to do product definition, how to think more aggressively about the competitive landscape and how to, you know, focus on the product side. And he's still very, very generous with his time and, you know, mentors me,

Starting point is 01:07:22 even though he's, you know, not even working full-time anymore. So I've always been appreciative, but there's a lot of people that have influenced me and, and basically provide free mentorship that I haven't really earned yet. You're one of them. And, um, you know, so it's always nice to have people that are, uh, that are just there to help you out. Yep, yep, cool. I feel like you need a hard, a pretty strict person in charge of databases, right? You wanna make sure that those are working well.

Starting point is 01:07:50 You don't want something like that. So that's, yeah, that's a great one. All right, last one. AI, of course, very popular over the last year and a half or so. What is your probability that AI equals doom for the human race? Zero.

Starting point is 01:08:07 Zero. Nice. I like it. Optimism. Yeah. Technology can be doomed for humans too, but like AI, it's going to create more jobs. It's going to, it's going to make us so much more efficient. It's going to make us so much more efficient. It's going to accelerate innovation. Like it's not like we're so far away from the doom that it's just not worth worrying about.

Starting point is 01:08:32 Yeah. Okay. So you're not bombing data centers or anything like that right now? No, absolutely not. Hazo, I agree with you. I'm an optimist. I think it's pretty exciting what's going on. So Kwasi, this has been great.

Starting point is 01:08:44 I always love talking to you and it's good to get, get some of this recording so other people can hear it as well. Uh, if people want to find out more about you, more about memento, where should they look? Go memento.com and follow Quajon on, on the Twitters as well. We'll put, we'll put both of those in the show notes, but, uh, yeah, thanks for coming on. It was great chatting with you. Thank you so much for having me.

Software Huddle - Multi-tenancy with Khawaja Shams

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.