Screaming in the Cloud - Episode 50: If You Lose Data, Your Company is Having a Very Bad Day

Episode Date: February 27, 2019

If you use MongoDB, then you may be feeling ecstatic right now. Why? Amazon Web Services (AWS) just released DocumentDB with MongoDB compatibility. Users who switch from MongoDB to DocumentDB... can expect improved speed, scalability, and availability. Today, we’re talking to Shawn Bice, vice president of non-relational databases at AWS, and Rahul Pathak, general manager of big data, data lakes, and blockchain at AWS . They share AWS’ overall database strategy and how to choose the best tool for what you want to build. Some of the highlights of the show include: Database Categories: Relational, key value, document, graph, in memory, ledger, and time series AWS database strategy is to have the most popular and best APIs to sustain functionality, performance, and scale Many database tools are available; pick based on use case and access pattern Product recommendations feature highly connected data - who do you know who bought what and when? Analytics Architecture: Use S3 as data lake, put in data via open-data format, and run multiple analyses using preferred tool at the same time on the same data AWS offers Quantum Ledger Database (QLDB) and Managed Blockchain to address use case and need for blockchain Authenticity of data is a concern with traditional databases; consider a database tool or service that does not allow data to be changed Lake Formation lets customers set up, build, and secure data lakes in less time DocumentDB: Made as simple as possible to improve customer experience AWS Culture: Awareness and recognition that it takes many to conceive, build, launch, and grow a product - acknowledge every participant, including customers Links: Amazon DocumentDB MongoDB Amazon RDS React Aurora re:Invent DynamoDB Amazon Neptune Amazon Elasti-Cache Amazon Quantum Ledger Database Amazon Timestream Amazon S3 Amazon EMR Amazon Athena Amazon Redshift Amazon Managed Blockchain Amazon EC2 Amazon Lake Formation Perl CHAOSSEARCH .

Transcript
Discussion (0)
Starting point is 00:00:00 Hello, and welcome to Screaming in the Cloud, with your host, cloud economist Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud. This episode of Screaming in the Cloud has been sponsored by Chaos Search. Chaos Search is a cloud-native SaaS offering that extends the power of Elasticsearch's API
Starting point is 00:00:34 on top of your data that already lives in Amazon's S3. Chaos Search essentially turns your data in S3 into a warm Elasticsearch cluster, which finally gives you the ability to search, query, and visualize months or years worth of log and event data without the onerous cost of running a hot elk cluster for legacy data retention. Don't move your data out of S3. Just connect the Chaos Search platform to your S3 buckets, and in minutes the data is indexed into a highly compressed data format and written back into your S3 buckets, so it keeps the data under your control. You can then use tools like Kibana on top of that to search and visualize your data all on S3, querying across terabytes of data within seconds. Reduce the size of your hot elk clusters and waterfall your data to Chaos
Starting point is 00:01:25 Search to get access to an unlimited amount of log and event data. Access more data, run fewer servers, spend less money. Chaos Search. To learn more, visit chaossearch.io and sign up for a trial. Thanks to Chaos Search for their support of this episode. Welcome to Screaming in the Cloud, I'm Corey Quinn. I'm joined today by Sean Bice, AWS's VP of Non-Relational Databases, and Rahul Pathak, GM for Big Data, Data Lakes, and Blockchain, which is a whole bunch of words that don't intuitively go together, but I imagine there's a common thread in there. Welcome to the show, folks. Thanks. Thank you. Corey, thanks for having us. So let's start at the very beginning. When I take a look across the entire ecosystem of non-relational databases, we'll even restrict
Starting point is 00:02:13 this to AWS, I see that there's a bunch of different options. And as of the time of this recording, DocumentDB came out roughly, let's call it 12 hours ago-ish. So now there's another entrance in the column of what do you wind up picking for a non-relational database? So at the very beginning, what is AWS's overall database strategy? Because right now it just feels like there are so many different paths you can go down. It almost starts to feel like front-end and React and all the different JavaScript frameworks you can pick between. It's analysis paralysis to a point. How does AWS contextualize this? Yeah, it's a great question.
Starting point is 00:02:51 As you know, our strategy is in partnership and driven with our customers. And frankly, when we sit down with customers and talk about their data strategy, there's one of two things that almost always comes up. One, customers have, odds are they probably have plenty of relational applications on premise. Why? Because relational databases have been around since the 70s. And when they start thinking about the cloud, because they want to free up their resources from that operational burden, they'll start thinking about, hey, I need to lift and shift and move those relational apps into something like Aurora with Postgres or MySQL. Or they could take a commercial workload like SQL Server or Oracle
Starting point is 00:03:29 and move it into RDS. So that's a motion that's in play. But I think the question you're really getting to is the second part that we hear from customers, which is, hey, if I'm building a new application, what tool should I pick? And frankly, the way we think about it is these new apps or these new modern apps, if you think of like the biggest ride share app or a media service or something like Snap, just think of these big scale apps. Most developers that are building these super large scale apps, they do what they do best. They basically break the app into smaller parts and then they pick the right tool for the right job.
Starting point is 00:04:04 So if you just sort of use that as a backdrop and in your mind, you're like, okay, apps today could have millions of users, they could be geographically distributed anywhere, and everybody wants everything to be to run even faster. Well, that puts even more pressure to make sure that you that you're you're you really are truly picking the right tool for the right job. so you don't overburden a single database. So with that, the one easy way to conceptualize this is years ago when you thought about a database category, it was just relational. But today there's about six categories that developers think about.
Starting point is 00:04:37 Relational is a category. Key value is a category. Like DynamoDB is a great product inside of the key value category. Document is a category and just yesterday we introduced Amazon DocumentDB. Graph is a category where we have Neptune. And then of course, you have in memory as a category with ElastiCache, Redis, Memcache. We introduced a brand new category at reInvent with the Quantum Ledger database. And then time series is a category. So if you think of those categories of data, our database strategy is super simple. We want to have the most popular
Starting point is 00:05:11 and best APIs in each of those categories so a developer never has to trade off on functionality, performance, or scale. Let me caveat this conversation with anyone who's ever seen my terrifying development practices understands why I generally tend to work with stateless things that you can reconstruct without destroying a company. With a database, you generally don't have that luxury. If you lose data, your company's having a bad day. And as a result, based upon my own limitations and experience, I'm not particularly up to speed on the nuances of database design, database selection, and the rest. Right now, I think the big question is when a developer is starting out with a new project, something they want to build, one of the largest questions that looms in their mind is, what tool do I use to do the job?
Starting point is 00:05:59 As someone starts looking through the increasingly lengthy list of database offerings, what are the considerations that shape that choice? Yeah, great question. And the answer is actually really straightforward. It has to do with the use case and access pattern. So if you and I, let's say, we're building an online commerce application, you know, think of a shopping cart. We don't know if we're going to have 100 users or 100 million. Think of something like Black Friday, where something could go on sale. And you can have millions of customers all of a sudden needing to shop and make a purchase. Well, you know, for if we were going to build a shopping cart, you just think of that shopping cart access pattern where you know, you're just quickly adding things to it, you haven't done a
Starting point is 00:06:39 transaction yet. Those are simple puts and gets and key value is an awesome solution for an access pattern of simple puts and gets because it's very fast. It's very efficient. You and I don't have to model anything. And it can scale for as many users as we can, meaning it's Thursday and not Black Friday. We have 10 users. And on Black Friday, a big thing goes on sale. We've got a million people shopping. So key value can handle that access pattern great. Now imagine in that shopping experience, you know, when you're buying something, you see a product recommendation. Well, product recommendations are really about highly connected data, like who bought what and when, so that if you and I wanted to make a product recommendation, graph database would be an excellent solution for it. So for example, for example, imagine the scenario where somebody's shopping for something, and instead of just saying, hey, here's what others bought in a sports category, what if it was a little more personalized and said, here's some of the items your friends bought, or people you know in a certain category.
Starting point is 00:07:40 Graph Database can help with that in a really, really big way. So that's kind of what developers do. They break these apps into smaller parts. They think of the access pattern and use case, and then they pick the right tool for the right job. So as you wind up going through a project of building out something that requires a data store, one of the most common stories we'll see around this has to do with someone building out some form of analytics architecture.
Starting point is 00:08:01 How does that wind up manifesting in your universe? Yeah, it's a great question. And when it comes to analytics, what we recommend to customers is just think about S3, which is our core storage service as your data lake. And we recommend that customers put data into their data lake in S3 in open data formats, so CSV, JSON, or query-optimized formats like Parquet or ORC. And that open data gives them portability. They can take that data wherever they want. They can use it with whatever technology they want. And then we've engineered our analytic services so that they can all run directly against open data in S3. So if you want to run the latest in Spark, you can run that through EMR. If you just want to run SQL on
Starting point is 00:08:39 your data in S3, you can use Athena. If you're doing data warehousing at scale, Redshift is a great choice. And that also works with data in S3. So what you get is the ability to run multiple types of analyses using a preferred tool at the same time on the same data without interfering with the other party that's running it. So you sort of maximize your flexibility and portability while getting all of the scale and durability benefits of S3. One of the increasing challenges is, to some extent, it feels like you are this technical generation's pearl in that famously the pearl programming language had the motto of there's more than one way to do it.
Starting point is 00:09:14 And increasingly, it feels like you are heading down the road of no matter what you pick for any solution, the easiest thing in the world for someone to do is come along and say, oh, you made the wrong decision. You should instead use X, Y, and Z. And even when they're right, that's never a particularly helpful statement. But people love arguing about things either for semantic reasons or for purposes of religion. That brings us to blockchain. As far as blockchain goes, from my perspective, as someone who stays as far away from it as humanly possible, there's been a lot of hype around it. There have been a tremendous
Starting point is 00:09:50 number of use cases that make varying degrees of sense. But largely, it's been the value of blockchain, for many of us, has been drowned out by the hype. And to be very direct, some of the worst people in the world are actively pushing some of these things to the point where it's almost a punchline more than anything else. That was my perspective a couple of months ago. And then at reInvent, there were a few blockchain announcements, two to my understanding, QLDB, Quantum Ledger Database, and managed blockchain. At which point, my immediate response personally was, well, crap, because now I have to take this seriously and I can't just dismissively hand wave it. Can you explain to me, please, what is the actual blockchain use case and what is it that's driving customer needs here? Absolutely. And it's important when we talk about blockchain in the AWS context to really separate out the cryptocurrency world because that's not what we're focused on.
Starting point is 00:10:47 So when we spend a lot of time with customers trying to understand what these use cases were, what we learned was really there were a couple of use cases that were at play. And typically, one of them was where there was a centralized entity that customers trusted. So it could be, think of it as a major manufacturer, and they have a satellite of suppliers. So everybody in that ecosystem trusts that manufacturer, and they're fine with that manufacturer maintaining a centralized record. But they wanted an immutable ledger, so they wanted to be able to trace every element that flew through. But they were fine having the manufacturer control that central record of what happened. And so that scenario where the centralized trust is what we built QLDB for. And QLDB is actually based on a ledger technology that we've had for a while at Amazon. But the intent is to provide an immutable, cryptographically verifiable, immutable record of
Starting point is 00:11:36 what's happened. And it's typically owned by a centralized entity. Others can connect to it and verify the transaction history. And the centralization there is key. So for that immutability, but no need for distributed trust, QLDB is the database of choice. And it frees customers from building those audit trails and relational databases where DBAs could modify things, or from using the blockchain frameworks like Fabric or Ethereum, which have a bunch of additional complexity related to the distributed trust and smart contracts that aren't really needed for this immutable record case. And the second type of use case we found was where there were perhaps more of a group of peers that were engaged in commerce or transactions, where they didn't want any single party to
Starting point is 00:12:20 completely control the record of what took place. And so in this distributed world, they wanted the immutability of a ledger, but they wanted multiple participants to agree and validate what would go into that ledger. And that's where the blockchain frameworks, not cryptocurrency technologies, but the frameworks like Hyperledger Fabric and Ethereum, which allow multiple parties to agree on what truth is, and then write out that truth to multiple copies of the data that's owned by each of the participants. A great example of this is if you think about online advertising networks. So what you've got is an exchange. You've got multiple parties bidding on ad slots. You've got publishers
Starting point is 00:12:55 that display ads. And what they would like with that exchange is to have a record of, hey, there was an auction, 50 people bid, one ad won. This was the ad that was served on this site. But they don't want to actually own that infrastructure. They trust the exchange. So the exchange would use QLDB to maintain a record of what happened for each auction. But there's also a scenario where you get multiple exchanges that are sharing information because they're routing traffic to each other. And they don't actually want to give all of their data to any one exchange.
Starting point is 00:13:23 So they want to use the blockchain for that scenario. And there they can use Amazon Managed Blockchain to have distributed data, but within themselves they might use QLDB to have just an immutable record. When you start talking about immutable records, transaction ledgers, and the rest, because of my own prejudices and where I come from and regulated backgrounds, my immediate thought is compliance. Is this something that you could use, for example, to fulfill Sarbanes-Oxley worm requirements, write once, read many? Or is
Starting point is 00:13:51 this the sort of thing where at least today you show it to an auditor and they stare at you and now you have more questions that you've just, you've effectively opened Pandora's box of explaining complex concepts to people who are generally hoping to check a box. How does that manifest in the regulated world today? So I'm not deeply familiar with Sarbanes-Oxley or Worm, but what we have found is there is a lot of interest in both ledgers and blockchain, secure LDB and managed blockchain for the audit and compliance use case. And the reason is that you can independently verify that what was written has not changed. And that's sort of the central building block of audit and compliance.
Starting point is 00:14:31 And so we see scenarios like in Guardian Life, which is an insurance company that has multiple providers, customers, and payers, the ability to say that, yes, every single party that looks at this can agree that what was written here is what was originally written here, and it hasn't changed since it was written. That's really powerful for the audit and compliance use case. It'll be fascinating to see how this winds up manifesting in a few years. But I get the sense that people on the, I guess, bleeding edge of the compliance story are going to sort of pave that road for the rest of us. As a general best practice, if you wind up having to bring a mathematician into an audit to
Starting point is 00:15:06 validate what you're saying, it's not going to be an easy conversation. So it's always interesting to watch people forge that road ahead a little bit. One of the interesting stories when I first saw the announcement of QLDB was, again, because of my own biases, when I don't understand something, the easiest thing in the world to do is make fun of it. And I made a joke to someone in passing who worked at AWS that, well, if this one doesn't work out, it might very well be the first service that they wind up turning off and deprecating. And the very serious answer that I got in return was, we're using this internally. And if you turn this off, there aren't too many services left that will work. It lines up to my understanding
Starting point is 00:15:45 becoming a foundational part of building higher level services. And it solves a need that at significant scale when you're dealing with distributed systems don't have easy answers for that. Is that an accurate assessment? Is that effectively someone shining me on? At this point, I don't know enough about the space to opine intelligently on it. Yeah, so the technologies behind QLDB are absolutely critical to how AWS and Amazon run a lot of our key internal technologies. And with distributed systems having a high throughput way to understand what the state of the system is, and the ability to replicate that state from point A to point B, so you can use it for different things is crucial. Yeah, you could think of it like, think of all the activity that goes through like the EC2
Starting point is 00:16:29 control plane. Just imagine how many events are coming through there. And, you know, if you and I were operating in a world like that, if we had to go to each and every place across the environment to see what kind of events were happening, that would be quite difficult. You could imagine us saying, gosh, man, I wish there was a way to have sort of a ledger of all the transactions that were occurring because it could help us troubleshoot and operate the environment better. And that's kind of the essence of where QLDB started this notion of a ledger many, many, many years ago. But you have to have this really big, big scale thing to drive a demand like that. And then the
Starting point is 00:17:12 interesting thing to your question, as we've been on this journey together with QLDB, we'd start sitting down with customers and they'd say things like, gosh, you know, I wish you had an immutable database. They weren't really asking for a ledger database. I wish, do you guys have something that's immutable and cryptographically verifiable? Because similar for them, it's like, hey, there's certain set of transactions that are happening in my environment. I wish there was a way to simply record that somewhere, know that it can't be changed and cryptographically verified, you know, if auditing occurred. And that's kind of when we had this moment of, boy, we've got the essence of technology that's supporting some of
Starting point is 00:17:50 the largest services in AWS. We have this new requirement coming in from customers. So maybe we can put those two things together. And that's kind of the ingredients that led to QLDB. It seems like it's a fascinating foundational technology. I still am having trouble bringing this into, I guess, mental focus for me, where I'm going to build something that is user-facing that rides as a relatively thin layer on top of this. I don't have that problem, for example, with DynamoDB or RDS. It's, I'm building a shopping cart and here's what you do. It's more challenging to think of I'm using an Instagram equivalent or something like that. Oh, yes, that's backed by a ledger. That's almost assuredly an imagination failure on my part. Yeah, it's funny. It's not a concept that you just kind of hear
Starting point is 00:18:36 once and you're like, oh, ha ha, like I now know what a ledger database is. It takes a little bit, but here's what I found. Imagine, take a DMV scenario. So you've registered a car at some point, I'm assuming, right? Yes, I still have scars from it. You still have scars from it. So you ever see these commercials sometimes where you'll see a company say, before you buy this car, there's been five registered owners for it. You see those kind of things. And I've always wondered, how do you know that there has been five registered owners?
Starting point is 00:19:03 Is that data something you cooked up or is it real i i don't know the authenticity of it so let's use dmv so you got all these people coming in and registering cars and that's going to get recorded in some database somewhere but in a traditional database one of the troubles is is if you had admin access to that database you could change that that data however you wanted. And if auditing was turned on or off, depending on or so, you could manipulate that data and it could be really difficult for somebody to know that that change was made. And on the flip side, if people are saying, hey, let's turn auditing on, if you, you know,
Starting point is 00:19:39 auditing done the right way can sometimes slow databases down, you know, because it's kind of an afterthought. But the reality here is the DMV, imagine as a government agency, you're saying, hey, when somebody comes in and registers a vehicle, that's a transaction. So there's a VIN, there's an identity for the vehicle, and you, for example, as a registered owner, let's write that once into a ledger, and then that's it. Once it's written, it cannot be changed. So let's say you trade the car in and somebody else buys that car., and then that's it. Once it's written, it cannot be changed. So let's say you trade the car in and somebody else buys that car. Effectively, that's the change event. There's a new registered owner for that vehicle. So then that would just be the next transaction about that VIN
Starting point is 00:20:16 number. So you can imagine as time goes on, each time that car is sold, there's a record of it, and it's just stored in this ledger database. And because that database has the property of the data is immutable, can't be changed. And it's cryptographically verifiable. If anybody came back and said, hey, is it really true that that car has had five registered owners? The DMV would have a very easy way to demonstrate that. Not only is that a terrific example of applying this in a way that someone with my limited understanding can grasp it. But it's also, I think, one of the first ledger explanations I've ever heard that wasn't condescending. It's one of the biggest challenges you see in many cases when you have a new and
Starting point is 00:20:57 exciting technology that gets launched is you ask someone to explain it to you and suddenly it winds up being, well, actually, it's very simple. And people start the most condescending explanation. And about three sentences in, I don't even care what this technology is, but I hate this person. I'm continually amazed by the fact that AWS is able to explain these complex concepts in a number of different ways in such a way that I don't feel like a moron for having asked the question in the first place. So first, thank you for that. I think that's something that a lot of people can wind up learning a fair bit from. And I know it's something I try and I struggle with
Starting point is 00:21:33 myself. One other service was announced at reInvent that I want a little help contextualizing while I have you here. Yeah, surprise. This entire podcast is a sham. It's all for my own education because opening support tickets just seems too pedestrian. But Lake Formation was announced, and that is one of those interesting services from a few different perspectives. First, it's an awesome name. It's evocative of largely what it does. But at the beginning, what does Lake Formation do? So Lake Formation is designed to be a way to allow customers to build and secure data lakes really in days versus what might have taken them months in the past. And the reason setting up data lakes can be challenging is that not only do you have to figure out how to lay out your data, where it lives, how to get it into your data lake,
Starting point is 00:22:18 but actually protecting your data is a huge part of making data lakes broadly available because you don't want to have everyone in the organization to have access to everything. You want to be able to define access policies that live with data so that customers can use any service they want to query it, but query it in a way that's controlled and governed. And then the third piece is just data hygiene. How do you make sure that you're not dumping vast amounts of data that don't make any sense into your data lake and you need some way to organize and curate and manage all of those things. And so one of the things you talked about earlier in the podcast was sort of this range of services can be confusing. And what we
Starting point is 00:22:51 wanted to do with Lake Formation was to provide a very prescriptive, repeatable way for customers to set these things up easily. So the key components of Lake Formation are one, blueprints that make it really easy to set up your initial data lake. Two, a centralized security mechanism that let you define a data access policy. So Corey can see these tables and these columns. Rahul can see those tables, but not these ones. And that stays with your data definition. And so then whether you use Athena or Redshift or EMR to get at that data, Lake Formation will make sure that you're only ever able to see what you've been allowed to see, regardless of the service that you choose. And the same for me. And because you have that central point of control, someone else can also then verify that, yes, what we intended to happen is actually what happened. And then the third piece is a data
Starting point is 00:23:38 deduplication and cleanup activity that's driven by ML. So customers can say, look, this is what my data should look like. These two things are actually related, and that'll train a model that can then go through and clean all their data sets up. And that comes out of technology that we've been using to dedupe our addresses and catalogs at amazon.com for a long time. Hearing it described that way sends me back in time where various previous engagements and jobs where I wound up effectively having to build the foundation of a data lake. And my approach then was, all right,
Starting point is 00:24:08 I'm going to throw everything into an S3 bucket and part two, we'll figure this one out later. And step three, we have a data lake. And just hearing you describe this and all the things I didn't do and didn't conceive of when I was building that out tells me, congratulations, Corey, you built a data swamp. And that is in many, where a lot of
Starting point is 00:24:26 people tend to wind up getting stuck. Is Lake Formation envisioned as something that is best suited for Greenfield data lake projects, or is it something that can be applied to an existing corpus of data? Great question. It's really designed for both. So the intent was to make it easy for people operating on Greenfield, but to the extent that you already have a data lake or, you know, a former data swamp, Lake Formation can absolutely crawl through that and discover what you have, catalog it, and then give you a starting point from which to then curate and clean up. So it's really designed for both. Do you find that there's any either relationship and or confusion in the name of Lake Formation to Cloud Formation? We haven't come across really any significant confusion.
Starting point is 00:25:06 You know, I think there might be times when customers are using both of them that it might get a little tricky to keep track of which thing they're forming. But for the most part, it's been pretty clear. I think customers understand it's tied to their data lakes. At launch, did lake formation have cloud formation support? Because if it didn't, and then you have to launch that later in time, that is going to be one of the most confusing headlines to read out loud. So at preview, it does not, but it will at GA. Oh, it's still in preview. I did not. Wow. This is the problem with having a firehose of
Starting point is 00:25:33 release announcements. It's very easy to lose sight of what's available that I can use today versus what's in developer preview versus yes, announcing this service that we've been running that you've never heard about for the last five years. It's always interesting to wind up seeing how this stuff plays out. We've long since passed the point where I think any one person can have an exhaustive list of everything that AWS runs stuffed into their own head. I still wind up getting faked out from time to time on services that don't actually exist. Which brings us to the announcement of roughly a day ago now of Amazon DocumentDB. First, its formal name is Amazon DocumentDB with MongoDB compatibility,
Starting point is 00:26:14 which sets a new record for the largest number of syllables in a formal AWS product name. So first, let me congratulate you on that. Now it's taken the crown from AWS Systems Manager, Session Manager, or Parameter Store, which tie a couple syllables less. One challenge as I look at that is, first, I know almost nothing about it, but we're about to fix that. But my first instinct on seeing the name is everyone wound up chiming in of, so what do you think of the name? Well, my honest answer is, yeah, their biggest competitor is named Mongo. So you can name it pretty much whatever you want and get a buy on it. I don't have too much to say. But the concern I have, and I'm wondering about here, is I've always abbreviated DynamoDB as
Starting point is 00:26:54 DDB. Is there now a namespace collision where it's about to become confusing to people as far as which database they're talking about? You know, I think you're asking a pretty reasonable question. You know, I was just thinking, as you were talking about that, and all the talks that I've done at reInvent, and I'm often the one speaking about our family of databases, I really haven't seen too many name collisions, per se, and I'll tell you why. It's because once you get an understanding of those categories,
Starting point is 00:27:23 you know, relational is a category, I'm just talking about data categories, documents a those categories, relational is a category. I'm just talking about data categories. Document's a category. Key value is a category. Graph, time series, ledger, so on and so forth. Once people understand those categories, they kind of have a light bulb moment. In fact, just yesterday I was in San Francisco with a customer from reInvent. He's like, gosh, we've never thought of data that way.
Starting point is 00:27:42 So they really start thinking about categories of data first, and then the API inside of that category. So in that context, there's not a lot of overlap because Dynamo is a flagship key value store. And DocDB or DocumentDB, as we referred to it yesterday, kind of fits nicely into that document category. What I find fascinating about it, just from early returns of people who have looked at this and played with it to some extent, is that it tends to offer, and please correct me if I'm wrong on this, a very approachable new user experience when you're just getting started with something like this. I'm told the documentation is terrific. There are a bunch of use cases. It's a lot less go poke around in a bunch of various forums across the internet and try and piece together a half-baked understanding of it.
Starting point is 00:28:29 The onboarding has had significant time and attention paid to it by all reports. First, not having played with it myself yet, is that accurate? And secondly, if it is, was that something that was a driving consideration pre-launch? Yeah. So we're always trying to improve the customer experience. Pretty much any Amazonian you talk to is going to tell you that more than once. But it's true. And developer experience starts with documentation. And it's kind of, you know, sometimes you might think, hey, it's just about the API or making it approachable. But most developers that I think Rahul would probably say the same thing, they really appreciate a low bar to entry. They
Starting point is 00:29:09 appreciate it when they can get up and running with very little cost, very little friction, and simplicity typically wins, at least on that first day one experience, so to speak. I think every team here that's trying to provide any customer experience is always trying to lower that bar and make it as easy as possible to get up and running. So in the context of DocumentDB, yeah, we definitely wanted to make that as simple as possible. Yeah, and one of my favorite memories from launching Athena, which is a serverless SQL on S3 at reInvent in 2016, was that 10 minutes after Andy had announced it in his keynote, someone had tweeted that they were using it in production
Starting point is 00:29:48 to analyze their CloudTrail logs. So that was a big win. That's a great example. It's always nice to see a service launch that doesn't feel like you're getting onto a carnival ride. And there's the bar with a cartoon character saying you must be at least this smart to ride. It winds up being appreciated
Starting point is 00:30:04 when regardless of the power and capability of a service, the onboarding isn't one of those trial by fire runnings of a gauntlet. Yeah, well, you know, to that point, if you take DocumentDB, like a lot of customers have actually been using Mongo in AWS today for quite some time. And you'll see that manifest by way of self-managed Mongo and EC2 or running on a Mongo service that's in AWS. But in the end, most of these customers came back with the same thing. And they say things like, hey, I really like the Mongo API. I like the flexibility of a document model. But boy, it'd be great if what I'm struggling with is rather just making it run in a very efficient, performant, highly available way.
Starting point is 00:30:48 Could you help us with that? And our mental model there is, okay, we're going to remove all that operational burden from you. We kind of have to make that bar to entry super simple. So from your point of view, it's just an API that you connect to so that you do the dev and then we do the ops. And that's kind of a simple mental model that kind of goes, that would sort of reinforce your example of somebody in 10 minutes getting into production because they're not having to deal with the ops, they just dev. I come from an ops background myself. So my conception of what makes things easy and
Starting point is 00:31:17 understandable is diametrically opposed to what someone with a development background tends to see. We're seeing this melding of the two as the world continues to evolve. And increasingly, we're seeing the divide break down where it's no longer an operations person just looks like a crappy developer or a developer looks like a ops person with no sense of responsibility. We're seeing a sense of those two things melding as part of the DevOps or whatever you want to call it movement, don't at me. And there's an increasing awareness that there are people on both sides of that historical divide
Starting point is 00:31:50 that need to be able to use a new product without having to go through a 18-step process to get things set up. So anytime there's a launch that makes something accessible and easy to use, I'm fully in support of it. I've never found that making things difficult to get started with has paid dividends. So it's clear from what I've seen so far, there's been significant effort put into that across the board from AWS. Some of the launches recently have just been night and day difference compared to some of the early services. It turns out things don't get worse with time. Who knew? I want to thank you both for taking the time to speak with me today.
Starting point is 00:32:25 It's incredibly gratifying to be able to talk to some of the people who are behind the services that get built out. It's easy to lose sight sometimes of the fact that when a service gets announced, you spend 18 months, two years on building a service with a service team. People wind up doing a bunch of work, blood, sweat, and tears. They finally send it to final reviews. Documentation gets done. It gets, in many cases, a ridiculous name slapped onto it. And then it gets launched. There's a blog post that someone writes and people thank the person that wrote the blog post. That's great. But first, that person didn't build the product. But there's a lot of people behind the scenes who build these things and get them out the door.
Starting point is 00:33:05 So I'm curious just from the perspective of having just spent time building and launching a number of services over the past few years, how is that seen internally? Is there a sense that you're seeing from the service teams and product teams that build these that their work is unheralded? Do they understand the level of appreciation the community has for the incredible amount of work that goes into these things? Do they understand the level of appreciation the community has for the incredible amount of work that goes into these things? I always tend to look at this even beyond the pure engineering effort. The product managers, the marketing people, the folks who work on pricing, there's an awful lot of moving parts to launch anything at this sense of scale. And
Starting point is 00:33:39 just, oh, it's a few engineers sitting in a room. I'm sure they can feed them all with two pizzas. It doesn't generally work that way. There's a lot of moving parts at this level of complexity. Yeah, maybe both of us can share. I thought I'd be brief in that the one thing that I see, like if we walked out of here and just walked around the hallways, you're going to bump into people that are just naturally focused on customers. That is not a set of words we just toss around lightly. I mean, people here, you know, we could walk into a meeting next door,
Starting point is 00:34:09 and if we were talking about any product that we were wanting to do, it is always working backwards from customers. So you could be a marketer, you could be a seller, you could be an engineer, you could be in PR, you could be in any function you're in this company, no matter what building you and I could walk into. Any problem we're talking about is always working backwards from the customer. So if you sort of use that as a mental model, there's just full engagement with what customers are doing across the board in every discipline. And the nice thing there is you don't end up in a situation where one function is the customer interface, and then there's everybody else.
Starting point is 00:34:45 So you end up with a room, as you pointed out, it's not just one single function that gets a product done. It's a whole team effort, but everybody on that team is as curious, interested, and committed to improving that customer experience. And that's why I think a lot of people do appreciate all that goes into what gets built. Yeah, I'd echo that. I think there's a lot of people do appreciate all that goes into what gets built. Yeah, I'd echo that.
Starting point is 00:35:05 I think there's a lot of awareness and recognition that it really does take a small army to conceive of, build, launch, and then successfully grow a product. And we try and acknowledge every participant in that. What's always been amazing to me is talking to some of the people on the back end who have built some of these things who are generally explicitly not customer-facing. But this attitude of solving customer problems tends to permeate the entire arena. It's easy to look at leadership principles, for example, and discount them as marketing or a sales pitch. And I have to admit, I did when I first started
Starting point is 00:35:36 getting fairly into the AWS ecosystem. Yes, every company has a mission statement. Terrific, great. But then you start talking to people and you see it manifest itself in ways that were not intuitively obvious at first. It really does tend to lend itself to a cohesive sense of culture. And there are certain truisms, regardless of what AWS group I'm talking to, things that bleed through. It's nice to see, I guess, the fruits of some of that as you build
Starting point is 00:36:02 things out. It's not the sort of thing where you can just pick apart piecemeal and drop onto some other company and expect to have the same results. It's something that I think was built in from the beginning here. And I don't think I've seen anything remotely like it in any other company I've ever spoken with. It's neat to see. Thank you. No, thank you both for your time. I appreciate that. I appreciate this. It's been a hectic few days for you folks with the launch of this, and I don't get the sense there's a whole lot of sitting around and resting at AWS. Now it's always on to the next thing, on to improving iteratively. Thanks very much.
Starting point is 00:36:34 This was fun. Thank you. Yes. Sean Bice, VP of Non-Relational Databases. Raul Patek, GM for Big Data, Data Legs, and Blockchain. I'm Corey Quinn. This is Screaming in the Cloud. This has been this week's episode of Screaming in the Cloud. You can also find more Corey at screaminginthecloud.com
Starting point is 00:36:52 or wherever fine snark is sold.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.