Screaming in the Cloud - Episode 23: Most Likely to be Misunderstood: The Myth of Cloud Agnosticism

Episode Date: August 10, 2018

It is easy to pick apart the general premise of Cloud agnosticism being a myth. What about reasonable use cases? Well, generally, when you have a workload that you want to put on multiple Clo...ud providers, it is a bad idea. It’s difficult to build and maintain. Providers change, some more than others. The ability to work with them becomes more complex. Yet, Cloud providers rarely disappoint you enough to make you hurry and go to another provider. Today, we’re talking to Jay Gordon, Cloud developer advocate for MongoDB, about databases, distribution of databases, and multi-Cloud strategies. MongoDB is a good option for people who want to build applications quicker and faster but not do a lot of infrastructural work. Some of the highlights of the show include: Easier to consider distributed data to be something reliable and available, than not being reliable and available People spend time buying an option that doesn’t work, at the cost of feature velocity If Cloud provider goes down, is it the end of the world? Cloud offers greater flexibility; but no matter what, there should be a secondary option when a critical path comes to a breaking point Hand-off from one provider to another is more likely to cause an outage than a multi-region single provider failure Exclusion of Cloud Agnostic Tooling: The more we create tools that do the same thing regardless of provider, there will be more agnosticism from implementers Workload-dependent where data gravity dictates choices; bandwidth isn’t free Certain services are only available on one Cloud due to licensing; but tools can help with migration Major service providers handle persistent parts of architecture, and other companies offer database services and tools for those providers Cost may/may not be a factor why businesses stay with 1 instead of multi-Cloud How much RPO and RTO play into a multi-Cloud decision Selecting a database/data store when building; consider security encryption Links: Jay Gordon on Twitter MongoDB The Myth of Cloud Agnosticism Heresy in the Church of Docker Kubernetes Amazon Secrets Manager JSON Digital Ocean .

Transcript
Discussion (0)
Starting point is 00:00:00 Hello, and welcome to Screaming in the Cloud, with your host, cloud economist Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud. This week's episode of Screaming in the Cloud is generously sponsored by DigitalOcean. I would argue that every cloud platform out there biases for different things. Some bias for having every feature you could possibly want offered as a managed service at
Starting point is 00:00:37 varying degrees of maturity. Others bias for, hey, we heard there's some money to be made in the cloud space. Can you give us some of it? DigitalOcean biases for neither. To me, they optimize for simplicity. I polled some friends of mine who are avid DigitalOcean supporters about why they're using it for various things, and they all said more or less the same thing. Other offerings have a bunch of shenanigans, root access and IP addresses. DigitalOcean makes it all simple. In 60 seconds, you have root access to a Linux box with an IP. That's a direct quote, albeit with profanity about other providers taken out. DigitalOcean also offers fixed price offerings. You always know what you're going to wind up paying this month,
Starting point is 00:01:23 so you don't wind up having a minor heart issue when the bill comes in. Their services are also understandable without spending three months going to cloud school. You don't have to worry about going very deep to understand what you're doing. It's click button or make an API call and you receive a cloud resource. They also include very understandable monitoring and alerting. And lastly, they're not exactly what I would call small time. Over 150,000 businesses are using them today. So go ahead and give them a try. Visit do.co slash screaming, and they'll give you a free $100 credit to try it out.
Starting point is 00:02:00 That's do.co slash screaming. Thanks again to DigitalOcean for their support of Screaming in the Cloud. Welcome to Screaming in the Cloud. I'm Corey Quinn. I'm joined this week by Jay Gordon, who's a cloud developer advocate at MongoDB. Welcome to the show, Jay. Hey, Corey. It's good to talk to you. Good to talk to you, too.
Starting point is 00:02:20 It's always good to see you at conferences and have conversations. And of course, on Twitter, you have a verified account, which is how we know that you're important. So anytime you're willing to spend time talking with me, I'm thrilled to give you a platform for it. I don't know how important I am as much as I filled out a form and it worked out. But, you know, it's kind of cool to have a verified account. However, certain people find it as a really, really easy way to say, oh, yeah, whatever, blue check, and pass you along. Oh, I hear you. I filled out a form once, and now people think I know what I'm talking about with cloud computing. It's amazing how the world tends to unfold that way.
Starting point is 00:02:58 Yes. So on the subject of seeing you at conferences, one of the things I really do feel lucky being part of a DevOps, cloud, whatever community, is there's so many cool people to spend time talking with. It's one of those lucky things that I can say I've been able to meet so many awesome people, and Corey, you're one of them. Thank you for having me on today. Thank you for the flattery. It'll get you everywhere. So what should we talk about today? Well, one of the things I want to talk about, obviously, because I work at MongoDB, is I like talking about databases. And distribution of databases, I think, is a really interesting kind of subject because we're kind of in a situation now where it is far easier to consider distributed data to be something that's reliable and available than distributed data not being reliable and available.
Starting point is 00:03:53 So one of the things that I've been thinking about is multi-cloud strategies. And I know you've planned on doing a talk soon about it and about cloud agnosticism. So I thought it would be a really cool subject to talk about today. Absolutely. In fact, when you brought that subject up to me earlier, you weren't aware that I was the person that was giving that talk. It was, what is this? And now, of course, you're taking a much kinder tone than the acts you were bringing to Grind originally. But please, hit me where it hurts.
Starting point is 00:04:22 The talk has been given at a couple of conferences so far, and I'm going to give it a few more. And it's called The Myth of Cloud Agnosticism. It started off as a blog post that I put up on Reactive Ops' blog, and I'll throw a link to that in the show notes. And it's probably the talk that I've given that is the most likely to be misunderstood so far. And I say that having given a talk called Heresy in the Church of Docker that more or less slaughtered a technology everyone loves. The challenge is that it's easy to pick apart
Starting point is 00:04:56 the general premise of cloud agnosticism being a myth if you come at it from a perspective of, well, what about these following list of very reasonable use cases? I agree with that. The point of the talk is that in the general case, when you have a workload that you decide that you want to put on multiple cloud providers, it is usually a crappy idea in that you want to be able to push a button and deploy this application to GCP, press the button again, now it's on Azure, press it again, it's on AWS, press it again,
Starting point is 00:05:36 it's on TED's taxidermy and cloud hosting. And you wind up in a scenario where trying to build that first off is not exactly trivial because these providers all implement things very differently. But also in trying to maintain that as you continue to build because providers change some more than others. The ability to continue working with these things becomes more and more complex as your application gains in complexity. And it's a non-trivial amount of effort. If you take a step back and look at what the stated objective of this is, it's, well, what
Starting point is 00:06:16 if our primary cloud provider disappoints us and we have to move in a hurry? Okay, that's a fair question. But if you take a look at the competitive landscape, companies don't do that very often. When they do, it makes headline news on whatever provider they're moving to. They talk about it in keynote stages. They give big press releases about it. It's not something that companies tend to do on an ongoing basis. Instead, people wind up spending a lot of time buying the option in a way that doesn't really work at the cost of feature velocity, because you wind up giving up a lot of the built-in primitives and advanced services that you can get from any of these providers, as long as you focus on that. And that's the theme of the talk. And it gets a lot more nuanced than
Starting point is 00:07:04 that. But once people hear that, they start to nod and say, yeah, I see what you're going for. Or they start coming at me with the knives of, well, actually. Well, what if you're pager duty, where you need to be more available than any given provider? Okay, that's valid. What if people's lives depend on this? Well, yeah, building in multi-provider redundancy is awesome. But if you see all of GCP or Azure or AWS or DigitalOcean or TED's taxidermy going down, except for that last one, you're going to see that's the day the internet is on fire.
Starting point is 00:07:39 And for many use cases, that's not the end of the world for most companies. Well, it really actually depends though, because you have to really look at it this way. If the viability of my business is completely based on the fact that a service that I provide is online, and at any time I have any real major outages that I not only actively lose money, but lose velocity of my product itself and the understanding people have of it. I like to kind of think of it like this. When we built out network infrastructure for servers and systems and data centers in, say, maybe the late 90s and some of the early 2000s, and even now we do this. We always look to ensure that all
Starting point is 00:08:26 critical paths have some sort of redundancy when it comes to going in and out of the data center network, correct? Absolutely. So I've kind of taken some of this thought process. And while I do believe that, you know, you get the more flexibility in the cloud world to be able to say, you know what, it's time to pivot. It's time to pivot fast. There's still work involved. But I still believe that no matter what, there should be a secondary option when a critical path could come to a breaking point for several reasons. One of them, you know, obviously is network outage or systems outage or something. We saw AWS accidentally fat finger some DNS a few years ago, and look what that did to us all.
Starting point is 00:09:10 You know what I mean? Absolutely. But to stretch your analogy possibly to the breaking point, in a large environment I was exposed to for a while, they ran studies where they wound up having redundant routers. And invariably, they saw far more failures from that redundancy based on heartbeat failures, where they wound up with a split brain scenario and had to effectively take the entire cluster down more than they saw router failures
Starting point is 00:09:38 themselves. The same model, to some extent, starts to apply when you look at this from a multi-provider perspective, in that it is far likelier that you're going to wind up causing an outage in the handoff between one provider to another than you're going to see a multi-region single provider failure that disrupts your site to the point of it going down. I'm not saying it's impossible, and there are edge cases around this, but I do think that for these small startups that are just dipping their toes into the water and experimenting with cloud architecture, one of the things that they optimize for should not be running on any provider under the sun at the push of a button. Sure. I think the one thing, though, that we saw kind of happen in the last maybe three years is the explosion of cloud agnostic tooling.
Starting point is 00:10:28 You know, at first we saw Kubernetes become this thing that we knew, well, you know, you can run it on Google's cloud. But then you saw it have more and more availability on different kind of cloud networks to the point where, you know, all three major clouds have some sort of Kubernetes engine implementation because they know that without providing an existing kind of Kubernetes cluster to spin up on a whim, there is a larger kind of barrier into them getting those workloads onto those clouds. So I've kind of looked at it like this, is that the more we've unified and created tools that do the same thing regardless of the provider, I think the more we're going to be able to see that there's more agnosticism that comes from implementers. And when I say implementers, building, say, a Kubernetes solution that is going to easily be put on any cloud based on what your business is
Starting point is 00:11:27 doing it that day. If Amazon has major issues and all you really need to do is modify DNS to get things going in a new direction, is that really a terrible strategy to take? And this is where it becomes incredibly workload dependent. In some cases, you wind up in a scenario where data gravity dictates your choices. Ooh, we're going to go ahead and have all of our containers spin up in GCP instead of AWS. We'll save 20 cents an hour per container, but we're also going to wind up spending 20 grand moving the data they need to process over into that environment. And that's really a big, big point is that no matter what, bandwidth still isn't free. Exactly. Well well it is
Starting point is 00:12:05 only ingress but but moving your data out of one network into another tends not to be free and we we have to kind of go through that you know i i've kind of seen this before because i've worked with people on inbound migrations to like mongodb we have our own cloud service for for databases called atlas and one of the big things is getting, self-hosted databases onto the service by a migration process. And one of the reasons why I've seen some people do it is that they need to go from one particular provider to another, and we give them easier tools to do it than doing it manually. So we have a live migration tool. You just pop in a host name and it'll slurp the data from where you're, where you're hosting it already to where you want it to be. So that
Starting point is 00:12:50 could easily be, I've got it running on a standalone machine, say it, uh, I don't know, on AWS and I want to move it to Google cloud. You know, we've created tooling around that. And I think that it gives people options, the fact that you can also say, you know what, and this is another one of these things I've heard around multi-cloud, and you can tell me if you think this is a real valid reason, is the fact that certain services are just available on one particular cloud just based on licensing. And look at TensorFlow, I think, is the easiest way to think about it, Is that if you do go through a multi-cloud strategy, that you could easily push data from one particular cloud to another and be able to utilize a lot of these different services so that you can say, you know what? If I need to do a big ML model on something and I want to do it with tools that Google's provided, I can easily have my data within there and not have to go through the whole rigmarole of migration. Absolutely. But as a counterpoint,
Starting point is 00:13:50 you're also getting into one of the early called out edge cases of this, where the idea of having multiple cloud providers for a business is not necessarily a terrible one. My, I guess, campaigning against being cloud agnostic generally is restricted to the workload level. If you have a particular application or a particular workload that you're trying to get to work across multiple providers, that's often painful. But if you're talking about having the web services live in AWS and the machine learning piece that chews on that data living in GCP, that's a very viable model that I wouldn't argue with you on. So one of the things that I really like the idea about is, one, I like big major services kind of being the way you handle maybe more persistent parts of your architecture. I like the idea of using a database service
Starting point is 00:14:48 like what Amazon provides or what we provide with Atlas. The fact that you can just spin up databases and use native tooling around them for these providers because they've all got some sort of way to move data in and out, whether it's using AWS CLI or using some sort of native. So there are always ways that you can go and grab this data, do something with it, and easily move it to other providers. Because I like on the fly being able to say,
Starting point is 00:15:14 I want to just connect my front-end application to this database, and I don't want to have to worry about reconfiguring the database. Right. And that's really one of the challenges in the market right now with the way that AI and machine learning have evolved. If your data fits on a drive, you're probably not going to have success. The single biggest predictor right now of success in any form of machine learning is whether you have a large enough data set to operate on. So when you're into the multi-petabytes, now you're starting to have some serious opportunities. But you're right, there it becomes very difficult to relocate that data, at least any reasonable percentage of it.
Starting point is 00:15:56 Yeah, because you're still dealing with the same things that we've always dealt with in the past, and that's data transfer. You're limited by whatever your throughput is on your lines, and more than anything, your costs. I mean, I know we talked about it just a minute ago, but I still believe that one of the big reasons why probably people haven't selected multi-cloud is the upfront kind of thought and process around getting data over and what it would cost them to run two infrastructures simultaneously. And I think that that probably is a leading factor why companies are choosing to kind of stick with one cloud. To an extent, yes. But in other cases, when we're
Starting point is 00:16:39 talking large enterprise scale, where they're doing deals in the eight to nine figure annual ranges with these large cloud providers, it becomes very difficult, first off, to wind up making a compelling case for putting all of that in a single provider and not looking either irresponsible or like you're being paid off underneath the table. So that becomes one challenge of it. The other is that there's an idea that you can then wind up having some negotiating room by transitioning workloads as leverage during contract negotiations. That is often a bit of a red herring. If you take a look globally at any company's use of cloud providers, one thing you almost never see is the number getting smaller.
Starting point is 00:17:27 Invariably, people tend to expand their footprint. They don't tend to reduce it very often. Heck, I spent most of my time working on optimization of AWS bills. And a year later, I find that most of my clients are spending more than they were when I started. They're just doing it more efficiently. Things continue to grow. That is what companies aspire to do. This is not a bad thing. Instead, it turns very much into an arena of focusing on what it is exactly the company needs to do. And cost, surprisingly, is not as much of a driver behind corporate decision-making as people tend to believe it is. Companies are willing to spend money in order to expand into new markets, to advance new features, to grow revenue.
Starting point is 00:18:12 And it just comes down to a unit economics discussion. So here's a question for you then. How much do you think that, say, RTO and RPO play to people's decisions to maybe consider multi-cloud whereas a failure say on something that needs to be restored maybe because of the network you can't restore it to that local cloud you can do it onto another one i'm curious if you think that that specific case not necessarily disaster recovery but maybe just a portion of a disaster recovery. I'm just trying to think if people really look to replace what would be just go back, get the backups, restore them, as opposed to let's just spin up a new environment on a different cloud and work through that and restore based on what we already have. Right. Let's define terms for those who didn't grow up building DR plans for fun. Our PO is when, from the time an incident occurs, what is the maximum exposure of lost data? In
Starting point is 00:19:14 other words, if you're restoring from backup, how long has it been since your last backup? Our TO is from the time the site goes down or is impacted, how long will it be until you're back up and running and able to service customers at some baseline level? And it's a great question. The right answer around a lot of this is going to be extremely workload dependent. For high availability services where latency is critical, I'll go back to pager duty as a good example of that, the tolerance for failure is extremely low. You're not going to be able to talk around that with,
Starting point is 00:19:50 ah, we're just going to take a six-hour outage and it's fine. An example of this, the other direction, is a few years back, I was trying to buy something. I think it was a package of socks on Amazon.com. Because I lead an exciting life, that's what I buy. And it threw a 500 error, which was bizarre. It was one of those things you don't see very often. I tried it again through the error. And in a lot of DR plans, there would be an assumption built into this that therefore,
Starting point is 00:20:18 because I could not buy those socks at that moment. I either never bought those socks again, and I'm walking around barefoot one day out of seven, or I instead went to another provider and went to all the rigmaroles setting up an account and ordering the socks then. In practice, I waited an hour, the problem went away, and I bought my pair of socks, and I went on with my life. Now, the fact that that worked is based on two specific facts about that use case. One, that it's a purchase that I'm making intentionally. If this were a company that were serving ads, I'm not going to come back and view those ads later in time. That opportunity is lost. And two, this doesn't happen every third time I try to buy something on amazon.com. If it did, I'd probably be spending a lot more money at Target instead because there's a reputational damage of being the site that's always down where
Starting point is 00:21:10 people no longer want to trust you with their purchases. Interesting. I'm not sure if that answers your question about how this applies to DR concepts, but it does tend to lead to the problem of when one of these large providers is having problems, there are a lot of problems that are second and third order effects. If you see, for example, US East 1 go down, because that's what it does, it breaks. And there have been a number of failure cases over the years where suddenly other regions become overloaded, provisioning calls take longer and longer, because everyone is failing over. They're saturating links, they're hammering APIs that don't generally see that level of traffic within a couple orders of magnitude.
Starting point is 00:21:53 And to that end, people have to start planning for things like that when they're building out a DR plan. In many cases, a lot of the automated tools, when I point at an AWS account, will say, ah, here's a bunch of idle instances. Turn them off. Well, that's a DR site, and we kind of want to be able to fail over there within seconds. We're not going to have time to wait for a laden provisioning backplane. We want them up and running now.
Starting point is 00:22:18 It's the same type of principle of what are the disaster types you're planning for, and how do you intend to wind up handling the failover process? Not to mention the failback, which is a whole separate category of issue. And rest assured, regardless of what you plan for, you're missing something. The only way you find all of the corner cases and edge cases is to live forever. Yeah, and even when you get there, you'll probably miss out on a few of them. Oh, absolutely. So one of the things I also wanted to kind of talk about, because you had mentioned to me that you really haven't spent a lot of time around databases.
Starting point is 00:22:59 And I was curious if there were any kind of questions around the world of databases, especially distributed databases, that you'd like to hear a little bit more about. Oh, absolutely. I wound up playing around with databases a bit in my youth. I can set up my SQL. I can set up replication. I can pass it over to a professional when I see something I don't recognize, which is pretty much everything past that point. And then I wash my hands of it and move on with life. The challenge, of course, from my perspective, is when I'm building something new, what database or what data store do I wind up selecting for weird, arbitrary use cases? I mean, there are times where in some cases I'm dealing with small enough data volumes that a flat file in CSV format living in S3 is more than sufficient for what I do. There's also the argument to completely over-engineer something using a bunch of very late bleeding edge systems
Starting point is 00:23:50 that are effectively ACID-compliant global world-spanning databases. Google's Cloud Spanner and I believe Cosmos DB from Azure tend to qualify there, as well as the upcoming announced Aurora Master-Master multi-region. The challenge, though, is the consensus that emerges consistently is that whatever I'm deciding to use, I'm wrong for using it. And it's always challenging for me to figure out what is the right answer. One thing that has been earning me global condemnation for
Starting point is 00:24:22 is using Amazon's Secrets Manager, where the idea is it holds secure data encrypted using KMS and provides that to your applications for 40 cents per month per secret. Well, that just sounds like an expensive database. So couldn't I theoretically just store all of my transactions in that and call it good? And people look at me with a look of horror and start backing away slowly. Yeah, because you still have to look at databases as a transactional thing. Or I shouldn't say transactional. I guess the best way to kind of look at databases are active or operational pieces of infrastructure. And I guess that people are still kind of concerned on whether encryption level or being able to store your secrets at that level is good enough. I don't know if that's the real right term is good enough. But there's always been these secondary use cases for stores like, I think about Chef and I think about the encrypted data bag or think about other tools. Which sounds like a spectacular insult to throw at someone yes or like you know uh encrypted yaml so that you can
Starting point is 00:25:31 keep the secrets for you know your yaml based products uh or or you know now we get more modern with kms and vault that are all out there so all these subjects exist and i guess that or i should say all these products exist. And I guess that, or I should say all these products exist, and I guess you wouldn't be that far off by calling them databases. But some of the things that they tend to not do is have distribution data available, ways to do queries around that. You're basically just providing one particular use case. And that is, I need information, provide me that information,
Starting point is 00:26:08 decrypt the information that I want. In databases, you can do that as well. You can basically make a query that wanted some encrypted data to return it back to you unencrypted. That's possible. But I think that the one thing that's really the differentiator between just like something that's a key store
Starting point is 00:26:24 and something that's a key store and something that's a database is the ability to really or even for that matter for s3 is the ability to really run very complex queries against that and with mongodb you can run complex aggregations against a lot of that data and you don don't have to do that from, say, a client level. You can do it all on the server side because we have an aggregation framework that allows you to ask really complex questions without having to put the load, say, on the client side if you're bringing over that data in some sort of application that eventually gets presented to the user. So I think that ultimately, while those services, I see them as simple databases, they don't really have data structure methods. They have file systems, at least in the way of
Starting point is 00:27:13 S3 is concerned. I can't really speak to what particular data structures that KMS has or Vault, but the biggest thing that I can say that differentiates the two is the level of security encryption around what data store or I should say private key stores provide. And obviously data stores are more focused on performance and returning information to you. Oh, absolutely. I mean, it's the idea of using Secrets Manager as a MongoDB replacement is completely ludicrous. The reason I tend to go in that direction is because it exposes a half dozen different misconceptions that many people, including me, tend to have around data stores in their entirety. And being able to address those
Starting point is 00:27:58 in a somewhat reasoned way starts to shed light on how some of these tools can or should be used. I wound up building a bunch of the pipeline process that builds my newsletter every week using DynamoDB because it was there. I didn't have to manage any infrastructure myself. It was effectively given to me under the permanent free tier. And last time I checked, I was storing 150K in that at any given point in time. So realistically, I'm at a point where a flat file would probably have been sufficient were it not for a couple of edge
Starting point is 00:28:31 case S3 race conditions. So this wound up more or less being my first outing into the world of non-relational data stores. And my brain is full and I wish to be excused. So it winds up being something that isn't exactly intuitive to the way that I see the world. Well, the one thing that's great about MongoDB is, one, you can run it basically anywhere, and it's one of the big core tenants of the product itself, is the fact that we say that if you have a CPU,
Starting point is 00:29:00 you could probably run MongoDB on that particular system. So, I mean, it's there. And while it's not in, say, RDS or one of those services, Mongo is because of our licensing method. We had to go out and say, let's go ahead and create our own cloud. And let's put it on all three of the major clouds so that you have those options to go wherever it is you want it. And let's make it easy to get started. So just like, say, AWS and Dynamo, we put together a free tier for people also. So what we're also kind of trying to do
Starting point is 00:29:35 is give people great use cases and reasons. So we have a developer advocacy team that I'm on, and we kind of show people that databases make sense and MongoDB makes sense for applications because of the way data is stored. We look back at how databases are traditionally kind of thought about and they were thought of as tabular. And when I say tabular, I think of it like a bunch of tabs or just parts of an Excel spreadsheet. And so as data becomes more complex, using that Excel spreadsheet ended up becoming more kind of ridiculous and data became far more complex to manage. And we saw a revolution around JSON, and I don't want to call it a revolution, but we saw people finding JSON to be a much more reasonable way to manage data because it reads like a menu instead of going through pages and pages of typewriter text and re-connoitering
Starting point is 00:30:38 them and flattening them and then taking that data and processing it and then building SQL migration so that you can get data from one place to another, it just became very difficult. I'm still going to expose some of my ignorance here. Just from an old-school, hands-on hardware configuration type, I still find YAML more understandable and readable than I do JSON. There's probably something profoundly wrong with me. Well, you know, JSON makes sense, I guess, to people who've been working, one, with JavaScript for a long time.
Starting point is 00:31:07 I think the other thing that makes a lot of sense about JSON to me is that it's so heavily linted that most major services that allow you to implement JSON as part of what your application is or whether you're importing data, they'll just basically tell you, your JSON's broken, this isn't going to work. And that's the kind of thing that I really dig about JSON. It's just there's a lot of reliability in the way that you can format the files. The cool thing about Mongo is that you can nest arrays and really build rich, rich kind of documents so that you don't necessarily need to have thousands and thousands of documents on a specific subject. You can keep everything in one particular document, be able to go and grab all your information out of that JSON data, and then present it for the application that you want to use. Absolutely. I think the one thing that we can all agree on is that XML is terrible.
Starting point is 00:31:57 Yes. And that's the big thing that Mongo provides you. So you don't have to spend a whole bunch of time, speaking of XML, writing ODMs and managing objects with your database because everything in MongoDB is considered an object, so you can query everything. So the one thing that's nice is you don't really have to spend time in XML at all. And that, to me, is a definite uptick of why people would want to select one over the other as far as, say, relational databases as compared to, say, something like MongoDB. But I'm not here to sell people MongoDB. But I think it's a good option for people that want to build applications quicker, faster, and they don't necessarily want to do a lot of the infrastructure work. You know, there's Atlas, and it does a lot of those things
Starting point is 00:32:40 that people really trusted in the cloud lately with services. Okay. This has been helpful, and I definitely appreciate your taking the time to go through this with me. But I have one question that is probably going to ruin our friendship before we call it an episode. And that is quite simply, every time I refer to something, be it Mongo, be it almost anything else, as a database, I'm reminded of the tagline of that service,
Starting point is 00:33:06 which is, it's not a database, it's a fill in the blank, document store, data store, et cetera, et cetera. Why? What's behind that nomenclature war? I think a lot of it has to do with the fact that there are several companies that started kind of calling themselves NoSQL. And they all did it at the same time, but they presented data in a different way, whether it's, you know, wide column or in document and JSON. It just became one of those things where people really wanted to call these things different names. And I think it was because they were trying to differentiate themselves from just being referred to as a NoSQL database. MongoDB, we consider ourselves a document database, and that's because our primary goal is to store JSON-based documents in a database and have it easily retrieved.
Starting point is 00:33:59 It doesn't mean that you can't store binary data, but that binary data will be stored within JSON documents in an encrypted, or I should say an encoded method. So, you know, why there's these wars over what things are called, it's really been difficult to tell, because I think it's just about differentiating yourself from the rest of the crowd. That really has been the only thing that I can think of. Good to know that there's not some key distinction there that I've just been asleep at the wheel for 15 years and missed. I mean, there are key value stores, there are document stores, they're all kind of in the end kind of doing something not far off, which is providing you an answer to a key and a value. It's how big that key and value can be within the total document that really differentiates it between, say, what something in Dynamo is compared to something that's in MongoDB.
Starting point is 00:34:52 Which makes an awful lot of sense. Yeah, it's the richness, the richness of data, if you will. Thank you so much for taking time out of your day to speak with me. No problem. I always enjoy speaking with you, Corey. It's one of those great, great luxuries that people that work in technology get. It's once in a while, we'll find you and we'll get to talk to you. When they're really unlucky, I start talking at them and then all heck breaks loose. I don't know if it's all heck, but it's certainly entertaining nonetheless. Well, thank you. My name is Corey Quinn. This has been Jay Gordon from MongoDB,
Starting point is 00:35:24 and this is Screaming in the Cloud.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.