Screaming in the Cloud - Episode 23: Most Likely to be Misunderstood: The Myth of Cloud Agnosticism
Episode Date: August 10, 2018It is easy to pick apart the general premise of Cloud agnosticism being a myth. What about reasonable use cases? Well, generally, when you have a workload that you want to put on multiple Clo...ud providers, it is a bad idea. It’s difficult to build and maintain. Providers change, some more than others. The ability to work with them becomes more complex. Yet, Cloud providers rarely disappoint you enough to make you hurry and go to another provider. Today, we’re talking to Jay Gordon, Cloud developer advocate for MongoDB, about databases, distribution of databases, and multi-Cloud strategies. MongoDB is a good option for people who want to build applications quicker and faster but not do a lot of infrastructural work. Some of the highlights of the show include: Easier to consider distributed data to be something reliable and available, than not being reliable and available People spend time buying an option that doesn’t work, at the cost of feature velocity If Cloud provider goes down, is it the end of the world? Cloud offers greater flexibility; but no matter what, there should be a secondary option when a critical path comes to a breaking point Hand-off from one provider to another is more likely to cause an outage than a multi-region single provider failure Exclusion of Cloud Agnostic Tooling: The more we create tools that do the same thing regardless of provider, there will be more agnosticism from implementers Workload-dependent where data gravity dictates choices; bandwidth isn’t free Certain services are only available on one Cloud due to licensing; but tools can help with migration Major service providers handle persistent parts of architecture, and other companies offer database services and tools for those providers Cost may/may not be a factor why businesses stay with 1 instead of multi-Cloud How much RPO and RTO play into a multi-Cloud decision Selecting a database/data store when building; consider security encryption Links: Jay Gordon on Twitter MongoDB The Myth of Cloud Agnosticism Heresy in the Church of Docker Kubernetes Amazon Secrets Manager JSON Digital Ocean .
Transcript
Discussion (0)
Hello, and welcome to Screaming in the Cloud, with your host, cloud economist Corey Quinn.
This weekly show features conversations with people doing interesting work in the world
of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles
for which Corey refuses to apologize.
This is Screaming in the Cloud.
This week's episode of Screaming in the Cloud is generously sponsored
by DigitalOcean. I would argue that every cloud platform out there biases for different things.
Some bias for having every feature you could possibly want offered as a managed service at
varying degrees of maturity. Others bias for, hey, we heard there's some money to be made in the cloud space. Can you give us some of it?
DigitalOcean biases for neither. To me, they optimize for simplicity. I polled some friends of mine who are avid DigitalOcean supporters about why they're using it for various things,
and they all said more or less the same thing. Other offerings have a bunch of shenanigans,
root access and IP addresses.
DigitalOcean makes it all simple.
In 60 seconds, you have root access to a Linux box with an IP.
That's a direct quote, albeit with profanity about other providers taken out.
DigitalOcean also offers fixed price offerings. You always know what you're going to wind up paying this month,
so you don't wind up having a minor heart issue when the bill comes in.
Their services are also understandable without spending three months going to cloud school.
You don't have to worry about going very deep to understand what you're doing.
It's click button or make an API call and you receive a cloud resource.
They also include very understandable monitoring and alerting.
And lastly, they're not
exactly what I would call small time. Over 150,000 businesses are using them today. So go ahead and
give them a try. Visit do.co slash screaming, and they'll give you a free $100 credit to try it out.
That's do.co slash screaming. Thanks again to DigitalOcean for their support of Screaming in the Cloud.
Welcome to Screaming in the Cloud.
I'm Corey Quinn.
I'm joined this week by Jay Gordon, who's a cloud developer advocate at MongoDB.
Welcome to the show, Jay.
Hey, Corey.
It's good to talk to you.
Good to talk to you, too.
It's always good to see you at conferences and have conversations.
And of course, on Twitter, you have a verified account, which is how we know that you're important. So anytime you're willing
to spend time talking with me, I'm thrilled to give you a platform for it. I don't know how
important I am as much as I filled out a form and it worked out. But, you know, it's kind of
cool to have a verified account. However, certain people find it as a really, really easy way to say, oh, yeah, whatever, blue check, and pass you along.
Oh, I hear you.
I filled out a form once, and now people think I know what I'm talking about with cloud computing.
It's amazing how the world tends to unfold that way.
Yes.
So on the subject of seeing you at conferences, one of the things I really do feel lucky being part of a DevOps, cloud, whatever community, is there's so many cool people to spend time talking with.
It's one of those lucky things that I can say I've been able to meet so many awesome people, and Corey, you're one of them.
Thank you for having me on today.
Thank you for the flattery. It'll get you everywhere. So what should we talk about today?
Well, one of the things I want to talk about, obviously, because I work at MongoDB,
is I like talking about databases. And distribution of databases, I think,
is a really interesting kind of subject because we're kind of in a situation now where it is far easier to consider distributed data to be something that's reliable and available than distributed data not being reliable and available.
So one of the things that I've been thinking about is multi-cloud strategies.
And I know you've planned on doing a talk soon about it and about cloud agnosticism.
So I thought it would be a really cool subject to talk about today.
Absolutely.
In fact, when you brought that subject up to me earlier, you weren't aware that I was the person that was giving that talk.
It was, what is this?
And now, of course, you're taking a much kinder tone than the acts you were bringing to Grind originally.
But please, hit me where it hurts.
The talk has been given at a couple of conferences so
far, and I'm going to give it a few more. And it's called The Myth of Cloud Agnosticism.
It started off as a blog post that I put up on Reactive Ops' blog, and I'll throw a link to that
in the show notes. And it's probably the talk that I've given that is the most likely to be
misunderstood so far. And I say that having given a talk called
Heresy in the Church of Docker
that more or less slaughtered a technology everyone loves.
The challenge is that it's easy to pick apart
the general premise of cloud agnosticism being a myth
if you come at it from a perspective of,
well, what about these following list of very reasonable use cases? I agree with that. The point of the talk is that in the general case,
when you have a workload that you decide that you want to put on multiple cloud providers,
it is usually a crappy idea in that you want to be able to push a button
and deploy this application to GCP,
press the button again, now it's on Azure,
press it again, it's on AWS, press it again,
it's on TED's taxidermy and cloud hosting.
And you wind up in a scenario where
trying to build that first off is not exactly trivial because these providers all implement things very differently.
But also in trying to maintain that as you continue to build because providers change some more than others.
The ability to continue working with these things becomes more and more complex
as your application gains in complexity.
And it's a non-trivial amount of effort.
If you take a step back and look at what the stated objective of this is, it's, well, what
if our primary cloud provider disappoints us and we have to move in a hurry?
Okay, that's a fair question.
But if you take a look at the competitive landscape,
companies don't do that very often. When they do, it makes headline news on whatever provider they're moving to. They talk about it in keynote stages. They give big press releases about it.
It's not something that companies tend to do on an ongoing basis. Instead, people wind up spending a lot of time buying the option in a way that doesn't
really work at the cost of feature velocity, because you wind up giving up a lot of the
built-in primitives and advanced services that you can get from any of these providers,
as long as you focus on that. And that's the theme of the talk. And it gets a lot more nuanced than
that. But once people hear that, they start to nod and say, yeah, I see what you're going for.
Or they start coming at me with the knives of, well, actually.
Well, what if you're pager duty, where you need to be more available than any given provider?
Okay, that's valid.
What if people's lives depend on this?
Well, yeah, building in multi-provider redundancy is awesome.
But if you see all of GCP or Azure or AWS or DigitalOcean or TED's taxidermy going down,
except for that last one, you're going to see that's the day the internet is on fire.
And for many use cases, that's not the end of the world for most companies.
Well, it really actually depends though, because you have to really look at it this way.
If the viability of my business is completely based on the fact that a service that I provide
is online, and at any time I have any real major outages that I not only actively lose money, but
lose velocity of my
product itself and the understanding people have of it. I like to kind of think of it like this.
When we built out network infrastructure for servers and systems and data centers in, say,
maybe the late 90s and some of the early 2000s, and even now we do this. We always look to ensure that all
critical paths have some sort of redundancy when it comes to going in and out of the data center
network, correct? Absolutely. So I've kind of taken some of this thought process. And while I do
believe that, you know, you get the more flexibility in the cloud world to be able to say, you know what, it's time to pivot.
It's time to pivot fast.
There's still work involved.
But I still believe that no matter what, there should be a secondary option when a critical path could come to a breaking point for several reasons.
One of them, you know, obviously is network outage or systems outage or something. We saw AWS accidentally fat finger some DNS a few years ago,
and look what that did to us all.
You know what I mean?
Absolutely.
But to stretch your analogy possibly to the breaking point,
in a large environment I was exposed to for a while,
they ran studies where they wound up having redundant routers.
And invariably, they saw far more
failures from that redundancy based on heartbeat failures, where they wound up with a split brain
scenario and had to effectively take the entire cluster down more than they saw router failures
themselves. The same model, to some extent, starts to apply when you look at this from a multi-provider perspective,
in that it is far likelier that you're going to wind up causing an outage in the handoff between
one provider to another than you're going to see a multi-region single provider failure that
disrupts your site to the point of it going down. I'm not saying it's impossible, and there are
edge cases around this, but I do think that for these small startups that are just dipping their toes into the water
and experimenting with cloud architecture, one of the things that they optimize for should not be
running on any provider under the sun at the push of a button. Sure. I think the one thing,
though, that we saw kind of happen in the last maybe three years is the explosion of cloud agnostic tooling.
You know, at first we saw Kubernetes become this thing that we knew, well, you know, you can run it on Google's cloud.
But then you saw it have more and more availability on different kind of cloud networks to the point where, you know, all three major clouds have some sort of Kubernetes engine implementation
because they know that without providing an existing kind of Kubernetes cluster
to spin up on a whim,
there is a larger kind of barrier into them getting those workloads onto those clouds.
So I've kind of looked at it like this,
is that the more we've unified and created tools that do the same thing regardless of the provider, I think the more we're going to be able to see that there's more agnosticism that comes from implementers.
And when I say implementers, building, say, a Kubernetes solution that is going to easily be put on any cloud based on what your business is
doing it that day. If Amazon has major issues and all you really need to do is modify DNS to get
things going in a new direction, is that really a terrible strategy to take?
And this is where it becomes incredibly workload dependent. In some cases, you wind up in a scenario
where data gravity dictates your choices.
Ooh, we're going to go ahead and have all of our containers spin up in GCP instead of AWS.
We'll save 20 cents an hour per container, but we're also going to wind up spending 20 grand moving the data they need to process over into that environment.
And that's really a big, big point is that no matter what, bandwidth still isn't free.
Exactly. Well well it is
only ingress but but moving your data out of one network into another tends not to be free and we
we have to kind of go through that you know i i've kind of seen this before because i've worked
with people on inbound migrations to like mongodb we have our own cloud service for for databases
called atlas and one of the big things is getting, self-hosted databases onto the service by a migration process.
And one of the reasons why I've seen some people do it is that they need to go from one particular provider to another, and we give them easier tools to do it than doing it manually.
So we have a live migration tool.
You just pop in a host name and it'll slurp
the data from where you're, where you're hosting it already to where you want it to be. So that
could easily be, I've got it running on a standalone machine, say it, uh, I don't know,
on AWS and I want to move it to Google cloud. You know, we've created tooling around that.
And I think that it gives people options, the fact that you can also say,
you know what, and this is another one of these things I've heard around multi-cloud,
and you can tell me if you think this is a real valid reason, is the fact that certain services
are just available on one particular cloud just based on licensing. And look at TensorFlow,
I think, is the easiest way to think about it, Is that if you do go through a multi-cloud strategy, that you could easily push data from one particular cloud to another and be able to utilize a lot of these different services so that you can say, you know what?
If I need to do a big ML model on something and I want to do it with tools that Google's provided, I can easily have my data within there and not have to go through the whole rigmarole of migration. Absolutely. But as a counterpoint,
you're also getting into one of the early called out edge cases of this, where the idea of having
multiple cloud providers for a business is not necessarily a terrible one. My, I guess, campaigning against being cloud agnostic generally is restricted
to the workload level. If you have a particular application or a particular workload that you're
trying to get to work across multiple providers, that's often painful. But if you're talking about
having the web services live in AWS and the machine learning piece that chews on that data living in GCP, that's a very
viable model that I wouldn't argue with you on.
So one of the things that I really like the idea about is, one, I like big major services
kind of being the way you handle maybe more persistent parts of your architecture. I like the idea of using a database service
like what Amazon provides or what we provide with Atlas.
The fact that you can just spin up databases
and use native tooling around them for these providers
because they've all got some sort of way to move data in and out,
whether it's using AWS CLI or using some sort of native.
So there are always ways that you can go and grab this data, do something with it,
and easily move it to other providers.
Because I like on the fly being able to say,
I want to just connect my front-end application to this database,
and I don't want to have to worry about reconfiguring the database.
Right. And that's really one of the challenges in the market right now with the way that AI and
machine learning have evolved. If your data fits on a drive, you're probably not going to have
success. The single biggest predictor right now of success in any form of machine learning is
whether you have a large enough data set to operate on. So when you're into the multi-petabytes, now you're starting to have some serious opportunities.
But you're right, there it becomes very difficult to relocate that data, at least any reasonable
percentage of it.
Yeah, because you're still dealing with the same things that we've always dealt with in
the past, and that's data transfer.
You're limited by whatever your throughput is on your
lines, and more than anything, your costs. I mean, I know we talked about it just a minute ago,
but I still believe that one of the big reasons why probably people haven't selected multi-cloud
is the upfront kind of thought and process around getting data over and what it would cost them to run two
infrastructures simultaneously. And I think that that probably is a leading factor why companies
are choosing to kind of stick with one cloud. To an extent, yes. But in other cases, when we're
talking large enterprise scale, where they're doing deals in the eight to nine figure annual ranges with these
large cloud providers, it becomes very difficult, first off, to wind up making a compelling case
for putting all of that in a single provider and not looking either irresponsible or like you're
being paid off underneath the table. So that becomes one challenge of it. The other is that
there's an idea that you can
then wind up having some negotiating room by transitioning workloads as leverage during
contract negotiations. That is often a bit of a red herring. If you take a look globally at any
company's use of cloud providers, one thing you almost never see is the number getting smaller.
Invariably, people tend to expand their footprint. They don't tend to reduce it very often.
Heck, I spent most of my time working on optimization of AWS bills. And a year later,
I find that most of my clients are spending more than they were when I started.
They're just doing it more efficiently.
Things continue to grow. That is what companies aspire to do. This is not a bad thing. Instead,
it turns very much into an arena of focusing on what it is exactly the company needs to do.
And cost, surprisingly, is not as much of a driver behind corporate decision-making as people tend to believe it is. Companies are
willing to spend money in order to expand into new markets, to advance new features, to grow revenue.
And it just comes down to a unit economics discussion. So here's a question for you then.
How much do you think that, say, RTO and RPO play to people's decisions to maybe consider multi-cloud whereas a failure
say on something that needs to be restored maybe because of the network you can't restore it to
that local cloud you can do it onto another one i'm curious if you think that that specific case
not necessarily disaster recovery but maybe just a portion of a disaster recovery.
I'm just trying to think if people really look to replace what would be just go back, get the backups, restore them, as opposed to let's just spin up a new environment on a different cloud and work through that and restore based on what we already have.
Right. Let's define terms for those who didn't grow up building DR plans for fun.
Our PO is when, from the time an incident occurs, what is the maximum exposure of lost data? In
other words, if you're restoring from backup, how long has it been since your last backup?
Our TO is from the time the site goes down or is impacted, how long will it be until you're back up and running
and able to service customers at some baseline level?
And it's a great question.
The right answer around a lot of this is going to be extremely workload dependent.
For high availability services where latency is critical, I'll go back to pager duty as
a good example of that, the tolerance for failure is extremely low.
You're not going to be able to talk around that with,
ah, we're just going to take a six-hour outage and it's fine.
An example of this, the other direction, is a few years back,
I was trying to buy something.
I think it was a package of socks on Amazon.com.
Because I lead an exciting life, that's what I buy.
And it threw a 500 error,
which was bizarre. It was one of those things you don't see very often. I tried it again through
the error. And in a lot of DR plans, there would be an assumption built into this that therefore,
because I could not buy those socks at that moment. I either never bought those socks again, and I'm walking around barefoot one day out of seven, or I instead went to another provider
and went to all the rigmaroles setting up an account and ordering the socks then.
In practice, I waited an hour, the problem went away, and I bought my pair of socks,
and I went on with my life. Now, the fact that that worked is based on two specific
facts about that use case.
One, that it's a purchase that I'm making intentionally. If this were a company that were serving ads, I'm not going to come back and view those ads later in time. That opportunity
is lost. And two, this doesn't happen every third time I try to buy something on amazon.com.
If it did, I'd probably be spending a lot more money at Target instead because there's a reputational damage of being the site that's always down where
people no longer want to trust you with their purchases. Interesting. I'm not sure if that
answers your question about how this applies to DR concepts, but it does tend to lead to the problem
of when one of these large providers is having problems, there are a lot of problems
that are second and third order effects. If you see, for example, US East 1 go down, because that's
what it does, it breaks. And there have been a number of failure cases over the years where
suddenly other regions become overloaded, provisioning calls take longer and longer,
because everyone is failing over. They're saturating links, they're hammering
APIs that don't generally see that level of traffic within a couple orders of magnitude.
And to that end, people have to start planning for things
like that when they're building out a DR plan. In many cases,
a lot of the automated tools, when I point at an AWS account, will say,
ah, here's a bunch of idle instances.
Turn them off.
Well, that's a DR site, and we kind of want to be able to fail over there within seconds.
We're not going to have time to wait for a laden provisioning backplane.
We want them up and running now.
It's the same type of principle of what are the disaster types you're planning for, and how do you intend to wind up handling the failover process?
Not to mention the failback, which is a whole separate category of issue.
And rest assured, regardless of what you plan for, you're missing something.
The only way you find all of the corner cases and edge cases is to live forever.
Yeah, and even when you get there, you'll probably miss out on a few of them.
Oh, absolutely.
So one of the things I also wanted to kind of talk about,
because you had mentioned to me that you really haven't spent a lot of time around databases.
And I was curious if there were any kind of questions around the world of databases,
especially distributed databases, that you'd like to hear a little bit more about. Oh, absolutely.
I wound up playing around with databases a bit in my youth. I can set up my
SQL. I can set up replication. I can pass it over to a professional when I see something I don't
recognize, which is pretty much everything past that point. And then I wash my hands of it and
move on with life. The challenge, of course, from my perspective, is when I'm building something new, what database or what data store do I wind up selecting for weird, arbitrary use cases?
I mean, there are times where in some cases I'm dealing with small enough data volumes that a flat file in CSV format living in S3 is more than sufficient for what I do. There's also the argument to completely over-engineer something
using a bunch of very late bleeding edge systems
that are effectively ACID-compliant global world-spanning databases.
Google's Cloud Spanner and I believe Cosmos DB from Azure
tend to qualify there,
as well as the upcoming announced Aurora Master-Master multi-region.
The challenge, though, is the consensus that emerges consistently
is that whatever I'm deciding to use, I'm wrong for using it.
And it's always challenging for me to figure out what is the right answer.
One thing that has been earning me global condemnation for
is using Amazon's Secrets Manager, where the idea is it
holds secure data encrypted using KMS and provides that to your applications for 40 cents per month
per secret. Well, that just sounds like an expensive database. So couldn't I theoretically
just store all of my transactions in that and call it good? And people look at me with a look
of horror and start backing away slowly. Yeah, because you still have to look at databases as a transactional thing. Or I shouldn't
say transactional. I guess the best way to kind of look at databases are active or operational
pieces of infrastructure. And I guess that people are still kind of concerned on whether encryption level or being able to store your secrets at that level is good enough. I don't know if that's the real right term is good enough. But there's always been these secondary use cases for stores like, I think about Chef and I think about the encrypted data bag or think about other tools.
Which sounds like a spectacular insult to throw at someone yes or like you know uh encrypted yaml so that you can
keep the secrets for you know your yaml based products uh or or you know now we get more
modern with kms and vault that are all out there so all these subjects exist and i guess that
or i should say all these products exist. And I guess that, or I should
say all these products exist, and I guess you wouldn't be that far off by calling them databases.
But some of the things that they tend to not do is have distribution data available,
ways to do queries around that. You're basically just providing one particular use case. And that
is, I need information,
provide me that information,
decrypt the information that I want.
In databases, you can do that as well.
You can basically make a query that wanted some encrypted data
to return it back to you unencrypted.
That's possible.
But I think that the one thing
that's really the differentiator
between just like something that's a key store
and something that's a key store and
something that's a database is the ability to really or even for that matter for s3
is the ability to really run very complex queries against that and with mongodb you can run complex
aggregations against a lot of that data and you don don't have to do that from, say, a client level. You can do it all on the server side because we have an aggregation framework
that allows you to ask really complex questions without having to put the load,
say, on the client side if you're bringing over that data in some sort of application
that eventually gets presented to the user. So I think that ultimately, while those services, I see them as simple databases,
they don't really have data structure methods. They have file systems, at least in the way of
S3 is concerned. I can't really speak to what particular data structures that KMS has or Vault,
but the biggest thing that I can say that differentiates the two is the level of
security encryption around what data store or I should say private key stores provide. And
obviously data stores are more focused on performance and returning information to you.
Oh, absolutely. I mean, it's the idea of using Secrets Manager as a MongoDB replacement is
completely ludicrous. The reason I tend to go in that
direction is because it exposes a half dozen different misconceptions that many people,
including me, tend to have around data stores in their entirety. And being able to address those
in a somewhat reasoned way starts to shed light on how some of these tools can or should be used.
I wound up building a bunch of the pipeline process
that builds my newsletter every week using DynamoDB because it was there.
I didn't have to manage any infrastructure myself.
It was effectively given to me under the permanent free tier.
And last time I checked, I was storing 150K in that at any given point in time.
So realistically, I'm at a point
where a flat file would probably have been sufficient were it not for a couple of edge
case S3 race conditions. So this wound up more or less being my first outing into the world of
non-relational data stores. And my brain is full and I wish to be excused. So it winds up being
something that isn't exactly intuitive
to the way that I see the world.
Well, the one thing that's great about MongoDB is,
one, you can run it basically anywhere,
and it's one of the big core tenants of the product itself,
is the fact that we say that if you have a CPU,
you could probably run MongoDB on that particular system.
So, I mean, it's there.
And while it's not in, say, RDS or one of those services, Mongo is because of our licensing
method. We had to go out and say, let's go ahead and create our own cloud. And let's put it on all
three of the major clouds so that you have those options to go wherever it is you want it. And let's make it easy to get started.
So just like, say, AWS and Dynamo,
we put together a free tier for people also.
So what we're also kind of trying to do
is give people great use cases and reasons.
So we have a developer advocacy team that I'm on,
and we kind of show people that databases make sense and MongoDB
makes sense for applications because of the way data is stored. We look back at how databases
are traditionally kind of thought about and they were thought of as tabular. And when I say tabular, I think of it like a bunch of tabs or just parts of an Excel spreadsheet.
And so as data becomes more complex, using that Excel spreadsheet ended up becoming more kind of ridiculous and data became far more complex to manage. And we saw a revolution around JSON, and I don't want to call it a revolution, but
we saw people finding JSON to be a much more reasonable way to manage data because
it reads like a menu instead of going through pages and pages of typewriter text and re-connoitering
them and flattening them and then taking that data and processing it and then building SQL
migration so that you can get data from one place to another, it just became very difficult.
I'm still going to expose some of my ignorance here.
Just from an old-school, hands-on hardware configuration type,
I still find YAML more understandable and readable than I do JSON.
There's probably something profoundly wrong with me.
Well, you know, JSON makes sense, I guess,
to people who've been working, one, with JavaScript for a long time.
I think the other thing that makes a lot of sense about JSON to me is that it's so heavily linted that most major services that allow you to implement JSON as part of what your application is or whether you're importing data, they'll just basically tell you, your JSON's broken, this isn't going to work.
And that's the kind of thing that I really dig about JSON.
It's just there's a lot of reliability in the way that you can format the files.
The cool thing about Mongo is that you can nest arrays and really build rich, rich kind of documents
so that you don't necessarily need to have thousands and thousands of documents on a specific subject.
You can keep everything in one particular document, be able to go and grab all your
information out of that JSON data, and then present it for the application that you want to use.
Absolutely. I think the one thing that we can all agree on is that XML is terrible.
Yes. And that's the big thing that Mongo provides you. So you don't have to spend a whole bunch of
time, speaking of XML, writing ODMs and managing objects with your database because everything in MongoDB is considered an object, so you can query everything.
So the one thing that's nice is you don't really have to spend time in XML at all.
And that, to me, is a definite uptick of why people would want to select one over the other as far as, say, relational databases as compared to, say, something like MongoDB.
But I'm not here to sell people MongoDB.
But I think it's a good option for people that want to build applications quicker, faster,
and they don't necessarily want to do a lot of the infrastructure work.
You know, there's Atlas, and it does a lot of those things
that people really trusted in the cloud lately with services.
Okay.
This has been helpful, and I definitely appreciate your taking the time to go through this with
me.
But I have one question that is probably going to ruin our friendship before we call it an
episode.
And that is quite simply, every time I refer to something, be it Mongo, be it almost anything
else, as a database, I'm reminded of the tagline of that service,
which is, it's not a database, it's a fill in the blank, document store, data store,
et cetera, et cetera. Why? What's behind that nomenclature war?
I think a lot of it has to do with the fact that there are several companies that started
kind of calling themselves NoSQL.
And they all did it at the same time, but they presented data in a different way,
whether it's, you know, wide column or in document and JSON.
It just became one of those things where people really wanted to call these things different names. And I think it was because they were trying to differentiate themselves from just being referred to as a NoSQL database.
MongoDB, we consider ourselves a document database, and that's because our primary goal is to store JSON-based documents in a database and have it easily retrieved.
It doesn't mean that you can't store binary data, but that binary data will be stored within JSON documents in an encrypted, or I should say an encoded method.
So, you know, why there's these wars over what things are called, it's really been difficult to tell, because I think it's just about differentiating yourself from the rest of the crowd.
That really has been the only thing that I can think of.
Good to know that there's not some key distinction there that I've just been asleep at the wheel for 15 years and missed.
I mean, there are key value stores, there are document stores, they're all kind of in the end
kind of doing something not far off, which is providing you an answer to a key and a value.
It's how big that key and value can be within the total document that really differentiates it
between, say, what something in Dynamo is compared to something that's in MongoDB.
Which makes an awful lot of sense.
Yeah, it's the richness, the richness of data, if you will.
Thank you so much for taking time out of your day to speak with me.
No problem. I always enjoy speaking with you, Corey. It's one of those
great, great luxuries that people that work in technology get. It's once in a while, we'll find you and we'll get
to talk to you. When they're really unlucky, I start talking at them and then all heck breaks
loose. I don't know if it's all heck, but it's certainly entertaining nonetheless.
Well, thank you. My name is Corey Quinn. This has been Jay Gordon from MongoDB,
and this is Screaming in the Cloud.