Screaming in the Cloud - Use Cases for Couchbase’s New Columnar Data Stores with Jeff Morris
Episode Date: November 27, 2023Jeff Morris, VP of Product & Solutions Marketing at Couchbase, joins Corey on Screaming in the Cloud to discuss Couchbase’s new columnar data store functionality, specific use cases for... columnar data stores, and why AI gets better when it communicates with a cleaner pool of data. Jeff shares how more responsive databases could allow businesses like Dominos and United Airlines to create hyper-personalized experiences for their customers by utilizing more responsive databases. Jeff dives into the linked future of AI and data, and Corey learns about Couchbase’s plans for the re:Invent conference. If you’re attending re:Invent, you can visit Couchbase at booth 1095.About JeffJeff Morris is VP Product & Solutions Marketing at Couchbase (NASDAQ: BASE), a cloud database platform company that 30% of the Fortune 100 depend on.Links Referenced:Couchbase: https://www.couchbase.com/
Transcript
Discussion (0)
Hello, and welcome to Screaming in the Cloud, with your host, Chief Cloud Economist at the
Duckbill Group, Corey Quinn.
This weekly show features conversations with people doing interesting work in the world
of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles
for which Corey refuses to apologize.
This is Screaming in the Cloud.
Welcome to Screaming in the Cloud.
I'm Corey Quinn.
This promoted guest episode of Screaming in the Cloud
is brought to us by our friends at Couchbase.
Also brought to us by Couchbase is today's victim,
for lack of a better term.
Jeff Morris is their VP of Product and Solutions Marketing.
Jeff, thank you for joining me.
Thanks for having me, Corey, even though I guess I paid for it.
Exactly. It's always great to say thank you when people give you things.
I learned this from a very early age, and the only people who didn't were rude children and turned into worse adults.
Exactly.
So you are effectively announcing something new today.
And I always get worried when a database company says that because sometimes it's a license that is going to upset people.
Sometimes it's died so deep in the wool of generative AI that we're now supporting vectors or whatnot.
Well, most of us don't know what that means.
Fortunately, I don't believe
that's what you're doing today. What have you got for us? So you're right as well. What I'm doing
is that we're announcing new stuff inside of Couchbase and helping Couchbase expand its market
footprint. But we're not really moving away from our sweet spot either, right? We like building
or being the database platform underneath applications. So push us on the operational side
of the operational versus analytic kind of database divide. But we are announcing a columnar
data store inside of the Couchbase platform so that we can build bigger, better, stronger
analytic functionality to feed the applications that we're supporting with our customers.
Now, I feel like I should ask a question around what a columnar data store is,
because my first encounter with the term was when I had a very early client for
AWS bill optimization when I was doing this independently. And I was asking them the
polite question of why do you have 283 billion objects in a single S3 bucket?
That is atypical and kind of terrifying.
And their answer was, oh, we built our own columnar data store on top of S3.
This might not have been the best approach.
It's like, I'm going to stop you there.
With no further information, I can almost guarantee you that it was not.
But what is a columnar data store?
Well, let's start with the, everybody loves more data and everybody loves to count more things, right?
But a columnar data store allows you to expedite the kind of question that you ask of the data itself
by not having to look at every single row of the data while you go through it.
You can say, if you know you're only looking for data that's inside of California,
you just look at the column value of
find me everything in California, and then I'll pick all of those records to analyze.
So it gives you a faster way to go through the data while you're trying to gather it up and
perform aggregations against it. It seems like it's one of those, well, that doesn't sound hard
type of things when you're thinking about it the way that I do in terms of a database being more
or less a medium to large size Excel spreadsheet. But I have it on good faith from all the customer
environments I've worked with that, no, no, there are data stores that span even larger than that,
which is, you know, one of those sad realities of the world. And everything at scale begins to be
a heck of a lot harder. I've seen some of the value that this stuff offers, and I can definitely understand a few different workloads, in which case that's going to be super
handy. What are you targeting specifically? Or is this one of those areas where you're going to
learn from your customers? Well, we've had analytic functionality inside the platform. It just,
at the size and scale customers actually wanted to roam through the data, we weren't supporting
that that much. So we'll expand that particular footprint. It'll give us better integration capabilities with
external systems or better access to things in your bucket. But the use case problem is,
I think, going to be driven by what new modern application requirements are going to be.
You're going to need, we call it hyper-personalization because we tend to cater to B2C style applications, things with a lot of account profiles built into
them. So you look at account profile and you're like, oh, well, Jeff likes blue, so sell him blue
stuff. And that's a great current level personalization. But with a new analytic engine
against this, you can maybe start aggregating all the inventory
information that you might have of all the blue stuff that you want to sell me and do that in
real time. So I'm getting better recommendations, better offers as I'm shopping on your site or
looking at my phone and, you know, looking for the next thing I want to buy. I'm sure there's
massive amounts of work that go into these hyper-personalization stories.
The problem is that the only time they really rise to our notice is when they fail hilariously.
Like, you just bought a TV. Would you like to buy another?
Now, statistically, you are likelier to buy a second TV right after you buy one.
But for someone who's just, well, I'm replacing my living room TV after 10 years, it feels ridiculous.
Or when you buy a whole bunch of nails and they don't suggest,
would you like to also perhaps buy a hammer? It was one of those areas where it just seems like
a human putting thought into this could make some sense, but I've seen some of this stuff that can
come out of systems like this and it can be incredible. I also personally tend to bias
towards use cases that are less, here's how to convince you to buy more things and start aiming in a bunch of other different directions where it starts meeting emerging
use cases or changing situations rapidly, more rapidly than a human can in some cases.
The world has, for better or worse, gotten an awful lot faster over the last few decades.
Yeah. And think of it in terms of how responsive can I be at any given moment? And so let's pick on one of the more recent interesting failures that has popped up.
I'm a Giants fan, San Francisco Giants fan, so I'll pick on the Dodgers.
The Dodgers during the baseball playoffs, Clayton Kershaw, three-time MVP, Cy Young
Award winner, great, great pitcher, had a first inning meltdown of colossal magnitude.
Gave up 11 runs in the first inning to the Diamondbacks.
Well, my customer Domino's Pizza could end up, well, let's shift the focus of our marketing.
The Dodgers are the best team in baseball this year in the National League.
Let's focus our attention there.
But with that meltdown, let's pivot to Arizona and focus on our market in
Phoenix. And they could do that within minutes or seconds even with the kinds of capabilities that
we're coming up with here so that they can make better offers to that new environment and also do
the decision intelligence behind it. Like, do I have enough dough to make a bigger offer in that
big market? Do I have enough drivers?
Or do I have to go and spin out and get one of the other food delivery folks, Uber Eats or
something like that to jump on board with me and partner up on this kind of system?
It's that responsiveness in real, real time, right? That's always been kind of the conundrum
between applications and analytics. You get an analytic insight, but it takes you
an hour or a day to incorporate that into what the application is doing. This is intended to
make all of that stuff go faster. And of course, when we start to talk about things in AI, right,
AI is going to expect real-time responsiveness as best you can make it.
I figure we have to talk about AI.
That is a technology that has absolutely sprung to the absolute peak of the hype curve over the past year.
OpenAI released ChatGipity either late last year or early this year,
and suddenly every company seems to be falling all over itself to rebrand itself as an AI company,
where we've been working on this for decades, they say, right before they announced something that very clearly was crash developed
in six months. And every company is trying to drape themselves in the mantle of AI. And I don't
want to sound like I'm a doubter here. Unlike most fans, I see an awful lot of value here,
but I am curious to get your take on what do you think is real and what
do you think is not in the current hype environment? So, yeah, I love that. I think there's a number of
things that are, you know, are real is it's not going away. It is going to continue to evolve
and get better and better and better. One of my analyst friends came up with the notion that
the exercise of generative AI, it's imprecise.
So it gives you similarity things, and that's actually an improvement in many cases over the precision of a database.
Databases, a transaction either works or it doesn't.
It has failover or it doesn't.
It's ideally deterministic when you ask the same question a second time, assuming it's not time-bound.
Gives you the right answer.
Yeah, or at least the same answer.
The same answer.
And your Gen AI may not.
So that's part of the oddity of the hype.
But then it also helps me kind of feed our storyline of, if you're going to try and make
Gen AI closer and more accurate, you need a clean pool of data that you're dealing with.
Even though you've got probably your previous design was such that you would use a relational database for transactions,
a document database for your user profiles.
You probably attach your website to a caching database because you needed speed and a lot of concurrency.
Well, now you've got three different databases there that you're operating. And if you're feeding data from each of those databases back to AI, one of them might be
wrong or one of them might confuse the AI.
Yet, how are you going to know?
The complexity level is going to become exponential.
So our premise is because we're a multi-model database that incorporates in-memory speed
and documents and search and transactions and the
like, if you start with a cleaner pool of data, you'll have less complexity that you're offering
to your AI system. And therefore, you can steer it into becoming more accurate in its response.
And then, of course, all the data that we're dealing with is on mobile, right? Data is created
there for, let's say, your account account profile and then it's also consumed there
because that's what people are using as their application interface of choice so you also want
to have mobile interactivity and synchronization and local storage kind of capabilities built in
there so those are kind of you know a couple of the principles that we're looking at of you know
json is going to be a great format for it, regardless of what happens. Complexity is kind of the enemy of AI, so you don't want to go there. And mobility is
going to be an absolute requirement. And then related to this particular announcement, large
scale aggregation is going to be a requirement to help feed the application. There's always going
to be some other bigger calculation that you're going to want to do relatively in real time and feed it back to your users or the AI system that's helping them out.
I think that that is a much more nuanced use case than a lot of the stuff that's grabbing
customer attentions, where you effectively have the Chad Chippity story of it being an incredible
parrot. Where I have run into trouble with the generative story has been people putting the thing that the robot
that's magic and from the future
has come up with off the cuff
and just hurling that out into the universe
under their own name without any human review.
And that's fine sometimes, sure,
but it does get it hilariously wrong at some points.
And the idea of sending something out under my name
that has not been at least
reviewed by me, if not actually authored by me, is abhorrent. I mean, I review even the transactional
yes, you have successfully subscribed or sorry to see you go email confirmations on stuff because
there's an implicit hugs and puppies, love Corey, at the end of everything that goes out under my
name. But I've gotten a barrage of terrible sales emails and companies that are trying to put
the cart before the horse where either the support rep, quote unquote, that I'm speaking
to in the chat is an AI system or else needs immediate medical attention because there's
something going on that needs assistance.
Yeah, they just don't understand.
Right.
And most big enterprise stories that I've heard so far that have come to light have been around the form of, we get to fire most of our
customer service staff, an outcome that basically no one sensible wants. That is less compelling
than a lot of the individualized consumer use cases. I love asking it, here's a blog post I
wrote, give me 10 title options. And I'll usually take one of them. One of them will usually be not
half bad. Then I can modify it slightly. And you'll change four words in it. Yeah. Yeah, exactly.
That's a bit of a different use case. It's been an interesting, even as we've all become familiar,
or at least junior prompt engineers, right, is your information is only going to be as good as
you feed the AI system. The return is only going to be as good. So you're going to want to
refine that kind of conversation. Now, we're not trying to end up replacing the content that gets
produced or the writing of all kinds of pros, other than we do have a code generator that
works inside of our environment called Capella IQ that talks to chat GPT. But we try and put
guardrails on that too, right? Is always make
sure that it's talking in terms of the context of Couchbase rather than where's Taylor Swift this
week, which I don't want it to answer because I don't want to spend the GPT money to answer that
question for you. And it might not know the right answer, but it might very well spit out something
that sounds plausible. Exactly. But I think the kinds of applications that we're steering ourselves toward can be helped along by the Gen AI systems. But I don't expect
all my customers are going to be writing automatic blog post generation kinds of applications.
I think what we're ultimately trying to do is facilitate interactions in a way that we haven't dreamt of yet, right? One of them
might be if I've opted into two loyalty programs, like my United account and my American Express
account. That feels very targeted at my lifestyle as well. So please continue. Exactly. And so what
I really want the system to do is for Amex to reward me when I hit 1K status on United while I'm on the flight. And
have the flight attendant come up and be like, hey, you did it. Either here's a free upgrade
from American Express, that would be hyper-personalization because you booked your
plane ticket with it, but they also happen to know or they cross-consumed information that
I've opted into. I've seen them congratulate people for hitting a million miles flown mid-flight,
but that's clearly something that they've been tracking
and happens a heck of a lot less frequently.
This is how you start scaling that experience.
Yes, but that happened because American Airlines was always watching,
because that was an American Airlines ad ages ago, right?
But the same principle holds true.
But I think there's going to be a lot more of these,
how much information am I actually allowing to be shared amongst the cult loyalty programs,
but the data sources that I've opted into. And my God, there's hundreds of them that I've
personally opted into, whether I like it or not, because everybody needs my email address,
kind of like what you were describing earlier. A point that I have, I think, agrees largely with
your point
is that few things to me are more frustrating
than when I'm signing up, for example,
oh, I don't know, an AWS event.
Gee, can't imagine there's anything like that
going on this week.
And I have to fill out an entire form
that always asks me the same questions.
How big my company is,
whether we have multiple workloads on,
what industry we're in.
And no matter what I put into that, first, it never
remembers me for the next time, which is frustrating in its own right. But two, no matter what I put in
to fill the thing out, the email I get does not change as a result. At one point, I said, all
right, I'm picking randomly. I am a venture capitalist based in Sweden. And I got nothing
that is differentiated from the other normal stuff I get tied to my account
because I use a special email address for those things sometimes just to see what happens.
And no, if you're going to make me jump through the hoops to give you the data, at least use
it to make my experience better.
It feels like I'm asking for the moon here, but I shouldn't be.
Yes, immediately to make your experience better and say, you know, here's four companies in
Malmo that you ought to be talking to. And they happen to be here at the AWS event and you can go find them because their
booth is here, here, and here. That kind of immediate responsiveness could be facilitated.
And to our point, ought to be facilitated. It's exactly like that. That kind of thing is
use the data in real time. I was talking to somebody else today that was discussing that most data, right,
becomes stale and unvaluable.
Like 50% of the data,
its value goes to zero after about a day.
And some of it is stale after about an hour.
So if you can end up closing that responsiveness gap
that we're describing,
and this is kind of what this columnar service inside of Capela is going to be like, is react in real time with real-time calculation and real-time lookup and real-time find out how you might apply that new piece of information right now.
And then give it back to the consumer or the user right now.
So Couchbase takes a few different forms.
I should probably, at least for those who are not steeped in the world of exotic forms
of database, I always like making these conversations more accessible to folks who are not necessarily
up to speed.
Personally, I tend to misuse anything as a database if I can hold it just the wrong way.
The wrong way? I've caught that about you.
Yeah, it's not for these a database if you hold it wrong.
But you folks have a few different options.
You have a self-managed commercial offering.
You're an open source project.
So I can go ahead and run it on my own infrastructure however I want.
And you have Capella, which is Couchbase as a service.
And all of those are
useful and have their points, and I'm sure I'm missing at least one or two along the way.
But do you find that the columnar use case is going to disproportionately benefit folks using
Capella in ways that the self-hosted version would not be as useful for? Or is this functionality
already available in other expressions of Couchbase? It's not already available in other expressions, although there is analytic functionality in
the self-managed version of Couchbase. But it's, as I mentioned, I think earlier,
it's just not as scalable or as really real-time as we're thinking. So it's going to, yes,
it's going to benefit the database as a service deployments of Couchbase available on your
favorite three clouds and still interoperable with environments that you might self-manage and self-host.
So there could be even use cases where our development team or your development team builds in AWS using the cloud-oriented features, but is still ultimately deploying and hosting and managing a self-managed environment.
You could still do all of that. So there's still a great interplay and interoperability amongst our different deployment
options. But the fun part, I think, about this is not only is it going to help the Capella user,
there's a lot of other things inside Couchbase that help address the developer's penchant for
trading zero cost for degrees of complexity that you're willing to
accept because you want everything to be free and open source. And Couchbase is my fifth open
source company in my background. So I'm well, well versed in the nuances of what open source
developers are seeking. But what makes Couchbase, its origin story really cool too, though,
is it's the peanut butter and chocolate marriage
of Memcached, the people behind that,
and Membase, and CouchDB from Couch1.
So I can't think of that many,
maybe Red Hat, projects and companies
that formed up by merging
to complementary open source projects.
So we took the scale-
You had OpenTelemetry, I think, that did that once.
You see occasional mergers, but it's very far from common.
It's very, very infrequent.
But what that made the Couchbase people end up doing is make a platform that will scale,
make a data design that you can auto-partition anywhere, anytime, and then build
independently scalable services on top of that. One for SQL++, the query language. Anyone who
knows SQL will be able to write something in Couchbase immediately. Then I've got this AI
automator, IQ, that makes it even easier. You just say, write me a SQL++ query that does
this, and it'll do that. But then we added full-text search. We added eventing so you
could stream data. We added the analytics capability originally, and now we're enhancing it
and use JSON as our kind of universal data format so that we can trade data with applications
really easily. So it's a cool design to start with.
And then in the cloud, we're steering towards things like making your entry point
and using our database as a service Capella really, really, really inexpensive,
so that you get that same robustness of functionality,
as well as the easy cost of entry that today's developers want.
And it's my analyst friends that keep telling me
the cloud is where the market's going to go. So we're steering ourselves towards that hockey puck
location. I frequently remark that the role of the DBA might not be vanishing, but it's definitely
changing, especially since the last time I counted, if you hold them and use as directed, AWS has
something on the order of 14 distinct managed database offerings. Some are general purpose, some are purpose-built.
And if this trend keeps up in a decade, the DBA role is going to be determining which of its 40
databases is going to be the right fit for a given workload. That seems to be the counter-approach to
a general purpose database that works across the board.
Clearly, you folks have opinions on this.
Where do you land?
Oh, so absolutely.
There's the product that is a suite of capabilities or that are individual capabilities.
And then there's ones that are, in my case, kind of multi-model and do lots of things at once.
I think historically you'll recognize, because this is, let's pick on your phone, the same holds true for, you know, your phone used to be a watch, used to be a Palm Pilot, used to be a StarTAC telephone, and your calendar application, your day planner all at the same time.
Well, it's not anymore.
Technology converges upon itself.
It's kind of a historical truism.
And the database technologies are going to end up doing that and
continue to do that even right now. So that notion that it's a 10-year-old notion of use a purpose
built database for that particular workload, maybe sometimes in extreme cases, that is the
appropriate thing. But in more cases than not right now, if you need transactions when you need them,
that's fine. I can do that. You don't necessarily need Aurora or RDS or Postgres to do that.
But when you need search and geolocation, I support that too.
So you don't need Elastic.
And then when you need caching and everything, you don't need Elastic Cache.
It's all built in.
So the multi-model notion of operate on the same pool of data, it's a lot less complex
for your developers.
They can code faster and better and more cleanly. Debugging is significantly easier. As I mentioned,
the SQL++ is our language. It's basically SQL syntax for JSON. We're a reference implementation
of this language along with Asterisk DB is one of them. And actually the original author of that language also wrote
DynamoDB's particle. So it's a common language that you wouldn't necessarily imagine, but
the ease of entry in all of this, I think is still going to be a driving goal for people.
The old people like me and you are running around worrying about, am I going to get a particular
really specific feature out of the full-text search environment?
Or the other one that I pick on now is, am I going to need a vector database too?
And the answer to me is no, right?
There's going to be the database vendors like ourselves and like Mongo's announced and a whole bunch of other NoSQL vendors.
We're going to support that.
It's going to be just another mode.
And you get better bang for your buck
when you've got more modes than a single one at a time.
The consensus opinion that's emerging
is very much across the board,
that Vector is a feature, not a database type.
Not a category. Yeah, me too.
And yeah, we're well on board with that notion as well.
And then, like I said earlier,
the JSON as a vehicle to give you all of that versatility is great, right?
You can have vector information inside a JSON document.
You could have time series information in the document.
You could have graph node locations and ID numbers in a JSON array.
So you don't need index-free adjacency or some of the other cleverness
that some of my former employers have done. It really is all converging upon itself. And hopefully
everybody starts to realize that you can clean up and simplify your architectures as you look ahead
so that you do, if you're going to build AI-powered applications, feed it clean data, right? You're
going to be better off. So this episode is being recorded in advance, thankfully, but it's getting released the first
day of reInvent. What are you folks doing at the show for those who are either there and for some
reason listening to a podcast rather than going to get marketed to by a variety of different pitches
that all mention AI or might even be watching from home and trying to figure out what to make of it.
Right. So, of course, we have a booth, and my notes don't have in front of me what our booth
number is, but you'll see it on the signs in the airport. So we'll have a presence there. We'll
have an executive briefing room available, so we can schedule time with anyone who wants to come
talk to us. We'll be showing not only the capabilities that we're offering here,
we'll show off Capella IQ, our coding assistant.
Okay, so yeah, we're on the AI hype band,
but we'll also be showing things like
our mobile sync capability where my phone and your phone
can synchronize data amongst themselves
without having to actually have
a live connection to the internet.
So long as we're on the same network locally
within the
Venetians network, we have an app that we have people download from the Apple store.
And then it's a color synchronization app or a picture synchronization app. So you tap it and
it changes on my screen and I tap it and it changes on your screen. And we'll have, I don't
know, as many people who are around standing there,
synchronizing what, maybe 50 phones at a time?
It's actually a pretty slick demonstration
of why you might want a database
that's not only in the cloud,
but operates around the cloud, operates mobily,
operates, you know, can connect and disconnect
to your networks.
It's a pretty neat scenario.
So we'll be showing a bunch of cool technical stuff,
as well as talking about the things that we're discussing right now.
I will say you're putting an awful lot of faith in connectivity working at reInvent,
be it Wi-Fi or the cellular network. I know both of those have bitten me in various ways over the
years, but I wish you the best on it. I think it's going to be an interesting show based upon
everything I've heard in the run-up to it.
I'm just glad it's here.
Now, this is the cool part about what I'm talking about, though.
The cool part about what I'm talking about is
we can set up our own wireless network in our booth.
And, you know, we still, you'd have to go to the App Store
to get this application.
But once there, I can have you switch over to my local network
and play around on it.
And I can sync the stuff right there and have confidence that in my local network that's in my booth, the system's working.
I think that's going to be ultimately our design there.
Because, oh my gosh, yes, I have a hundred stories about connectivity and someone blowing a demo because they're yanking on a cable behind the pulpit, right?
I always build in a...
And assuming there's no connectivity, how can I fake my demos?
Just because it's, I've only had to do it once, but I, you wind up planning in advance when you
start doing a talk to a large enough or influential enough audience where you want things to go right.
There's a delightful acceptance right now of recorded videos and demonstrations that people
sort of accept that way because of exactly all
this. And I'm sure we'll be showing that in our booth there too. Given the non-deterministic
nature of generative AI, I'm sort of surprised whenever someone hasn't mocked the demo in
advance just because, yeah, it gives you the right answer in the rehearsal, but every once in a while
it gets completely unglued. Yes, and we see it pretty regularly. So the emergence of clever and good prompt
engineering is going to be a big skill for people. And hopefully, you know, everybody's
going to figure out how to pass it along to their peers. Excellent. We'll put links to all this in
the show notes. And I look forward to seeing how well this works out for you. Best of luck at the
show. And thanks for speaking with me. I appreciate it.
Yeah, Corey, we appreciate the support and I think the show is going to be
very strong for us as well.
Again, thanks for having me here.
Always a pleasure.
Jeff Morris, VP of Product
and Solutions Marketing at Couchbase.
This episode has been brought to us
by our friends at Couchbase
and I'm cloud economist, Corey Quinn.
If you've enjoyed this podcast,
please leave a five-star review on your podcast platform of choice. Whereas if you've hated this
podcast, please leave a five-star review on your podcast platform of choice, along with an angry
comment. But if you want to remain happy, I wouldn't ask that podcast platform what database
they're using. No one likes the answer to the point.
Visit duckbillgroup.com to get started.