The Changelog: Software Development, Open Source - Erlang, CouchBase, Merging with Membase (Interview)
Episode Date: March 30, 2011Wynn sat down with Chris Anderson from CouchBase to talk about CouchDB, the merger with Membase, Erlang, and bringing NoSQL to PHPers....
Transcript
Discussion (0)
Yo Chris!
What up Claire?
I gots a database problem that I gots to share
I hate my SQL, it's giving me tears
This alter table statement is gonna take years
No need to trip on a funky query
Use MapReduce and JavaScript with CouchDB
The schema listen replicates using JSON.
Non-relational databases turn me on.
From now on I'll use CouchDB.
Update.
All my sisters using CouchDB. Don't push me away! We're also up on GitHub. Head to github.com slash explore. You'll find some training reposts, some feature reposts from the blog, as well as our audio podcast.
And if you're on Twitter, follow ChangeLogShow and me, Adam Stack.
And I'm Penguin, P-E-N-G-W-I-N-N.
This episode is sponsored by GitHub Jobs.
Head to thechangelog.com slash jobs to get started.
If you'd like us to feature your job on this show, select advertise on the ChangeLog when posting your job, and we will take care of the rest.
The irony of a real radio station advertising on the change log when posting your job and we will take care of the rest. The irony of a
real radio station advertising on
the fake radio. Southern California Public
Radio, KPCC
89.3 on your FM dials. Looking
for a Django Python developer
that would report into the senior UX designer
and implement HTML
CSS, the Python
Django templates. Experience with
the full stack MVC, MySQL Plus.
If you're in the Pasadena, California area, be sure and check out lg.gd slash 9s.
Fun episode this week.
Took a break from our regularly scheduled design programs.
Save your emails, guys.
We know two back-to-back design episodes set off the switchboards,
but we're back to the no sequel.
Talked to Chris Anderson over at Couchbase
about the Membase CouchDB merger
and the full line of products that they have.
Let me ask you a question.
Did he have a brand-new theme song then?
You know, he sent it to me.
We will put that in the show notes.
Actually, if you can cut it into the intro, that would be awesome.
We'll try.
We'll do our best. If you guys haven't
caught the
ad hoc theme show
or theme song that we recorded
the first time, I guess the NoSQL Smackdown
at South by last year.
Chris was awesome in that
little dilly. We had the video
too and he was bouncing around, so I don't think anybody
got to see that though.
Fun times with Chris Abound.
We talked about what it's like to work with Damien Katz on the Apache project.
And also his buddy, Jan Lennert, who if you don't know Jan, then you're missing out.
Yeah, absolutely.
Fun episode.
Should we get to it?
Let's do it. We're chatting today with Chris Anderson from Couchbase.
So Chris, why don't you introduce yourself and your role over at Couchbase?
Sure.
Yeah, I've been a longtime committer to the Apache CouchDB project and founded CouchDB
company with Damien Katz and Jan Lennert back in 2009.
And we were a merry band of engineers doing everything we could to make CouchDB awesome.
And then in the last few months, we were kind of faced with the choice between doing VC or, uh, merging with these guys over at
men base. And as we started to look closer and closer at them, um, it got to be, you know,
obvious that, uh, it was just the right choice. So now we're couch base and we're bigger and
stronger and, um, you know, me and Damien and Jan get to focus on our, on the things we're
good at. So yeah, it's a, it's been been a wild ride, and I guess we're just getting started.
Well, we'll get into Couchbase in just a moment.
So for the five or so people out there that aren't familiar with CouchDB, why don't you
give an overview of the Apache CouchDB project?
Sure.
So Apache CouchDB is a database that's accessed via web protocols.
So you just store JSON in it and get JSON back,
both in the form of what you stored,
and you can also build dynamic queries.
So I want all the blog posts in the last two days.
That sort of thing is easy to pull out.
It's got some other fun features.
The killer one that really we're not seeing anywhere else in the marketplace
is the ability to keep two copies two or more copies of it all synchronized so the idea you
know kind of like dropbox but um you know for your uh api not for your files um so you know
you've got two copies of it and uh you know there's work going on on both ends and you can
synchronize them uh more or less effortlessly.
So I can talk technically more about
how that synchronization works.
But for most developers, it just works.
So in reference to Couchbase,
the new merger of Membase and CouchDB,
how did that come about
and what's the offering from CouchDB. How did that come about? And what's, I guess,
the offering from Couchbase now? Sure. Yes, I mentioned earlier that, you know,
it's part of our wild ride as founders. So, you know, what happened was Damien and I had a chance
meeting with James, who is our, you know our lead product architect guy now.
And we just started talking.
And what they'd been doing for the last year, it was kind of at the other end of the
spectrum from what we'd been doing as far as where they're putting their focus and
vice versa.
And then comparing notes even more, our plans for 2011 were to do what they'd already
done and vice versa versa they wanted to build
into their product membase a lot of the features that couch tb already has um so you know we
started talking more and more and we realized that at the end of the day doing this merger
would accelerate both companies roadmaps and and there's one thing that you can't buy and that's
time so so i really feel like um if it continues to go well on a technical front, integrating stuff,
that we have jumped forward a year or maybe even two in terms of our roadmap and capabilities
and viability as a company.
So on top of that, it's a huge relief for me because I went from being the CFO to being
the president and having to manage all these teams and stuff, which is
great fun, but I'd much rather focus on where the rubber meets the road as far as what developers
are using. How many CFOs do you know that actually have a GitHub account?
Well, there's a few of them, but that's what you do when there's three of you.
Somebody's got to be the cfo and um i drew the
short straw i guess so you know the first link on the couch uh base website is why no sequel
so it's now 2011 um not only are you having to answer this question uh you're featuring it
prominently on your your home page are you finding that you have to sell no sequel just as much as
you have to sell your products oh yeah so uh so it really depends on the audience that you have to sell NoSQL just as much as you have to sell your products? Oh, yeah. So it really depends on the audience that you're talking to.
The core audience of this show probably already knows what NoSQL is for,
and they're probably even over that hump where it seems like a threat to their tried-and-true relational databases.
I think most of the cutting-edge developer community sees that there's problems that are just a better fit for schemaless storage where you don't have to deal with migration, you know, migrating your schema all the time.
And then, of course, like once you go over that hump, then there's all kinds of other benefits like the synchronization that we offer or the ability to do scale out because of key value based architectures instead of the relational model.
So, you know, at the cutting edge, I feel like that story is told.
But, you know, the percentage of developers who even, you know, know what GitHub is, is
vanishingly small compared to the large mass of people out there.
And so, yeah, to, they're going to come to us not because they heard about Couch, but because they heard, oh, there's something different,
and that little blurb may be the first explanation they get of NoSQL at all.
So what does the product line look like for Couchbase?
So what names survive, or is it just a new name totally with Couchbase?
Sure. That's a great question.
The answer is kind of it's easier to talk about it from the technical side
because what we're doing for integration is fairly obvious.
I mean, that's a big part of why the merger looked viable.
After it looked exciting, then, of course, we had to go look at it with a skeptical eye
and see is it going to be too hard to pull off.
But it's really technically kind of obvious what we've got to do.
So before you understand what our combined product is going to look like,
you've got to understand MIM base, which, you know, the elevator pitch is,
it's basically a big MIM cache D cluster that doesn't, you know,
forget everything when the power goes out.
So it handles the, you know, resizing the cluster,
and it handles if you want it to,
proxying, you know, each request to the particular cluster member,
or if you use smart clients,
then you can, you know, have slightly better efficiency.
But overall, you know, it'll do all the rebalancing
and make sure that as your data set grows,
you can maintain sub-millisecond query latencies
via the MIM-capable API.
So that's MIMBase.
And currently today, the back-end storage is handled by SQLite.
But they're not using the relational features of SQLite.
They're basically using it instead of raw files.
So the first step, pretty obvious to do, is just pull out SQLite and replace it with CouchDB storage engine.
And so that's easier even than it sounds because most of membase, you know, everything but the critical write path pretty much is written in Erlang already.
And then there's the C-based, you know, memcached and SQLite portions. everything but the critical write path pretty much is written in Erlang already.
And then there's the C-based, you know, memcached and SQLite portions.
But this is just going to be placing a bigger bet on Erlang,
and it also makes it really smooth to integrate Couch.
So first step is just getting Couch in there as a storage engine, and we're going to release that product as essentially something that provides a lot of value to existing Membase users
because you get, for the one thing, slightly better I.O. throughput to disk.
Couch is just more optimized for the kinds of access patterns that Membase was already doing.
And so that's just kind of like a really basic win, but maybe not worth all the technical risk of trying to do this integration on its own.
The other thing you get more or less for free
is the ability to query now your memcached,
your mem-based cluster with the CouchDB-style MapReduce.
So that's always been a big thing that's been missing
from people's memcached experience, right?
You stick stuff in, you can get it out by the same key you put it in,
but as soon as you want to get more complex than that,
then you're either having to do a bunch of pointer following in your application
or you're having to write some custom layer that interacts with memcached.
So this will give people a straightforward ability to get real-time queries
on top of their memcache
D clusters or membase clusters.
And that alone
is enough to be pretty exciting.
But that's really just the first step.
You know, once you get out of the path of choosing
a NoSQL option, and there's a whole
lot of options out there, and we've covered a lot of them
on the show, I think where Couch in the past
has shined has been in the
replication area. But you've got another kind of ace in the past has shined has been in the replication area,
but you've got another kind of ace in the hole, as it were, coming up. Talk about your mobile.
Yeah. Well, so yeah, so mobile has been a focus of Couch1 before the merger. And
even though we've got more going on after the merger, I think that we're getting more momentum on mobile,
if only because I'm not dealing with HR and fundraising.
I'm working on mobile.
And for the most part, that's just been coordinating a lot of the code that we had around
and starting some QA and some release process on it and documenting it
and getting it out to the community.
So I'm really lucky to our engineers for having already kicked a bunch of ass on getting stuff
to run on iOS.
It's not real simple.
We had to do a lot of low-level stuff to the Erlang VM, to CouchDB itself.
But the upshot is now we've got a CouchDB instance
that runs on your device.
We were surprised to find that it just has almost no impact on battery life.
So that was real lucky.
I mean, it's not surprising when you understand Erlang
and how Erlang is good at being idle.
But still, we were expecting to have to invest a lot there.
Instead, we've still got have to invest a lot there. Instead, we've
still got to tackle the overall download size. So right now, it adds about 15 megabytes to your
application, which we see being fine for enterprise applications and kind of more serious stuff. But
if somebody just wants a little bit of synchronization, that's big enough to make
them think twice. So our first goal is to shrink that. So I guess on iOS, which every app's in its own little
sandbox, that's pretty much additive to every app that you create. You can't install the Couchbase
framework once and then share that, right? Right. Yeah. And that's down to the Apple
restrictions, which I think make perfect sense. I mean, they're sandboxing these apps. They don't
want, you know, some, they don't want DLL hell. They don't want your underlying libraries swapping versions out from underneath your application.
It's our job to get that thing small enough to not have that negative impact.
I think 5 megabytes is a threshold where we can start to feel pretty strong about it.
Why iOS first?
Was it an install-based decision or lower barrier to entry as far as a technical problem?
Well, we've been running on Android for, I don't know, about nine months now.
And the response has been really strong.
We've got a couple of case studies in the pipeline of people who are using Android on the device,
are using CouchDB on their Android device.
But it was actually kind of scary because Android affords you so much freedom.
You know, that whole, that question you're asking about, is it in each app or is it once on the device?
On Android, you can have CouchDB be a library.
And so there's this whole line of development going down about how to manage a centralized database that multiple applications are talking to.
It's really powerful and interesting and threatened to pull us into
the rabbit hole.
With iOS, with the different restrictions, Paul Graham said this once.
He said, when you're a startup, run up the steepest hill you can find.
Just do the hardest thing that you can see because probably most people aren't looking
with the same amount of detail as what you're looking at. So if you see something really challenging
and you can nail it first, then that gives you a really strong position.
So we think that a win on iOS is going to be easier to translate to other platforms and vice versa.
So one of the attractive features of Couch in the past has been these Couch apps, right? And it's
more of a move back
to client server where you've got your views and presentation logic actually running in your
database, so to speak. Is that the same sort of pattern that you follow with a mobile application?
You know, sort of. In the sense that Couch apps are the least amount of, you know, stuff you've
got to do aside from the thing that runs in your browser
that you're already good at. I mean, imagining, you know, like a jQuery developer, and they want,
they're used to having to deal with the Sinatra guy or the Node.js guy to do the server component
and provide them a JSON API for the front end. You know, CouchApp is just simplifying that stack
so that the jQuery developer doesn't need to deal with the middle tier anymore.
But if we were going to take that same philosophy and apply it to iOS,
then really the right approach is to be as transparent to your traditional iOS developer as possible,
be the least amount of additional stuff they have to learn.
So out of the gate, it's just CacheDB with the HTTP JSON API, but we see some APIs that
Apple's got that should allow us to have a pure core data interface.
So ideally, existing apps that already use core data, you just plug our library in and
get the synchronization for free.
That's the goal, and hopefully we can get there.
But even if there's roadblocks we've
still got something i think is pretty valuable so on your product page you've got one of these
nice manager friendly diagrams that just has the word couch sync between couch basin and the mobile
app is that replication or how's that working yeah that's replication and so you know plain
old replication is just so easy to do in couchouch. You post some JSON via HTTP at the server and tell the remote server that you want it synchronized with,
and it does the rest of the work and does it as bandwidth-efficient as possible.
And you can even tell it to continuously keep up to date, which that even turns out to be a good fit for mobile networks. That long pole or continuous changes feed connection is actually,
I thought it was going to run counter to the way cell networks work,
but they're already optimized for these kind of long-running,
mostly quiet connections.
So that was nice.
So basic replication is a really good fit for mobile,
but there are some patterns
that we want to embody in CouchSync
to make things easier.
So for instance, on the membase side,
one of the big users is Zynga.
So you can imagine all the data in FarmVille,
and right now it's in a big cluster.
But if you want to take FarmVille
and make it offline capable,
then you'd need to have the ability
to get the data for,
you know, a single given user and put it in its own little database, you know, essentially so that the user can then replicate that back and forth for their backup slash offline. So, you know,
tools to make that stuff super easy. That's what's out on the horizon for us.
Let's talk about couch apps for a moment. Did you coin this term?
I guess so. I mean, it's kind of obvious. Pretty much every Couch term in the universe is taken
at this point. But yeah, the CouchApps script that's like sort of a developer toolkit
that you can find linked from Couchapp.org, implemented in Python.
Now, I originally wrote something in Ruby that did essentially the same thing
and just didn't have time to do the maintenance burden,
so I handed it off to Benoit and Jan, both CouchDB committers,
and they worked on it some in Ruby and then decided to port it to Python because that's where Benoit is.
That's where he feels most comfortable.
So now we've got this Python thing with all these practically enterprise-y features.
You can write eggs to plug into it.
I don't use any of that stuff, but it's good that it's there when you need it.
So that's a developer tool chain, but it's different from the idea.
The idea of a Couch App is just an app that is served out of CouchDB and to whatever the native client you have around is.
The most popular native client in the world right now, of course, is the HTML browser.
But on iOS, if it's, you know, just Objective-C and CouchDB, I'll call that a couch app.
So yeah, I think that the real fundamental idea is that if you are allowing your users to take a database offline onto their device,
you've kind of got to understand the security model of the fact that they've got a copy of all the data. And so the place where you apply your security policy is going to be on that inbound replication stream. It's not going to be by writing some middleware, Rails app or something that sits
there and validates everything as it's going through. One of the things that I noticed when
I got into development was that no matter how good you were on the front end, unless you were an uber front end ninja, to use the term, you pretty much had to deal with a server implementation of some sort.
And we were all kind of in tribes based on whichever server platform you chose, because you really couldn't afford to pick up more than one because it was such an overhead of knowing more than one platform.
But as apps like CouchDB and Node.js have taken off,
it seems like we've kind of this JavaScript layer that all of us were familiar with.
As we start to do more with it, we're starting to kind of bleed or blur those lines in between our tribes.
Have you noticed anything like that? Well, absolutely. I mean, especially, you know,
talking about the cutting edge developers who have the choice to use the tools they want,
you know, JavaScript seems to be really taking off. And I think that's the reason is, you know,
why I switch all those contexts when JavaScript has, you know, most of the runtime benefits that
the other languages can give you. But on the other hand, you do have a bunch of developers
in the enterprise world who don't get to pick what they use.
However, that's even changing.
I mean, JavaScript in the browser has been common there for a long time.
So maybe we can leapfrog.
People can move straight from their vb.net backends to Couch apps. And we've heard stories of, you know, large internal, you know,
customer management systems and stuff being moved over to Couch
and getting, you know, much better.
Basically, less code means less to maintain.
And also, a lot of these guys have been seeing better performance
just because, you know, you don't have a Java stack trace 50 frames deep or whatever.
One of the things that intrigues me about Couch is not only does it collapse a lot of the middle layers,
which seem to be superfluous for a lot of the smaller-end apps, but also it's built-in versioning for everything.
Not just your data, but also your GUI. Yeah, I mean, it's got, so it's important to distinguish CouchDB's, you know, the built-in version, as it were, is multi-version concurrency control.
So what that means is if, you know, we're both working against the same cloud server, and you load the document, and I load the document, and then you make a change and save it.
When I try to save it, Couch is going to reject my save as being out of date.
And that's just to prevent race conditions.
But it also means that readers can always proceed against a view query or against scanning the documents in a database without being blocked by writers.
Everyone has their own independent snapshot of the database.
So that all goes really deep into the technical design of Couch when you start to look at it. But the thing to be clear on is that by default, those old versions
do not get replicated around. So when you synchronize, it just sends, you know, the current
version. When you compact, which, you know, if you're not your own DBA, your DBA may compact
when you least expect it to clean up wasted space.
That'll also clean up the outdated versions.
That's not to say you can't do versioning in Couch. There's lots of applications that either have an entity document
and then log additional documents that refer to that entity.
So you can do patterns like that,
or you can do patterns like actually keeping the full history
as binary attachments
on the old history
so there's a lot of patterns there to do
and if you google
CouchDB simple document versioning
I wrote a blog post about this a few months
it'll come up and it'll kind of go through
the pros and cons of all the patterns
In an effort to keep it real what sort of applications are not suited for CouchDB?
You know, that's a good question.
I think that, you know, a worst-case scenario for what, you know,
how much storage and resources you're using up compared to, you know, the alternative,
like a real-time message queue where you don't care about archiving it.
So something where you've got something that's fairly reliable but in memory.
So if you were going to do that workload in CouchDB,
you'd have all the message history for that application stored on disk.
On the other hand, most real-time messaging applications you have all the message history for that application stored on disk.
On the other hand, most real-time messaging applications
do have some sort of need to archive and query the messages.
I mean, maybe not most, but a fair proportion of them.
So I've seen Couch used for spam filters.
I've seen Couch used for chat rooms.
And it makes a good fit for that sort of stuff.
The other ephemeral data,
so if you were just doing like a dig style upvote counter on a post,
maybe something else would be a better fit,
although we're addressing that.
I think there is some truth to be said
that right now the different NoSQLs
have all been kind of finding their niche
and getting entrenched there.
But really, everyone's going after some form of 80% solution.
So people are going to be adding each other's feature sets to the extent that it makes sense, technically.
What was involved with getting the Erlang runtime on iOS?
Did you guys have to deal with that?
Our engineer, Aaron Miller, gets most of the credit
for that. He went through the Erlang VM. Erlang is
implemented in C and it uses dynamic linking for
a whole lot of... It's basically
built out of its own plug-in system at some level.
He went through and turned all that dynamic linking into static linking,
which was just like touching a bunch of code and having to know what to do.
And then there was a bunch of other strange little gotchas that you wouldn't expect.
But for instance, Erlang uses the syscall fork to create a subprocess to handle DNS lookups.
And that's just not going to fly on iOS.
You can't do fork.
So we had to do little subtle changes like that.
We also had to get SpiderMonkey running on the device
so we have JavaScript running in a background thread
because the built-in JavaScript on iOS, at least to my knowledge,
always blocks the main UI thread when it's running.
So you can't have the UI locked up just because a MapReduce is generating.
So we included that spider monkey in there,
which I think also had to have some technical changes.
But mostly it was just a matter of getting the build cleaned up and
and then going through and conforming to you know sort of apple's view of the world
was spider monkey a holdover from a previous design decision or any uh consideration for v8
yeah so we've done the spider monkey v8 shootout and spider monkey wins um and the reason why is
because v8 is optimized for
process launch time. You open a new
tab, it needs to be responsive right away.
SpiderMonkey has the
JIT compiler, which
as it's running,
especially with these map functions where you define the
function once and then run 100,000 documents
through it, the JIT will get it up to
faster than C in some places.
Coupled with that, SpiderMonkey seems to use a little less memory than V8.
The startup time being not that important to us, we find that SpiderMonkey is better
for at least on a big server install of Couch, you're going to get better throughput.
That being said, on iOS, if we could somehow use the built-in Nitro or whatever, I mean, the number one constraint there is I'd rather not have to download all of SpiderMonkey to the device, even if it's a little slower.
So we're working on figuring out solutions there.
So CouchDB is part of the Apache Foundation lineup.
What is the licensing rundown on everything Couchbase these days? So Couchbase right now has Membase, which is, I think, Apache licensed.
And then Couchbase, which is our build of CouchDB that includes GeoCouch and some other little features and QA and stuff.
And that's Apache license as well.
As far as what the license is going to be on,
stuff way down in the future,
we're still figuring that out.
But the main consideration for me right now
is I want to make sure that we're contributing
to the Apache CouchDB community,
not just code, but that Apache CouchDB community, not just code, but the Apache CouchDB
is where the Erlang work that's appropriate,
where that ends up.
We could have easily come out the gate and said,
okay, we're just going to fork CouchDB
and try and build up a community around that fork,
but I would much rather stay in the Apache CouchDB community.
So on your comparison page, you compare yourself to Couchbase versus Cassandra and MongoDB.
So we've had Reok on the show twice.
Any other NoSQL options out there that you could draw a distinction to?
You know, I think that it's real important that people understand that CouchDB's MapReduce is really different from all the others, and especially Hadoop.
So Hadoop is, as far as I'm concerned, the big winner right now for, you know, especially in the enterprise people, you know, doing something other than just using Oracle.
And so, you know, CouchDB MapReduce is incremental. And what that means
is that if you, you know, have 10 million documents in a database and you define a view,
then it takes some time to build that view the first time. But queries against that index are
almost instantaneous. And then on top of that, CouchDB automatically keeps the index up to date as efficiently as possible just by recomputing based on changes.
Whereas Hadoop-style MapReduce, which is what you'll find in the other products forFS and then define your query and run it on it and take the results of that query and maybe put them back into a database for real-time viewing.
So if you change 20% of those inputs, then it's usually better in the Hadoop context to just rebuild the whole thing, which is fine. I mean, Hadoop obviously seems very popular, but it's different from the kinds of MapReduce that would be useful to a company like Zynga
wanting to support FarmVille and having real-time results available as they stream in.
So in the mobile context, you mentioned long-running connections.
What's available with CouchDB on the desktop or the server? Sure. So we have a Couchbase desktop for OSX
that is a rev of CouchDBX,
a project that Jan had been working on for a long time.
It's finally cleaned out some of the annoyances and stuff
and really stripped it down to just being an icon
in your menu bar that has a Couchbase server running there,
and you can pop it open on port 5984 and create documents
and play around in Futon.
So I think that's important for supporting developers.
On the server, we also have a Couchbase server build for Linux and Windows,
and we see actually starting to get some interest
from the Windows side of the world.
But in the long run, everyone's asking us,
what about scale up?
What about scale out?
Because currently Apache CouchDB is designed for a single node.
The API is designed to scale up,
but the actual implementation doesn't contain that.
So that's what we're going for. I mean, you know, that's the point of this merger is that when we've
got our combined product, it's going to be the big, fast CouchDB that everyone always wished for.
So what becomes of Couch.io?
That's just an old domain name that I've still got laying around.
So, yeah, so we've got, you know, the history of the company was we founded it as the, you know, the business entity being Relax Incorporated, which is kind of like GitHub's Logical Awesome.
And then, yeah, we had this couch.io domain name, which was cute, but it had usability issues and that's you know just became
obvious the more people that we talked to about it um so that's why we switched to couch one and um
you know finally with a merger uh we were you know couch base kind of obvious coming out of
couch one and min base and uh my cabbie in austin last weekend you know could understand what i was
saying right away i said couch base and you know wasn't like couch what, which happens when you say couch
DB. I was pretty happy about that.
We're not allowed to entertain the idea of changing the company's name ever again.
What about couch in the cloud?
The couch hosting that we have
is expanding.
We've got, well, we've just recently been going through some upgrade pains, you know, as everything does.
But we've moved everyone's data on to EBS.
So we're getting faster latency and, you know, better throughput on those boxes.
Jason is, Jason Smith is our guy in Thailand who handles most of the hosting.
And he's also working on, you know, rolling out the paid options for hosting. So it's really
going to be, you know, catering to professional users who are, you know, either storing mission
critical data in there or want to use it as a development point in the cloud.
So there's other services out there, Cloudant being one.
Are you guys supporters of that as far as paid commercial support,
or do you see them as a competitor long-term?
Well, long-term, what we see is the more CouchDB companies, the better.
And so we love it that Cloudant's there.
There's another company, I think they're still Stealth,
but they're actually working on a CouchApp marketplace.
So there's a fair amount of action going on in the CouchDB ecosystem and we think the more diversity the better.
Cloudant has BigCouch, which is sort of the, you know, it's a CouchDB that scales out,
and it's all written in Erlang and is fairly performant and high throughput,
and we think that's great to have out there, have people using it.
It's a little, or at least their business model, excuse me,
CloudN's business model is a little more focused on, you know,
kind of these real-time search workloads.
So they've got a lot of customers who are consuming Twitter firehose or other feeds like that and doing semantic analysis and stuff on top of that data.
We're a lot more interested in the real-time, somebody clicked to buy a cabbage and now they have a cabbage, those kind of queries.
So, you know, we think that there's room easily for Cloudant and Couchbase and hopefully a
whole slew of other companies to come along.
So for the developer that's not doing just front-end, back-end, direct JavaScript to
Couch type application architecture.
Where are you seeing the growth and adoption?
In Python, Ruby, what sorts of communities are embracing Couch?
So we're going right now to focus on PHP first because the runtime already makes a lot of sense
with Couch's ability to crash and recover quickly.
The PHP runtime, every
single request is isolated. So if what you need to do is turn some JSON into some HTML,
you could do worse than to turn to PHP. But on the other hand, there's some work that
needs to be done there to make the clients really smart and strong. So we're plowing energy into the PHP drivers, also into Ruby and Python and Java and.NET.
So it's actually Jan who's heading up the effort to put our SDKs together for the various
platforms and picking which ones to do first.
And maybe we're picking to start with PHP because Jan's an old-time PHP guy. Don't tell
anyone, but he's got a php.net email address. Nice. So let's switch gears for a moment.
When you're not hacking on Couch or Couch apps, what's really got you excited in the world of
open source? Oh, gosh. That's a good question. I've been so heads down. First of all, I'm on the merger and now finally getting back to write code. I think that the mobile stuff, iOS and Android're still going to surprise us. People are making fun
of that color funding. They raised like $40-something million, which is maybe more money than seems
reasonable, but their app seems kind of cool. I don't know. Maybe, I don't know about the
financial side of it, but I think that this kind of finding people who are near you in real time stuff hasn't even started to change the world yet compared to how it's going to.
I'd like to see the new Couchbase Mobile be a module for Titanium Accelerator or Titanium Mobile so that you've got a couch db option on both ios and android one day yeah so we know of at least a few
apps out there they're using titanium and couch db together i'm not sure if the code is open source
or you know clean enough to turn into a module but people are doing it so it seems like it's a good fit. Yeah, and I'm doing what I can,
meeting with all these various HTML5 kind of UI
and widget component companies and jQuery Mobile
and delving into all that.
If people are out there and are kind of interested
in the intersection between front-end and mobile,
there's a seven-part series by this guy Todd Anderson
on jQuery Mobile and CouchDB.
And if you go through that seven-part series,
you'll come out the other end of it
probably better at that stuff than I am right now.
I mean, it's just got everything you need to know.
So a week from now, you could be an expert
iOS CouchDB developer or HTML5 mobile
CouchDB developer. Todd Anderson, no relation? Nope, no relation. But well, not that we know of
yet. So one last question. Who's your programming hero? Oh, you know, that's kind of easy because
I get to hang out with him on a fair basis.
At the risk of being a fanboy, Damien's pretty awesome.
I mean, as far as knowing what not to do, he's always coming to me saying like,
Chris, are you sure you want to write that code?
You know, if you write that code, someone's going to have to maintain it.
And that's like having somebody be that conscience to not always add features is really cool.
And then being able to see how stuff at the low level affects stuff at the high level.
One story that he tells about Erlang that is really true, I saw some performance benchmarks of some, what was it, an image converter that someone had written Erlang, which seems unlikely to be fast, but it was.
So each Erlang process, which is, you know, an Erlang process is kind of similar to a Java object.
You can create 100,000 of them in a second, and you can, you know, and they're all running concurrently scheduled by the scheduler.
So each one of those has its own isolated stack and its own isolated heap.
And that means that when one gets swapped onto a core,
the whole thing gets swapped onto the core.
And maybe it doesn't all fit right there on the L1 cache,
but over the cache hierarchy, the active memory is all just localized
as opposed to threaded concurrent code,
which has to jump randomly across memory access all the time potentially.
So you've got these little processes that get swapped in.
They burn through their workload, and then they get swapped out for another one.
And then on top of that, since they're isolated,
they can be garbage collected independently.
And that means you don't have any stop-the-world pauses
when the garbage collector is running. And if a process is done, you can just throw it out. You
don't even have to crawl it cheap. So those things combined together, you know, this is the sort of
stuff that Damien explains to me and then I get all excited about. But I've seen Erlang apps where,
you know, you dial up the benchmark on it after it's, you know, after it's sort of prototyped and you look at it and you go, this isn't going to work.
We're like two orders of magnitude outside of spec here.
But that's only with 50% load applied.
So then you do a couple of optimizations and you're getting better
but it still doesn't look great. And then you go crank the load up all the way.
Actually get more than one box that's not the server box to apply load with instead of
just, you know, AB running on one box or something, really saturate it. And the thread or the process
scheduler can do these optimizations it can't do when it's less busy. And so you end up getting
kind of this better than linear ringing out, you know,
the last bits of performance from the box and just any kind of language that can do that as,
I don't know, awesome. And, and it's even more fun when Damien comes along and tells you why that
happened. You know, one of the things that struck me about Damien, when I first discovered CouchTV
was just the story behind the project and how he basically punted on his corporate career that was just not satisfying him to follow an open source project, which he didn't even know what it was at the time.
Yeah. I mean, if people want to see that story, the best resources, he did a – or InfoQ has the video posted from the talk he did at Ruby Fringe back in, I think, 2009.
Maybe it was 2008.
But yeah, back at the Ruby Fringe conference in Toronto.
Yeah, he got a standing ovation for that talk.
And I think he tears up in the middle.
So it's worth watching.
Definitely put that in the show notes.
One last question for you as a bonus.
So I had the opportunity to be on the NoSQL Smackdown with your buddy Jan. I think
you made an appearance in that one as well. What's it like working with Jan? Is he half as passionate
in his day-to-day job as he was on that panel? Oh yeah, he definitely is. He's the guy who,
you know, there will be a meeting and, you know, someone will say something and I'll be like,
I don't know about that, but it doesn't, you know,
not enough to actually speak up because I've got
whatever else on my mind. And I'll just jump right
in and get
to the bottom of whatever the issue is.
And so, you know,
it takes you a minute to get
used to that, but then you start to
thank them for it. That's, you know, it's
important to have people who are really looking out for
you know, especially looking out for end users and developers
and making sure that it takes the least amount of clicks
to get to the download and all that.
Well, Chris, certainly appreciate the time
and taking the time out of a busy schedule after the merger here
to tell us about the new lineup and where you're headed.
Yeah, thanks, Wynn.
Glad to be here. And anybody who's getting started with Couch
and gets stuck or whatever, has questions,
the community really loves helping new people.
So even if you just tweet about your,
oh, I wrote this MapReduce at CouchDB,
you'll probably get some helpful replies.
Cool, thanks again