The Changelog: Software Development, Open Source - Erlang, CouchBase, Merging with Membase (Interview)

Starting point is 00:00:00 Yo Chris! What up Claire? I gots a database problem that I gots to share I hate my SQL, it's giving me tears This alter table statement is gonna take years No need to trip on a funky query Use MapReduce and JavaScript with CouchDB The schema listen replicates using JSON.

Starting point is 00:00:27 Non-relational databases turn me on. From now on I'll use CouchDB. Update. All my sisters using CouchDB. Don't push me away! We're also up on GitHub. Head to github.com slash explore. You'll find some training reposts, some feature reposts from the blog, as well as our audio podcast. And if you're on Twitter, follow ChangeLogShow and me, Adam Stack. And I'm Penguin, P-E-N-G-W-I-N-N. This episode is sponsored by GitHub Jobs. Head to thechangelog.com slash jobs to get started.

Starting point is 00:01:17 If you'd like us to feature your job on this show, select advertise on the ChangeLog when posting your job, and we will take care of the rest. The irony of a real radio station advertising on the change log when posting your job and we will take care of the rest. The irony of a real radio station advertising on the fake radio. Southern California Public Radio, KPCC 89.3 on your FM dials. Looking for a Django Python developer that would report into the senior UX designer

Starting point is 00:01:38 and implement HTML CSS, the Python Django templates. Experience with the full stack MVC, MySQL Plus. If you're in the Pasadena, California area, be sure and check out lg.gd slash 9s. Fun episode this week. Took a break from our regularly scheduled design programs. Save your emails, guys.

Starting point is 00:02:00 We know two back-to-back design episodes set off the switchboards, but we're back to the no sequel. Talked to Chris Anderson over at Couchbase about the Membase CouchDB merger and the full line of products that they have. Let me ask you a question. Did he have a brand-new theme song then? You know, he sent it to me.

Starting point is 00:02:20 We will put that in the show notes. Actually, if you can cut it into the intro, that would be awesome. We'll try. We'll do our best. If you guys haven't caught the ad hoc theme show or theme song that we recorded the first time, I guess the NoSQL Smackdown

Starting point is 00:02:36 at South by last year. Chris was awesome in that little dilly. We had the video too and he was bouncing around, so I don't think anybody got to see that though. Fun times with Chris Abound. We talked about what it's like to work with Damien Katz on the Apache project. And also his buddy, Jan Lennert, who if you don't know Jan, then you're missing out.

Starting point is 00:02:55 Yeah, absolutely. Fun episode. Should we get to it? Let's do it. We're chatting today with Chris Anderson from Couchbase. So Chris, why don't you introduce yourself and your role over at Couchbase? Sure. Yeah, I've been a longtime committer to the Apache CouchDB project and founded CouchDB company with Damien Katz and Jan Lennert back in 2009.

Starting point is 00:03:32 And we were a merry band of engineers doing everything we could to make CouchDB awesome. And then in the last few months, we were kind of faced with the choice between doing VC or, uh, merging with these guys over at men base. And as we started to look closer and closer at them, um, it got to be, you know, obvious that, uh, it was just the right choice. So now we're couch base and we're bigger and stronger and, um, you know, me and Damien and Jan get to focus on our, on the things we're good at. So yeah, it's a, it's been been a wild ride, and I guess we're just getting started. Well, we'll get into Couchbase in just a moment. So for the five or so people out there that aren't familiar with CouchDB, why don't you

Starting point is 00:04:14 give an overview of the Apache CouchDB project? Sure. So Apache CouchDB is a database that's accessed via web protocols. So you just store JSON in it and get JSON back, both in the form of what you stored, and you can also build dynamic queries. So I want all the blog posts in the last two days. That sort of thing is easy to pull out.

Starting point is 00:04:38 It's got some other fun features. The killer one that really we're not seeing anywhere else in the marketplace is the ability to keep two copies two or more copies of it all synchronized so the idea you know kind of like dropbox but um you know for your uh api not for your files um so you know you've got two copies of it and uh you know there's work going on on both ends and you can synchronize them uh more or less effortlessly. So I can talk technically more about how that synchronization works.

Starting point is 00:05:14 But for most developers, it just works. So in reference to Couchbase, the new merger of Membase and CouchDB, how did that come about and what's the offering from CouchDB. How did that come about? And what's, I guess, the offering from Couchbase now? Sure. Yes, I mentioned earlier that, you know, it's part of our wild ride as founders. So, you know, what happened was Damien and I had a chance meeting with James, who is our, you know our lead product architect guy now.

Starting point is 00:05:46 And we just started talking. And what they'd been doing for the last year, it was kind of at the other end of the spectrum from what we'd been doing as far as where they're putting their focus and vice versa. And then comparing notes even more, our plans for 2011 were to do what they'd already done and vice versa versa they wanted to build into their product membase a lot of the features that couch tb already has um so you know we started talking more and more and we realized that at the end of the day doing this merger

Starting point is 00:06:16 would accelerate both companies roadmaps and and there's one thing that you can't buy and that's time so so i really feel like um if it continues to go well on a technical front, integrating stuff, that we have jumped forward a year or maybe even two in terms of our roadmap and capabilities and viability as a company. So on top of that, it's a huge relief for me because I went from being the CFO to being the president and having to manage all these teams and stuff, which is great fun, but I'd much rather focus on where the rubber meets the road as far as what developers are using. How many CFOs do you know that actually have a GitHub account?

Starting point is 00:06:57 Well, there's a few of them, but that's what you do when there's three of you. Somebody's got to be the cfo and um i drew the short straw i guess so you know the first link on the couch uh base website is why no sequel so it's now 2011 um not only are you having to answer this question uh you're featuring it prominently on your your home page are you finding that you have to sell no sequel just as much as you have to sell your products oh yeah so uh so it really depends on the audience that you have to sell NoSQL just as much as you have to sell your products? Oh, yeah. So it really depends on the audience that you're talking to. The core audience of this show probably already knows what NoSQL is for, and they're probably even over that hump where it seems like a threat to their tried-and-true relational databases.

Starting point is 00:07:40 I think most of the cutting-edge developer community sees that there's problems that are just a better fit for schemaless storage where you don't have to deal with migration, you know, migrating your schema all the time. And then, of course, like once you go over that hump, then there's all kinds of other benefits like the synchronization that we offer or the ability to do scale out because of key value based architectures instead of the relational model. So, you know, at the cutting edge, I feel like that story is told. But, you know, the percentage of developers who even, you know, know what GitHub is, is vanishingly small compared to the large mass of people out there. And so, yeah, to, they're going to come to us not because they heard about Couch, but because they heard, oh, there's something different, and that little blurb may be the first explanation they get of NoSQL at all. So what does the product line look like for Couchbase?

Starting point is 00:08:37 So what names survive, or is it just a new name totally with Couchbase? Sure. That's a great question. The answer is kind of it's easier to talk about it from the technical side because what we're doing for integration is fairly obvious. I mean, that's a big part of why the merger looked viable. After it looked exciting, then, of course, we had to go look at it with a skeptical eye and see is it going to be too hard to pull off. But it's really technically kind of obvious what we've got to do.

Starting point is 00:09:07 So before you understand what our combined product is going to look like, you've got to understand MIM base, which, you know, the elevator pitch is, it's basically a big MIM cache D cluster that doesn't, you know, forget everything when the power goes out. So it handles the, you know, resizing the cluster, and it handles if you want it to, proxying, you know, each request to the particular cluster member, or if you use smart clients,

Starting point is 00:09:33 then you can, you know, have slightly better efficiency. But overall, you know, it'll do all the rebalancing and make sure that as your data set grows, you can maintain sub-millisecond query latencies via the MIM-capable API. So that's MIMBase. And currently today, the back-end storage is handled by SQLite. But they're not using the relational features of SQLite.

Starting point is 00:10:03 They're basically using it instead of raw files. So the first step, pretty obvious to do, is just pull out SQLite and replace it with CouchDB storage engine. And so that's easier even than it sounds because most of membase, you know, everything but the critical write path pretty much is written in Erlang already. And then there's the C-based, you know, memcached and SQLite portions. everything but the critical write path pretty much is written in Erlang already. And then there's the C-based, you know, memcached and SQLite portions. But this is just going to be placing a bigger bet on Erlang, and it also makes it really smooth to integrate Couch. So first step is just getting Couch in there as a storage engine, and we're going to release that product as essentially something that provides a lot of value to existing Membase users

Starting point is 00:10:48 because you get, for the one thing, slightly better I.O. throughput to disk. Couch is just more optimized for the kinds of access patterns that Membase was already doing. And so that's just kind of like a really basic win, but maybe not worth all the technical risk of trying to do this integration on its own. The other thing you get more or less for free is the ability to query now your memcached, your mem-based cluster with the CouchDB-style MapReduce. So that's always been a big thing that's been missing from people's memcached experience, right?

Starting point is 00:11:23 You stick stuff in, you can get it out by the same key you put it in, but as soon as you want to get more complex than that, then you're either having to do a bunch of pointer following in your application or you're having to write some custom layer that interacts with memcached. So this will give people a straightforward ability to get real-time queries on top of their memcache D clusters or membase clusters. And that alone

Starting point is 00:11:50 is enough to be pretty exciting. But that's really just the first step. You know, once you get out of the path of choosing a NoSQL option, and there's a whole lot of options out there, and we've covered a lot of them on the show, I think where Couch in the past has shined has been in the replication area. But you've got another kind of ace in the past has shined has been in the replication area,

Starting point is 00:12:05 but you've got another kind of ace in the hole, as it were, coming up. Talk about your mobile. Yeah. Well, so yeah, so mobile has been a focus of Couch1 before the merger. And even though we've got more going on after the merger, I think that we're getting more momentum on mobile, if only because I'm not dealing with HR and fundraising. I'm working on mobile. And for the most part, that's just been coordinating a lot of the code that we had around and starting some QA and some release process on it and documenting it and getting it out to the community.

Starting point is 00:12:43 So I'm really lucky to our engineers for having already kicked a bunch of ass on getting stuff to run on iOS. It's not real simple. We had to do a lot of low-level stuff to the Erlang VM, to CouchDB itself. But the upshot is now we've got a CouchDB instance that runs on your device. We were surprised to find that it just has almost no impact on battery life. So that was real lucky.

Starting point is 00:13:15 I mean, it's not surprising when you understand Erlang and how Erlang is good at being idle. But still, we were expecting to have to invest a lot there. Instead, we've still got have to invest a lot there. Instead, we've still got to tackle the overall download size. So right now, it adds about 15 megabytes to your application, which we see being fine for enterprise applications and kind of more serious stuff. But if somebody just wants a little bit of synchronization, that's big enough to make them think twice. So our first goal is to shrink that. So I guess on iOS, which every app's in its own little

Starting point is 00:13:46 sandbox, that's pretty much additive to every app that you create. You can't install the Couchbase framework once and then share that, right? Right. Yeah. And that's down to the Apple restrictions, which I think make perfect sense. I mean, they're sandboxing these apps. They don't want, you know, some, they don't want DLL hell. They don't want your underlying libraries swapping versions out from underneath your application. It's our job to get that thing small enough to not have that negative impact. I think 5 megabytes is a threshold where we can start to feel pretty strong about it. Why iOS first? Was it an install-based decision or lower barrier to entry as far as a technical problem?

Starting point is 00:14:24 Well, we've been running on Android for, I don't know, about nine months now. And the response has been really strong. We've got a couple of case studies in the pipeline of people who are using Android on the device, are using CouchDB on their Android device. But it was actually kind of scary because Android affords you so much freedom. You know, that whole, that question you're asking about, is it in each app or is it once on the device? On Android, you can have CouchDB be a library. And so there's this whole line of development going down about how to manage a centralized database that multiple applications are talking to.

Starting point is 00:15:00 It's really powerful and interesting and threatened to pull us into the rabbit hole. With iOS, with the different restrictions, Paul Graham said this once. He said, when you're a startup, run up the steepest hill you can find. Just do the hardest thing that you can see because probably most people aren't looking with the same amount of detail as what you're looking at. So if you see something really challenging and you can nail it first, then that gives you a really strong position. So we think that a win on iOS is going to be easier to translate to other platforms and vice versa.

Starting point is 00:15:38 So one of the attractive features of Couch in the past has been these Couch apps, right? And it's more of a move back to client server where you've got your views and presentation logic actually running in your database, so to speak. Is that the same sort of pattern that you follow with a mobile application? You know, sort of. In the sense that Couch apps are the least amount of, you know, stuff you've got to do aside from the thing that runs in your browser that you're already good at. I mean, imagining, you know, like a jQuery developer, and they want, they're used to having to deal with the Sinatra guy or the Node.js guy to do the server component

Starting point is 00:16:17 and provide them a JSON API for the front end. You know, CouchApp is just simplifying that stack so that the jQuery developer doesn't need to deal with the middle tier anymore. But if we were going to take that same philosophy and apply it to iOS, then really the right approach is to be as transparent to your traditional iOS developer as possible, be the least amount of additional stuff they have to learn. So out of the gate, it's just CacheDB with the HTTP JSON API, but we see some APIs that Apple's got that should allow us to have a pure core data interface. So ideally, existing apps that already use core data, you just plug our library in and

Starting point is 00:16:58 get the synchronization for free. That's the goal, and hopefully we can get there. But even if there's roadblocks we've still got something i think is pretty valuable so on your product page you've got one of these nice manager friendly diagrams that just has the word couch sync between couch basin and the mobile app is that replication or how's that working yeah that's replication and so you know plain old replication is just so easy to do in couchouch. You post some JSON via HTTP at the server and tell the remote server that you want it synchronized with, and it does the rest of the work and does it as bandwidth-efficient as possible.

Starting point is 00:17:36 And you can even tell it to continuously keep up to date, which that even turns out to be a good fit for mobile networks. That long pole or continuous changes feed connection is actually, I thought it was going to run counter to the way cell networks work, but they're already optimized for these kind of long-running, mostly quiet connections. So that was nice. So basic replication is a really good fit for mobile, but there are some patterns that we want to embody in CouchSync

Starting point is 00:18:07 to make things easier. So for instance, on the membase side, one of the big users is Zynga. So you can imagine all the data in FarmVille, and right now it's in a big cluster. But if you want to take FarmVille and make it offline capable, then you'd need to have the ability

Starting point is 00:18:23 to get the data for, you know, a single given user and put it in its own little database, you know, essentially so that the user can then replicate that back and forth for their backup slash offline. So, you know, tools to make that stuff super easy. That's what's out on the horizon for us. Let's talk about couch apps for a moment. Did you coin this term? I guess so. I mean, it's kind of obvious. Pretty much every Couch term in the universe is taken at this point. But yeah, the CouchApps script that's like sort of a developer toolkit that you can find linked from Couchapp.org, implemented in Python. Now, I originally wrote something in Ruby that did essentially the same thing

Starting point is 00:19:10 and just didn't have time to do the maintenance burden, so I handed it off to Benoit and Jan, both CouchDB committers, and they worked on it some in Ruby and then decided to port it to Python because that's where Benoit is. That's where he feels most comfortable. So now we've got this Python thing with all these practically enterprise-y features. You can write eggs to plug into it. I don't use any of that stuff, but it's good that it's there when you need it. So that's a developer tool chain, but it's different from the idea.

Starting point is 00:19:54 The idea of a Couch App is just an app that is served out of CouchDB and to whatever the native client you have around is. The most popular native client in the world right now, of course, is the HTML browser. But on iOS, if it's, you know, just Objective-C and CouchDB, I'll call that a couch app. So yeah, I think that the real fundamental idea is that if you are allowing your users to take a database offline onto their device, you've kind of got to understand the security model of the fact that they've got a copy of all the data. And so the place where you apply your security policy is going to be on that inbound replication stream. It's not going to be by writing some middleware, Rails app or something that sits there and validates everything as it's going through. One of the things that I noticed when I got into development was that no matter how good you were on the front end, unless you were an uber front end ninja, to use the term, you pretty much had to deal with a server implementation of some sort. And we were all kind of in tribes based on whichever server platform you chose, because you really couldn't afford to pick up more than one because it was such an overhead of knowing more than one platform.

Starting point is 00:21:07 But as apps like CouchDB and Node.js have taken off, it seems like we've kind of this JavaScript layer that all of us were familiar with. As we start to do more with it, we're starting to kind of bleed or blur those lines in between our tribes. Have you noticed anything like that? Well, absolutely. I mean, especially, you know, talking about the cutting edge developers who have the choice to use the tools they want, you know, JavaScript seems to be really taking off. And I think that's the reason is, you know, why I switch all those contexts when JavaScript has, you know, most of the runtime benefits that the other languages can give you. But on the other hand, you do have a bunch of developers

Starting point is 00:21:47 in the enterprise world who don't get to pick what they use. However, that's even changing. I mean, JavaScript in the browser has been common there for a long time. So maybe we can leapfrog. People can move straight from their vb.net backends to Couch apps. And we've heard stories of, you know, large internal, you know, customer management systems and stuff being moved over to Couch and getting, you know, much better. Basically, less code means less to maintain.

Starting point is 00:22:19 And also, a lot of these guys have been seeing better performance just because, you know, you don't have a Java stack trace 50 frames deep or whatever. One of the things that intrigues me about Couch is not only does it collapse a lot of the middle layers, which seem to be superfluous for a lot of the smaller-end apps, but also it's built-in versioning for everything. Not just your data, but also your GUI. Yeah, I mean, it's got, so it's important to distinguish CouchDB's, you know, the built-in version, as it were, is multi-version concurrency control. So what that means is if, you know, we're both working against the same cloud server, and you load the document, and I load the document, and then you make a change and save it. When I try to save it, Couch is going to reject my save as being out of date. And that's just to prevent race conditions.

Starting point is 00:23:13 But it also means that readers can always proceed against a view query or against scanning the documents in a database without being blocked by writers. Everyone has their own independent snapshot of the database. So that all goes really deep into the technical design of Couch when you start to look at it. But the thing to be clear on is that by default, those old versions do not get replicated around. So when you synchronize, it just sends, you know, the current version. When you compact, which, you know, if you're not your own DBA, your DBA may compact when you least expect it to clean up wasted space. That'll also clean up the outdated versions. That's not to say you can't do versioning in Couch. There's lots of applications that either have an entity document

Starting point is 00:23:55 and then log additional documents that refer to that entity. So you can do patterns like that, or you can do patterns like actually keeping the full history as binary attachments on the old history so there's a lot of patterns there to do and if you google CouchDB simple document versioning

Starting point is 00:24:17 I wrote a blog post about this a few months it'll come up and it'll kind of go through the pros and cons of all the patterns In an effort to keep it real what sort of applications are not suited for CouchDB? You know, that's a good question. I think that, you know, a worst-case scenario for what, you know, how much storage and resources you're using up compared to, you know, the alternative, like a real-time message queue where you don't care about archiving it.

Starting point is 00:24:49 So something where you've got something that's fairly reliable but in memory. So if you were going to do that workload in CouchDB, you'd have all the message history for that application stored on disk. On the other hand, most real-time messaging applications you have all the message history for that application stored on disk. On the other hand, most real-time messaging applications do have some sort of need to archive and query the messages. I mean, maybe not most, but a fair proportion of them. So I've seen Couch used for spam filters.

Starting point is 00:25:18 I've seen Couch used for chat rooms. And it makes a good fit for that sort of stuff. The other ephemeral data, so if you were just doing like a dig style upvote counter on a post, maybe something else would be a better fit, although we're addressing that. I think there is some truth to be said that right now the different NoSQLs

Starting point is 00:25:40 have all been kind of finding their niche and getting entrenched there. But really, everyone's going after some form of 80% solution. So people are going to be adding each other's feature sets to the extent that it makes sense, technically. What was involved with getting the Erlang runtime on iOS? Did you guys have to deal with that? Our engineer, Aaron Miller, gets most of the credit for that. He went through the Erlang VM. Erlang is

Starting point is 00:26:11 implemented in C and it uses dynamic linking for a whole lot of... It's basically built out of its own plug-in system at some level. He went through and turned all that dynamic linking into static linking, which was just like touching a bunch of code and having to know what to do. And then there was a bunch of other strange little gotchas that you wouldn't expect. But for instance, Erlang uses the syscall fork to create a subprocess to handle DNS lookups. And that's just not going to fly on iOS.

Starting point is 00:26:50 You can't do fork. So we had to do little subtle changes like that. We also had to get SpiderMonkey running on the device so we have JavaScript running in a background thread because the built-in JavaScript on iOS, at least to my knowledge, always blocks the main UI thread when it's running. So you can't have the UI locked up just because a MapReduce is generating. So we included that spider monkey in there,

Starting point is 00:27:16 which I think also had to have some technical changes. But mostly it was just a matter of getting the build cleaned up and and then going through and conforming to you know sort of apple's view of the world was spider monkey a holdover from a previous design decision or any uh consideration for v8 yeah so we've done the spider monkey v8 shootout and spider monkey wins um and the reason why is because v8 is optimized for process launch time. You open a new tab, it needs to be responsive right away.

Starting point is 00:27:50 SpiderMonkey has the JIT compiler, which as it's running, especially with these map functions where you define the function once and then run 100,000 documents through it, the JIT will get it up to faster than C in some places. Coupled with that, SpiderMonkey seems to use a little less memory than V8.

Starting point is 00:28:10 The startup time being not that important to us, we find that SpiderMonkey is better for at least on a big server install of Couch, you're going to get better throughput. That being said, on iOS, if we could somehow use the built-in Nitro or whatever, I mean, the number one constraint there is I'd rather not have to download all of SpiderMonkey to the device, even if it's a little slower. So we're working on figuring out solutions there. So CouchDB is part of the Apache Foundation lineup. What is the licensing rundown on everything Couchbase these days? So Couchbase right now has Membase, which is, I think, Apache licensed. And then Couchbase, which is our build of CouchDB that includes GeoCouch and some other little features and QA and stuff. And that's Apache license as well.

Starting point is 00:29:06 As far as what the license is going to be on, stuff way down in the future, we're still figuring that out. But the main consideration for me right now is I want to make sure that we're contributing to the Apache CouchDB community, not just code, but that Apache CouchDB community, not just code, but the Apache CouchDB is where the Erlang work that's appropriate,

Starting point is 00:29:31 where that ends up. We could have easily come out the gate and said, okay, we're just going to fork CouchDB and try and build up a community around that fork, but I would much rather stay in the Apache CouchDB community. So on your comparison page, you compare yourself to Couchbase versus Cassandra and MongoDB. So we've had Reok on the show twice. Any other NoSQL options out there that you could draw a distinction to?

Starting point is 00:30:00 You know, I think that it's real important that people understand that CouchDB's MapReduce is really different from all the others, and especially Hadoop. So Hadoop is, as far as I'm concerned, the big winner right now for, you know, especially in the enterprise people, you know, doing something other than just using Oracle. And so, you know, CouchDB MapReduce is incremental. And what that means is that if you, you know, have 10 million documents in a database and you define a view, then it takes some time to build that view the first time. But queries against that index are almost instantaneous. And then on top of that, CouchDB automatically keeps the index up to date as efficiently as possible just by recomputing based on changes. Whereas Hadoop-style MapReduce, which is what you'll find in the other products forFS and then define your query and run it on it and take the results of that query and maybe put them back into a database for real-time viewing. So if you change 20% of those inputs, then it's usually better in the Hadoop context to just rebuild the whole thing, which is fine. I mean, Hadoop obviously seems very popular, but it's different from the kinds of MapReduce that would be useful to a company like Zynga

Starting point is 00:31:30 wanting to support FarmVille and having real-time results available as they stream in. So in the mobile context, you mentioned long-running connections. What's available with CouchDB on the desktop or the server? Sure. So we have a Couchbase desktop for OSX that is a rev of CouchDBX, a project that Jan had been working on for a long time. It's finally cleaned out some of the annoyances and stuff and really stripped it down to just being an icon in your menu bar that has a Couchbase server running there,

Starting point is 00:32:06 and you can pop it open on port 5984 and create documents and play around in Futon. So I think that's important for supporting developers. On the server, we also have a Couchbase server build for Linux and Windows, and we see actually starting to get some interest from the Windows side of the world. But in the long run, everyone's asking us, what about scale up?

Starting point is 00:32:35 What about scale out? Because currently Apache CouchDB is designed for a single node. The API is designed to scale up, but the actual implementation doesn't contain that. So that's what we're going for. I mean, you know, that's the point of this merger is that when we've got our combined product, it's going to be the big, fast CouchDB that everyone always wished for. So what becomes of Couch.io? That's just an old domain name that I've still got laying around.

Starting point is 00:33:16 So, yeah, so we've got, you know, the history of the company was we founded it as the, you know, the business entity being Relax Incorporated, which is kind of like GitHub's Logical Awesome. And then, yeah, we had this couch.io domain name, which was cute, but it had usability issues and that's you know just became obvious the more people that we talked to about it um so that's why we switched to couch one and um you know finally with a merger uh we were you know couch base kind of obvious coming out of couch one and min base and uh my cabbie in austin last weekend you know could understand what i was saying right away i said couch base and you know wasn't like couch what, which happens when you say couch DB. I was pretty happy about that. We're not allowed to entertain the idea of changing the company's name ever again.

Starting point is 00:33:57 What about couch in the cloud? The couch hosting that we have is expanding. We've got, well, we've just recently been going through some upgrade pains, you know, as everything does. But we've moved everyone's data on to EBS. So we're getting faster latency and, you know, better throughput on those boxes. Jason is, Jason Smith is our guy in Thailand who handles most of the hosting. And he's also working on, you know, rolling out the paid options for hosting. So it's really

Starting point is 00:34:34 going to be, you know, catering to professional users who are, you know, either storing mission critical data in there or want to use it as a development point in the cloud. So there's other services out there, Cloudant being one. Are you guys supporters of that as far as paid commercial support, or do you see them as a competitor long-term? Well, long-term, what we see is the more CouchDB companies, the better. And so we love it that Cloudant's there. There's another company, I think they're still Stealth,

Starting point is 00:35:08 but they're actually working on a CouchApp marketplace. So there's a fair amount of action going on in the CouchDB ecosystem and we think the more diversity the better. Cloudant has BigCouch, which is sort of the, you know, it's a CouchDB that scales out, and it's all written in Erlang and is fairly performant and high throughput, and we think that's great to have out there, have people using it. It's a little, or at least their business model, excuse me, CloudN's business model is a little more focused on, you know, kind of these real-time search workloads.

Starting point is 00:35:46 So they've got a lot of customers who are consuming Twitter firehose or other feeds like that and doing semantic analysis and stuff on top of that data. We're a lot more interested in the real-time, somebody clicked to buy a cabbage and now they have a cabbage, those kind of queries. So, you know, we think that there's room easily for Cloudant and Couchbase and hopefully a whole slew of other companies to come along. So for the developer that's not doing just front-end, back-end, direct JavaScript to Couch type application architecture. Where are you seeing the growth and adoption? In Python, Ruby, what sorts of communities are embracing Couch?

Starting point is 00:36:35 So we're going right now to focus on PHP first because the runtime already makes a lot of sense with Couch's ability to crash and recover quickly. The PHP runtime, every single request is isolated. So if what you need to do is turn some JSON into some HTML, you could do worse than to turn to PHP. But on the other hand, there's some work that needs to be done there to make the clients really smart and strong. So we're plowing energy into the PHP drivers, also into Ruby and Python and Java and.NET. So it's actually Jan who's heading up the effort to put our SDKs together for the various platforms and picking which ones to do first.

Starting point is 00:37:20 And maybe we're picking to start with PHP because Jan's an old-time PHP guy. Don't tell anyone, but he's got a php.net email address. Nice. So let's switch gears for a moment. When you're not hacking on Couch or Couch apps, what's really got you excited in the world of open source? Oh, gosh. That's a good question. I've been so heads down. First of all, I'm on the merger and now finally getting back to write code. I think that the mobile stuff, iOS and Android're still going to surprise us. People are making fun of that color funding. They raised like $40-something million, which is maybe more money than seems reasonable, but their app seems kind of cool. I don't know. Maybe, I don't know about the financial side of it, but I think that this kind of finding people who are near you in real time stuff hasn't even started to change the world yet compared to how it's going to. I'd like to see the new Couchbase Mobile be a module for Titanium Accelerator or Titanium Mobile so that you've got a couch db option on both ios and android one day yeah so we know of at least a few

Starting point is 00:38:52 apps out there they're using titanium and couch db together i'm not sure if the code is open source or you know clean enough to turn into a module but people are doing it so it seems like it's a good fit. Yeah, and I'm doing what I can, meeting with all these various HTML5 kind of UI and widget component companies and jQuery Mobile and delving into all that. If people are out there and are kind of interested in the intersection between front-end and mobile, there's a seven-part series by this guy Todd Anderson

Starting point is 00:39:28 on jQuery Mobile and CouchDB. And if you go through that seven-part series, you'll come out the other end of it probably better at that stuff than I am right now. I mean, it's just got everything you need to know. So a week from now, you could be an expert iOS CouchDB developer or HTML5 mobile CouchDB developer. Todd Anderson, no relation? Nope, no relation. But well, not that we know of

Starting point is 00:39:54 yet. So one last question. Who's your programming hero? Oh, you know, that's kind of easy because I get to hang out with him on a fair basis. At the risk of being a fanboy, Damien's pretty awesome. I mean, as far as knowing what not to do, he's always coming to me saying like, Chris, are you sure you want to write that code? You know, if you write that code, someone's going to have to maintain it. And that's like having somebody be that conscience to not always add features is really cool. And then being able to see how stuff at the low level affects stuff at the high level.

Starting point is 00:40:34 One story that he tells about Erlang that is really true, I saw some performance benchmarks of some, what was it, an image converter that someone had written Erlang, which seems unlikely to be fast, but it was. So each Erlang process, which is, you know, an Erlang process is kind of similar to a Java object. You can create 100,000 of them in a second, and you can, you know, and they're all running concurrently scheduled by the scheduler. So each one of those has its own isolated stack and its own isolated heap. And that means that when one gets swapped onto a core, the whole thing gets swapped onto the core. And maybe it doesn't all fit right there on the L1 cache, but over the cache hierarchy, the active memory is all just localized

Starting point is 00:41:21 as opposed to threaded concurrent code, which has to jump randomly across memory access all the time potentially. So you've got these little processes that get swapped in. They burn through their workload, and then they get swapped out for another one. And then on top of that, since they're isolated, they can be garbage collected independently. And that means you don't have any stop-the-world pauses when the garbage collector is running. And if a process is done, you can just throw it out. You

Starting point is 00:41:48 don't even have to crawl it cheap. So those things combined together, you know, this is the sort of stuff that Damien explains to me and then I get all excited about. But I've seen Erlang apps where, you know, you dial up the benchmark on it after it's, you know, after it's sort of prototyped and you look at it and you go, this isn't going to work. We're like two orders of magnitude outside of spec here. But that's only with 50% load applied. So then you do a couple of optimizations and you're getting better but it still doesn't look great. And then you go crank the load up all the way. Actually get more than one box that's not the server box to apply load with instead of

Starting point is 00:42:29 just, you know, AB running on one box or something, really saturate it. And the thread or the process scheduler can do these optimizations it can't do when it's less busy. And so you end up getting kind of this better than linear ringing out, you know, the last bits of performance from the box and just any kind of language that can do that as, I don't know, awesome. And, and it's even more fun when Damien comes along and tells you why that happened. You know, one of the things that struck me about Damien, when I first discovered CouchTV was just the story behind the project and how he basically punted on his corporate career that was just not satisfying him to follow an open source project, which he didn't even know what it was at the time. Yeah. I mean, if people want to see that story, the best resources, he did a – or InfoQ has the video posted from the talk he did at Ruby Fringe back in, I think, 2009.

Starting point is 00:43:27 Maybe it was 2008. But yeah, back at the Ruby Fringe conference in Toronto. Yeah, he got a standing ovation for that talk. And I think he tears up in the middle. So it's worth watching. Definitely put that in the show notes. One last question for you as a bonus. So I had the opportunity to be on the NoSQL Smackdown with your buddy Jan. I think

Starting point is 00:43:45 you made an appearance in that one as well. What's it like working with Jan? Is he half as passionate in his day-to-day job as he was on that panel? Oh yeah, he definitely is. He's the guy who, you know, there will be a meeting and, you know, someone will say something and I'll be like, I don't know about that, but it doesn't, you know, not enough to actually speak up because I've got whatever else on my mind. And I'll just jump right in and get to the bottom of whatever the issue is.

Starting point is 00:44:14 And so, you know, it takes you a minute to get used to that, but then you start to thank them for it. That's, you know, it's important to have people who are really looking out for you know, especially looking out for end users and developers and making sure that it takes the least amount of clicks to get to the download and all that.

Starting point is 00:44:34 Well, Chris, certainly appreciate the time and taking the time out of a busy schedule after the merger here to tell us about the new lineup and where you're headed. Yeah, thanks, Wynn. Glad to be here. And anybody who's getting started with Couch and gets stuck or whatever, has questions, the community really loves helping new people. So even if you just tweet about your,

Starting point is 00:44:58 oh, I wrote this MapReduce at CouchDB, you'll probably get some helpful replies. Cool, thanks again

The Changelog: Software Development, Open Source - Erlang, CouchBase, Merging with Membase (Interview)

Wynn sat down with Chris Anderson from CouchBase to talk about CouchDB, the merger with Membase, Erlang, and bringing NoSQL to PHPers....

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.