The Changelog: Software Development, Open Source - Riak, the New Erlang-based NoSQL Store (Interview)

Starting point is 00:00:00 Welcome to the ChangeLog, episode 0.1.4. I'm Adam Stachowiak. And I'm Winn Netherland. This is the ChangeLog, bringing you what's fresh and new in the world of open source. We focus on the projects and people of open source. You can follow us at thechangelog.com or for a real-time view, tail.thechangelog.com. Yeah. You can also check us out at GitHub, too.

Starting point is 00:00:37 We hang out on the Explore page, so github.com forward slash explore. They've got some training repos listed there as well as some feature repos from our blog, and also all the audio podcasts from 0.1 whatever, all the way up to this one here. So just go ahead and go to GitHub and check it out. 14 episodes we've been on the air. Can you believe it? That's kind of like a dynasty, isn't it? That's right. You know, unlike everybody else, it seems like we're on Twitter.

Starting point is 00:01:05 Changelogshow is our handle. And me personally, I tweet at Penguin, P-E-N-G-W-Y-N-N. Yep, I'm Adam Stack pretty much everywhere. Twitter,

Starting point is 00:01:12 Facebook, FriendFeed, which was consumed by Facebook and everywhere else. We've got a great show today. We talked with Andy Gross from Basho, the company behind Reok,

Starting point is 00:01:22 another NoSQL database entry, and Sean Cribs, a freelance Ruby developer that's written a cool new Ruby library for Reonk that he scooped on the show when we recorded it. And we posted that to the changelog, and it went from, what's it at now? It's like at one watcher when we got the scoop, and I think it was up close to 100.

Starting point is 00:01:43 I don't believe it. I don't know. Like most things that get posted to the changelog. The changelog effect is what it is. Yeah. Ripple is the name of the library. It's now at 88 watchers. Oh, boy.

Starting point is 00:01:57 Pretty hot on the GitHub. The GitHub. Cool. So, Reok, another NoSQL database entry. Stop me if you've heard this one before. Erlang with JavaScript, JSON with a REST interface. Yeah, definitely. Sounds like. Sounds like all of them.

Starting point is 00:02:14 Sounds like CouchDB especially. But they tell us, with a masterless replication scheme where all you need to know is one node of the replication network, and you can get on and get data from any of the nodes, which is pretty cool. I was pondering this over the weekend, the irony that all of these NoSQL databases have JSON support in some regard, and yet in HTML5, what are we doing? We're putting SQL in the browser. Yeah, I know. Was that you that tweeted something like that? Yeah, I did. You know, what I really want is Mongo or Couch or some of these other NoSQL technologies

Starting point is 00:02:58 that have this rich JSON store that I can query built into my browser. Forget the SQL. You know, that's just ugly writing SQL with escaped JavaScript. I really want a hash database that I could just built into my browser. Forget the SQL. That's just ugly writing SQL with escaped JavaScript. I really want a hash database that I could just stash my JSON objects and search them right there in the browser.

Starting point is 00:03:11 So go figure. We wait all this time to finally get database into the browser, and it's nothing we want. That's right. That's right. So they also started this off by scratching our own itch, too. They actually started this off much like Mongo, where they were solving their own problem and created an enterprise version and then led the way with push-down to open source. You know, that happens a lot.

Starting point is 00:03:34 You know, with Wanstroth, when he was talking about GitHub and how GitHub was a side project and they were actually working on VAM spam full-time and then figured out that what was the side project turned out to be the bigger play. It seems like oftentimes you don't know what's going to be successful when you start it. We've got a great episode today. Should we get to it? Absolutely. All right. We're joined by Andy Gross from Basho, the company behind Reonk, the cool new database document store, and Sean Cribs, a freelance Ruby developer. Andy, why don't you introduce yourself, let the folks know a little bit about yourself and Basho. Hi, yeah, I'm Andy Gross. I'm the VP of Engineering at Basho Technologies. I've been working at Basho for about two years and it's the best gig I've ever had. We have an awesome team.

Starting point is 00:04:30 I've worked with people that I've both worked with in the past and we've been assembled recently. An awesome group that's growing by the day. We both produced the React open source project and offer support and

Starting point is 00:04:48 an enterprise version of the same for the market. So we're trying to just drive this NoSQL thing as hard as we can. For the folks that don't know, what about React and its entry into the NoSQL space? How did that come about and a little bit about the project before we introduce Sean? It's a really interesting story. We started off actually as a company that was actually producing writing web apps. And going into it, we said we were going to plan for success. And one thing that a number of us have been burned by at previous startups is tying ourselves

Starting point is 00:05:28 to a relational database and then having to change that architecture at the worst possible time, which is when you start to get popular. So one of the things we said going in, and it was awesome that we had buy-in from the other founders, too, that we were going to write our own data persistence layer, which is – it could be relative – it can be seen as relatively controversial. So initially, React was an internal – just an internal Basho product that we used to power our applications. And it wasn't until August of 2009 that we went open source. We said, you know, this NoSQL thing is really our core competency. We're a bunch of distributed systems guys. The apps worked out okay, but we thought NoSQL was a much bigger opportunity.

Starting point is 00:06:25 So we spent a couple months changing directions, getting React ready for open source. We released it in mid-August. I don't remember the exact first release of React, but we're coming up on the six-month anniversary of its first open sourcing. So I think we've come an amazing, uh, uh, amazingly far since then. Um, uh, but again, it was initially an, an internal tool, um, uh, internal data store that, uh, we, uh, implemented, uh, for the purposes of not having to be woken up at one in the morning. One anecdote that people like to tell with that show is that one day we had two nodes die at about 11 in the evening. And the relevant people got on a call, and we knew the properties of our data store.

Starting point is 00:07:14 And we said, you know what? We can leave this till the morning. And we went to sleep and fixed it the next day, which was really – which really got our minds forward-looking towards, I think the future of this company is going to be in the NoSQL space. I've recently discovered it, and it's a very cool project. Before we get too deep into it, we should introduce Sean. And for the folks that are previous in the audience, I'm sure they know who you are. But, Sean, for the other folks out there, why don't you introduce yourself

Starting point is 00:07:44 and let us know how you came across React. Sure. Hi, I'm sure they know who you are. But Sean, for the other folks out there, why don't you introduce yourself and let us know how you came across REAC. Sure. Hi, I'm Sean Cribs. I've been a freelance Ruby web developer since 2007. Before that, I worked for a small community college in Kansas City, also doing Ruby development there. And I got interested in React after I heard it go open source. One of my recent interests has been Erlang. And so I went to the Erlang factory conference last year. I guess it was end of April. And I met Justin Cheehy for the first time. Justin is the, I don't know if Andy mentioned, he's the CTO of Basho. And he was talking about WebMachine, which is another one of Basho's open source projects.

Starting point is 00:08:32 But it turned out that they released their, as Andy said, their RIOC in August. I was like, hmm, so this is what they're using web member tune for and um and then i got to the chance to go to no sequel least and and talk to the whole team a bit more about it and i've been really interested in uh the no sequel movement uh since i first heard about it because um as most ruby web developers discover with uh, active record projects that there are just certain edge cases, uh, that make working with an SQL database really difficult. Um, times when you want to have, uh, pieces of your data that are completely dependent on one another. Um, and when you get your database really big, that's, those are the two things that

Starting point is 00:09:26 I find that people run into. Um, and, um, so, uh, I, in, in addition to just being kind of tangentially interested in, in REAC, uh, for, for personal reasons, um, I got contracted in January, uh, started working on this contract in, or this contract or arranging this contract in December, but got contracted in January to build a Ruby library so the client could move their Rails app off of MySQL onto React. That's interesting. The growing crowd of NoSQL databases is just getting bigger every day. So is this a fad? Is this a trend? Is this a replacement for traditional relational database architecture,

Starting point is 00:10:13 or is it just a complement to that architecture? I think currently it's going to have to be complementary for the time being. However, unlike other types of databases that have been fads or just ill-conceived, in my opinion, like XML databases, you've seen some of the seminal NoSQL projects arise out of necessity as internal projects that companies like, well, Basho,

Starting point is 00:10:43 but also Facebook and LinkedIn with Cassandra and Voldemort, respectively. And that indicates to me that, you know, if it was, that solving a real problem that people have, I don't think anyone ever asked for an XML database. But when you see companies actually implementing their own data stores that are so fundamentally different, but so relevant to the problems that modern web apps face at scale, that's indicative of something that's not a fad and actually a market need. So a question from Twitter already. Jake Don asks, what makes React different from the other players in the space?

Starting point is 00:11:30 Yeah, I'll try to condense it. React is fundamentally distributed from the start. Our philosophy on how we proceeded in developing it was to get the distributed systems fundamentals down early. I've seen in other projects, you know, shortcuts that have been taken with regard to, you know, proper distributed systems theory made early on, and then people realize there's a need to, say, implement vector clocks, which are, you are logical, non-physical timestamps and eliminate the need for all your servers to be perfectly time synced. And it's really hard to get that stuff, to retrofit that stuff onto a store, onto a data store that you've made compromises on with regard to those things earlier.

Starting point is 00:12:23 So React is fundamentally distributed. It works fine on a single node. It scales down excellently. When we were developing applications, every developer on their MacBook had the entire stack running, including four React nodes operating just as they would in production.

Starting point is 00:12:42 It also scales up to hundreds of nodes just as easily. So that's where I see that's one of the primary key differentiators of React. We were very deliberate about getting the fundamentals right first. And more recently, we've been tackling, now that we have that done, we've been tackling making it, you know, easy to use and not something that's perceived as, you know, a complicated, you know, hard to use piece of software. And I think that was the right path to take because now I'm very confident in the core of React and we can start delivering features like things that Sean has been working on

Starting point is 00:13:34 and some of the more recent features in the latest releases of React like JavaScript-based MapReduce and some of the other things in the pipeline that I'm sure we'll get to later. Could you talk for a moment about the language breakdown? What's the core technology? I understand Erlang is at the center of this.

Starting point is 00:13:51 So what's the architecture? So React is mostly written in Erlang. I caught the Erlang bug while I was working at Apple, working on asynchronous, high-through low latency systems, trying to do that in Twisted Python and then in C++ and basically wanting to scream because of that um the and and i was and bob at bolido who i i then went to work for mochi media had turned me on to to erlang and it was it was you know just a dream come true for me to be able to work in that environment it's extremely powerful uh extremely proven and and has been used and proven to provide uptimes greater than any other measured language or product out there. So the core distributed system, the core React code is mostly Erlang. We do have extensions the storage layer are uh we have

Starting point is 00:15:10 pluggable storage our preferred storage layer is innostore which is another basho open source project that is uh an erlang uh wrapper around uh embedded in odb uh And our JavaScript support is Mozilla's SpiderMonkey, which is written in C. And we use Erlang's interfacing, foreign function interface capabilities to talk back and forth between those subsystems. But the core is in Erlang. So I guess you mentioned Erlang, Jason, REST.

Starting point is 00:15:50 Do you have people stop you right there? Do you have the reaction to have to draw some sort of distinction to Couch at that point? Yeah, I think Couch is a great project, and Couch really got out there early and really got people aware of uh no sequel in general and just you know hey there's a new way of doing this type of thing uh i would say our primary our primary difference with couch is in uh what i was talking about

Starting point is 00:16:21 before is that we're couch is Couch is really at its core a single node system. They support replication but Couch databases are single node concepts. React

Starting point is 00:16:39 even if you're running it on a single node the lower level abstractions are dealing with consistent hashing and virtual nodes. And when you add a second node, things don't change. And it's not, you don't have to point your database at another node to replicate to. You just add a node and the data distributes itself in the background. Contrast this to sharding, which is, in my opinion, of all the things to bring forward from the relational era, one of the last things I'd choose to bring forward because sharding is fragile. When you're spreading your data around across many machines, you increase just the mean time between failure of hardware. You're getting yourself into a trap where it's much more likely you're going to lose a bunch of data. But React, on the other hand, provides you with all the benefits of sharding without exposing both the operational pain of having to deal with a setup like that and its inherent fragility.

Starting point is 00:17:55 React, you basically tell it, look, I want each piece of data to be replicated on this many nodes, vryok takes care of it um when you add nodes you add throughput you add uh storage capacity in a roughly uh linear fashion and and just as importantly without any opera any additional operational pain like uh adding shards uh causes so the concepts of master slave really don't exist with React? No, not at all. It's fundamentally just distributed. No node is special in a React cluster. There's no central point of failure,

Starting point is 00:18:35 nor is there a central point of sort of need for operational attention. Every node is homogenous. They're all the same. If they disappear, your app's not going to go down. If a shard disappears, whatever range of data that shard has is going to go with it, and you better have had a backup of it. With React, just the sort of core distributed functionality ensures that your data is going to remain available when you add or remove nodes from a cluster.

Starting point is 00:19:10 Yeah, that has interesting ramifications for peer-to-peer type applications. If you only need to know one node to connect to, it kind of changes things. Yeah, it's true. The calculation of where a piece of data should be written to or read from is a function that can be executed strictly based on local data to any individual node, and this is consistent hashing. This is the technology that Akamai really introduced into the web caching world, and we've applied it here to databases.

Starting point is 00:19:42 Akamai never wanted to tackle the database problem, which is something that frustrated me while I worked there. But consistent hashing is – and sometimes it's hard to – the differences are subtle and there's nuances here. So sometimes I've explained consistent hashing as sort of dynamic optimal sharding or some combination of words like that. But what it really is, is handling the problem of ensuring that you have replicas of data, both replicas of data and spread of data, which is what sharding provides at a much lower layer that isn't exposed to your application or to your operations personnel. You know, another powerful feature of React is the notion of link walking, and one of the better explanations I've seen is up on your blog, Sean, SeanCribs.com, with two Bs. Why don't you give the folks an overview of link walking and why it's so powerful?

Starting point is 00:20:46 Right. So every object that you store in React has a bunch of metadata associated with it. And one of those pieces of metadata is the links. So there's an IETF working draft, I believe, or proposal about a HTTP header called link. And basically, it kind of looks like what you might see on a content type header or any of the other HTTP headers, except that it has a link to some other location, and then various attributes that are attached to it. So React lets lets you make one-way associations to other pieces of data. So let's say I had my own, we're building like a social network,

Starting point is 00:21:35 a quintessential example, and my record in there has a link to Wynn and to Andy and to Adam. And then if I wanted to see who my friends are, say I tag that link with friend as the tag, I could just construct a URL and get at that URL, beginning with my user record and then the link spec, we call it, which is better described on my blog than I can do in person without a whiteboard.

Starting point is 00:22:08 But you would tell it, get my friends, and React will go out, find the user record, and then follow those links and return all of the people who are my friends, all the records like that. So that's just a fundamental different way of handling, I guess, joints, what we call joints in the relational world, right? Right. Well, it's actually more useful to think of it like a graph database, which has nodes and edges. Or another analogy that I like to use is building data structures in C.

Starting point is 00:22:44 With the data structures in C. With data structures in C, you build like a struct, which has some pointers, and those pointers point to other places in memory. This is a bit more analogous to the C way of building data structures in that it's just a pointer. And you can follow that pointer with very little cost. Another, I guess, fundamental difference between REAC and some of the other players is the way it handles MapReduce and the way that it expects a set of keys to be passed into the map function

Starting point is 00:23:15 before it's run. Talk a little bit about that, Andy, and why that architecture and what makes that different and powerful. Yeah, it makes it powerful, and it's very deliberate the way we chose that. Right now, we're not trying to compete with Hadoop in terms of being a MapReduce engine. MapReduce here, we're trying to expose as a query mechanism, basically, that you can have in the request loop of your application. And Sean really led into this with the fact that the way web applications are structured nowadays, you tend to start with a root object like a user and you can fan out from there to their friends, their blog posts, their comments, you know, whatever other sort of domain objects that you have.

Starting point is 00:24:08 And throughout the course of a web session, you'll know the keys ahead of time that you want to perform an operation on. And this is getting back a little bit to the consistent hashing stuff I was talking about before. Given a key and – given a bucket and key in React, any node can determine what node that data lives on. So you can – we can farm out and distribute the computation and move basically the computation to the data rather than having to move the data to where the function is executing. So it's quite efficient and therefore very suitable for an actual query mechanism as opposed to the Hadoop use case, which is typically offline log processing. I don't know of many apps with Hadoop somehow integrated into the actual real-time request cycle of their applications.

Starting point is 00:25:11 I've been developing web applications, I guess, 10 years or so, maybe a little longer than that now. Something that strikes me, the queries are represented in JSON. Early on in web development, we had server technologies, and we found ourselves writing JavaScript to go against the DOM, but a lot of that JavaScript was dynamic and generated from server code. And so now we're passing queries to a lot of these data stores, React being one of them that uses JSON, under the hood,

Starting point is 00:25:43 where we express things like MapReduce inside of JavaScript functions inside of a JSON object that's passed back. But, Sean, let me ask you, as a Rubyist, right now I'm sure that you're writing a lot of this JSON by hand as these libraries are just starting to evolve. But where do you see that going and are we going back to straddling two languages to do one task? Well, I think that most, at least for most Rubyists, if they're doing Rails or Merb or Sinatra apps, they're already familiar with JavaScript. And I think that there's a great respect from Rubyists toward JavaScript

Starting point is 00:26:23 and its capabilities. There is that cognitive disconnect, but I think that web developers are the type of people who work in many domains at once anyway. And having the MapReduce be in JavaScript, which is something familiar to most web developers, I think is more of an advantage than a drawback. I just wanted to add a little bit to that. And then this kind of, you know, struck me too. I

Starting point is 00:26:49 haven't been a web developer for most of my career. It was sort of, I was a web developer back in the days where JavaScript was sort of a hack to do browser detection and other things. And it's really turned into, you know, a nice sort of little language that is able to express these types of operations. And it also is – everybody kind of knows at least a little bit of JavaScript. So it neatly bypasses what can be a difficult choice of what dynamic language VM you choose to implement inside the core of your data store. You see Google App Engine, they're rolling out support for various languages. And I think they're leveraging the JVM and its support for compiling those languages down to Java to a large degree. But I think JavaScript is really a net win. It's easy to learn. It's simple. It doesn't have a lot of real rough

Starting point is 00:27:56 edges, especially when you're dealing with it in a non-DOM, when you're not talking about the DOM manipulation aspects of it. It's expressive, it's clear, it's concise, and I think it's absolutely the right choice for the React not produced feature that we're talking about. Well, Sean, you mentioned the cognitive disconnect there. I'll ask the question that Adam's dying to ask, and he wants to keep our streak alive about discussing Node.js on this show. So how does this play nice with Node.js, and would it be easier just to keep everything in JavaScript? Well, it would play absolutely nicely with Node.js.

Starting point is 00:28:36 In fact, if I remember right, there was a recent client written for Node.js using Node.js' built-in HTTP for React. And I think I saw that flying by in the GitHub feed the other day. But on the other hand, you know, CouchDB has had this concept of a couch app for a while, which is basically you just store a bunch of JavaScript and other files in CouchDB, and you can serve that out as an application because CouchDB has an HTTP server in it. Well, there's honestly no reason why you couldn't do that with React too. One of the advantages of the raw interface, which is what I've been writing my Ruby code against, is you can store any content type that you want in React.

Starting point is 00:29:28 So it basically acts like an HTTP server. And there's also already in the client libraries for React that comes in the main distribution, a very basic sort of jQuery-ish client for React. Looks like we've got some questions rolling in on Twitter. Adam, you want to? The Twitter. The Twitter, as I call it.

Starting point is 00:29:51 Yeah, I've done this entire podcast without saying one where I was trying to get to like minute 44, but we're like seven seconds away. Well, normally you jump in so late, I feel like we have to introduce you. This is my co-host, Adam. Yeah, hey, this is Adam. There's a few questions on Twitter, but I think I have a more pressing question.

Starting point is 00:30:08 And it's kind of funny we got so far into the podcast and really haven't talked about it. Sorry, Tweeps. We have this company called Basho that was formed. And you guys have this product, but it's open source. It's commercial. How did that story come about? When the company formed, how did you initially plan revenue and the formation, and was it all joined around this product?

Starting point is 00:30:30 Well, when the company formed, we were doing an entirely different business. We were actually writing applications that are relatively uninteresting in the context of this podcast, but we chose, we implemented React as a strictly internal project so we wouldn't have to deal with, you know, scaling issues later on. But it was always a dream of mine that one day we'd be able to, you know, release React as open source. And when NoSQL started to really gain steam, we sort of weighed our options and we ditched the app business

Starting point is 00:31:14 and we really just sort of leaned into it as far as the NoSQL stuff goes. And Basho, as a corporate entity, we are extremely devoted to open source. We have an enterprise product that provides things like wide area, multi-master replication, enhanced SNMP monitoring, web UI tools, things that are valuable to enterprises. But we really try to err on the side of putting as much stuff and as much value into the open source project as possible. show as a company uh what i think uh is nice about bash show is that you know as a customer of bash show you you know a get access to our enterprise features you get input and voting rights onto

Starting point is 00:32:16 into you know our product roadmap uh no sql being relatively new and no deployment of NoSQL at scale being the same, you get pretty – basically direct access to the developer team in terms of getting your implementation right from the ground up. And we – but primarily, we are an open source software company. Uh, we have an enterprise product, you know, we obviously we have to, um, make money and we, and we have very valuable enterprise features, but, uh, we realize that, uh, you know, there are a few no SQL or no SQL like, uh, databases that are closed source, and I just don't think that's viable. The way that Bastio is going to succeed is by being a responsible and effective open source company and nurturing a community. And we've already reaped many rewards on that and, you know, couldn't be more thrilled with the attention that React has gotten

Starting point is 00:33:33 even only six months after its initial release. There's already a ton of community interest. I was going to ask you, what's your user base like? We have, you know? We have customers. I can talk about a couple of them, or I can talk about one of them at least. I'm just curious, numbers-wise, do you know what your community is like, actual usage, both enterprise and open source? Yeah, we have more enterprise customers than you could count on your hands.

Starting point is 00:34:10 I can say that. Yeah, and we end – Just teasing you. And a really active and growing every day React users mailing list and we try to be right out there on Twitter talking about stuff. We have a couple deployments that are a pretty big deal. Mochi Media uses us in a couple of really critical applications, and they get, you know get a ton of traffic. The one I'm most familiar with is all their session management is done through React,

Starting point is 00:34:54 and that's something that gets, you know, I don't know the exact numbers, so one I'm comfortable saying right now would be millions of hits a day, hundreds of requests a second. So – and that's on sort of the startup side. We've also gotten interest across the board. throughout this process, how forward-looking some companies you'd think would be still stuck in the, you know, Java, Hibernate, Oracle realm of failure that are willing to actually embrace these technologies. So we're in some trials with some pretty big names that we hope to announce soon. Um, but it's, uh, it's, it's, it's really, I feel it's taking off and this is not just a Basho thing.

Starting point is 00:35:52 I think this is great for the entire community. Uh, and it's, it's a young community. And I think that at this stage, you know, successes for any company are great for everyone. You know, successes customer-wise, successes funding-wise, I'm always happy to see any company in this area succeed because I think it's important, and I think it can really change the shape of how people build applications, whether it's web startups or big enterprises.

Starting point is 00:36:30 Right. So how does – my question I have is how does the product being open source, how does that – does it allow for greater adoption of the enterprise version? Do they play together or does it – how does the relationship between open source version? Do they play together? How does the relationship between open source and enterprise play out? So, I mean, the enterprise version is basically open source React with some add-on applications like I talked about before. The way we've gotten a lot of the customers we have currently are people download the open source version, do a shootout with us versus Couch or Mongo or Cassandra or whatever, and then approach

Starting point is 00:37:12 us saying, hey, we like you, let's try the enterprise bits now. I think without that open source component, we'd be at a real loss and a strategic disadvantage with respect to the whole market and the opportunities that are there. And we try to err very strongly on the side of if there's a question about whether a feature should be held back or open sourced, my argument is always let's open source it. So we spent a lot of work on – and this is largely Dave Smith's work. I would love to shout out every single Basho developer here because they're an amazing team, the best team I've worked with across all the companies I've been at. But we're growing so fast that I couldn't name them all probably right now. Um, but, uh, uh, Dave Smith, Dizzy Co on Twitter, uh, has done an amazing job of

Starting point is 00:38:15 taking embedded in ODB, which, you know, has been proven, you know, if you think the LAM stack has been successful, right. You know, that's,SQL right there and to a large part InnoDB. So we've taken InnoDB and wrapped it up in Erlang, and we use that as, you know, pretty much our recommended store. There's a relatively simple API for which anybody can write a backend for. People have written backends that store React data in Redis, which I think is really cool. I'm a big fan of Redis. You could store it in S3.

Starting point is 00:38:52 You have to implement less than five methods to be a fully functional React backend. So open source is, I can't overstate how central it is to, uh, to both our, you know, vision and, uh, and our success. Um, do you ever see it, uh, we're stepping off the vein here, but do you ever see the enterprise version kind of going away and you guys just sort of take over support for large implementations and maybe consulting and training and stuff like that and go straight open source? Or is it always going to have this enterprise vein to the product? That's a good question.

Starting point is 00:39:38 You know, I'm not the biz dev guy. If you were to ask me my honest opinion on most of these questions, I'd default to open source. But we have to make money to keep putting great stuff into open source. The current plan is to continue with the enterprise features. However, and I sometimes refer to this as the sleepy cat model, sleepy cat being the ones who wrote BerkeleyDB and were later acquired by Oracle. And version one of an enterprise feature will be a holdback for a little while. And then when we write version two, we put version one out in open source. And if we can't come up with something more compelling in a year, uh, than, than we failed. So, um, you know, while we do have things that are held back, the goal of, uh, is to gradually, you know, release those features back into the open source.

Starting point is 00:40:37 Um, so, you know, we're, we're committed to open source. Well, you have to make money. I mean, I know that, uh, you guys are committed to open open source but you do have to make some money but uh we have a couple more questions from our twitter audience and first question comes from bradford w and he wants to know about search and how uh react is going to play in a search search that that's that's a great question search is a product uh that is in beta testing with a couple of customers, Collecta is the one that we've announced. React, when you see people talking about NoSQL and a phrase I've often seen thrown out is, oh, it would be awesome to throw a consistent hashing layer,

Starting point is 00:41:23 and I'm using finger quotes there, on top of this name your single node system here, Tokyo Tyrant Redis or whatever, and it will perform awesome. It's hard to... It's easy to underestimate how much work writing that consistent hashing layer is. And React, however, is basically that consistent hashing layer is. And REAC, however, is basically

Starting point is 00:41:46 a consistent hashing layer. Since we have pluggable storage backends and a bunch of other pluggable hooks in terms of how data is partitioned, REAC search is basically using REAC as a consistent hashing layer around what I can best describe as what solar is in a single node. So the REAC search product has the same properties of REAC in that you can add a node and the data gets spread across it in the search case. You were talking about search indices, gets spread across it, has the same basic scaling properties as REAC, but provides a solar-compatible interface to that data. So it's really just take REAC in its current use as a key value store and

Starting point is 00:42:45 use that to solve a search problem. We're having great success with that. React Search is currently in limited beta. We're a growing company. We don't want to ... It's in beta because we don't want to, it's in beta because we don't want to release it and not be

Starting point is 00:43:09 able to support people on it effectively. We are very conscious of what our capacity is in our support pipeline, but most of React search will be in the open source version of React. The enterprise holdbacks for React search are probably going to be focused around API compatibility with existing search products like Lucene and Solr and being able to import your Solr schemas into React. into REAC. But that is another very exciting internal project that will come out a bit as soon. And I've seen people clamoring for it on Twitter, and it will be out, and it will be huge. Speaking of clamoring on Twitter, this is one of the most active TalkBack channels we've had on an episode. It must be the hour that we're recording it. It's a question for you, Sean.

Starting point is 00:44:07 You mentioned earlier the different content types in React. And the question from Alexander Sikular is, and he draws a comparison to MongoDB's Bison and its 4 megabyte max width. How does React handle binary, and are there any limitations on content size? Right now there's, and this is just from what I understand, a small limitation of the HTTP layer that's on top of the

Starting point is 00:44:35 datastore part of React, but there's a size limitation. However, there's no content type limitation. So React just, Erlang has this concept of binaries, which are basically bit strings. And once it gets into React, it just says, okay, this is binary data. So I'm just going to ship it out into the cluster and replicate and do all those great things that React does.

Starting point is 00:45:06 So there's really no restriction. You just have to specify the content type when you do the put or post request to React. So, I mean, there's no reason why you couldn't store an image or an audio file or a video and, you know, serve that out as part of your application or use it to, you know it like a global file system. Yeah, it's really a minor limitation and just something we really have to get to. What I've actually been working on to address this is an abstraction between the web interface

Starting point is 00:45:41 and the backend storage layer that represents a stream of data such that we don't have to accept an entire body. Obviously, you're not going to fit a DVD in memory on your average computer, which is what a standard web server is going to want to do. It's not a huge amount of work to expose streaming storage abstractions such that you could

Starting point is 00:46:06 upload via HTTP and chunked encoding a very large binary and have us store it as an object. And it's something I expect we'll implement pretty soon now. Alexander also has a follow-up question for whomever wants to answer this one. Can you save MapReduce output somewhere, and can you update output with Delta from the last execution? I could take that one. You get the MapReduce output back as, you know, if you're doing it from the Erlang interface,

Starting point is 00:46:43 you get it back as Erlang terms. If you're doing Erlang interface, you get it back as Erlang terms. If you're doing it over HTTP, you get it back as JSON. And there's nothing stopping you from going and saving that back as another React object to sort of any given MapReduce function and a data and that function's arguments, assuming the function doesn't change, we cache those results. So if you're dealing with stuff like time series data, you may run a relatively long-running query that aggregates, say, an hour's worth of data. But then if you want to do another incremental MapReduce over that for the next minutes data,

Starting point is 00:47:27 you're going to hit the cache for everything but that last minute. So we do, we do caching of, of MapReduce results, caching of MapReduce functions and have plans in the pipeline for making that cache even more intelligent and useful. Another thing that's related to this, and this is a common dig against NoSQL DBs, especially the key value-oriented ones, is you essentially need to know the key if you want to access the data. Now that we have JavaScript integration,

Starting point is 00:48:09 what we're likely to add in the very near future is something along the lines of GASP. Here's something from SQLand, a trigger or stored procedure that basically says, when you put an object into this bucket, run this JavaScript. And what that JavaScript function can do is essentially do like what CouchDB does with incrementally updating views. But it can also do, you know, making, you know, planning for success and having that be easy. People knock NoSQL value stores for lack of queryability, especially the distributed ones, because that becomes a harder problem. We definitely plan on adding some sort of secondary indexing capability through or implemented as some sort of post-commit hook that gets executed, a JavaScript function that gets executed every time you put an object into a bucket.

Starting point is 00:49:20 So you can have views that index a JSON document. Obviously, you can get it by its key and its ID, but you can index it on a secondary property as well. I might also add that because you're not chained to the idea of an auto-incrementing ID, you're not chained to the idea of foreign keys, you have the freedom to pick useful keys, as well as, you know, if you really, really need that extra speed that React's not providing, you can create your own kind of pseudo index in another bucket. And this is one of the things I'm considering in my Ruby code too, is that, you know, well, if you want to find something

Starting point is 00:50:06 frequently by one aspect of that data, why not just store another object and link to the original from that object and use the key that has meaning. Yeah. And we've used that in our own applications in, in bastial applications from our previous iteration and in toy applications that I've wrote to great success. And on one hand, it might seem like, wow, that's a pain. You have to maintain your own indexes. But it also gives you a great deal of flexibility as well. So it's sort of two-sided. And while we do want to actually natively support retrieval based on a key other than an object's ID, there actually are some benefits and a lot of use cases where it makes sense to essentially build your own index custom suited for your application and do that work when you're storing the primary object, like you were saying, Sean.

Starting point is 00:51:10 So, Sean, you're the same Sean Cribs of Radiant CMS fame, right? That is correct. So how do you see these document stores changing the CMS landscape? If there ever was a use case for deep, you know, schemess stores i would think cms would be it yes and actually um that's part of what drew me to it um i had done as a proof of concept a converting of radiance model layer over to mongo db and really enjoyed the benefits um of what it provided but on the other hand um I was also thinking about, you know,

Starting point is 00:51:48 maybe moving this into kind of a multi-tenant type thing. And then I got into the idea of, oh, well, gosh, I'm going to have to build multiple databases to keep my customers or the multiple sites separate. And then I, it really just would just would have been a management nightmare.

Starting point is 00:52:10 So when I saw the idea of links, which is really what drew me back to React after I was using Mongo, I thought, well, this is completely natural. You know, I can take, I can go ahead and put the individual parts of each page into the page object. That makes a lot more sense than having it linked to be another table. Um, but then I can, um, you know, have my users go across all the different sites. Um, you know, maybe you have, maybe you're an editor on this site and you own this site and you pay for this site or you're just a reviewer on this site.

Starting point is 00:52:50 So there's a lot of possibilities definitely for CMS simply because the problem of a content management system is one of semi-structured data and definitely sparsely populated semi-structured data. So we heard that you have a Ruby driver coming out. Do you want to mention that real quick? Sure. I'm releasing it tonight. It's going to be called Ripple. And I might interject a little bit of the story behind that name. And Andy's chuckling there. Actually, Riyak is an Indonesian word meaning ripple so when we were trying

Starting point is 00:53:29 to decide the name of it I started off with Riyak client but it's very vanilla. Ripple I think is really nice because it describes also kind of the idea of how Riyak works. So this will be released tonight.

Starting point is 00:53:45 It's going to be on Gemcutter. It's going to be RubyGem. It'll be on GitHub, github.com slash Sean Cribs slash Ripple. And I'm going to encourage people to fork it. So what it has in it is a very robust Ruby client driver that gives you all the basics of working with React, including things like knowing what types of HTTP responses React will return on different requests. And I try to take those into account. And so you get a pretty rich layer that includes the ability to manipulate buckets and insert and retrieve and delete and reload objects that you have in your application. And also, it's not entirely complete, but I'm going to go with the release early, release often on this. And there's also

Starting point is 00:54:45 a modeling layer, which has a lot of similarities to MongoMapper. And I have to give John Neumaker props on MongoMapper. It's a great library. And I took a lot of inspiration from that. So it's actually probably the most interesting thing about the modeling layer is that it's Rails 3 only. So it uses the active model library to provide a lot of the more complicated things that you'd expect out of that type of library. And if you're new to MongoDB and didn't catch episode, I believe it was 011, where we interviewed John about MongoMapper. You can get all the details of that. But we're at the part of the show. If you guys have tuned in and hung around to the end of the show, you know that we're to the segment where we ask you what's on your open source radar.

Starting point is 00:55:40 So, Andy, you're up first. What gets you excited in the world of open source other than what's going on at basho uh other than what's going on at basho um node.js is is highly interesting to me uh i think there's uh i mean i think i'm not unique uh in being interested in that uh i think it's you're going to see a lot more stuff based on Node.js. I'm excited about some of the work the community is doing on native Node.js support for React. And I've been, you know, I admittedly am not a Ruby person, actually. And I've just been really impressed by, ever since we've started to improve our support for Ruby, the level of attention and the level of interest and the great feedback that I've had with members of the Ruby community that I haven't known before. So, you know, in general terms, I'm just looking for, I'm very excited for,

Starting point is 00:56:48 you know, what's to come both in Sean's work and other people's work with regard to Ruby integration and working with React. I'm also really excited about projects projects like rabbit mq and amqp in general um i think react and and rabbit make great sense and we've actually talked quite a bit with people about integrating them further whether that's providing a amqp uh interface to react or uh using React as a backend for persistent, um, AMQP storage. Um, and, uh, you know, that's really it. I, I, I should be, uh, you know, more attentive to projects that are going out right now, but I have such a full plate, uh, that, you know, just, you know, no SQL stuff in general. And I'm looking forward, actually, having spent many years, you know, I enjoy Erlang, but professionally before that, I was tied to C++ and other languages that I'd like to forget and reuse those neurons for learning more Ruby stuff.

Starting point is 00:58:05 So it's been a real eye-opening experience, you know, having this role in this project and being able to engage with these various communities, in particular Ruby. How about you, Sean? Well, I've been doing a lot of JavaScript lately, jQuery front-end stuff. That's always on my mind. I like working with and building beautiful interfaces that work well.

Starting point is 00:58:34 So I'm always on the lookout for new jQuery plug-ins, and I was really pleased to see 1.4 came out just a couple weeks ago. Also on my mind a lot, probably because of who I've been working with, but there's a big resurgence of Lisp and Scheme variants lately, and particularly Clojure is the big juggernaut in this space right now, but also has some smaller friends like LFE, which is a Lisp-flavored Erlang, so you can write Lisp for your Erlang applications. And also Gambit Scheme has been looking pretty cool.

Starting point is 00:59:15 So I'm going to try to get into some Lisp this year in addition to doing more Erlang and JavaScript. Yeah, talking about languages in general, besides projects, I would really love to have the time to explore Clojure some more and explore Haskell some more. Every time I try to learn Haskell,

Starting point is 00:59:35 I end up feeling stupid, but I think if I actually gave it the old college try, I could really kind of wrap my head around it and be a better programmer as a result. It's so funny to hear you guys mention Node.js because that's like, I don't know, what, like nine shows in a row in? Right.

Starting point is 00:59:53 Something like that. But Andy, Sean, thank you so much for coming on the show. It's been great having you guys. Andy, it's great to see Bashos step up and scratch their own itch and, you know, give back to the community, even at the measures you have. It's such an exciting time for the NoSQL ecosystem community. But how can people reach out to both of you guys via Twitter or email? I'll go first. I'm at argv0, A-R-G,v0, a-r-g are my initials, and argv0 is C for the name of the program that is being executed.

Starting point is 01:00:31 I thought it was clever and meta back in 99, and it stuck. So you can reach me. Twitter is usually my primary medium these days, but if you want to catch me on email, I'm either andy at bachelor.com, or you can always reach me on the React users mailing list. And I'm Sean Cribs on Twitter, just S-E-A-N-C-R-I-B-B-S. That's an easy one. Yeah, that's the easy one. Very vanilla, right?

Starting point is 01:00:58 But it's memorable, so I like to use that. So clever. memorable so i like to use that clever and also um andy isn't on there as often but i i'm frequently on uh free node irc um so i hang out in the radiant cms and the erlang otp and also recently the reoc channel so um if you want to get get me live that's where to find me yeah i mean i i try to be in the react channel as much as possible as well too so you can catch me there live also. Anyone out there, if you're listening, if you didn't catch how to spell Andy's Twitter handle,

Starting point is 01:01:32 just check the Change Log Show Twitter threads and you'll see some corresponding. And both of these guys are on the Change Log Show guest list. It's changelogshow. show forward slash guests. Yeah. On Twitter. Awesome. Well guys, thanks again for coming on the show.

Starting point is 01:01:51 It's been a pleasure having you and enjoy your evening. It's been great. Thank you very much guys. Thank you for listening to this edition of the change log. Point your browser to tail. The change log.com to find out what's going on right now in open source. Also, be sure to head to github.com forward slash explore to catch up on trending and feature repos, as well as the latest episodes of The Change Log. safe in your arms as if no passion shown was mine alone

Starting point is 01:02:32 now now now now bring it back bring it back to Our town Our town

Starting point is 01:02:51 All of the time Bring it back, bring it back to

The Changelog: Software Development, Open Source - Riak, the New Erlang-based NoSQL Store (Interview)

Adam and Wynn caught up with Andy Gross from Basho and Sean Cribbs, a freelance Ruby developer, to discuss Riak, the new Erlang-based NoSQL store and Ripple, Sean’s new Ruby wrapper for Riak....

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.