The Changelog: Software Development, Open Source - RethinkDB (Interview)

Episode Date: December 11, 2013

Slava Akhmechet, co-founder and CEO of RethinkDB, joined the show to talk with Andrew about RethinkDB - the open-source database for the realtime web....

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome back everyone. This is The Change Log and I'm your host Adam Stachowiak. We're a member supported blog podcast and weekly email covering what's fresh and what's new in open source. Check out the blog at thechangelog.com, our past shows at 5by5.tv slash changelog, and subscribe to the Changelog Weekly. It's our weekly email covering everything that hits our open source radar. You don't want to miss it. It ships on Saturdays.
Starting point is 00:00:32 Subscribe at thechangelog.com slash weekly. This show is hosted by Andrew Thorpe. It's episode 114, and it's sponsored by DigitalOcean and TopTile. We'll tell you a bit more about TopTile later in the show, but they're awesome sponsors of ours. We absolutely love them. They connect startups, businesses, and organizations to a growing network of elite engineers all
Starting point is 00:00:54 around the world. Head to toptile.com slash developer. That's T-O-P-T-A-L dot com slash developer. And DigitalOcean. We love DigitalOcean. We're hosted on DigitalOcean. And we want you to be hosted on DigitalOcean. Today, get hosted on a blazing fast DigitalOcean SSD cloud server. You can easily create a brand new droplet with root access in 55 seconds.
Starting point is 00:01:18 Literally, in 55 seconds, you'll be at your prompt setting up your new machine. You get your choice of size, region, operating system, all through a simple and easy-to-use dashboard or via the command line if you want to. They've got an API. And as for our fans across the pond, DigitalOcean just announced their brand-new second Amsterdam data center. AMS2 just opened up on December 2nd and now offers expanded server capacity to Europe, as well as shared private networking, which is only a feature we had here in the States at their NYC2 data center.
Starting point is 00:01:50 We want you to try DigitalOcean today for free using our promo code. Try them out today. ChangelogSentMe is the promo code to use. You'll want to use that when you enter your billing code information. There's a spot there asking for your promo code. Or if you miss it and you sign up Just email support Let them know that Changelog sent you
Starting point is 00:02:08 Use Changelog sent me as your promo code And they'll hook you up It's a $10 hosting credit you'll get So we want you to enjoy DigitalOcean Head to DigitalOcean.com today to get started And now, on to the show We're joined today by Slava Akhmachet To talk about RethinkDB Welcome to the show, Sl're joined today by Slava Akhmachet to talk about RethinkDB.
Starting point is 00:02:25 Welcome to the show, Slava. Hi, Andrew. It's good to be here. Yeah, so RethinkDB is a... I love your tagline on the website. I do this often. I say, built with love. But RethinkDB is an open-source distributed database built with love. Why don't you give us a little introduction. First, who is Slava Akhmachet and what is RethinkDB?
Starting point is 00:02:52 Yeah, well, I was born in Ukraine and I moved to New York City when I was 13. I now live in California. You know, I did my undergrad in computer science. I worked for the financial industry for a while. And then I sort of didn't fit in, so I went to grad school. And, you know, we looked around and we saw that there are a lot of changes in how people access databases and sort of a lot of changes of how things get deployed, how applications get built.
Starting point is 00:03:19 So we thought it was me and my co-founder, Michael. And I'll tell you more about these details as we get deeper in. And we thought we were going to start a project to take some of these ideas and some of these thoughts and sort of implement them into a product, an open source product that people could use. So we moved from, I was in grad school at the time doing something totally different. We were doing computational neuroscience and supercomputers. And it sounds kind of fancy, but really it was just trying to figure out how to simulate big things with a lot of interconnections on IBM BlueJune, which turns out to be really difficult.
Starting point is 00:03:56 So we were doing that and then started Rethink, moved to California. And we've just been working on this project ever since about 2009. Gotcha. So you guys moved out. Did you go through the Y Combinator? Is that right? Yes, we did. That was actually the catalyst for moving to California and then we never went back. Gotcha. Yeah, so it's a relatively new project.
Starting point is 00:04:22 I mean, no sequel is, I wouldn't say new, but it started to really gain in popularity in the last couple of years. What is it that made you want to do your own thing? Were the current solutions not good enough? Were there no solutions that you were aware of to solve the problem? What really made you kind of rethink the NoSQL? Well, I think the really major, there were a lot of factors going into it, but there's one thing that I think is a really big deal.
Starting point is 00:04:51 If you look at traditional databases and even NoSQL databases, they're databases that just happen to have a programmer interface, like an API. And we saw this trend, like if you look at programming languages, people understand that, you know, developers spend many, many hours a day building their programs. And these things don't just have to be, like, easy or pragmatic. They also have to be pleasant because pleasant programming languages win. So we thought that we're going to start a database that is a developer tool first and a database second.
Starting point is 00:05:27 And what that really means, I mean, there's a lot of details that go into it, but every time we design a feature or sort of make any kind of a decision, we first think of developers and what it feels like to develop in the system. And then after that, we think of all the implications in the database, to the database world and the operations world. And what comes out of that is what we think and a lot of our users think, a really, really pleasant database to develop in because many times people, when they build web applications, right, like backend is a huge, huge deal
Starting point is 00:05:59 and they spend many hours a day just working through a lot of these things. So it's stuff like a really pleasant administration UI that takes a lot of cues from many of the consumer projects or consumer products. Like why do consumers have to get better UIs than programmers? It's something that didn't sit well with us, so we thought we're going to make that part really good. It's things like a query language that's designed to be just a really unique, pleasant, and pragmatic query language. We wanted to do that.
Starting point is 00:06:33 So if you take this core premise that it's a developer tool first and a database second, a lot of very interesting things come out of it, and you get something that looks quite different and feels quite different and feels quite different from from anything else out there i'm not sure does that make sense yeah it does i mean actually on the on some of your docs you kind of say like you call you like to call it the best of both worlds and so you say that there are like when i first saw the the rethink db kind of like interfaced it it reminded me a little bit of couchouchDB, right? The same kind of idea.
Starting point is 00:07:06 So you say there's like the more developer-oriented products, which would include CouchDB, MongoDB, and things like that. And then there's the more ops-oriented solutions like Cassandra and React, which are a little bit more difficult to get started with, and they are designed for kind of a different purpose. Would it be appropriate to say that Rethink is more of like a DevOps solution?
Starting point is 00:07:27 It's like the mixture of the two? Yes, we always wanted to do that. So I think, actually, in a lot of the NoSQL projects, and really databases in general, this tension between developers and operations and how the team behind the project manages that tension is really what pretty much defines the project. So, for example, in the case of Cassandra and React to a large extent,
Starting point is 00:07:54 this tension between developers and operations and how they make decisions definitely falls closer to the operations side, far closer to the operations side. Because in Cassandra's case, it was really important to maintain write availability. So they designed a Dynamo-type system. And then if you're writing an application, you have to deal with conflicts and things like that. So just by design,
Starting point is 00:08:20 it makes writing applications a little bit more difficult and running a large system a little easier. And then MongoDB was kind of the opposite, where they made really pleasant decisions for designing applications. It was just JSON in, JSON out, really simple. You couldn't do joins, couldn't do many things. So it was just that simple system that people really, really loved. But then on the operations side, things got tougher because of failover and things like that that weren't as nice as Cassandra or React were.
Starting point is 00:08:53 So Rethink is just our own take on this tension between developers and apps. And we thought that a lot of these systems are very nuanced. So if you start looking at the details and looking at the nuances, we thought that a lot of these systems are very nuanced. So if you start looking at the details and looking at the nuances, we thought that we could design a much more balanced, much more pleasant experience. But the product is definitely developers first. We sort of look at what it's like to develop applications, what it feels like just from landing on know, landing on the page to downloading the product, doing the first five minutes and so on. And then we, of course, have to make sure that operations like that it works, that it's good, that it's pleasant for people. But whenever there is a decision like a trade off and we can't do the best of both worlds,
Starting point is 00:09:42 we usually fall closer towards the developers. Not always, but usually. Yeah, so one of the things that you tout is the query language, and I read very positive responses to Requel. What was the decision behind Requel? Give me some information about when you guys sat down and talked about your query language. I guess, like, what was the decision behind Requel? Like, give me some information about when you guys sat down to talk about your query language. You know, what do those talks sound like?
Starting point is 00:10:13 Because that's pretty low-level stuff to talk about. Yeah, so I'll sort of start with an anecdote. I don't know if you remember, there was an operating system a long time ago called BOS. Do you remember that at all? This was like maybe in the 90s. It was a an operating system a long time ago called BOS. Do you remember that at all? This was like maybe in the 90s. It was a media operating system. Sounds vaguely, vaguely familiar. I was very young. Yeah. Well, so BOS was this really pleasant, it was an operating system, was a really pleasant UI. And I think someone asked like the lead developer or an architect of BOS, how did you guys get a UI that is so snappy? And the guy said, oh, it's easy. The UI guy was sitting in a cube very close to the kernel guy,
Starting point is 00:10:53 right next to the kernel guy. And that interaction just resulted in a snappy UI. So I think the way what happened with Recall at Rethink is that I'm originally a programming language person. I absolutely love programming languages. I used to just build interpreters for fun for different languages and learn like every language I could get my hands on. And then when we started Rethink and we started building the team around it, part of what I did is this was completely unconscious, but the people that joined also happened to be programming language people.
Starting point is 00:11:30 Not because I was looking for that or anything. It's just because people just tend to unconsciously sort of attract people and work with people that are similar to them. And then my co-founder, Mike, was a UI person, so he got people to join that were really interested in user interfaces. So a lot of us are programming language people and we thought, okay, we have to design an interface and it has to be really pleasant, it has to be easy to use, it has to be familiar to people. So just starting with these premises, we built a query language that's sort of like
Starting point is 00:12:09 the domain-specific language that integrates into whatever language you use. So if you're using Python, for example, everything to be query language is just a library for Python. Or if you're using Ruby, it's just a library for Ruby. So some of these things were pretty easy. But once you get into like the esoteric parts of it and how lots of pieces fit in, a lot of the discussions get pretty contentious. People have different ideas, different opinions. So we've created almost like, I mean, to some degree, it's like the US judicial system, right? It's very adversarial. And this adversarial process, I think, results in something
Starting point is 00:12:46 quite good. Sometimes it's stressful. There's a lot of tension. Sometimes, you know, people don't often agree. But I think at the end, it results in a really pleasant experience for people. Yeah, I mean, you're ultimately working toward the same goal, right? So if you guys have different opinions, you can, you know, be adults and sit down and talk about it oh yeah yeah absolutely so the way i mean the way the process works it's actually completely open online so if you go to github and search for rethink db and look at the issue tracker um so when we started we actually we couldn't do that online so we sat down in a room and the first first version of Requel was just completely banged out in a room with five people sitting around. Right now, because the core of the language already exists and most of the changes are smaller,
Starting point is 00:13:36 all of the discussions are happening online and on GitHub. So if you look at the issue tracker and look at Requel issues, you'll see exactly what the process looks like. And typically, we have a discussion process where anybody could participate you know it's anyone who's working on rethink or users or really anybody at all um and we have a we time box it so it takes about i believe it's a week um to settle on an issue and then if we still can't settle there is a tiebreaker um and it's just the person you know we think is is uh has a really good sense for programming languages so we try to arrive at a consensus and if we can't that person breaks ties um and that's how the process works right now
Starting point is 00:14:15 gotcha so yeah it's a it's an open we've had guests on the show i think that uh we felt like chad whittaker from gid if that would love would love to hear that the community and everyone kind of plays a part in the decisions that are made. That's a pretty cool thing. How often do you have to... Go ahead. So actually, the community playing a part in design discussions has been a huge deal for us.
Starting point is 00:14:39 I think it's incredibly important because what often happens, and actually, it's not just Rethink. I think it's open source in general. But used to happen was you know commercial projects as people would release a feature and then they'd get the feedback afterwards and you could do all sorts of stuff before like you could do you know studies and you can do betas and demos and things like that but it's just not the same as having users you know jump in on in on a GitHub issue during the technical discussion and comment on what you're doing. And so far, I mean, I wouldn't say every single recall design decision benefited from this, but, like, the majority probably did. How often do you have to, you know, I feel like a year, two years ago, maybe a little longer than that, the question that I always read was, you know, NoSQL versus SQL, right?
Starting point is 00:15:31 What's the right solution? Should I use like a Mongo or should I use like a Postgres or, you know, what's the solution for my application? How often now, it seems like that question has shifted now to people kind of know what they want to use for their solution. And now it's like, it's gone back to if you're going to use SQL, like, is it MySQL or is it Postgres? If you're going to use NoSQL, is it, you know, which one? So how often do you have to answer or kind of defend the decision to whether to go with, you know, Postgres or Rethink kind of a thing? Well, I think we're, so Rethink is of a thing? Well, I think we're... So Rethink is a young product and a young project,
Starting point is 00:16:08 and a lot of people that start using Rethink already have a very good idea of what's going on. So we very rarely have to talk about Rethink versus Postgres or Mongo versus Postgres or anything like that. I think most people pretty much know who use RethinkDB. But if you zoom out a little bit and look at programmers in the world in general, I think there is still a lot of education to do and a lot of work to do for people to understand the differences
Starting point is 00:16:37 between these two approaches and what fits when. Because people have, I mean, we've studied relational systems and taught relational systems to people for the past 40 years and i don't think a change like that a very fundamental change like that can happen within a couple of years i think it's going to take a while for like the programming world at large to really understand the difference and i actually think you know even people building these things like we're learning every day how RethinkDB is and isn't useful to people. So even for the vendors and the people that are building these projects, it takes a while to understand what their project actually means and what it does for people and when it's a good idea and when it's not such a good idea yeah i mean i think i like i would say most
Starting point is 00:17:25 people and myself included tend to still just think relationally in terms of you know our system design and so i would i would wonder and i'd probably imagine that a lot of people who are just doing like going to rethink or going to mongo are just kind of doing it at this point because it's like the new thing to do and still trying to slam relational models into it and use it that way and and so i wonder at what point will we you know you said like you said i mean just object oriented in general kind of lends itself to relational ideas so you know when at what point how many years will it take before we like are able to actually kind of free our minds of that and think in different ways that really enable this mindset. Well, we sometimes talk about this.
Starting point is 00:18:12 So when people first built cars, they used to not be called cars. They used to be called horseless carriages. And NoSQL kind of reminds me of that because when you define a whole field by an absence of something, that means the field is pretty young. It's going to take a while for it to really settle. I think if you jump a little bit into the details, when people first start using Rethink in particular, they maybe start with preconceptions of relational design, but then they very quickly learn not to necessarily do that because the project just sort of guides them towards the thing that makes sense.
Starting point is 00:18:46 You know, these things are often not about what's possible because you could build anything and anything. It's more about what's easy and what is like the path of least resistance. So people learn pretty quickly on individual basis the moment they start using Rethink. And I'm sure that's true about other NoSQL projects too. But the world at large, I think it will probably take another five to ten years for this to really become old news. And everyone just understands what everything is and what it means. So let me ask you then, just kind of for an answer, what makes NoSQL a good choice? And then more specific, what makes Rethink a good choice once you've gotten to that point? So I think NoSQL as a field is still definitely young. But what makes NoSQL a good choice is two things. The first is that a lot of data that people work with now,
Starting point is 00:19:35 it's not relational in nature, at least not as relational as it used to be. It's much more hierarchical. And pragmatically what it means is if you just do a relational design, you're going to have a lot of missing columns. You just have 1,000 columns, and in most rows, most of them are null, and it's very unpleasant to work that way. And NoSQL makes that very pleasant. You don't have to worry about that very much.
Starting point is 00:20:09 That's the first thing that makes NoSQL makes that very pleasant. You don't have to worry about that very much. That's the first thing that makes NoSQL easier for that kind of problem. The second thing is scale-out. So there was a big promise of that, and it's still, I think, quite debatable whether NoSQL makes things easier to scale-out in practice today, but I think when the field matures, it's definitely going to be the case, because the thing is fundamentally more scalable than relational systems just because it does less. And when these systems mature, I think scale out is going to be a no-brainer and no SQL, but it's still going to be hard in SQL. So that's the field in general. As far as rethink, we make it really, really, really easy to build applications
Starting point is 00:20:46 that have to deal with JSON. Specifically, if you want to do things other than sets and gets and basic aggregations in a single table, the moment you start doing cross-table stuff or cross-collection stuff, Rethink just makes that really easy. The programming language is
Starting point is 00:21:02 really easy. And then you build your app, and then we make deploying and scaling out just a very pleasant and easy experience. You could go to rethinkdb.com and watch the video and we sort of show like a one minute video of how easy it is
Starting point is 00:21:18 to scale things out. It's just a press of a button. So we make building applications and then scaling them out really simple. Now, I would point out that Rethink is still in beta, and we set it on the front page. We're getting very close to making it be a production release that people can start using in real production products, and a lot of people have already. But we've been very careful about making promises to people because these systems are hard.
Starting point is 00:21:47 They take a long time to design. They take a long time to iron out the bugs so they work well. So Rethink is new. And we certainly encourage everyone to try it and play with it and start building applications. But it's always a disclaimer that I kind of use before we start offering commercial versions of the product. Yeah. Being in beta, I mean, so, you know, just to kind of be transparent, you guys are – so there's a 13-minute video I watch, right? The first thing I did with Rethink, I was like, let me watch this video. 13-minute video, and you guys kind of explained Rethink, what it is.
Starting point is 00:22:24 You showed me sharding, replication, failover all in 13 minutes and I just think back to a couple years ago somebody trying to explain sharding to me and a couple years ago somebody trying to explain what their replication strategy
Starting point is 00:22:41 is to me and it's just shocking to me that you guys can do all that in a 13 minute video well. And it's just shocking to me that you guys can do all that in a 13 minute time. Well, so it's, it's 13 minutes to demo the product, but it's about three years to make all of that possible.
Starting point is 00:22:54 Right. Exactly. Yeah, it's, it's, it's really cool. So let me ask you this. What is your,
Starting point is 00:23:03 you guys officially support, uh, I guess three languages, the best way to put it, right? Python, Ruby, and JavaScript. What's your favorite implementation and why? So I am personally a Python fan. But I think, and I love Python. I love the programming language. And I love the Python driver, RethinkDB driver. I use it a lot.
Starting point is 00:23:26 I also use JavaScript a lot, both because I like the language and I like the driver. I'm not a fan of Ruby myself. But if I had to be honest with myself and with everyone listening, I'd say that the Ruby driver for RethinkDB is probably best just because Ruby, with their blocks and in general how the language is designed and how easy it is to hack in and do anything you want, it's the most pliable if you want to build a domain-specific language. So the Python driver, for example, and the JavaScript drivers are great,
Starting point is 00:24:01 but Ruby, the language, makes some things easier. Specifically, I think blocks are the most important and it's a little bit difficult to describe without you know just actually typing so i can't do that that verbally but if you look at rethinkdb.com and see just a basic example of what it looks like in ruby and python and javascript ruby just is a little bit nicer yeah well i mean it's it's just the idea of chaining in general like javascript ruby just is a little bit nicer yeah well i mean it's it's just the idea of chaining in general like javascript chaining is great but it's it there's some parts of it that anyone who's worked in javascript has you kind of it feels weird sometimes and ruby lends itself to that i think just in a real elegant way just yeah it's a great language for dsls and
Starting point is 00:24:40 stuff like that so awesome well the fact that oh sorry go ahead no you got it i i was just gonna say the fact that blocks um have a really nice syntax make it easier because in javascript you have to type like the word function um right and that's a lot of typing to do whereas in ruby you just put brackets and that makes things a lot easier yeah definitely let's go ahead and pause for a minute give a shout out to our sponsor toptow for a minute and give a shout out to our sponsor, TopTal. Yes, let's give a shout out to our awesome sponsor, TopTal. They've been sponsoring the show for a bit now and they're going to sponsor I think one more month. But I've been working with their CTO, Brendan, and I mentioned before I wasn't quite sure what to expect from them when we first started working out with them.
Starting point is 00:25:21 But I've got to say these guys are the real deal. They're engineers themselves from top to bottom. They built the company around engineers. They're not non-technical recruiters trying to pimp developers. They're a network of engineers from all around the world who work with some really awesome clients. And for those of you out there who are freelancing or maybe you'd like to freelance or maybe you're in a full-time position kind of doing one thing by day and you like to do another thing by night let's say node or something in javascript
Starting point is 00:25:50 or ruby just as an example and you'd like to try kind of testing out freelancing you gotta check out top top because they're doing some really awesome stuff with companies like airbnb artsy ideo and many others you can work remotely a beach, or anywhere in the world. No office required. To get started, head to TopTile.com slash developer and click join the best. Because they want to work with only the best senior engineers out there, they got a well-thought-out four-stage screening process that begins with a personal call via Skype
Starting point is 00:26:22 to kind of get to know who you are and what you're up to and introduce you to TopTile and what their mission is and see if you're a fit. And from end to end, the process includes an English speaking test, a timed algorithm test, technical interviews with core TopTile engineers, and a test project. But once you've got through that screening process, the sky is the limit. And if you think you have what it takes, head to TopTile.com slash developer to get started. Tell them the changelog sent you. TopTile.com slash developer. All right, so we were talking about which languages and Ruby versus Python, and we don't want to get too much into that right now. But what I do want to kind of get into is just a little bit more specific, deep dive into Rethink itself and, you know, less of the theory behind it.
Starting point is 00:27:09 And like, let's talk a little bit about how it works. So what do you guys recommend for the kind of the best way for somebody to get started working with Rethink? So we wanted the, the way we create, like getting started is almost like a game, right? So it's got to be really easy when you start out. And then as you start doing more advanced things, it should keep being easy and the learning curve shouldn't jump too much. So, you know, getting started is really easy. You can go to rethinkdb.com.
Starting point is 00:27:37 You can download it on Linux or OSX. And then there is a tutorial for any pretty much you know ruby python javascript but you could really use this with any programming language the tutorial is just 10 seconds and then if you like that you can move on to a 10 minute tutorial and start inserting documents and and querying and doing more advanced things gotcha let's talk a little bit about the querying i think it's a neat way the way that you guys do the chaining. And so every, basically, every, let's specifically talk in JavaScript. Every, I don't know, operation is essentially a chain of, you know, different. This is what we're talking about with Requel, the query language, right?
Starting point is 00:28:17 So you would basically say, you know, r.database, and I guess that's probably optional if you're only dealing with one database. I'm not sure, but, you know, you would say r.database, and I guess that's probably optional if you're only dealing with one database, I'm not sure, but, you know, you would say r.database, and then you'd pass the name of your database into that function, then you would say.table, pass the name of the table into that function, and then you would start talking about your operations and what you want to do, and then you end it with a run. Yes. So the query language is designed in a way where you start, so the data sort of flows left to right. So on the very left of your query, you specify where the data comes from.
Starting point is 00:28:50 Usually it's a table, right? So you say table, you know, users. And then after that, you say dot, and you can put any command you want. So for example, you want to filter users in a specific city. So you say dot filter, and then, you know, the city that you want. And then you can say dot again and let's say you want to group things so you say you know group by da da da da and then you can say dot again and you can just do this indefinitely um so it's it's very similar to um how you do chaining in jquery if people are familiar with that it's also very similar with
Starting point is 00:29:22 how you do it on the on the Unix command line in Bash, right, where data just flows left to right and you can keep adding pipes and each pipe is just an operation on that data. So then once you actually execute it,
Starting point is 00:29:36 you just, once it hits run, it actually executes everything from before, right? Yeah, so the important subtlety here is as you write that query in JavaScript, all of that is on the client.
Starting point is 00:29:48 It's all written in JavaScript. And you say, you know, table, filter, group by, you can count things, you could do whatever you want. You know, you could do joins across tables. But all of that is still just a program in JavaScript. And then when you type.run and you give it the connection to the database, what happens is the client
Starting point is 00:30:07 takes that query, packages it into a binary format, into protocol buffers, actually Google protocol buffers, and that gets shipped over to the database server. And then RethinkDB clusters, basically the machine
Starting point is 00:30:21 on the other side, the server machine, takes that query, compiles it down to a distributed program, and sends it out to all the machine on the other side the server machine takes that query compiles it down to a distributed program and sends it out to all the nodes in the cluster it knows where everything is so you can send the query to any machine and it gets the data and then as a user you just get the result right so none of this gets executed in the client the client side is just a convenient way to write the query the whole thing runs on the server in the cluster.
Starting point is 00:30:46 One little thing I wanted to point out. I was looking at your FAQs, and at the top of the site, I see a little example on inserting into students, and it looks like you guys are kind of taking a shot at SQL with the SQL injection. Bobby drop tables. Yeah, Bobby drop tables. That's funny. But that does kind of give me a, I mean, is that just because, do you guys deal with just only this requel and this format?
Starting point is 00:31:13 Or can you actually write actual, not SQL, but something similar? So right now we only deal with this format. But if you look at, if you actually dive into the details of how the protocol is designed, there is no reason this has to be a DSL in Python, Ruby, JavaScript, or any other language. This could be a text language. We just haven't designed one yet. I think this is going to be important for people like business analysts
Starting point is 00:31:37 who later, you know, they have a running database and they want to analyze the data. And I don't think, well, I'm not sure, but, you know, I think it's nicer for people to be able to do it in a language closer to English rather than Python. So we're thinking about this a little bit, but yes, there's no language like that now. Right now it's just a DSL. And as you pointed out, an interesting property of that is you can't really get injection attacks in the way you can with SQL. So do you think that you'll ever, do you think it would be SQL that you would support or
Starting point is 00:32:08 would you write your own mapping or how would you, what kind of decisions? I don't think it would ever be SQL for a couple of reasons. I think SQL isn't very good for hierarchical data and people have tried extensions to it in particular, like Postgres has extensions to SQL to work with JSON. And, you know, it's okay. It's not nearly as nice as a language designed from scratch to work with hierarchical data. So I don't think it will ever be SQL. I just think it's going to be designed.
Starting point is 00:32:36 If we ever do this, it's going to be more for non-programmers, if that makes sense. SQL sort of has this interesting property where it was designed for non-programmers, right? And then programmers were kind of forced to use it, but it was really designed for business people. So we designed the first version of the language as DSLs for programmers. And then if we ever do a SQL-like language for business people,
Starting point is 00:32:59 it's not going to necessarily look like SQL. It's just going to be closer to a natural language. So you don't have to put like quotes and dots and things like that, which non-programmers probably don't understand. Right. So one thing that's interesting that kind of, I don't know, just to me personally, it jumped out was watching the tutorials on Rethink, the join part of the language. And I think that being a non-relational, I think that you don't see that a lot with NoSQL because they want to – I mean the word join kind of implies that these two different databases or – I'm sorry, two different tables are related in some way. And so that's some sort of relationship. hierarchical data, they oftentimes do have things that are, you know, that are relatable, or you want to, maybe they're not necessarily related to each other, but you want to compare with each
Starting point is 00:33:50 other, things like that. So what was the decision behind supporting a join like that? And why do you think other, you know, NoSQL solutions, and I don't know which ones do and don't support that, but you know, what do you think goes behind that? Well, it's actually really interesting, because when people talk about relational databases, I mean, when this thing was designed in like the 70s and 80s, the word relational really came from mathematical relations, which has almost nothing to do with relationships, but because the word sounds so similar, it has the same root.
Starting point is 00:34:19 People talk about relational databases in terms of relationships between data. And this was completely unintended, right? This was not the original intention at all. And with NoSQL, you just can't escape the fact that data has relationships. I mean, every hierarchical data, graph data, any data, it's all about encoding relationships, whether it's SQL databases or NoSQL databases. And to us, a join operation was really a no-brainer because if you look at what people do with
Starting point is 00:34:50 a database like MongoDB, for example, that doesn't have a join operation, what they'll do is they'll have a table where they'll often get the data out into the client and then loop through every record and then go to the database again. And you can, of course, you can get around that by storing documents inline, but you can only do that to a point because that's not necessarily very scalable. And we thought that, hey, Rethink has to support both
Starting point is 00:35:18 because it's just a matter of time until every NoSQL database supports a join operation. It was sort of a no-brainer to us, so we just went ahead and did it because we designed the architecture on day one to support commands that work across tables. And you could do this, so pretty much anything you could do in SQL,
Starting point is 00:35:39 you could do in Rethink, so you could do subqueries and things like that. If you're running a MapReduce command or something, you can put a join inside there, you could do sub queries and things like that if you're running a group you know if you're running a map reduce command or something you can put a join inside there you could do sub queries inside there and never made sense to us that a query should just be on a single table we always thought it should be able to support dealing with relationships yeah that's kind of a big decision though right i mean do you guys have to kind of answer for that a lot or it's is that it seems like that would be a pretty big selling point of rethink yes it of a big decision though, right? I mean, do you guys have to kind of answer for that a lot? Or it seems like that would be a pretty big selling point of Rethink. Yes, it's a big selling point.
Starting point is 00:36:09 So the downside to this, of course, is that a system like this is much, much harder to develop because there's a lot that goes on in the back end to make this work. And it almost makes the complexity like exponential, right? It's just so much harder to develop a system like this, so much harder to design an architecture and then every feature you have to think about how it fits in. So we have to pay for that in just development time. You know, every time we do something,
Starting point is 00:36:39 we have to make sure everything fits. But now that we understand that really well, it became a lot easier i think early on um we just had to pay a lot in development time but i we think about this in terms of just you know what's what's better for users and we thought it's it's totally worth it so talking about what's better for users can you kind of give me a uh like a practical application everything something you know the real world scenario where it would make sense? We originally designed it for web applications and
Starting point is 00:37:08 mobile applications, but we just find people use it in a lot of different places. People use it in municipalities to record police events. People use it in biotech to store gene sequence data. It just shows up all over the place. It was very
Starting point is 00:37:24 exciting and sort of makes me personally very happy to see that a lot of people like, just like what we've built and find it useful. But I still think that every time you're dealing with, so Rethink is really useful every time you're dealing with JSON. So, you know, stuff like log data, any kind of middleware where you're dealing with different APIs. Anytime you're doing things like product catalogs where you can't, you know, you have different products and they all have different structure. Just really anytime you're dealing with JSON or hierarchical data, Rethink is really useful. And I still think most of the time that's building things for the web.
Starting point is 00:38:03 Gotcha. Do you have any plans of releasing a Windows support for Rethink? I'd love to do this. I actually grew up on Windows. I think one of my first development environments was Visual Studio. So I'm still in love with that platform. I think it's just a matter of time until we do it. We don't have plans for this right now
Starting point is 00:38:24 because we don't want to increase the surface area of the project, right? Because the moment we port to Windows, we have to support it and everything gets a little bit harder. So sooner or later, we're going to do it. I don't have an ETA for this right now. Gotcha. So you talk about having to support it. And you mentioned earlier that you guys are still in beta um although you're in beta do you see people using this in production like anyone you know or you know any companies that are you know big companies
Starting point is 00:38:49 or anything using this in production yeah so one thing we quickly learned is people don't listen when you say it's in beta right like gmail was in beta for a very long time you know up to a point where like the whole world was using so the same is true as we think i can't speak to specific companies right now we're definitely gonna you know post it on the site and talk about it and do case studies and sort of showcase a lot of interesting use cases but yeah people definitely have been you know starting to build production software on rethink like from day one which really surprised us because you know we expected people would be a little bit more careful. Right.
Starting point is 00:39:29 So like with traditional solutions, you have, you know, I mean, sharding and replication. Those aren't, I mean, those are common things, right? And pretty much every database solution has to handle that in some way. It's so easy with rethink, though. But so part of that is, and I think a lot of, you know, developers and ops people, and I think a lot of developers and ops people, I think they like to kind of have fine control and fine tune.
Starting point is 00:39:50 But when I'm watching this video and I see, I don't know who it was that was doing the video, but when I see him shard one of the servers and replication was so easy. But is there fine tuning? Can somebody get in there and really tune tune like you know i don't know like to speed up queries or can they do stuff like that and we think oh yeah so there's a command line interface um that allows you to really deep down dive deep down into the details and and take complete control um over the system we designed it was the idea that well there are a couple of ideas there the first is we learned that when you automate too much,
Starting point is 00:40:25 it works, and it works, let's say, 95% of the time. But 5% of the time it breaks down. Well, that's great, but it's not very useful to people, right? Because they don't know what to do when there is an actual error. So we didn't want to automate too much, and we wanted to build it in a way where administrators could do, you know, could be very explicit about what they want. And we built that first, and that's available in the command line.
Starting point is 00:40:48 And then after that, we thought, you know, to get started with the system, that's got to be really easy. So we built tools on top of that that use the lower-level tools to automate all that, and that's what you see in the web UI. And it turned out to work really well, so, you know, 95% of the time, people just do not have to look at the deeper thing
Starting point is 00:41:08 because the high-level interface will work. But if you want to, you totally can. You just type everything to be admin on the command line, point it at the cluster, and you can administer and change pretty much anything you want. Awesome. Yeah, so you guys have a page sequel to requel and i think it's neat to see how these projects are vastly different um and you know just in general but how easy it is
Starting point is 00:41:35 to kind of map terminology and stuff like that is it's pretty cool to see i think it's going to be you know it really helps to enable people who've been in a traditional environment to kind of move into the next the next era of databases and really learn. It's not like learning from the bottom. You kind of have a foundation already. Yeah, it's actually amazing how similar they look but how different they feel when you actually start using the two things. Yeah. Let's talk a little bit about the business.
Starting point is 00:42:07 And Rethink is a – we talked a little bit. You guys were in Y Combinator and this is public information. You guys put this – I think it's on your website or somewhere but you guys have raised funding but at some point, you guys have monetization at some point and making money. So what's the goal look like for Rethink as a business? J.D. So we really want the product to be open source forever. It's sort of at the core of what we do. Every developer here really cares about that,
Starting point is 00:42:31 and we think it results in better software for people. So Rethink will always be open source. Well, always is a long time, but I really believe that. I can't see a world where it wouldn't be. Let's put it that way. But commercially, I mean, we wouldn't do anything very different from other companies like this. We plan to offer support versions, supported versions of RethinkDB, so support packages. And we found that what happens is developers pick up Rethink. They start building an application on it, and then they hand it off to operations people. And operations people usually want to make sure that if something goes wrong,
Starting point is 00:43:10 they can pick up the phone and call someone on the other end of the line. So that's the model for Rethink. We're going to offer support versions and announce them pretty soon. I can't talk about the details right now. And that's going to be the immediate monetization. And we have a lot of ideas on what to do after that, specifically with services and platforms as services. But I don't want to get into that too much. It's a little bit early for that. Yeah, that's fine.
Starting point is 00:43:36 So you guys, though, obviously are thinking about things like that. And part of what comes with that is you guys have started to really – well, I don't know if started is the right word, but you guys have gotten a lot of popularity. So when you first started working on this project and you guys kind of started the business and all that, there were other viable options to NoSQL and just databases in general. Were you expecting the kind of popularity that you guys have now, or has this kind of taken you by surprise as far as just your day-to-day goes? Oh, it's definitely taken us by surprise, at least with the very first release of RethinkDB as it is now. We worked – these systems take a while to build. It's not like it took three months and then we released it, we were working on it pretty much in isolation for about, I want to say, two and a half or three years
Starting point is 00:44:30 because it took a really long time to design the architecture, make everything work, and make the first sort of quantum of utility that we could release. And that's a really long time. Very few projects take that long. So when we released it, and people were just absolutely blown away by the UI and the query language, how easy it is to use and how pleasant and how all these things feel. I mean, that felt amazing.
Starting point is 00:44:56 We would never expect that kind of popularity early on. Because every time we'd make a decision, it sort of felt as the right thing at the time, but you never really know how people are going to perceive it or they're going to understand it. Is it going to be useful to people? And the fact that on balance, most of these decisions came out, I don't want to say right, but at least useful to a lot of people. I think that's definitely not something we expected to this degree. Yeah.
Starting point is 00:45:29 So when you guys started, who was the team? It was you and one other person, is that right? It was me and my co-founder, Michael, and we had a third co-founder, a guy named Leif from Stony Brook University, Leif Walsh. He has long, long flowing red hair. I still remember that. We're still friends. Leif Walsh. He has long, long flowing red hair. I still remember that. We're still, I mean, we're still friends. Leif now works at TokoTech,
Starting point is 00:45:49 which is not a NoSQL company, but also in the database world, in the database industry. And then right now we're a team of 11, but it started with just the three of us. Right. I love looking at the people on RethinkDB, and your title is Raising the Bar. What does that mean?
Starting point is 00:46:09 Well, I'm the CEO. Officially, I'm the CEO of the company. But if you look at what I do on a daily basis, it's really anything from just basic services, make sure the fridge is stocked and the engineers here have what they need to get their jobs done, all the way to feature design and architecture and project management and, you know, talking to people and things like that. But I think if you boil it down to one thing, it's about getting the product to be so good that people just can't ignore it.
Starting point is 00:46:41 It's got to be so pleasant and so helpful and so nice for people. And they have to find it so valuable that they just can't, you know, not talk about it, not pick it up, not download it, not find it useful. And that I think is the main thing that I do, or I'd like to think I do that, you know, the jury's still out, but that's how I think of my job. Awesome. So you guys got a bunch of contributors that you've kind of specifically noted just probably because of the amount that they've given to the project but it looks like you're also hiring is that is that accurate yes that's right we actually so i can't talk about this too much um about the financing but we're going to announce this pretty soon and yes we're hiring people um all over the board um i can talk about that a
Starting point is 00:47:24 little bit. I don't know if the audience is interested in this kind of thing. But yes, we are hiring and we're looking to make the project hopefully even better than it is now. Awesome. If you're interested, just head over to their website and click on people and you can get some more information. Again, we won't belabor the point here. But yeah, I mean, Rethink, it's really cool to see you guys growing. I know I want to kind of just in general thank you for being so flexible with me.
Starting point is 00:47:51 This has been a crazy couple of weeks, but it's been really cool to see Rethink growing and the company and the product and the community around it. And to me, anytime that you can, I don't know, I look at you again, looking at your people page and anytime you can kind of distinguish the core team from the notable contributors and the contributors list is just as long, if not longer than the core team means that you've got something, right? It means that the community is interested and it means that, um, that there's something here and we just kind of hope that we can watch you guys succeed in the future with it. And, you know, it's definitely a really awesome product that when you during the show today you said that
Starting point is 00:48:30 there was a bunch of things that are coming soon and announcements that are going to be made and you didn't want to talk too much about them so it sounds like there's uh there's going to be some news to follow on to kind of keep up with with rethink so so how can people do that how can people keep up with you guys yeah there's there's definitely a lot of energy actually so it's interesting it started the the show with um built with love and why that is and uh you know we think of ourselves the the people that work for rethink we just think of ourselves as contributors that happen to get paid and there are a lot of contributors that you know just from the community um but so we tried to get rid of that divide. And anybody who contributes RethinkDB and even the user is just sort of part of this group, part of the team,
Starting point is 00:49:12 and everybody cares about the project and what it means. So to answer your question, you could follow at rethinkdb.com slash blog. We always announce things. You could look at GitHub. Or you could follow us on Twitter, just at R, just everythingDB and all the announcements happen there. So any one of these three channels, you could hop onto IRC and you'll know what's going on and you can follow some of the energy and some of the things that are happening.
Starting point is 00:49:39 Awesome. So for our listeners that are new, we ask the same three questions at the end of every show. So we'll go ahead and ask them. The first question for you, Slava, is for a call to arms, for the community to help out with RethinkDB, what would you like to see? So what we're trying to do right now at Big Push is making the experience more unique for people who use Django, people who use Ruby on Rails, and people who use Node.js. So we already have the three drivers in the languages, but we want to make it unique and a nicer experience for people building specific, you know, using specific web frameworks.
Starting point is 00:50:14 So for anybody who's a core contributor to Django or Rails or Node, we are hiring right now, and we're looking for people to contribute to the drivers and make RethinkDB just a better experience for those environments, please shoot me an email, jobs at rethinkdb.com, and we'd love to talk about it. Other than that, download the product, play with it, send us your feedback. That's the most valuable thing.
Starting point is 00:50:38 Awesome. If you weren't doing this, whether it was working at Rethink or just programming in general, what would you be doing instead? I tried to pick another problem in software that I think would make a big difference in the world. The thing I think that really excites me is 3D printing. It reminds me of Star Trek replicators, and I think it's going to be a huge deal. So if I weren't doing Rethink, I'd probably work on that. Nice. You'd be doing Rethink printing. Yes. I'd have to be a huge deal. So if I weren't doing Rethink, I'd probably work on that. Nice.
Starting point is 00:51:05 You'd be doing Rethink printing. Yes. I'd have to be useful. I'm not sure I know very much about the field. Well, you could at least, maybe you wouldn't be working on it, but you would be playing with the prototypes. Yes.
Starting point is 00:51:16 We actually are building a 3D printer from a kit at Rethink. Awesome. And the last one is for a programmer hero. So somebody that's kind of influenced you up to this point in your career um i'd say it's i mean john karmak comes to mind i grew up with his games i just and absolutely amazed with his ability to marry research and pragmatism and and and getting people something amazing that they're just amazed by.
Starting point is 00:51:45 And he really inspired me just as a kid when I started programming, and he still does. Yeah, that's a pretty – I was going to say, I remember his name from Quake and stuff, but looking at it, that's a pretty crazy chain, Wolfenstein 3D, Doom, Quake, Rage. That's crazy that he's kind of been the lead on so many successful projects. Yeah, the guy is amazing. I mean, he should be an inspiration, I think, to the whole generation of programmers.
Starting point is 00:52:09 He probably is, right? Yeah. Awesome. Well, I want to say thanks again for joining us. And, you know, I was, once again, to reiterate, we kept tossing your day around to which day you would join us. And every time you came back with a no problem, that'll work great. And you've been really, really flexible. And I just want to say I appreciate that for would join us. And every time you came back with a no problem, that'll work great. And you've been really, really flexible. And I just want to say I appreciate that for joining with us.
Starting point is 00:52:29 Thank you, Andrew. I'm happy to be here. I'm always excited to talk about Rethink and talk about open source and technology in general. So it's no problem at all. I'm happy to be here. Awesome. I also wanted to give another shout-out to our sponsors,
Starting point is 00:52:43 DigitalOcean and TopTal, for supporting the show. Head to DigitalOcean.com to set up your cloud server today, and make sure you use our promo code CHANGELOGSENTME, that's CHANGELOGSENTME, all caps, to get a $10 hosting credit. And if you want to do freelance with companies like Airbnb, Artsy, or IDEO, head to TopTal.com slash developer, and click join the best to see if you have what it takes to join Toptal's network of elite engineers. Again, that URL is toptal.com slash developer. And that's it for this week. Thanks again to Slava for joining us and also thanks
Starting point is 00:53:16 to the listeners for tuning in and for your support. If you haven't yet, subscribe to the Changelog Weekly. It's our weekly email where we share everything that hits our open source radar. You can subscribe at thechangelog.com weekly so for now let's say goodbye We'll see you next time.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.