The Changelog: Software Development, Open Source - 10gen and MongoDB (Interview)

Starting point is 00:00:00 Hello and welcome to 007 of the Change Law, that's 007, I think you know who I am, my name is Adam Stachowiak, you can check me out on Twitter at Adam Stach. I got my boy, Wynn, here with me. We had an awesome interview. It was a lot of fun. It was a lot of fun. And I'm Wynn Nutherland, for those that don't know. You can reach me on Twitter at penguin, P-E-N-G-W-Y-N-N.

Starting point is 00:00:36 And this week, we talked to Mike Deeroff from MongoDB, from TenGen, the company behind MongoDB, which is a lot of fun. It's a very cool database application. Have you used this yet? Just in some stuff with you, that's pretty much it. But beyond just this, I think you might have some sort of fetish with James Bond and 007 with this. And I also have a gem out there called Octopussy.

Starting point is 00:00:59 Octopussy, that's right. It's a GitHub gem that we've created. I like James Bond. I like the movies. The older ones, I think, are better than the more recent ones. But I couldn't resist doing the whole James Bond takeoff on 007, right? Anytime you have 007, you've got to represent. Our condolences to Stephen Bristol, because I remember, I think it was in GoldenEye, 007 kills 006.

Starting point is 00:01:26 Poor Steven. So we're going to be in, I guess, a little hiatus here for Christmas over the next week? Yep. Got some travel planned to go see some family, and I'm sure you've got some things going on yourself. You're going to the Great White North, right? Yeah, well, yeah, you'd almost call it that. I'm heading into Canada first.

Starting point is 00:01:44 I'm flying in, dropping into Toronto tomorrow, actually. Tomorrow I take off, tomorrow morning. And by, I guess, around 2.30 in the afternoon, Eastern Standard Time, since we're in Central here, I'll be in Toronto. I'll be picking up my car. I'll be picking up my beautiful daughter, and we'll be driving down to see some family in Pennsylvania. We'll hang out there for a couple weeks, and then back up to TO and back down to Houston. Good deal. Well, we've got a great interview this week. I think it was a lot of fun talking to Mike.

Starting point is 00:02:13 And without further ado, let's get to it. Hi, we're talking today with Mike Dural from Tenjin about MongoDB. Mike, what's your role at Tenjin? And give us a little background on the MongoDB project. Yeah, so Tenjin provides support and sponsors the development of MongoDB. And at Tenjin, my primary focus is working on the Ruby and Python drivers for MongoDB. So that's a little bit of background of what I do there. As far as MongoDB itself, for those folks that may not know, what exactly is MongoDB?

Starting point is 00:02:52 Yeah, so MongoDB is an open source, high performance, schema-free, document-oriented database. So there's a lot of buzzwords there, but I think the point is that there's recently been this trend towards using non-relational databases. Some people are referring to it as the NoSQL movement. And I think the reasons for that are that there are some shortcomings in terms of the traditional RDBMS, in terms of both its ability to scale out horizontally, and also in terms of flexibility for developers working within the relational paradigm. And so we've seen a bunch of different types of products that are trying to address this in the non-relational space. So there's things like key value stores, which have a pretty simplistic data model,

Starting point is 00:03:40 basically put and get on a single key. But that allows them to scale very well and very easily and also to offer pretty good performance. And I think with MongoDB, the goal is to sort of bridge the gap between those sort of key value stores, which have this simple data model, and something like an RDBMS, which has a much more complicated data model and is full of features. And so with MongoDB, we're sort of trying to maintain the scalability and performance of the key value stores and add some functionality more like what you'd see out of a relational

Starting point is 00:04:14 database. You know, I discovered Mongo early 2009. How old is the project? Yeah, so MongoDB actually comes out of this full stackstack cloud computing platform that we were working on at TenGen. And so originally, when I joined the company a couple years ago, we were working on this cloud computing platform, sort of like a Google App Engine, basically. There was an application server, a load balancer, and a database, which became MongoDB. And so that project was also open source,

Starting point is 00:04:51 and that was started in the end of 2007, I guess, or end of 2008, sorry. 2007. Sorry, end of 2007. And so we didn't ever see sort of, actually, in the end of 2008, we decided to stop focusing on this full-stack platform and start focusing on a much more narrow problem. And we decided that sort of the most interesting piece of technology we had built at the time was the database. So we split the database out from the rest of the project

Starting point is 00:05:43 and developed some drivers for all these different languages and released it as a standalone open source project. And the first release was in the beginning of February of this year. And so since then, we've seen a lot of traction. And yeah, so it looks like it was a good decision to make that move. But the project itself was started, like I said, in the end of 2007. So it's been around for about two years now and it's been used in production for almost as long as it's been around.

Starting point is 00:06:16 So it does have some time behind it. In my time and my exposure to the project, I'm amazed at how fast you guys turn out releases and especially bug fixes. How big is your team? So the team right now is actually growing. We do have open positions if people out there are interested. And for the past, for most of this year, we've been a pretty small team around four and you know recently we've we've we've grown so we're up to uh six full-time developers now and and hiring um and we've hired some some additional people as well besides developers um but yeah the team the team is growing rapidly and

Starting point is 00:07:01 uh and it's a great bunch of people so it's been fun working here. What kind of insight do you have behind the Series A and Series B rounds that you guys have recently secured? Yeah, so I'm not a business person. I'm a developer so I don't know how much my insight is worth but I think it is interesting to see that it's almost like the space is sort of growing up a little bit. We recently closed, as you mentioned, a Series B round, and a couple of other companies that are sort of related in the space have raised a couple rounds recently as well. I'd say it was $3.4 million in November. That's huge. Yeah. So, you know, for open source to start to collect that kind of money towards focusing on these high-performance type of products like you guys are doing, that's a pretty wild story. Yeah.

Starting point is 00:07:56 So, I mean, I think it's sort of a testament to the fact that – to where we see this space going. Like I said, we've seen some significant adoption over the past year and I think that we're going to see even more over the next couple months as people who maybe haven't heard of MongoDB start to learn about it

Starting point is 00:08:19 and get interested. And I think there's that this is technology that can be applied to a vast array of projects out there. So hopefully we'll continue to see it pick up in terms of usage. I'm just browsing the production deployments

Starting point is 00:08:35 page, and I had an update since I was last out there. I guess Discuss is the biggest name, maybe outside of EA. Any insight to how those guys are using Mongo? Yeah, so Discuss, the biggest name, maybe outside of EA. Any insight to how those guys are using Mongo? Yeah, so Discuss, I'm actually not too sure of how they're using it. I think that I talked with those guys back in maybe June at a Python meetup, and at that time they were using it for a URL shortening service, I think. So not, at that time it wasn't

Starting point is 00:09:06 their main, you know, where the comments are stored. But at the time I think they were talking about moving more stuff onto it. So I really am not sure how far along they are with that or what is actually running on it now. Some of the other big names on there are SourceForge is using it. They've been using it since May as well, and they've been serving up basically all of their project pages are stored entirely in MongoDB now. GitHub is also on there. They're using it for some internal stuff right now and looking at expanding what they're using it for. And EA. EA is using it for their Rupture site, which is, I guess, there's high score stuff and sort of community around their games. So yeah, we've seen some high profile sites pick it up recently as well.

Starting point is 00:10:00 You know, one thing that's, I guess, amazed me at all of the NoSQL databases, and I don't think we've named any of them. Maybe we can discuss those in a moment. Couch and some of the others being, I guess, the major players. But the common line between these seems to be JavaScript for the internal scripting language. Can you speak to why you guys chose JavaScript and what it's meant? Yeah, so it's sort of funny in our case. We chose JavaScript, and it sort of fell out of this cloud computing platform that I was talking about earlier. So this cloud computing platform was multi-language, but the first language we supported was server-side JavaScript. And the reason for that is that at the time we felt that JavaScript

Starting point is 00:10:47 is a language that most web developers already know, at least to some degree. And it's also a pretty nice language, and it's pretty easy to get started with. So we thought it made sense there. And so as part of that, the database also spoke JavaScript. So then when we pulled out MongoDB as its own standalone project, there was already a bunch of useful features that

Starting point is 00:11:11 were built on JavaScript, like the database shell, for example. So we have this administrative shell that comes with the distribution, and that's all JavaScript. So you can explore your database, but you can do so programmatically. So it's sort of nice. And so we had already had a lot of this stuff built, and we stuck with JavaScript. So right now there's an embedded SpiderMonkey interpreter in the database. We're thinking about possibly switching to V8.

Starting point is 00:11:42 But, yeah, I think JavaScript makes a lot of sense because, like I said, it sort of is this least common denominator for a lot of web developers. And it's a pretty nice language to work with. It's pretty easy to work with. Is there any support for JavaScript outside the shell? Yeah. So in addition to using it in the shell, like I said, there's an embedded JavaScript interpreter in the database. So there's a couple ways that that gets used. You can do what's called an eval, where you actually send JavaScript code, arbitrary JavaScript code, that gets executed on the database server itself. So that can be useful for doing some more complex operations without network turnaround in between, client-server interaction in between.

Starting point is 00:12:29 And there's also a where clause. So MongoDB has a nice query syntax with a bunch of interesting query operators, and it does have index support and all that sort of stuff. But if our query syntax doesn't quite express things the way you need to, you can use arbitrary JavaScript. So you can pass a where clause that will get evaluated against all of your documents

Starting point is 00:12:51 and decide which ones to be returned. So I guess in both of those cases, that would be passing JavaScript from another language binding like Ruby or Python. Any support for like a Node.js type of setup where you would call Mongo directly from a server-side JavaScript? Yeah, so there are some people who are working on a Node.js integration layer.

Starting point is 00:13:14 We actually, Elliot has pulled out some of the internal V8 code and made it into, from the shell, and made it into a standalone V8 driver. But it's a little bit tricky to integrate that with Node because Node.js expects everything to be asynchronous. So I think there's some people working on that. I'm not sure how far along that is.

Starting point is 00:13:37 But yeah, that's definitely an interesting way to go as well. And another server-side thing that depends on the JavaScript is MapReduce. So MongoDB has relatively recently added support for full map-reduce, and you express these map and reduce functions in JavaScript.

Starting point is 00:13:56 Right. Yeah, those are nice. Those are new in 1.1 or 1.2? They appeared sometime in the 1.1 cycle, probably 1.1.2 or so, but they are in 1.2 now, which 1.2 is the latest stable, which was released last week. So I'm in a conversation with Michael Bly on Twitter this afternoon around views in Mongo. I'm not sure if you saw that one. Any plans to store saved views in Mongo a la Couch's implementation?

Starting point is 00:14:28 Yeah, so that's an interesting point. So the way CouchDB works, you do queries in Couch through MapReduce views. And basically, in CouchDB, the MapReduce thing is custom index building, whereas in MongoDB, our MapReduce support is more for aggregation and that sort of thing. In its real time, right? Yeah, right. So in CouchDB, you specify a MapReduce function

Starting point is 00:14:55 to do your queries, pretty much. And so as you're inserting documents, that view is getting updated to maintain an index. Basically, it's a custom index. And so the equivalent thing in Mongo would be if we supported some sort of custom indexing. And I think that's probably on the roadmap. I don't know. There's a lot of things on the roadmap right now.

Starting point is 00:15:20 So one thing that we're pushing pretty heavily on is sharding. So we support auto-sharding now. It's in alpha. So the database supports full replication. That's stable. But the auto-sharding stuff is, you know, to allow for this sort of infinite horizontal scalability. That's in alpha right now.

Starting point is 00:15:40 So we're really pushing on getting that to be more stable. And there's a bunch of other things we're working on as well right now. Big things like concurrency, better support for concurrency, some durability stuff. So I'm not sure when we'd expect to see custom index building, but it's certainly a possibility at some point. And that may be a feature left to the ORM drivers out there just to be able to take those MapReduce functions and compile them down and save them as just for convenience sake so that the developer doesn't have to keep up with them.

Starting point is 00:16:14 Oh, well, you can save JavaScript to the server side and call it. So you can store JavaScript functions on the server side. I think the difference between that and something like CouchDB's views is that those views are updated on writes. So it's more like an index than a special type of query. So we'd really need to have something equivalent. We'd really need to support custom index building. And we found that in general, you can build indexes, you can specify indexes on compound indexes, indexes on embedded documents, and we have a pretty rich query language as well. And so queries in MongoDB are a little bit more traditional,

Starting point is 00:16:59 a little bit more like you're used to with an RDBMS. So they're dynamic queries. And like I said, you specify indexes manually. And I think we found that that resonates pretty well. So I don't think there's too, too much of a need for this sort of custom view thing, but it'll be a possibility further down the line, I think. You know, one of the interesting aspects of how you guys store data in Mongo is, I believe this is the correct pronunciation, Bison, B-S-O-N. Is that right? Yeah, so I've been saying it B-S-O-N, and around here we've been saying it B-S-O-N,

Starting point is 00:17:30 but I think that's probably open to interpretation. So B-S-O-N. So that's binary serialized object notation, is that right? Right. So B-S-O-N is, it stands more or less for binary JSON. So I'm not a linguist. I don't know if we're committing serious fouls there in terms of that abbreviation, but it stands for binary JSON. And so what BSON is, is this serialization format that we've defined and all of our drivers can

Starting point is 00:17:59 serialize to and from BSON. And it's pretty much a serialization of a superset of JSON. So it's JSON, plus we support some additional types, like a separate type for floating points than for integers, and a date type, and a regex type, both of which are very useful if you're building a database. And JSON doesn't have anything like those.

Starting point is 00:18:26 So it's slightly a superset of JSON, but it's a binary encoding. So it's lightweight, and there's some stuff in there to make it fast and easy for the database to traverse. So what happens is that the driver takes a document and encodes it to this BSON format and sends it to the database. And the cool thing is that that's already a format that the database understands. So it pretty much just takes that data and writes it right to disk. And that's one thing that allows MongoDB to be so fast. And then the database understands that format. So it's able to reach inside and do operations on embedded documents and build indexes and all that sort of good stuff.

Starting point is 00:19:05 Have you actually built anything with Mongo, or is it primarily an internal project that you're working on? So the stuff that I've been building has been primarily internal stuff. But, yeah, I've been eating my own dog food a little bit, and it's pretty nice. I think that people... So like I said, there's two reasons I think people are sort of jumping into these non-relational databases. And one is the promise of scalability, which is a big one.

Starting point is 00:19:35 But the other is flexibility. And I think that working with these as a developer, and for the people listening out there, you should go ahead and go to MongoDB and download it and go through the tutorial, because I think you'll find that in a lot of cases it can be a lot more flexible and fun to work with and easier to work with than a relational database. So there are more reasons to use them than just performance and scalability. You know, the flexibility also introduces, I wouldn't say problems,

Starting point is 00:20:04 but challenges. I've used Couch and used Mongo and discussing with colleagues. You really have to kind of rethink how you model the data in your application. Have you found the same? Yeah, so certainly you do. And I think that's both an advantage and a disadvantage. So one thing that's interesting about data in MongoDB is the notion of embedded documents.

Starting point is 00:20:29 So documents are what we call these objects that you're storing in the database, which are more or less JSON-like. So in Ruby, it's a hash. In Python, it's a dictionary. In JavaScript, it's a map. In JavaScript, it's an object, whatever it is. So it's not just a first-level thing, though. So in a relational database, if you were working on a blog, for example, you'd probably have a table for posts and a table for

Starting point is 00:20:56 comments. And when you wanted to get a post and its comments to display on a page, you'd do a join. And in something like MongoDB, where you can store embedded documents, one good way to represent that relationship would be to actually take those comment documents and actually embed them right within the post itself. And so that allows you to go ahead and get a post with all of its comments. And it's all coming from the same place, and it's all a single document. And so you're going to see significant performance increases by doing that versus doing a join. And in some cases, it can also be easier to work with, to use these embedded documents.

Starting point is 00:21:36 So it does create, I think it does create some, I don't think problems is the right word, but there's certainly some things you have to think about, which is when does it make sense to embed documents versus referencing other documents in a different collection and doing more like a join type thing. And there are certainly cases where each makes sense. So there are some different sets of things you need to think about in terms of designing your schema as it is, or as you might call it. You know, early on when I was working with Mongo,

Starting point is 00:22:11 I found myself developing, I guess, wider schemas than deep schemas based on whether or not I needed to return a particular type as a top-level object itself. But with MapReduce, you guys have kind of muddied the waters even more because now I get kind of the best of both worlds. Can you talk about how long it took to develop MapReduce and any challenges that you came across in developing that feature? Yeah, so it didn't take too long to have a basic implementation going. I don't think Elliot has been the one primarily working on the MapReduce stuff.

Starting point is 00:22:46 And it didn't take too long because we already had the JavaScript interpreter embedded, and we already had a mechanism for sending commands to the database and all that sort of stuff. So it was more, I think, coming up

Starting point is 00:23:02 with a model that we're going to use for MapReduce. And then there's been some, you know, making sure that things are performing. So MapReduce right as it is right now is probably more of an, it's, it's an offline thing. So, uh, you wouldn't be doing a MapReduce job as a simple query, you know, that you're using to generate a response to a page, like, instantaneously in real time. So the way it is right now, it would be more of, like, every couple minutes, do a MapReduce job, generate some results, and then use those results to respond to later queries. So that's been the model that we're working with now. And so I think some of the

Starting point is 00:23:46 difficulties are getting MapReduce right in a sharded environment. So one of the good things about MapReduce is that it's possible to do in a sharded environment versus something like group, which is a little bit more difficult to do. And so getting that right is certainly a problem. And then performance stuff has been something that we've been working on with that as well. Two of my favorite features of MongoDB regarding updates are upserts, which are really, really nice. We specify the key and then a hash of values,

Starting point is 00:24:17 and then we'll do one fire and forget update or insert. And then the other are the modifier operations, set, ink, push, or insert. And then the other are the modifier operations, set, ink, push, push all. How did those come about as far as features? Do you guys just develop to scratch your own itch, or how do features get, I guess, developed into the framework? Yeah, so OpsR and the update modifier. So I'll introduce those a little bit more

Starting point is 00:24:42 for people who might not be familiar with them, but MongoDB supports an update operation. And one option when you do an update is to do an upsert, which says if you can't find a document to update, then go ahead and create this new document instead. And like you said, that can be really nice for doing a fire and forget insert or update. And then the other thing that you mentioned are these atomic operators. So we support a bunch of different atomic operators for updates, like increment, set, append to an array, a bunch of different things. And those can be really nice too.

Starting point is 00:25:17 So for doing something like real-time analytics, if you have some document and you want to increment a counter, you can just send a single update operation. You don't need to go get the document, modify it, and save it back. And you can do that increment like that. And so those are very useful as well and allow for some good performance benefits. And those have been around for a while. I mean, we've been adding more modifiers as time goes on.

Starting point is 00:25:42 But those have been around for, I think, at least as long as I've been working on the project. So I'm not sure who came up with them or who to give credit to for them. But certainly MongoDB as a whole, the thought process behind it comes from the experiences that our founders have had with developing large infrastructure.

Starting point is 00:26:03 So our CEO, Dwight, was one of the co-founders of DoubleClick and worked on the ad-serving architecture there. And Elliot, who's our CTO, was a co-founder of ShopWiki and has done a ton of stuff there as well. So both of them have plenty of experience with developing large infrastructure. And so I think that part of MongoDB has been to sort of scratch what their issues were with developing that infrastructure. You know, one of the things that I really liked about using CouchDB was Futon, the built-in admin interface that it supports. What's the state of GUI tools for Mongo, and are you guys working on anything or just leaving it to the community?

Starting point is 00:26:46 Yeah, so that's a good question. I think that up until recently, we've sort of been hoping for somebody from the community to take charge of a project like that and head it up. So MongoDB does support some administrative tools like the shell, and we have a basic web console, which can be very useful for debugging. And when you run the database, that starts by default as well. But like you say, we don't have a nice sort of GUI tool that does all the things that you might want, let you inspect your database and add data and do all that sort of stuff. But I think our feeling now is that maybe we'll have to get a project like that started

Starting point is 00:27:31 and sort of put some momentum behind it and then hope that we get some community involvement that way. Because there's been a few projects from the community that have been pretty good attempts or pretty good steps in the right direction in terms of that, but I don't think there's anything that's been really solid and a really great UI. Especially once we get things like sharding out there, it'd be nice for an admin tool to support some of the sharding layouts

Starting point is 00:28:01 and that sort of stuff as well. I think it might end up being that we need to sort of put some momentum behind that and see where the community wants to take it afterwards. Would, I guess, a more RESTful interface on top of Mongo built into the server kind of facilitate that? I think it might. Part of the problem there is that if you're just using it over a REST layer, then you have to manage permissions and authentication and stuff that way as well. Like you said, there is a REST layer in the default Mongo server now,

Starting point is 00:28:46 but it's pretty simplistic, and I'm not sure it's quite ready for something like this to be built on top of it. And I think we think that going forward, the right model is to build a nice REST layer in one of the client languages, like Python or Ruby or PHP or whatever and talk to the database through underlying calls in the driver

Starting point is 00:29:09 and then implement the REST layer in one of these other languages rather than implementing it in C++. So I think that would probably be the model that we would recommend. And that might be a part of this admin project or the UI could just talk to one of the drivers directly. I think either way has its advantages and disadvantages. Mike, could you talk a minute about, I guess, the different languages that have bindings for MongoDB

Starting point is 00:29:39 and what sort of traction you're getting in each community? Yeah, sure. So I'm going to pull up the drivers page right now just to make sure that I don't miss any. But obviously we support Ruby and Python. That's what I work on for the most part. We have a PHP driver, a Perl driver, a Java driver, C++. Recently we have a standalone C driver that was recently released and that hasn't had too, too many eyes on it, so we're hoping to get some people from the community to start using that

Starting point is 00:30:16 and recommend directions to take with that. And we also have that JavaScript driver that I mentioned. So that's the ones that are sort of supported by 10gen. And all of those have seen a good amount of traction. I think Ruby has probably seen the most in terms of community interaction, but certainly PHP, Python, Java, and Ruby have all seen a ton of users and a ton of stuff. And actually, Perl has seen a good amount of usage as well. There are some people using the C++ driver, and hopefully we'll get some people using the C driver for things like web server extensions and that sort of stuff.

Starting point is 00:30:58 I have an Nginx module for MongoDB's GridFS that I wrote, and I'm hoping to port that to the C driver when I get a chance. And then we have a ton of community-supported drivers. So there's a C-sharp.net driver, ColdFusion, AirLang, Factor, F-sharp, Go, Groovy, PowerShell, and a couple of other ones as well. So there's been a lot of work from the community as well in terms of adding support for these different languages. Very cool.

Starting point is 00:31:30 Hey, something I know it's been a while since I've actually chatted in here. Wynn's been mostly driving this thing. That's because I'm just an excited fanboy. That's true, that's true. Something I'm curious of, it seems like, you know, Tenjen was developing this cloud computing platform, and then they spun it off into just being MongoDB-focused. As a company, though, just focusing on MongoDB,

Starting point is 00:31:53 how do you guys get the word out about new things that are happening with MongoDB, and how do you interact with the community? Yeah, so I think one way that has sort of dominated has been through Twitter. So a lot of the way that we sort of track what the community is talking about has been through Twitter searches for MongoDB, and that actually works very well. For those of you working on open source projects, that's a great way to get some feedback because people are out there talking about it, whether or not they're talking to you or not. So that's worked really well.

Starting point is 00:32:29 We also have a Google group that we use for doing support and that sort of stuff. So that gets a lot of traction. We have an IRC room on Freenode, Sharp MongoDB on Freenode, and there tends to be people in there at all hours of day and night. So for quick questions, that's a good way to go about getting them answered. But in terms of community, I mean, I think the keys have really been just paying attention to sort of these back channels, mainly Twitter, and then getting out there and talking about it. So we've also done, I think, a pretty good job of getting out to conferences.

Starting point is 00:33:10 And people like Wynn and others from the community have also done a good job of getting out there and talking about MongoDB at conferences and meetups and stuff. And I think that's been really good as well. I'm curious, though. I didn't hear github.com mention at all on that. Yeah, so all of the projects are hosted on GitHub, and that's been great, too. So that makes it really quite easy for people to contribute back to the projects.

Starting point is 00:33:39 So to contribute to any of the MongoDB projects, it's pretty much fork and pull request, and we'll take a look at your commit and merge it back into the mainline. And that's been really good as well. You get a lot of contributions that way, or has it been pretty much you guys focused? No, we've seen a good amount of community contributions. I think contributions to the core server have been probably mainly coming from within Tengen.

Starting point is 00:34:10 There's certainly been some people who've done things like packaging, DB and scripts for the server, that sort of stuff, and contributed those. But there hasn't been too, too many outside contributors who have been really getting into the nitty-gritty in terms of the server. But certainly on the drivers, we've had a ton of contributions from the community. It's been really great, actually. And not only on the drivers themselves, but also on additional tools built around them. So one example is in Ruby, there's this project called MongoMapper

Starting point is 00:34:43 that John Neumaker started, and that's been really great. That's basically like an object mapper that's built on top of the lower-level Ruby driver. And people seem to really like it. And so things like that, we've seen a ton of community development going on. Is there any equivalent to MongoMapper in the Python community? Yeah, so there's a couple, actually, that have been started. The big one that's been around for a while is MongoKit.

Starting point is 00:35:15 And these are listed, for those of you following along at home, if you go to the Python page, which is api.mongodb.org slash python, and you click on the tools link, there's a list of tools that have been built around the Python driver. And I think the big one up until now has been MongoKit, which is a similar type of thing, a framework that provides validations and that sort of stuff

Starting point is 00:35:40 on top of PyMongo, which is the Python driver. And another interesting one to look at was just announced in the past couple of weeks, and that sort of stuff on top of PyMongo, which is the Python driver. And another interesting one to look at was just announced in the past couple of weeks, and that's called Ming, and that was released by the SourceForge people, actually. So SourceForge was one of the really early adopters of MongoDB, and they developed this Python library as part of that, and so they've open sourced it. And I haven't gotten to play with it yet, but I've looked through the source and looked through the docs, and that looks really nice.

Starting point is 00:36:11 So it'll be interesting to see if people start to pick up on that going forward. You know, one of the questions that we had posed to the changelog for you, Mike, was any plans for full tech support in MongoDB? Yeah, so there's a Jira ticket open. We use Jira for our bug tracking. There's a bug ticket open right now for full tech search.

Starting point is 00:36:36 And I think the status of that now is still sort of gathering ideas from the community and seeing exactly what the right model is going forward. I think, well, one thing to note is that in terms of basic full-text search, MongoDB has this built-in feature called multi-key indexing. So if you have an array and you create an index on that array, that index will actually be keyed on each element of the array. So for doing things like getting all documents

Starting point is 00:37:09 that have a certain tag or something like that, you can make those queries really fast, and that's really nice. You can do some basic full-text search like that. I think that's actually how the Business Insider, which is a site that runs on MongoDB, does their search. But in terms of more general- purpose advanced full-text search, my guess is that the model will be something along the lines of having some basic support built into MongoDB for sort of pretty simplistic full-text search, and then making sure that integration with

Starting point is 00:37:41 tools like Sphinx or Lucene or whatever else is really nice and really easy. And like I said, there's a ticket open now where people are sort of going back and forth on what the right model is. But I imagine we'll see something like that. You mentioned earlier that you guys are hiring at MongoDB. What sort of skills would one need to join the team? Well, I think the best way, if people are interested,

Starting point is 00:38:08 I think the email address is jobs at 10gen.com. So if you're interested, you can send stuff that way. But I think the best, really the best way to impress us and to make an impact would be to look at

Starting point is 00:38:24 the code that's out there. Like I said, it's all on GitHub and it's easy to contribute to and find a bug or find a feature that you'd like to see and contribute. Make a fix or implement a feature and send us a poll request. And I think

Starting point is 00:38:40 that's probably the best way to show that you're actually interested and to find out if the job would work for you and for us to see if you would work for the job, I guess. The open source job interview. I like it. Right. Yeah, it's perfect. That's spot and ask what's on your open source radar? What open source projects out there other than the one that you're working on has got you most excited?

Starting point is 00:39:13 So I'm sort of a languages, I'm sort of really interested in languages. So some of these new JVM languages are sort of interesting to me, Scala, Clojure, et cetera. I tend to track the development of those. In terms of R space, there's a bunch of interesting projects that are going on in the NoSQL space. If you ask me, I think MongoDB is the most interesting.

Starting point is 00:39:39 But there's other projects too, like Cassandra, CouchDB, Redis, etc., that are all interesting and worth a look. But yeah, open source is moving fast, so there's only going to be more cool stuff in the future, I think. Well, it's been a wild ride in 2009. I think 2010 is just going to bode well for MongoDB adoption as other services I see cropping up, like MongoHQ and some others. So hopefully you guys will have continued success. Yeah, hopefully.

Starting point is 00:40:11 Well, that's been it. It's been a wild ride, and we thank you for joining us. Adam, you have any questions? No, just thanks for taking your time to have a good time with us on the show and answer some questions. I know that a lot of the stuff you talk about is going to benefit the open source community. And that's, uh,

Starting point is 00:40:26 that's the aim here. Yeah. Thanks guys. Uh, I think, I think it's great what you guys are doing with the show. So it was, uh,

Starting point is 00:40:32 quite an honor to come on and get to chat with you guys. Awesome. Thank you. And you know what, this is, I don't think we mentioned it since we're going to put it in the intro. This is episode 007. So that shows you how cool you are.

Starting point is 00:40:42 Oh yeah. Perfect. It's perfect. 007 baby. All right. how cool you are. That is perfect. That is perfect. 007, baby. All right. Thanks, Mike. Yep. Thanks, guys.

Starting point is 00:40:54 Thank you for listening to this edition of The Change Log. Be sure to tune in weekly for what's fresh and new in open source. Also, visit thechangelog.com to follow along, subscribe to the feed and more. Thank you for listening.

CODACE Plant Stand

The Changelog: Software Development, Open Source - 10gen and MongoDB (Interview)

Mike Dirolf joined the show to talk about how MongoDB came about, design decisions, and the future of this cool NoSQL server....

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.

Your Ad Here

CODACE Plant Stand

The Changelog: Software Development, Open Source - 10gen and MongoDB (Interview)

Mike Dirolf joined the show to talk about how MongoDB came about, design decisions, and the future of this cool NoSQL server....

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.