The Changelog: Software Development, Open Source - There and back again (Dgraph's tale) (Interview)

Episode Date: November 9, 2018

This week we talk with Manish Jain about Dgraph, graph databases, and licensing and re-licensing woes. Manish is the creator and founder Dgraph and we talked through all the details. We covered what a... graph database is, the uses of a graph database, and how and when to choose a graph database over a relational database. We also talked through the hard subject of licensing/re-licensing. In this case, Dgraph has had to change their license a few times to maintain their focus on adoption while respecting the core ideas around what open source really means to developers.

Transcript
Discussion (0)
Starting point is 00:00:00 Bandwidth for Changelog is provided by Fastly. Learn more at fastly.com. We move fast and fix things here at Changelog because of Rollbar. Check them out at rollbar.com and we're hosted on Linode servers. Head to linode.com slash changelog. This episode is brought to you by our friends at Rollbar. Check them out at rollbar.com slash changelog. Move fast and fix things like we do here at Changelog. Catch your errors before your users do with Rollbar.com slash changelog. Move fast and fix things like we do here at Changelog. Catch your errors before your users do with Rollbar. If you're not using Rollbar yet or you haven't tried it yet, they have a special offer for you. Go to Rollbar.com slash changelog. Sign up and integrate Rollbar to get $100 to donate to open source projects via Open Collective.
Starting point is 00:00:42 Once again, Rollbar.com slash changelog. Welcome back. You are listening to the Changelog, a podcast featuring the hackers, the leaders, and the innovators of software development. I'm Adam Stachowiak, Editor-in-Chief here at Changelog. Today, Jared and I are talking to Manish Jain about D-Graph, graph databases, licensing, and relicensing woes. Manish is the creator and founder of D-Graph. We talk through all the details, what a graph database is, the uses of a graph database,
Starting point is 00:01:23 and how and when to choose a graph database over a relational database. We also talked through the hard subject of licensing and relicensing. In this case, DGraph has had to change their license a few times to maintain their focus on adoption while respecting the core ideas of what open source really means. So we have a two-pronged episode, two for the price of one today, and the price of one is zero or free, so really, really lucking out. We're here to talk first about Dgraph, which is the world's most advanced graph database, according to Dgraph.io. And then we also are going to talk about some licensing and some re-licensing woes.
Starting point is 00:02:12 Some of the stuff that open source developers and popular projects have to go through, but are kind of the difficult, weedy, like, how do we do this? How do we re-license if we change our mind? And Manesh has done all that with Dgraph. It's gone through a few different iterations of licensing. And so he's here to tell us that story. So, Manesh, thanks for coming on the changelog. Thanks for having me, guys.
Starting point is 00:02:37 And we should probably give a shout out to Ping because this is an episode that started in Ping. If you've never heard of our Ping repo, it's on GitHubithub at the changelog slash ping hop in there uh give us your thoughts on what shows we should do this one was actually opened up by an epic transcriber horst rutter adam you know horst yes he has been uh faithfully transcribing our or not transcribing but fixing our transcripts improving our transcripts.
Starting point is 00:03:05 The unintelligible are missing. They are missing when it's unintelligible. Horace, I was going there and corrected. He has done a ton and we appreciate that. And he was interested in hearing about some of the decisions and some of the process of how you change your license from one to another. And then a follow up to that was Vespertilian. Oh, that was bad. Vespertilian, a.k.a. Cameron,
Starting point is 00:03:33 which is probably the real name, who pointed us at DGraph as a user of DGraph and one who had watched the Common Clause license and the Apache 2.0 and the AGPL and all of this over the last, I don't know, six to eight months happening over at DGraph. He said that this would be a good project to kind of focus on that conversation. So thanks to those two for being a part of our community and thanks for suggesting this and getting us hooked up with Manesh.
Starting point is 00:04:02 So with that out of the way, Manesh, let's talk about DGraph. Tell us about this project, where it came from, how long it's been around, what you're up to with it. Sure. Maybe I can start with my own journey a bit before I get into Dgraph. So I used to work at Google in Mountain View, California for six and a half years working in the web search infrastructure team where we were dealing with real- time distributed systems. In fact, we built an incremental indexing system and launched that in 2010, got an OC award for it. And basically what that did was to reduce the latency that it
Starting point is 00:04:38 takes for a web page to go from the first time we crawl it to the first time a user sees on google.com from four days to a few hours so that was uh the biggest uh big table database installation at google at the time and uh you know it gave me a lot of uh sort of freedom to work on real-time distributed systems now back in 2010 after we launched this thing, I started looking around and seeing, hey, what else could I dig my teeth in? And turns out Google had acquired MetaWeb, which is the company which brought Knowledge Graph to Google. So the Knowledge Graph that we hear these days came from MetaWeb.
Starting point is 00:05:23 And I started a couple of projects there. One of the projects was to unite all structured data at Google. That was all the, what we call one boxes. So that would be weather and events and movie short times and flights, et cetera, and the knowledge graph into a single graph indexing and serving system. And that was a big challenge. obviously. We didn't have a graph serving system at Google. We had a web search index serving system, but not a graph one. And so along with a few other tech leads, one was in India, one was in San Francisco, and I was in Mountain View. We started this project to build something which would be able to do arbitrary depth
Starting point is 00:06:09 joins and would do traversals and do them in sub-second latency. In fact, we had a limit on how much latency it can have because if the system does not respond to a web search request internally, that search would just move on and would not surface anything interesting from the knowledge graph. So I was involved in that. And while building that, we obviously put together all the research that Google had done at that time. And I got to learn a lot. So I left Google in 2013, moved from the US to Australia, had some family reasons to move. And around 2015, I remember being involved in a freelancing gig where this person is like, hey, can we use a graph database? And I was like, well,
Starting point is 00:07:04 the existing graph databases, they are not that good. They don't scale pretty well. They have issues with consistency. And in general, I just never considered primary databases. And that's what triggered me to say, hey, maybe we should build something which would be like that. I looked around. The biggest one was Neo4j, which is a single server database. In fact, the most popular one in the market, but yeah, limited by data corruption issues and performance issues. And then there were some others which were not databases, but more like graph layers. You would think of Titan DB, Datastacks DSE Graph, Janus Graph, which are built on top of other distributed databases. So you put HBase below it, or you put Cassandra.
Starting point is 00:07:57 And then you put a layer above it again suffers from performance. You need to run multiple systems. So DGraph really started as a way by which we could have a native graph database, which could also scale horizontally and perform with a pretty tight latency. And I used a lot of concepts that I learned back at Google. On top of that, while we were building it, we realized if you were to build a database, which has to be a primary database for big companies, it must support transactions. It must support synchronous replication.
Starting point is 00:08:34 It must provide linearizable reads. Because when you build these things into the database, applications have it a lot easier. They don't need to worry about, hey, whether I'm hitting the master or the replica, they don't need to worry about hey whether I'm hitting the master or the replica they don't have to worry about any of that they just hit any of the servers in the cluster and they are guaranteed to get the freshest response back so so that those were the ideals that we built D graph for it started and launched 0.1 in December 2015. We went on to raise $3 million over the course of two years.
Starting point is 00:09:12 Launched 1.0 in December 2017. And now we are in a place where Dgraph is close to being used in production at a few big companies. And obviously we have a huge open source community. Very cool. Well, you mentioned Neo4j just in the news yesterday,
Starting point is 00:09:29 I believe they raised a Series E, the company behind Neo4j, $80 million Series E. So definitely investment interest in this space. And Neo4j been around for quite some time. So does Dgraph, you said its advantage is that it's built for distributed from the ground up also potentially some of the technology or that's just the timing of dgraph in terms of it starting in 2015 can you give some of the underlying technology languages or tools that you're using in the open source software and speak to that for us so i'm a big
Starting point is 00:10:06 fan of go language um this was not when i was at google uh i was pretty much writing c++ but after i left um python just could never stick with me and the moment i got no but go i started trying it out and um back in 2015, I think CockroachDB, another database company in New York, they had raised a Series A. I saw their stack was Go, and that immediately excited me. So Dgraph is written purely in Go. We use gRPC for communication, both internal between the cluster and for external communication from a client to the cluster. We were initially using RocksDB as the embedded key value database to put our data in.
Starting point is 00:10:55 But then we realized that when you go from Go's user space to C, Go to C++, which is where RocksDB is written written in it just causes a lot of headache um go tools don't get to see the go memory profilers for example do not get to see what's happening in the sea land the go performance profilers do not get to see what's happening in ceil and either so uh at some point after you know much uh thought, we decided that we should just build a good sort of RocksDB alternative purely in Go. And we looked at the alternatives at the time. One was BoltDB, which was a B plus tree based key value database. And there was uh obviously uh level db and stuff trox db was already a improvement over level db so for us uh that seemed like uh not a great choice ball db's
Starting point is 00:11:54 right performance um and and not just ball db but in general any b plus trees right performance is definitely always a bottleneck. So we wrote something which was based upon a new paper by University of Wisconsin-Madison, which what it did was, it took some of the negatives of LSM trees and spread it by putting the key, separating the values from the keys. So the values go into a log
Starting point is 00:12:24 and the keys go into the LSM tree. And we based our main design upon that. And it took us a while to really get it right because the paper didn't talk about all the nuances involved with having a separate value log. So that's something that we have been sort of perfecting over time. But the end result was that the performance of this thing called Badger,
Starting point is 00:12:50 it basically outperforms RocksDB on a lot of use cases. It works out pretty well for us. So we use Badger as the underlying embedded key value database. Very cool. One thing you mentioned earlier is you said that many people were using graph databases not for their primary data store, but as perhaps a secondary data store.
Starting point is 00:13:12 Maybe they put their, not their relational, but their social network style data in the graph database, but maybe they have a more traditional relational database management system for their primary tables. Can you give a high level decision?
Starting point is 00:13:27 Of course, once you decide I need a graph DB, now you have to graph database. You may say, OK, DGraph or Neo4j or perhaps a proprietary option. But what about even like, do I need a graph database versus a Postgres or a MySQL? Help people with that decision. Is there a pretty simple flow you can go through in your mind to decide, is this the data store for me, especially if you're going to pick it as a primary? That is a tough question for a lot of people. MySQL and Postgres have been around for such a long time.
Starting point is 00:14:06 Literally, SQL is being taught in schools and colleges all over the world. It's hard to convince somebody who is, let's say, a Postgres fan or a SQL fan to switch to something else. So I try not to engage directly or try to convince anybody to use else. So I try not to, you know, engage, um, sort of directly or try to convince anybody to use graphs. Um, what happens for us is, um, as the companies, so, so Postgres and MySQL are very popular with, with very young startups, but as they progress and they start to realize the limits of these systems, the limitations of their join power, the limitations of not being able to do recursive queries across tables and stuff.
Starting point is 00:14:51 All that code that goes into the application because database is so simple, as the company size grows, they start to hit those limitations. And at some point, a new team, a new project would be like, hey, it would be great if we had a graph database for this. It would really save a lot of work. Or hey, we tried this with SQL.
Starting point is 00:15:13 It's just too slow for our users. Maybe we should switch over to a graph database. So that's what happened. Then they started looking into a graph database. Obviously, they come across some of the popular choices um they try them out uh and then accidentally almost they get to hear about dgraph and that sticks so it's kind of one of these things where you'll know it if you need it because you'll have grown past certain needs potentially in your traditional
Starting point is 00:15:47 relational database and so that makes it actually a pretty nice space for an enterprise offering because your your your community is enterprise it's it's larger it's companies that have grown at least data wise to a size where they feel the need already. And so they're usually, they're probably a certain level of successful, at least hopefully. Or they're even doing special things with their data, more so than simply like, hey, we have a web app with basic CRUD, like MySQL, Postgres, those databases are perfect and great and fine for those types of apps.
Starting point is 00:16:21 But once you're past a certain point, you want to actually make more sense or get insights or analytics that really draw relations or different things from a database you may want to experiment and even use in addition to versus simply replacing. Yeah, absolutely.
Starting point is 00:16:40 So we do see some of these medium to big size companies. I think they are the most, I would say, active users of graph technologies. Even if you were to look at yesterday's news article about Neo4j getting the $80 million, they said that 20 out of 27 or 24 top banks in the US are using Neo4j. So it gives you some idea for how popular graphs are with enterprises. But you know, I do want to say one thing though, I feel, and we've actually done some work on that as well, even for some basic stuff, which you think typically think is squarely in the SQL space,
Starting point is 00:17:27 for example, building a question answering website, right? You have Quora, you have Stack Overflow, and you have like a bunch of these things, which have even Facebook, right? You have a post, you have comments on the post, you have likes on those comments, you have comments on comments, likes on those and so on and so forth. It's a very recursive sort of, you know, if you need to show a post, it's a recursive traversal. And that's exactly what graphs are great at. So what we did, for example, I think it was last year, was we, so Stack Overflow does this data dumps
Starting point is 00:18:02 that you can just pick up. It's an XML file. You can pick it up and just you can do whatever with it. So we picked that up and we loaded that into Dgraph. And we thought, hey, let's build the three most popular pages on Stack Overflow. One of them is the questions page. One of them is the home page. And there was one more page.
Starting point is 00:18:24 I forgot which one was it. And we just built those three pages. The amount of backend code that we needed was not that much because the query language, in this case of Dgraph, was sufficiently complex that it could just retrieve all the data for you, give it to you in a nice JSON. So all the work that needs to be done is just in the front end in rendering it, as opposed to in the backend where you pick up the question from the questions table, then pick up the answer from the answers table, pick up the likes and upwards from another table, and then try to join them together. You don't have to do any of that code it just happens automatically at the database level so i feel graphs can be used in a lot more broad way and they are a lot
Starting point is 00:19:10 nicer and faster for developers but that level of developer awareness that takes some time to build yeah that's a great idea for getting people to see how easy it is to build these recursive you know data fetches is to use something we all are very well aware of, which I don't know. Does it, you think developers know what stack overflow looks like? Perhaps. Also, you have a cool, another one on your homepage is play with 21 million facts from the free base
Starting point is 00:19:40 film data loaded up on a demo D graph instance. So you can just hop in there and see what different queries will look like and speaking a little bit to the timing of dgraph in terms of its competitive advantage over potential other graph databases is its query language is inspired by graphql which just couldn't have been inspired by graphql if it was 10 years ago so this is something that's very familiar, at least to front-end web developers. Can you talk about that? Yeah, I think GraphQL was, I would say, a great choice for us. It was very early on, in fact.
Starting point is 00:20:17 I think Facebook just had released GraphQL or something, and I remember looking at it, I'm like, hi, this looks like it just fits. Because when you go to a graph database, you want to get a subgraph back, you don't want to get a list back. Because if you get a list back, it's hard to know what was connected to what you cannot create a subgraph from a list. But you can take a subgraph and convert it to a list. And most of the other graph queries, Cypher and Gremlin, they are all returning lists of things back just like SQL does. So they lose some of that relationship data between things. And I looked at GraphQL and I was like, hmm, this is very interesting. In fact, I went back and checked with the CTO at MetaWeb who was at Google and showed it to him. He
Starting point is 00:21:12 was like, what do you think about this? And he said that it was very close to MetaWeb's own query language called MeQL, which was popular at the time. And so we decided, hey, let's use this as a query language. Now, the thing about GraphQL that we did not realize at the time was that it was really a replacement for REST APIs. And it was still designed keeping SQL in mind. The types in the GraphQL, I really think of them as SQL tables and the connections are similar. So we started to quickly hit some of the seams of GraphQL where we felt like we could not
Starting point is 00:21:55 really work with it if you want to build a graph database. So we had to then start to modify the spec um outside basically go outside of the spec and modify we simplified some we added some features like shortest path um we we added like filters in a in a simple way um and so and so forth uh and we still don't have a good name for this language we just call it graphql uh plus minus because we added some and we removed some it's first of our areas i was looking at that plus minus i thought maybe it was like a typo there because it looks like it was like accidentally in the link but right yeah that's that's a good name just uh plus some and minus some is plus plus still being used often
Starting point is 00:22:42 i kind of feel like it had its heyday. You know what I mean? I remember it from maybe 10 years ago, maybe even eight. I don't know. It doesn't seem like it was a couple. Is it still kind of a current known naming pattern? It's like a hacker thing, something plus plus? I think so. I think it's still out there.
Starting point is 00:23:01 I mean, people still, hackers are still typing it on the daily. Plus mine is brand new. We still use C++ and we were like, hey, is it GraphQL++? Right. But then I was like, well, it doesn't do everything
Starting point is 00:23:16 that GraphQL does. So it would be wrong to call it plus plus. It has to be plus minus. I dig it. So is that something that potentially those pluses or maybe the minuses could work their way back into graphql or is it just because working with graph databases there are things that just don't make sense for the broader web api graphql honestly that's a question on my mind almost every other day we we do see how popular GraphQL has become. In fact, it has become way
Starting point is 00:23:49 more popular than I anticipated. And there's an open ticket on Dgraph to support the official GraphQL spec. So it will play well with all the tooling out there. Apollo raised a bunch of money and Apollo is being used quite a lot in the GraphQL community. And we would like GraphQL to play well with all of those tools. So I think there's definitely something that we want to do is to support the official one. It probably takes a deeper discussion with the authors of GraphQL to see if they would like to integrate some of the modifications that we have done back into the spec. That's probably a harder discussion though. This episode is brought to you by DigitalOcean.
Starting point is 00:24:51 DigitalOcean is a cloud computing platform built with simplicity at the forefront. So managing infrastructure is easy. Whether you're a business running one single virtual machine or 10,000, DigitalOcean gets out of your way so teams can build, deploy, and scale cloud apps faster and more efficiently. Join the ranks of Docker, GitLab, Slack, HashiCorp, WeWork, Fastly, and more. Enjoy simple, predictable pricing. Sign up, deploy your app in seconds. Head to do.co slash changelog, and our listeners get a free $100 credit to spend in your first 60 days. Try it free.
Starting point is 00:25:27 Once again, head to do.co. Change log. So I'm going to help us understand some of the killer use cases, the sweet spot for graph databases. Similar to the idea of, you know, I think Mongo came out really talking about document based data stores and saying, if you're running an e-commerce site such as Magento, look at all these crazy joins on these different tables just to pull together a shopping cart. Really, that's a document,
Starting point is 00:26:13 so let's have a document database. And that was, I think, a compelling use case or at least selling point for that style data store. When I think of graph databases, I think think of social networks but that's just me from your perspective what's the sweet spot for these types of data stores yeah so there are certain use cases where people immediately think about using a graph database and they're there i think there's a sweet spot there um the the top one which comes to my mind is uh real-time recommendations uh these days companies have a lot of data around their users.
Starting point is 00:26:48 For example, you have credit cards or you have rewards cards from even big airlines or hotels or e-commerce companies around what users have purchased in the past and what other people have purchased. Amazon comes to mind. Amazon runs an amazing recommendation system. That's one of probably the most demanded features or most demanded use cases from a graph database. Then we have seen particularly medium to big companies go really hard after real-time fraud detection. It's very easy in a graph to find circles where they can identify if it's the same person or entity trying to create multiple cards or multiple money sources and figure out if it's a ring and sort of cache that we have also seen identity reconciliation you know people trying to
Starting point is 00:27:55 figure out if you're the same person in in let's say Instagram in Facebook in Twitter so and so forth so those kind of reconciliations, now you can apply them to other data sources. That's actually a good use for graphs. And the last one, this is actually the most relevant to particularly big companies. They have a lot of data silos. They have a lot of different databases or even just different database instances where they actually grab data and just one silo never talks to the other one. And what they then do is they unify all of the data from these different silos into a graph database.
Starting point is 00:28:40 Because remember graph databases do not have any boundaries. The idea of graphs is that you just put all the data into one place, and it can traverse from any node in the graph to any other node, however far away it might be. There's no tables. There's no different databases. It's just one graph. And so that actually, that concept really helps when you want to query across multiple
Starting point is 00:29:08 data sources. And the fifth one, which is really jumping up these days, is around artificial intelligence. There was just a paper, I think, by Google, I think I was reading like last week, around how they realized that they have reached the limits and they need to use a graph database to be able to do better AI. And they even launched a small graph library that you can use to integrate with TensorFlow. And in fact, just reading it from yesterday's post around Neo4j funding, AI was the top thing that they're going to go after with the new money that they're getting.
Starting point is 00:29:51 So, you know, I think for AI graphs are a no brainer. If you had to give somebody a graph database 101, would you just say it's like a string that threads different data points and that strings, as you kind of said, there can infinitely scale. data points and that strings as you kind of said there can infinitely scale what would be if you had to give a you know a 101 of what a graph database is how long might that be and could you do it here absolutely I think it's I think graphs are probably the simplest things to think about really you know people think about sequel tables you have a row and you have some columns think of
Starting point is 00:30:25 graph as as three columns there you have subject a predicate and an object and if you put together a whole bunch of these things you get a graph so a subject is essentially think of it as an entity a predicate is the relationship and the object is either another entity or a value. So subject could be, let's say, me and my relationship might be lives in and the object might be San Francisco. Right. Right. Or it could be me. Name is Manish.
Starting point is 00:31:01 And that's sort of like a property. Right. name is Manish and that's sort of like a property. So you just put together a whole bunch of these, what we call facts or triples and you get a graph. And then other people who live in San Francisco would have similar facts and then you could run a graph query around, hey, tell me all the people who live in San Francisco and who eat sushi. So you do like a bit of, you pick up all the people who live in San Francisco and who eat sushi right so you do like a bit of you pick up all the people who live in San Francisco you intersect with people in the world who eat sushi which are completely different facts you didn't you didn't store you didn't create them as this person you know lives in San Francisco and eats sushi this is something that we're doing on the
Starting point is 00:31:41 fly so you pick up all the people in San Francisco pick up all the people in the world who eat sushi this is something that we're doing on the fly so you pick up all the people in san francisco pick up all the people in the world who eat sushi you intersect the two lists now you get people in san francisco who eat sushi now you can take that result and say give me all the people uh intersect with all the people who have been to japan right uh you pick up another list of people who have been to japan intersected with this now you now you get people who live in san francisco who eat sushi and you have been to japan right so the power of graphs is is really in these joins that you can do based upon coming from just very simple facts that makes sense too why in part one you mentioned not having to rewrite a bunch of code you know when you when you explain it in the 101 that that uh these things naturally appear based on the way you query the data versus traditional ways you
Starting point is 00:32:32 might have done it with my c-core postgres relational databases in this case that the graph of these points become more and more clear as you intersect or cross over the data because it's just naturally how it works and you're saving one time but also insights that were just so much harder to get to in traditional ways or other database ways that's that's absolutely true and uh you know i was playing with the movie database uh the the freebies movie set that we have also on our website and uh one of the interesting things that you can look at the data all you want and you never really find these tidbits,
Starting point is 00:33:11 but I put it into Dgraph and run some queries and turns out that the directors of Indiana Jones movies were also in the movie, right? I mean, Steven Spielberg was in one of the Indiana Jones movies as one of the also in the movie. Steven Spielberg was in one of the Indiana Jones movies as one of the characters in the movie. Some of these interesting things, they just become really obvious when you put them in a graph.
Starting point is 00:33:34 That's interesting. You add that, the built-in ACID transactions, which gives you a lot of safety. What are you missing then? Is everything better in graphb land or are the things that relational databases still do better today like what are the drawbacks um i used to say the drawback was that uh graph database dgraph was not great for financial transactions but then we added transactions and so now it's great for financial transactions, but then we added transactions.
Starting point is 00:34:07 And so now it's great for financial transactions. The other drawback that we still have is that it's not really great for flat data. And by flat data, I mean like time series data, right? You just have tons of things which are not really connections, but just more and more record points for the same thing. That kind of flat data is really just not done very well with graph databases.
Starting point is 00:34:32 You could use a graph for that, but it's better if you aggregate it somewhere else and bring in the results into a graph database than to try to do the aggregation or storage in the graph database. So basically in a world full of subjects that have many verbs with many like-minded objects, graph databases apply. Absolutely, I think. Any SQL table, which is essentially row and column and the data,
Starting point is 00:34:59 can be easily converted into graphs. And I think every time we have tried to switch from a SQL use case to a graph use case, just the amount of backend code that was there in play before reduces by at least half because the query language is so much more powerful. So to go further into Jared's question of like where you reach for a graph database over, say, Postgres or MySQL or relational, you said you used to not recommend it for transactional, but then you built it. Is there a checklist of things that is like you'd reach for Postgres over GraphDB or Dgraph or other graph databases that is like consistently being chiseled away where a graph database just went out? Sorry, could you repeat that question?
Starting point is 00:35:50 Basically meaning is there a list of things where you recommend, well, okay, if you're in these scenarios, don't use a graph database that you, you know, like you said before, you don't recommend it for transactional database and then you built transactions. So now you you take that back you maybe that was the list that was that was the list okay well there's one thing in the list or not well uh flat data right so if you don't have a lot of relationships then it's i mean you can i'm according to what you just said man as you can use them but they're not
Starting point is 00:36:20 necessarily optimized for that right you're not going to get the advantages necessarily right so i think the time series data is the is the one which i mentioned yeah it's just not great for graphs what about management and maintenance because that's when i so i am a postgres user and have been for for years and so i always look at these shiny different data stores and i think this sounds great when I'm in development. And then I have to actually put the thing into the world and run it and like back it up and make sure it's always up and so on and so forth. And then it's like now I have to relearn or learn a brand new set of maintenance or management skills that I already own on the Postgres side. So I think that's probably a barrier for a lot of people. What's the story with deploying this thing?
Starting point is 00:37:12 I know it's built and distributed, so it's going to shard horizontally for you, which sounds amazing, but also potentially scary. I don't know. Tell us about deployment. So deployment is where you lose customers i think not for dgraph in general but i'm just talking about in general this is where you can easily lose customers because devops guys are always hard to impress and we have spent a lot of time making sure that devops guys are happy with dgraph. So we already built in, as I said, it's distributed, so it can shop the data for you, but it is also replicated, and all of that is part of the open core.
Starting point is 00:37:53 So a bunch of deployments that we're doing right now, they use what we call a six-node cluster, where we have three replicas for Dgraph0 and three replicas for DGraph0 and three replicas for DGraphAlpha. Don't worry about the terminology here, but just understand that it's three replicas each. And DGraph uses a consensus algorithm called Raft to make sure that every data that you put into DGraph, it reaches a quorum and gets replicated across majority of these replicas before the acknowledgement is sent back to the user. So in case one of the servers crash, nothing happens. Your queries would keep on running, your data will keep on mutating, everything will just be fine. The DevOps guy would get a notification they can
Starting point is 00:38:45 either swap the machine or the machine just if you're using kubernetes the machine just comes back up automatically and your users don't even see it so it becomes really easy as a devops person to just run the graph and and keep everything uh happy And one more thing that happens at the developer level is that, as I said before, that sometimes with Postgres, for example, or any database which has eventual consistency in the replication system, they will, let's say, create a new account into the master.
Starting point is 00:39:24 And then they want to read this new user's account. And they end up going to a replica. And the replica still doesn't have that new record. So it will show, hey, account not found, which is just bad experience for a user. So there's a lot of systems built on top, or you have to build it yourself to make sure that if you're doing a read after write then the read goes back to the master which basically means your
Starting point is 00:39:51 replicas are not used as well or you have to do a bunch of application level tweaks and techniques to make it work now dgraph doesn't in d, you don't have to worry about any of that because it's all consistent. So even if a node crashes and is down for a long time, comes back up, immediately run a query, the query would block until the node has caught up to the rest of the cluster. And only once the data is up to date, would it reply back. And obviously, there is also ways by which you can time out and query another server. So all of these things are built in to make sure that you always get the freshest data, what we call linear disabled reads. So it tackles some of the common issues that from the both the dev side and also from the developer side.
Starting point is 00:40:48 So does it give up availability then in that case when the query blocks until it's consistent? So you're losing availability? Yeah. So in the CAP theorem, it goes for consistency in partitioning instead of availability but note that a lot of people mistake this cap theorem is not the same as high availability D graph is highly available but it still goes for CP instead of CA So I have some pretty awesome news to share. We are now partnered with Algolia. If you've ever searched Hacker News, Teespring, Medium, Twitch, or even Product Hunt,
Starting point is 00:41:39 then you've experienced the results of Algolia's search API. And as we expand our content, we knew that one day we'd have to either roll our own search solution on top of Postgres, or we could partner up with Algolia. And I'm happy to report that phase one of our search is now powered by Algolia. We're able to fine tune our indexing, gain insights from search patterns and analytics. We can create custom query rules to influence ranking behavior, as well as improve our search experience by adding synonyms and alternative corrections to queries. Sure, we could build search ourselves, but that would mean we would be busy doing that instead of shipping shows like you're listening to right now.
Starting point is 00:42:13 Huge thanks to our friends at Algolia for working with us. Check the show notes for a link to get started for free or learn more by heading to Algolia.com. And by GoCD. GoCD is an open source continuous delivery server built by thoughtworks check them out at gocd.org or on github at github.com go cd go cd provides continuous delivery out of the box with its built-in pipelines advanced traceability and value stream visualization with go cd you can easily model orchestrate and visualize complex workflows from end to end with no problem. They support Kubernetes and modern infrastructure with Elastic on-demand agents and cloud deployments.
Starting point is 00:42:52 To learn more about GoCD, visit gocd.org slash changelog. It's free to use, and they have professional support and enterprise add-ons available from ThoughtWorks. Once again, gocd.org slash changelog. So Manish, based on what you've shared with us so far, it sounds like the initial start for D-Graph as a company was 2013. Is that right? 2015. 2015. And 2015, you did a round, you raised $3.1 million, if I remember correctly.
Starting point is 00:43:37 Is that right? So we did a round in early 2016 and another round in sort of late 2017. Okay. Just a total of, I think, 2.9-ish million. So that means somebody trusts you with millions of dollars, basically, is what I'm trying to get at. You're establishing a company, you build a technology that's obviously proven itself, and somebody said,
Starting point is 00:43:58 yeah, here's money, I trust you, I trust what you're trying to build, and I think it makes sense to do so. And sometimes that means that you've licensed things appropriately. The project has been open core, open source. You can tell us more about the inner details of that and what that means. But somehow, someway, at some point, you chose the right license that allowed you to take on funding and build a company around it. Can you kind of walk us through what that is? Because I'm imagining there's just so many developers out there you know going to choose a license.com or is it.org
Starting point is 00:44:30 and they're they're getting enough information but still yet the wisdom is not there maybe so much the the definitions and details are but i feel like you can bring some some uh bloody knuckles and some wisdom here. So preach. Absolutely. So I think when I was starting Dgraph, and this is towards the end of 2015, I naturally went for open source. And it was not clear to me at that time how the business model would work. I think, in fact, a lot of people I talked to
Starting point is 00:45:04 around this idea of, hey, let's build a, I'm going to build a graph database and make it open source. And they were like, what you're putting all the IP out there, then what's left for you to make money off? And I think, so the business models around open source only became sort of clear to me slightly later, you know, and I think, I think a lot of people who are in the Valley probably are more aware of them, but definitely people in Australia were not. You get OpenCore and so on and so forth. Now the choice of licensing was kind of important to me. The behemoth in the graph space Neo4j was licensed as
Starting point is 00:45:47 AGPL and which is considered to be a copyleft license now what AGPL does is that if you were to touch any code and use this AGPL code as let's say library then you must open source your code also as AGPL it's sort of like a viral license if you touch it it affects you as well and we decided to go with a more permissive sort of Apache license. Now, a lot of people think the reason to open source something is around getting contributions from just developers all over the world. And I would say that is true,
Starting point is 00:46:42 but it is not the main benefit of open sourcing something. The biggest benefit of open sourcing software in my mind is around adoption. It's basically free marketing. You put your code as open source, anybody can see it, they feel more comfortable using it, they don't have to pay you a dime to use it, particularly in permissive licenses like Apache and BSD and MIT, etc. And these days, if you want to build an infrastructure company, I've noticed most startups and most tech-based companies, they really want the underlying technology to be open source. And they have multiple benefits of doing so.
Starting point is 00:47:32 When they have the code available to them, they already have the engineering talent. That talent can potentially go and modify the code base to improve it or modify it to their liking, etc. So the biggest thing I've seen around permissive licenses is adoption. And also you get contributions. But more importantly, I think over the journey of both DGraph and Badger that I've noticed is just the fact that people tell you, people give you feedback around issues that they run into. And that feedback I feel is more important sometimes than the actual core contributions that you get. So if you look at any open source repository,
Starting point is 00:48:17 you'll see, you know, 90% of the contributions are being done by the core team of three or four people and then there's a whole long tail of small contributions done by the bigger open source community um that's sort of like the the uh ugly truth or unknown truth about uh open source projects so really i think it's the feedback that that really makes uh that that improves the robustness of code. That's definitely an interesting take. I think most people would say that the contributions are the main reason, but I think that's a compelling statement that you have there with regard to the feedback versus actual code contributions. So you mentioned picking Apache versus
Starting point is 00:49:03 AGPL. Tell us about agpl maybe even contrast it with gpl which is a modification of to a certain degree and then why it was unattractive to you as a license so i think let me just start with explaining a bit about agpl itself and uh again this is best to my understanding with GPL. The idea is that, you know, it's the code is on the same place and the users are sort of linking to it as a library. And again, the virality of this whole GPL series comes into play. So if you link your code to GPL code, your code becomes, it's supposed to become GPL
Starting point is 00:49:45 as well. And you must make it open source on the GPL terms. Now, AGPL was then devised as a way by which it can tackle GPL running as a server and you interfacing with it over the network so i think the idea was that is to try to make the same virality affect you if you are running gpl code in the server and interfacing with it over a client that's my understanding as well the the gpl had a quote-unquote loophole because it was designed before the proliferation of services, websites, web servers, web services, where you're not delivering the end code, you're delivering a byproduct of the code. And so the AGPL was basically a fix for that loophole
Starting point is 00:50:36 to also make the server side, even if you don't deliver the code to the end user, still covered under the what you said the virality portion of the gpl so i think we're in agreement with that's that being the primary means and then for the aim and also i think it was effective in that regard absolutely and a lot of companies who still want to like hold on tightly to their code base tend to use AGPL as a sort of like a stop gap between going fully permissive open source
Starting point is 00:51:13 and while still trying to make sure that they have a more solid sort of business model around us. Now, we actually like Dgraph initially, we did also try to convert from Apache to AGPL. Now, when you do such a conversion, the first thing that you have to make sure of is that even before the project started, you have a good ICLA in place. Now, what's an ICLA? It's an
Starting point is 00:51:47 individual contributor license agreement, which means that any contribution that you take in into the open source project, the rights to that contribution are given back to the company running the open source. And we put that in place into Dgraph very early on, even then we were under Apache. So that means that in a way, the authors of that contribution, they hand the rights back to the company, which means the company can now change the licensing if need be.
Starting point is 00:52:20 We do not accept any contributions without the author signing ICLA. And it's just a standard practice I've noticed across not just Degra, but other open source companies as well. So that meant that we could change the licensing terms. And we did change it to AGPL. This was, I think, after MongoDB went IPO and MongoDB was using AGPL. And we felt maybe that's a better way for us to make sure that we have a good business
Starting point is 00:52:51 model. And once we had switched over to AGPL, we started hitting some of these things that we did not really understand before. Now to give you a bit of a history, Google explicitly bans AGPL code. Google's open source guy, Chris DeBona, in fact, sort of famously said that no AGPL code is useful or good. And we don't need to use it. They banned it. Now now when google goes and bans a license other companies follow right so facebook now facebook doesn't publish it openly and i don't really know but i know that much that in facebook and in apple and some of these big companies it is very hard or almost impossible to bring in
Starting point is 00:53:42 any agpl code which means uh and we actually had some of these things. So somebody wants to play with Dgraph at one of these big companies, they are unable to, because they can't even bring the code into the company at all. So we started realizing that because of this, people were having hard time adopting Dgraph. And again, this going back to my point about why would you choose open source over proprietary
Starting point is 00:54:11 license? It's largely for adoption. So we started seeing some of those issues. And we switched over from Apache to AGPL in March 2017, if I'm not wrong. And then towards the end of 2017, we decided, hey, we need a better solution here. AGPL seems too toxic to be used for Dgraph. And around that time, we started a discussion, somewhere after that, started discussion with Redis Labs folks. And, you know, together we came up with this thing called the Commons Close.
Starting point is 00:54:53 Now, the idea behind Commons Close is that you use a permissive license like Apache or in this case of Redis, they use BSD. And you add a close, which basically says that it basically prohibits some company or some person to sell the software as it is um and and why would we why would we go to agpl or why would go to commerce laws the reason is that um what's been happening lately and what none of the open source licenses have thought about is that big companies and these platform as a service or infrastructure as service etc companies most notably amazon and the chinese counterparts they would pick up an open source
Starting point is 00:55:40 project and they will run it as a service at a much cheaper price. And, you know, they, because they have the, they have the bandwidth and the engineering talent and the money for it, they would, they would run it as a service without contributing back to the open source project. And the main thing that, that we were going for is to avoid that. If you want to sell this thing to developers, you should at least contribute back, or you should help the company financially who is actually doing most of the contributions. So all of these licenses, AGPL or Commons Close, and now Mongo's SSPL, they are really around trying to dissuade big providers, service providers, from just ripping off an open source project.
Starting point is 00:56:36 It seems like this stems, based on your earlier points, is like your motivations, right? Your lens for which you're navigating this. And in your case, in particular with Dgraph, you know, you're optimizing as open source for adoption, not so much contributions, right? So you still want contributions. It's still important as part of the world
Starting point is 00:56:59 how open source works, but you're doing it based on adoption. So you've had to go through different licenses and you want to be, you want to have a liberal license with the clause that protects you so you can be a company and actually be viable and sustainable and there's some that say that that added clause basically makes you not open source what do you say to that yeah i think it's uh it's it's a very delicate trade-off between trying to choose a permissive license, which allows most users to just use the software while also dissuading a big company from coming in
Starting point is 00:57:33 and stealing your financial longevity in some sense. And if you put Commons Clause in place, it is true. The project is no longer open source because it is not, commons clause is not OSDI approved. Now Redis did a smart thing where they kept most of their code base under the BSD license, which is, it is still open source, but chose some of the modules that they had built and put them under common slows. So you can think of again as this open core model in some sense where most of your
Starting point is 00:58:13 code is open source, but then some of your code is not. And when we applied common slows to DGraph, we applied it fully, which means all of the core base was under common clause. And we were just not convinced that that was the right move. And this became very apparent when, again, Google went in and banned common clause as well. Now, I don't agree with the reasoning for Google to ban commons clause, which was that they feel that commons clause prohibits all commercial usages, which is completely wrong, really. Commons clause has this term called
Starting point is 00:58:58 if the code is substantially the same as the original code, then you can't sell it. Substantially is a term used very commonly in legal documents to basically indicate that if you tweak things a bit, it doesn't make it different. And that is just a way of saying that if largely you're selling the same thing, which is selling, let let's say redis modules in this case or selling dgraph then you would not be allowed to do that but you can build something on top of it for example you could build a question answering website you could build some other proprietary service on top of dgraph and you can sell that nobody stops you from doing that because
Starting point is 00:59:43 it is not substantially the same thing. So that was the idea behind Commons Close. I feel that the intentions were correct, but it was very hard to convey to people in the community and even Google in this case, what substantially meant. I think we went through many, many debates around explaining to people substantially does not mean this, substantially doesn't mean that. But I don't think it was, it's a fight that is easy to win. And then I think most, and we, so we in the end, after we realized that commerce laws was banned by Google, brings us back to the same place where AGPL is banned by Google.
Starting point is 01:00:30 And again, it affects adoption. And so we decided that we would switch back to Apache license. Now, there's an interesting sort of backdrop here. This is back in 2017, I think. CockroachDB, a database company in New York, they had come up with a license, which was essentially Apache plus enterprise license, what they call the Cockroach license. And what they did was,
Starting point is 01:00:56 instead of trying to close source their enterprise modules, they made it source visible, and they co-located it right next to their open source code base. So now what they have is they have the main source tree which is Apache licensed and then certain modules which are under the enterprise license are still with the code visible. And that was a very attractive uh sort of uh um sort of system and it was very well received by their community and um it's something that that i had in the back
Starting point is 01:01:36 of my mind for a while uh and i felt that uh digra was still sort of young enough and we have not yet we have started to build our enterprise features but i felt that we can easily switch over to that license um and uh make it work so what we have done now is that we have brought the graph back to apache without any clothes um and we're going to build enterprise modules which would be source visible this This system is also adopted, if I'm not wrong, by Elasticsearch. And it's just in general a very big win for liberal open source licenses in some sense. One more thing on top of this is that, you know, so this is our journey. That's where our journey sort of like kind of concludes.
Starting point is 01:02:23 But after we switched over to Apache license and enterprise license, MongoDB, which was previously AGPL, has grown even stricter and created a license called SSPL, which is server-side public license. Now, you know,
Starting point is 01:02:39 as AGPL was sort of stricter than GPL, SSPL is even stricter than AGPL. And what it says is that it tries to do the same thing as common clause in some sense, but does it a bit differently. So what they say is that if you run MongoDB as a service, then you must open source the code base, which helps you run MongoDB as a service. Again, it's a jab at the big service providers like amazon but it's just done in a different way where they probably have a better chance of getting it approved by osi but in my mind it's trying to achieve the same thing
Starting point is 01:03:22 as what redis was doing with common slots So there are plenty of people out there that are vehemently opposed to Commons Clause with regards to open source software. Because as you said, the OSI has not approved it and potentially will not approve it. And so there's Commons Clause licensed projects that claim to be open source. And even on the commonsclause.com, it says, is this open source? And it says no, because of that specific thing. That being said, do you believe the Commons Clause is in the spirit of open source? Because I'm on the fence there. It seems like freedom to modify, or to dispute it seems like a
Starting point is 01:04:06 bit anti-freedom but only for a small subset right it's like large corporations slash service providers you can't like we're like your freedom away but everybody else is still free i don't know what's this was something you've you've gone down the path you've you implemented it's kind of there and back again apache 2 maybe hgpl maybe commons clause you've had some pushback from your community you mentioned google banning it was the showstopper makes a lot of sense for adoption but all along the way it seems like your intentions are are good from what i can tell from this conversation so what do you think about the commons clause with regards to maybe it's not open source approved but do you do you believe
Starting point is 01:04:51 it's in the heart in the spirit of open source or or not um i absolutely believe uh it is i feel it is more in the spirit of open source than agpl is why is that that? The problem with AGPL being used at any medium to big company is that the moment you bring in AGPL, you have to be afraid about, hey, do I need to open source my own code base? And the problem with big companies is that they have this spaghetti code,
Starting point is 01:05:19 which is part proprietary, part ancient. It's very hard to say, okay, this piece I can break off and maybe open source this, but this piece I can keep proprietary. It's very hard to say that. And therefore, if you look at, you know, Google, for example, when they built Kubernetes or when they built gRPC, they didn't just open source their existing systems, Borg and Stubby. They had to rewrite them from scratch to make it open source.
Starting point is 01:05:48 And so AGPL puts this restriction upon these companies that if they use any AGPL code, they must open source because of virality. It's very prohibitive. Now, you bring in Commons Close plus Apache. Apache gives you anything uh basically you can do anything with the code base you don't have to open source it's not vital um and commerce law stops you from selling the the database in this case or or whatever it's the code base is from selling that particular code it works for big companies. Very cut and dry. It should work for, you know, let's say Google.
Starting point is 01:06:27 It should work for Facebook because they're not trying to sell Redis. They're not trying to sell Tcraft. They're just trying to use it. So I feel that it is more permissive than a GPL. The only companies it should really affect is if you are amazon trying to sell redis and all the particular modules that they that they put in the commons close then you're not able to sell that um which is i i feel it's fine because uh if they did not contribute um then maybe they shouldn't sell it and maybe they should let the contributors sell that.
Starting point is 01:07:06 So that's my take on it. For AGPL, I might have a somewhat analogous take on this, so to speak. It reminds me of CSS in a way. There's a cascade, an unwanted effect of using it, which is not always clear when you make changes or use a class or something like that. There's hidden things. So if I use a GPL, it may affect licenses or other feature software ever use in unwanted ways. And those unwanted ways provides ambiguity and it's not clear.
Starting point is 01:07:35 So in those reasons, I can see why it's not, you know, that's accurate. Then I can see why it's less likely. Whereas commas class is more like a razor blade. Like it's less likely whereas commas clause is more like a razor blade like it's a clear cut you know it's like i i can license my code permissively you know at one level and then clause in or add an addendum which is the point of it is here's one clause and it's only for this project and it doesn't affect any other things it touches. It's just like, if you're trying to resell my thing here, then that's just not possible. So I'm with you too, Jared.
Starting point is 01:08:10 Like, um, I'm going to just take like, seems like a great guy. I like him. Um, you know, he's still here. We haven't hung up on him yet. I can hear you. Right. You know, it's, this is where I think this needs to be a dialogue. And blog posts are great for getting points across.
Starting point is 01:08:28 I really feel like this needs some sort of like at large literal discussion because behind all software is human beings with often great intentions. Right. Manish isn't trying to hurt people. He just wants to be able to create awesome tech and have people use it. He said that here and he's trying to look for, and he and his team, and I'm sure his investors too, are trying to make sure that remains possible. And so I'm for that. Couldn't he just do that now that we're talking about him and he's not here anymore?
Starting point is 01:09:00 Couldn't he just do that by having closed source software? Like, isn't that just a way of going? I mean, if you want to do that, I'm just playing devil's advocate. Right. If you want to do that, obviously Manish, please feel free to respond. We're not actually talking about you.
Starting point is 01:09:11 Like you're not here. Couldn't you just close source? I mean, keep it proprietary and then you get to say hands off. You don't have these problems. So the thing about closed sources, again, it goes back to the reason about why do you want to open source in the first place? I think it's not about the contributions.
Starting point is 01:09:30 I mean, obviously, if you get contributions, I always thank people for contributions, thank them for the feedback. But the reason you make anything open source is adoption. You want to build something which a lot of companies, a lot of people are going to base their entire tech stack upon, in this case a database. They're going to trust you with their data. They want to be able to look at the code and make sure that the code base is good quality, it doesn't have any weird bugs, that they are able to modify the code. And what if the company dies tomorrow they should still be able to adopt that code base and then maybe run with it so i feel
Starting point is 01:10:12 open you can do that with proprietary license as well i mean you could ship them binaries plus source code as part of their license this isn't something they wouldn't be able to do that's the thing about proprietaries you can do whatever you want with it true the the other part of this equation is that when you make something proprietary the selling becomes a lot more work you need to have an entire sales team to uh to be able to go to individual companies and be like hey uh have you heard about this thing called you know dgraph and it's a proprietary thing. You can't see it online, but we can sell it to you for use. It is a lot harder pitch than, hey, developers, it's just free.
Starting point is 01:10:55 It's out there. You can try it. And if you don't like it, it's fine. If you like it, it's fine. You don't have to talk to us. And I think that's the beauty of open source is that it avoids having to have sales people running around um and uh you just become part of a developer conversation um anywhere in the world um nobody has to pay you to try it okay i think the problem though is is uh is being seen as masquerading as open source, but not really being open source.
Starting point is 01:11:25 It goes back to like original things. It's been said like the anti-commons clause or whatever. Just in terms of the spirit of open source. And sure, it is open. You can see it. I can contribute back if I want to. But I think what the community is really pushing back on is less like, hey,
Starting point is 01:11:45 that's a bad thing and more like, hey, this really isn't open source. So just don't call it open source and we'll be okay. Yeah. It's potentially a namespace conflict, right? As all things are, because, you know, the benefits of open source are immense, as you've said, Manish. And in many cases, especially in infrastructure style, like missions critical enterprise software in 2018, it's almost table stakes for success because people expect it. As you said, your sales processes are easier. People can like, the trust is immediately there. And yet when you add Commons clause to it, it's it's restricting in that regards.
Starting point is 01:12:28 And so now it's like, well, right there on commonsclause.com, this is not open source. It's something else. But then there's also like you want. It's almost like
Starting point is 01:12:35 I'm not saying this personally against you or against GGAF, but it's as if you want the benefits of open source without actually being open source.
Starting point is 01:12:42 And so maybe it needs to be like available source or readable source. Or, you so maybe it needs to be like available source or readable source, or, you know, it's almost like we just got to come with some more nomenclature, similar to how we have copy left, copy right, or free, you know, free and Libre versus open source.
Starting point is 01:12:56 We have all these different terms. Maybe there's a need for another term for this style. I don't know. What do you think about that, Manish? We were very careful when we switched to uh to commons close and apache that we removed all the references to open source and we swapped them with a liberal license because i think goes back into my uh my take on this is that it's more liberal than agpl and some of the other open source licenses.
Starting point is 01:13:26 So we had to switch it over to liberal license. It was a bit of a heartache for me because I've been an open source guy for a long time. Back in 2005, I wrote this thing called FlickrFS to build a file system on top of Flickr, which was the most popular image sharing site at the time. So I've been through and through open source guy and it was a bit of a hard decision, but I think something and, and just to be clarified, right? So we have moved away from common slows, but still, I would sort of defend the thought at the time was that it, it is probably not approved by the folks who are at MSI.
Starting point is 01:14:05 But in terms of the spirit of open source, I feel it was there. I think open source has to evolve to a point where people who are building open source can sustain themselves from what they are building as opposed to having to ask for donations or having to work for another company or having to be acquired by another company who is writing proprietary code. Every time I see some open source author having to go join a company and abandon their open source project which is very popular, it hurts me in some sense. It feels bad why shouldn't a person who is writing an amazing code not able to sustain themselves with the right intentions in their mind which is that hey open source obviously makes sense now we should we should there should be a deeper conversation about hey open source makes sense we all agree let's figure out how do we make money how do we make
Starting point is 01:15:06 sure that people who are in open source continue to make money and not just not just by working making open source their secondary project but having open source as their primary project and the source of income it definitely i mean hearing it from that perspective and then also knowing you know what a issue you have in open source back to flickr fs you know it makes you really consider this what you say is a necessary evolution of open source because based on what you just said there and how you said it was that the restriction the free and libre of open source is there, but at some point it does restrict potentially the sustainability by restricting its original creators and maintainers and community from being able to profit in certain ways from it
Starting point is 01:16:00 because of just sheer competition. You can't compete with Amazon. Maybe you can, maybe you really can, but I mean, like most, if Amazon launches a furniture line, well, Wayfair's stock goes down 6% in a day. I mean, that happens, right? So, you know, how can we expect little old you guys in your team to compete? And the restriction is that the the restriction comes back to the original core team and how you can sustain it financially without having to as you
Starting point is 01:16:34 said the examples were either ask for donations work for a company you're you're you're not liberated to operate a company around this source code in a way that is financially feasible if you have to face sheer weight of competition that is just so massive. Does that summarize somewhat of what you're trying to say there? Yeah, I think one thing we failed to mention is the three models of open source money making. I think I should quickly mention that. So it all ties together. You know, the first one is that you have this open core, which is under an open source license,
Starting point is 01:17:13 and you build proprietary features on top of it, and which you sell. That's the first one. And in Redis Labs' case, they basically try to make those modules sort of under a commerce clause so that they can sell those. The second is that you obviously support and training comes in, right?
Starting point is 01:17:33 Red Hat pioneered this a long time ago, and every open source company does open support and training. That's how they make money. The third one is that you run that software as a service. I think this is where Amazon story comes from the picture is for example, with Redis Labs, Amazon is probably running Redis behind their scenes for either Elastic Cash or I forget what it is. And they're literally just running that without paying anything back to Redis Labs. And Redis Labs in this case also has competing Redis as a service availability. And so both MongoDB and Redis and whoever is trying to use commerce close is trying to avoid a big company like Amazon.
Starting point is 01:18:19 And also, and now these days, their Chinese counterparts, I think, I forget the name of that, but they also are running Redis and Mongo behind their service providers and charging customers for it. So these companies are like, hey, we build this thing. We should be, you shouldn't be competing with us on this and we should be getting that money. Trying to stop the leeches, you know, stop leeching off people, you know, contribute back. And it makes you kind of mad, even though you don't, I totally get it. Right. I can see it from Amazon's side.
Starting point is 01:18:53 Right. But yeah, it's like the leech clause. There you go. The leech clause. Well, in a free world, people are free to do literally whatever they want and so i think in the spirit of open source they the idea has been for it to be a free world in in most or all senses of the word and i think when you restrict that recognize you know this leech scenario and the viability of it if we continue to allow that to happen and not have conversations that hear all sides then we
Starting point is 01:19:40 essentially allow the freedom of the software as good as it may be to stagnate and potentially, like you said, Jared, why not go into proprietary? And then we, we wouldn't even be talking to them. Nish,
Starting point is 01:19:54 you know, cause I mean like what would be the point, right? The open source is one that's in quotes. It's been said not just by us, but others, you know, so that's in quotes.
Starting point is 01:20:04 Well, I mean, it's official nadia eggball said it on request for commits many times and others agreed so that's why i see it's in quotes because it's been said not just here by us but others um yeah i mean i think it just needs more attention i'm not saying i agree or it's wrong or it's right i definitely see the pain points and we need some sort of evolution. I would like to add one thing. I think this is, it seems like a fresh thing.
Starting point is 01:20:30 It seems like a new thing that there's this attack on open source in some sense by this common clause, et cetera. But this was done before. If you look at GPL, the idea was behind GPL was that, hey, open source is important. We must do open source. In fact, we force you to do open source.
Starting point is 01:20:46 If you use our code, you must also open source your code, right? And then AGPL was evolution of that to say, hey, also on the network, same thing. And then the MongoDB SSPL is extending that to say, hey, if you run it as service, same thing, right? But think about what they are really doing practically, like what are the practical consequences of this is that in some sense, they're dissuading others who have not contributed from leeching off it in some sense, right?
Starting point is 01:21:17 And I think that's the direction that common slows and SSPL are all going. I recall we had Joseph Jax on the show of OSS Capital a couple of weeks back, and we asked him about Commons Clause because drafted by Heather Meeker, she's part of OSS Capital, so I'm sure you know her as well. And one thing that he said about it is he sees it as a stepping stone or as an effort in a specific direction, and that there are things that you said there's necessary evolution that has to happen for the greater open source community to continue to not strive so much, but thrive.
Starting point is 01:21:56 Right. And so I'm happy to have this conversation. I've learned a lot here, Manish. Thanks so much for coming on and just continuing to talk about these things. I know it's the kind of the nitty gritty licensing, not the most kind of a dry topic, but there's so many facets to these decisions and the implications of changing a license, picking a license. They're just massive. And we're definitely living in a brave new world where we're trying to figure this out together. And clearly a world with big numbers.
Starting point is 01:22:25 I mean, we've seen the headlines on changelog.com this week in the news feed. You know, billion-dollar valuations, multi, hundreds of millions of dollars invested into, you know, new companies or companies that are now unicorns. HashiCorp being an open-core model type company that's just taken on you know a new round of gigantic funding so there's clearly lots of money at play here you know and it's a new world for open source every single day so where we go from here i mean clearly we've had a great conversation that's led from not only dgraph as a tech and how it applies graph databases 101 on through how they can be used you're clearly super smart you've had to relicense you've been through a journey what do you suggest maybe the next step maybe not here
Starting point is 01:23:10 today because we're getting out of time but where what are some suggestions for you to continue this conversation in ways that are meaningful that can get to meaningful change do we have a conference about it do we do a sustained like unconference or just kind of a gathering how can this best be approached by the right people in ways that are not vicious and attacking but in ways that are meant to actually get to change what do you suggest i think it's a it's a tough conversation it's a conversation um of ideals versus practicality. It would require flexibility from the maintainers or the people in charge at OSI
Starting point is 01:23:54 to think through some of the practical considerations of running a open source company in today's environment. And I think it would definitely need a bigger dialogue. I feel, you know, if MongoDB's SSPL gets approved by OSI, that would be probably a great outcome of this and can easily see a bunch of other companies jumping onto that bandwagon. If it gets rejected,
Starting point is 01:24:23 then other open source companies are going to keep coming up with something new, which might work. There's definitely a need for a change here. I think that much is clear. Well, let's close the show with anything for you. I know you got lots of stuff happening. We've obviously covered quite a bit of ground,
Starting point is 01:24:41 but if people are following along with you, where do they go? What do they do? Do you have anything to announce here at the close of the show? Yeah, I do want to announce something. I think, you know, we are doing, we are solving really complex problems at Dgraph
Starting point is 01:24:54 and we also have Badger, both written purely in Go, both open source. If you want to help us and if you want to experience these challenging problems, come join us. You can go to httpsdcraft.io
Starting point is 01:25:09 and see some of the job openings we are looking for backend engineers. So apply. Manish, thank you so much for sharing not only your story, but your wisdom here. I know it's a tough subject and going on record because we do have an awesome transcript for this show. Thank you, Alexander, for being so awesome and all the contributors out there who help us to make them like like our friends who mentioned the top of the show, make our shows less unintelligible and more intelligible, so to speak. So, I mean, I know it's tough to be on record about very tough subjects and we just
Starting point is 01:25:45 appreciate your courage to share how you feel and the willingness to continue to go on the road, even when it's bumpy. And thank you for, thank you for sharing your time with us. Thanks for having me guys. All right. Thanks for tuning into this episode of the change log.
Starting point is 01:26:03 If you enjoyed this show, do us a favor. Go into iTunes or Apple Podcasts and leave us a rating or a review. Go into Overcast and favorite it. Tweet a link to it. Share it with a friend. And, of course, I want to thank our awesome sponsors and partners, Rollbar Digital Ocean, Algolia, and GoCD.
Starting point is 01:26:21 Also, thanks to FASI, our bandwidth partner. Head to FASI.com to learn more and we're able to move fast around here and fix things because of roll bar check them out at rollbar.com and we're hosted on leno cloud servers head to leno.com slash changelog support this show this episode was hosted by myself alice dicoviak and j Santo. Editing was by Tim Smith. And the mix and master also by Tim Smith. Music is by the ever awesome Breakmaster Cylinder. And if you want to hear more episodes like this, subscribe to our master feed at changelog.com slash master
Starting point is 01:26:58 or go into your podcast app and search for ChangeLog Master. You'll find it. Subscribe, get all of our shows, as well as some extras that only hit the master feed. Thanks for tuning in. We'll see you soon.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.