The Changelog: Software Development, Open Source - There and back again (Dgraph's tale) (Interview)
Episode Date: November 9, 2018This week we talk with Manish Jain about Dgraph, graph databases, and licensing and re-licensing woes. Manish is the creator and founder Dgraph and we talked through all the details. We covered what a... graph database is, the uses of a graph database, and how and when to choose a graph database over a relational database. We also talked through the hard subject of licensing/re-licensing. In this case, Dgraph has had to change their license a few times to maintain their focus on adoption while respecting the core ideas around what open source really means to developers.
Transcript
Discussion (0)
Bandwidth for Changelog is provided by Fastly. Learn more at fastly.com. We move fast and fix
things here at Changelog because of Rollbar. Check them out at rollbar.com and we're hosted
on Linode servers. Head to linode.com slash changelog. This episode is brought to you by
our friends at Rollbar. Check them out at rollbar.com slash changelog. Move fast and fix
things like we do here at Changelog. Catch your errors before your users do with Rollbar.com slash changelog. Move fast and fix things like we do here at Changelog. Catch your errors before your users do with Rollbar.
If you're not using Rollbar yet or you haven't tried it yet, they have a special offer for you.
Go to Rollbar.com slash changelog.
Sign up and integrate Rollbar to get $100 to donate to open source projects via Open Collective.
Once again, Rollbar.com slash changelog.
Welcome back.
You are listening to the Changelog, a podcast featuring the hackers, the leaders, and the
innovators of software development.
I'm Adam Stachowiak,
Editor-in-Chief here at Changelog. Today, Jared and I are talking to Manish Jain about D-Graph,
graph databases, licensing, and relicensing woes. Manish is the creator and founder of D-Graph. We
talk through all the details, what a graph database is, the uses of a graph database,
and how and when to choose a graph database over a relational database.
We also talked through the hard subject of licensing and relicensing.
In this case, DGraph has had to change their license a few times to maintain their focus on adoption
while respecting the core ideas of what open source really means.
So we have a two-pronged episode, two for the price of one today, and the price of one is zero or free, so really, really lucking out.
We're here to talk first about Dgraph, which is the world's most advanced graph database, according to
Dgraph.io.
And then we also are going to talk about some licensing and some re-licensing woes.
Some of the stuff that open source developers and popular projects have to go through, but
are kind of the difficult, weedy, like, how do we do this?
How do we re-license if we change our mind?
And Manesh has done all that with Dgraph.
It's gone through a few different iterations of licensing.
And so he's here to tell us that story.
So, Manesh, thanks for coming on the changelog.
Thanks for having me, guys.
And we should probably give a shout out to Ping
because this is an episode that started in Ping.
If you've never heard of our Ping repo,
it's on GitHubithub at the
changelog slash ping hop in there uh give us your thoughts on what shows we should do
this one was actually opened up by an epic transcriber horst rutter adam you know horst
yes he has been uh faithfully transcribing our or not transcribing but fixing our transcripts
improving our transcripts.
The unintelligible are missing.
They are missing when it's unintelligible.
Horace, I was going there and corrected.
He has done a ton and we appreciate that.
And he was interested in hearing about some of the decisions and some of the process of how you change your license from one to another.
And then a follow up to that was Vespertilian.
Oh, that was bad.
Vespertilian, a.k.a. Cameron,
which is probably the real name,
who pointed us at DGraph as a user of DGraph
and one who had watched the Common Clause license
and the Apache 2.0 and the AGPL and all of this over the last,
I don't know, six to eight months happening over at DGraph.
He said that this would be a good project to kind of focus on that conversation.
So thanks to those two for being a part of our community and thanks for suggesting this
and getting us hooked up with Manesh.
So with that out of the way, Manesh, let's talk about DGraph.
Tell us about this project, where it came from, how long it's been around,
what you're up to with it. Sure. Maybe I can start with my
own journey a bit before I get into Dgraph. So I
used to work at Google in Mountain View, California for
six and a half years working in the web search infrastructure team where
we were dealing with real- time distributed systems. In fact, we built an incremental indexing system and launched that
in 2010, got an OC award for it. And basically what that did was to reduce the latency that it
takes for a web page to go from the first time we crawl it to the first time a user sees on google.com
from four days to a few hours so that was uh the biggest uh big table database installation at
google at the time and uh you know it gave me a lot of uh sort of freedom to work on real-time
distributed systems now back in 2010 after we launched this thing, I started looking around and seeing, hey,
what else could I dig my teeth in?
And turns out Google had acquired MetaWeb, which is the company which brought Knowledge
Graph to Google.
So the Knowledge Graph that we hear these days came from MetaWeb.
And I started a couple of projects there.
One of the projects was to unite all structured data at Google.
That was all the, what we call one boxes.
So that would be weather and events and movie short times and flights, et cetera,
and the knowledge graph into a single graph indexing and serving system.
And that was a big challenge. obviously. We didn't have a graph serving system at Google. We had a web search index serving system,
but not a graph one. And so along with a few other tech leads, one was in India,
one was in San Francisco, and I was in Mountain View. We started this project to build something which would be able to do arbitrary depth
joins and would do traversals and do them in sub-second latency.
In fact, we had a limit on how much latency it can have because if the system does not
respond to a web search request
internally, that search would just move on and would not surface anything interesting from the
knowledge graph. So I was involved in that. And while building that, we obviously put together
all the research that Google had done at that time. And I got to learn a lot. So I left Google in 2013, moved from the US to Australia,
had some family reasons to move. And around 2015, I remember being involved in a freelancing
gig where this person is like, hey, can we use a graph database? And I was like, well,
the existing graph databases, they are not that good.
They don't scale pretty well. They have issues with consistency. And in general, I just never
considered primary databases. And that's what triggered me to say, hey, maybe we should build
something which would be like that. I looked around.
The biggest one was Neo4j, which is a single server database. In fact,
the most popular one in the market, but yeah, limited by data corruption issues and performance
issues. And then there were some others which were not databases, but more like graph layers. You would think of Titan DB, Datastacks DSE Graph, Janus Graph, which
are built on top of other distributed databases. So you put HBase below it, or you put Cassandra.
And then you put a layer above it again suffers from performance. You need to run multiple
systems.
So DGraph really started as a way by which we could have a native graph database, which
could also scale horizontally and perform with a pretty tight latency.
And I used a lot of concepts that I learned back at Google.
On top of that, while we were building it, we realized if you were to build a database,
which has to be a primary database for big companies, it must support transactions.
It must support synchronous replication.
It must provide linearizable reads.
Because when you build these things into the database, applications have it a lot easier.
They don't need to worry about, hey, whether I'm hitting the master or the replica, they don't need to worry about hey whether I'm hitting the master or the replica they don't have to worry about any of
that they just hit any of the servers in the cluster and they are guaranteed to
get the freshest response back so so that those were the ideals that we
built D graph for it started and launched 0.1 in December 2015.
We went on to raise $3 million
over the course of two years.
Launched 1.0 in December 2017.
And now we are in a place
where Dgraph is close to being used
in production at a few big companies.
And obviously we have a huge
open source community.
Very cool.
Well, you mentioned Neo4j just in the news yesterday,
I believe they raised a Series E,
the company behind Neo4j, $80 million Series E.
So definitely investment interest in this space.
And Neo4j been around for quite some time.
So does Dgraph, you said its advantage is that it's built for
distributed from the ground up also potentially some of the technology or that's just the timing
of dgraph in terms of it starting in 2015 can you give some of the underlying technology
languages or tools that you're using in the open source software and speak to that for us so i'm a big
fan of go language um this was not when i was at google uh i was pretty much writing c++
but after i left um python just could never stick with me and the moment i got no but go
i started trying it out and um back in 2015, I think CockroachDB, another
database company in New York, they had raised a Series A. I saw their stack was Go, and that
immediately excited me. So Dgraph is written purely in Go. We use gRPC for communication,
both internal between the cluster and for external communication from
a client to the cluster.
We were initially using RocksDB as the embedded key value database to put our data in.
But then we realized that when you go from Go's user space to C, Go to C++, which is
where RocksDB is written written in it just causes a
lot of headache um go tools don't get to see the go memory profilers for example do not get to see
what's happening in the sea land the go performance profilers do not get to see what's happening in
ceil and either so uh at some point after you know much uh thought, we decided that we should just build a good
sort of RocksDB alternative purely in Go. And we looked at the alternatives at the time. One was
BoltDB, which was a B plus tree based key value database. And there was uh obviously uh level db and stuff trox db was already a
improvement over level db so for us uh that seemed like uh not a great choice ball db's
right performance um and and not just ball db but in general any b plus trees
right performance is definitely always a bottleneck. So we wrote something which was based upon a new paper
by University of Wisconsin-Madison,
which what it did was,
it took some of the negatives of LSM trees
and spread it by putting the key,
separating the values from the keys.
So the values go into a log
and the keys go into the LSM tree.
And we based our main design upon that.
And it took us a while to really get it right
because the paper didn't talk about all the nuances
involved with having a separate value log.
So that's something that we have been
sort of perfecting over time.
But the end result was that the performance of this thing called Badger,
it basically outperforms RocksDB on a lot of use cases.
It works out pretty well for us.
So we use Badger as the underlying embedded key value database.
Very cool.
One thing you mentioned earlier is you said that many people
were using graph databases
not for their primary data store,
but as perhaps a secondary data store.
Maybe they put their,
not their relational,
but their social network style data
in the graph database,
but maybe they have a more traditional
relational database management system
for their primary tables.
Can you give a high level decision?
Of course, once you decide I need a graph DB, now you have to graph database.
You may say, OK, DGraph or Neo4j or perhaps a proprietary option.
But what about even like, do I need a graph database versus a Postgres or a MySQL?
Help people with that decision.
Is there a pretty simple flow you can go through in your mind to decide,
is this the data store for me, especially if you're going to pick it as a primary?
That is a tough question for a lot of people.
MySQL and Postgres have been around for such a long time.
Literally, SQL is being taught in schools and colleges all over the world.
It's hard to convince somebody who is, let's say, a Postgres fan or a SQL fan to switch to something else.
So I try not to engage directly or try to convince anybody to use else. So I try not to, you know, engage, um, sort of directly or try to convince
anybody to use graphs. Um, what happens for us is, um, as the companies, so, so Postgres and
MySQL are very popular with, with very young startups, but as they progress and they start
to realize the limits of these systems, the limitations of their join power,
the limitations of not being able to do recursive queries
across tables and stuff.
All that code that goes into the application
because database is so simple,
as the company size grows,
they start to hit those limitations.
And at some point, a new team, a new project would be like, hey, it would be great if we
had a graph database for this.
It would really save a lot of work.
Or hey, we tried this with SQL.
It's just too slow for our users.
Maybe we should switch over to a graph database.
So that's what happened.
Then they started looking into a graph database.
Obviously, they come across
some of the popular choices um they try them out uh and then accidentally almost they get to
hear about dgraph and that sticks so it's kind of one of these things where you'll know it if
you need it because you'll have grown past certain needs potentially in your traditional
relational database and so that makes it actually a pretty nice space for an enterprise
offering because your your your community is enterprise it's it's larger it's companies that
have grown at least data wise to a size where they feel the need already. And so they're usually, they're probably a certain level of successful,
at least hopefully.
Or they're even doing special things with their data,
more so than simply like, hey, we have a web app with basic CRUD,
like MySQL, Postgres, those databases are perfect and great and fine
for those types of apps.
But once you're past a certain point, you want to actually make more sense
or get insights or analytics
that really draw relations
or different things from a database
you may want to experiment
and even use in addition to
versus simply replacing.
Yeah, absolutely.
So we do see some of these
medium to big size companies.
I think they are the most, I would say, active users of graph technologies.
Even if you were to look at yesterday's news article about Neo4j getting the $80 million,
they said that 20 out of 27 or 24 top banks in the US are using Neo4j. So it gives you some idea for
how popular graphs are with enterprises. But you know, I do want to say one thing though,
I feel, and we've actually done some work on that as well, even for some basic stuff,
which you think typically think is squarely in the SQL space,
for example, building a question answering website, right? You have Quora, you have Stack
Overflow, and you have like a bunch of these things, which have even Facebook, right? You have
a post, you have comments on the post, you have likes on those comments, you have comments on
comments, likes on those and so on and so forth. It's a very recursive sort of, you know, if you need to show a post,
it's a recursive traversal.
And that's exactly what graphs are great at.
So what we did, for example, I think it was last year,
was we, so Stack Overflow does this data dumps
that you can just pick up.
It's an XML file.
You can pick it up and just you can do whatever with it.
So we picked that up and we loaded that into Dgraph.
And we thought, hey, let's build the three most popular pages on Stack Overflow.
One of them is the questions page.
One of them is the home page.
And there was one more page.
I forgot which one was it. And we just built those three
pages. The amount of backend code that we needed was not that much because the query language,
in this case of Dgraph, was sufficiently complex that it could just retrieve all the data for you,
give it to you in a nice JSON. So all the work that needs to be done
is just in the front end in rendering it, as opposed to in the backend where you pick up the
question from the questions table, then pick up the answer from the answers table,
pick up the likes and upwards from another table, and then try to join them together.
You don't have to do any of that code it just happens automatically at the database level so i feel graphs can be used in a lot more broad way and they are a lot
nicer and faster for developers but that level of developer awareness that takes some time to build
yeah that's a great idea for getting people to see how easy it is to build these recursive you know
data fetches is to use something we all are very well aware of,
which I don't know.
Does it, you think developers know what stack overflow looks like?
Perhaps.
Also, you have a cool,
another one on your homepage is play with 21 million facts from the free base
film data loaded up on a demo D graph instance.
So you can just hop in there and see what different
queries will look like and speaking a little bit to the timing of dgraph in terms of its
competitive advantage over potential other graph databases is its query language is inspired by
graphql which just couldn't have been inspired by graphql if it was 10 years ago so this is something that's very familiar, at least to front-end web developers.
Can you talk about that?
Yeah, I think GraphQL was, I would say, a great choice for us.
It was very early on, in fact.
I think Facebook just had released GraphQL or something, and I remember looking at it, I'm like, hi, this looks like it just fits.
Because when you go to a graph database, you want to get a subgraph back, you don't want to
get a list back. Because if you get a list back, it's hard to know what was connected to what
you cannot create a subgraph from a list. But you can take a subgraph and convert it to a list.
And most of the other graph queries, Cypher and Gremlin, they are all returning lists of things
back just like SQL does. So they lose some of that relationship data between things.
And I looked at GraphQL and I was like, hmm, this is very interesting. In fact,
I went back and checked with the CTO at MetaWeb who was at Google and showed it to him. He
was like, what do you think about this? And he said that it was very close to MetaWeb's
own query language called MeQL, which was popular at the time. And so we decided, hey, let's use this as a query language.
Now, the thing about GraphQL that we did not realize at the time
was that it was really a replacement for REST APIs.
And it was still designed keeping SQL in mind.
The types in the GraphQL, I really think of them as SQL tables
and the connections are similar.
So we started to quickly hit some of the seams of GraphQL where we felt like we could not
really work with it if you want to build a graph database.
So we had to then start to modify the spec um outside basically go outside of the spec
and modify we simplified some we added some features like shortest path um we we added like
filters in a in a simple way um and so and so forth uh and we still don't have a good name for
this language we just call it graphql uh plus minus because we added some
and we removed some it's first of our areas i was looking at that plus minus i thought maybe
it was like a typo there because it looks like it was like accidentally in the link but right
yeah that's that's a good name just uh plus some and minus some is plus plus still being used often
i kind of feel like it had its heyday.
You know what I mean? I remember it from maybe 10 years ago, maybe even eight.
I don't know.
It doesn't seem like it was a couple.
Is it still kind of a current known naming pattern?
It's like a hacker thing, something plus plus?
I think so.
I think it's still out there.
I mean, people still, hackers are still typing it on the daily.
Plus mine is brand new.
We still use C++
and we were like,
hey, is it GraphQL++?
Right.
But then I was like,
well, it doesn't do everything
that GraphQL does.
So it would be wrong to call it plus plus.
It has to be plus minus.
I dig it.
So is that something that potentially those pluses or
maybe the minuses could work their way back into graphql or is it just because working with graph
databases there are things that just don't make sense for the broader web api graphql honestly
that's a question on my mind almost every other day we we do see how popular GraphQL has become. In fact, it has become way
more popular than I anticipated. And there's an open ticket on Dgraph to support the official
GraphQL spec. So it will play well with all the tooling out there. Apollo raised a bunch of money
and Apollo is being used quite a lot in the GraphQL community. And we would like GraphQL to play well with all of those tools.
So I think there's definitely something that we want to do is to support the official one.
It probably takes a deeper discussion with the authors of GraphQL to see if they would like to
integrate some of the modifications
that we have done back into the spec.
That's probably a harder discussion though. This episode is brought to you by DigitalOcean.
DigitalOcean is a cloud computing platform built with simplicity at the forefront.
So managing infrastructure is easy.
Whether you're a business running one single virtual machine or 10,000,
DigitalOcean gets out of your way so teams can build, deploy, and scale cloud apps
faster and more efficiently. Join the ranks of Docker, GitLab, Slack, HashiCorp, WeWork,
Fastly, and more. Enjoy simple, predictable pricing. Sign up, deploy your app in seconds.
Head to do.co slash changelog, and our listeners get a free $100 credit to spend in your first 60 days.
Try it free.
Once again, head to do.co.
Change log. So I'm going to help us understand some of the killer use cases, the sweet spot for graph
databases.
Similar to the idea of, you know, I think Mongo came out really talking about document
based data stores and saying, if you're running an e-commerce site such as Magento,
look at all these crazy joins on these different tables
just to pull together a shopping cart.
Really, that's a document,
so let's have a document database.
And that was, I think, a compelling use case
or at least selling point for that style data store.
When I think of graph databases, I think think of social networks but that's just me
from your perspective what's the sweet spot for these types of data stores yeah so there are
certain use cases where people immediately think about using a graph database and they're there i
think there's a sweet spot there um the the top one which comes to my mind is uh real-time
recommendations uh these days companies have a lot of data around their users.
For example, you have credit cards or you have rewards cards from even big airlines or hotels or e-commerce companies around what users have purchased in the past and what other people have purchased.
Amazon comes to mind.
Amazon runs an amazing recommendation system.
That's one of probably the most demanded features or most demanded use cases from a graph database.
Then we have seen particularly medium to big companies go really hard after real-time fraud detection.
It's very easy in a graph to find circles where they can identify if it's the same person or
entity trying to create multiple cards or multiple money sources and figure out if it's a ring and sort of
cache that we have also seen identity reconciliation you know people trying to
figure out if you're the same person in in let's say Instagram in Facebook in
Twitter so and so forth so those kind of reconciliations, now you can apply them to
other data sources. That's actually a good use for graphs. And the last one, this is actually
the most relevant to particularly big companies. They have a lot of data silos. They have a lot
of different databases or even just different database instances where they
actually grab data and just one silo never talks to the other one.
And what they then do is they unify all of the data from these different silos into a
graph database.
Because remember graph databases do not have any boundaries.
The idea of graphs is that you just put all the data into one place,
and it can traverse from any node in the graph to any other node,
however far away it might be.
There's no tables.
There's no different databases.
It's just one graph.
And so that actually, that concept really helps when you want to query across multiple
data sources.
And the fifth one, which is really jumping up these days, is around artificial intelligence.
There was just a paper, I think, by Google, I think I was reading like last week, around
how they realized that they have reached the limits
and they need to use a graph database to be able to do better AI. And they even launched a small
graph library that you can use to integrate with TensorFlow. And in fact, just reading it from
yesterday's post around Neo4j funding, AI was the top
thing that they're going to go after with the new money that they're getting.
So, you know, I think for AI graphs are a no brainer.
If you had to give somebody a graph database 101, would you just say it's like a string
that threads different data points and that strings, as you kind of said, there can infinitely
scale. data points and that strings as you kind of said there can infinitely scale what
would be if you had to give a you know a 101 of what a graph database is how long
might that be and could you do it here absolutely I think it's I think graphs
are probably the simplest things to think about really you know people think
about sequel tables you have a row and you have some columns think of
graph as as three columns there you have subject a predicate and an object and if
you put together a whole bunch of these things you get a graph so a subject is
essentially think of it as an entity a predicate is the relationship and the object is either another entity or a value.
So subject could be, let's say, me and my relationship might be lives in and the object might be San Francisco.
Right.
Right.
Or it could be me.
Name is Manish.
And that's sort of like a property.
Right. name is Manish and that's sort of like a property. So you just put together a whole bunch of these,
what we call facts or triples and you get a graph. And then other people who live in San Francisco
would have similar facts and then you could run a graph query around, hey, tell me all the people
who live in San Francisco and who eat sushi. So you do like a bit of, you pick up all the people who live in San Francisco and who eat sushi right so you do like a bit of you pick
up all the people who live in San Francisco you intersect with people in the world who eat sushi
which are completely different facts you didn't you didn't store you didn't create them as this
person you know lives in San Francisco and eats sushi this is something that we're doing on the
fly so you pick up all the people in San Francisco pick up all the people in the world who eat sushi this is something that we're doing on the fly so you pick up all the people in san francisco pick up all the people in the world who eat sushi you intersect the two lists now you
get people in san francisco who eat sushi now you can take that result and say give me all the people
uh intersect with all the people who have been to japan right uh you pick up another list of people
who have been to japan intersected with this now you now you get people who live in san francisco who eat sushi and you have been to japan right so the power of graphs is is really
in these joins that you can do based upon coming from just very simple facts that makes sense too
why in part one you mentioned not having to rewrite a bunch of code you know when you when
you explain it in the 101 that that uh
these things naturally appear based on the way you query the data versus traditional ways you
might have done it with my c-core postgres relational databases in this case that the
graph of these points become more and more clear as you intersect or cross over the data because
it's just naturally how it works
and you're saving one time but also insights that were just so much harder to get to in traditional
ways or other database ways that's that's absolutely true and uh you know i was playing
with the movie database uh the the freebies movie set that we have also on our website
and uh one of the interesting things that you can look at the data all you want
and you never really find these tidbits,
but I put it into Dgraph and run some queries and turns out that the
directors of Indiana Jones movies were also in the movie, right?
I mean,
Steven Spielberg was in one of the Indiana Jones movies as one of the also in the movie. Steven Spielberg was in one of the Indiana Jones movies as
one of the characters in the movie.
Some of these
interesting things, they just become
really obvious when you put them in a graph.
That's interesting.
You add that, the built-in ACID
transactions, which gives you a lot of
safety. What are you
missing then?
Is everything better in graphb land or are the things that relational databases still do better today like what are the drawbacks
um i used to say the drawback was that uh graph database dgraph was not great for financial
transactions but then we added transactions and so now it's great for financial transactions, but then we added transactions.
And so now it's great for financial transactions.
The other drawback that we still have is that it's not really great for flat data.
And by flat data, I mean like time series data, right?
You just have tons of things
which are not really connections,
but just more and more record points for the same thing.
That kind of flat data is really just not done very well
with graph databases.
You could use a graph for that,
but it's better if you aggregate it somewhere else
and bring in the results into a graph database
than to try to do the aggregation or storage in the graph database.
So basically in a world full of subjects that have many verbs
with many like-minded objects, graph databases apply.
Absolutely, I think.
Any SQL table, which is essentially row and column and the data,
can be easily converted into graphs.
And I think every time we have tried to switch from a SQL use case to a graph use case, just
the amount of backend code that was there in play before reduces by at least half because
the query language is so much more powerful.
So to go further into Jared's question of like where you reach for a graph database over, say, Postgres or MySQL or relational, you said you used to not recommend it for transactional, but then you built it.
Is there a checklist of things that is like you'd reach for Postgres over GraphDB or Dgraph or other graph databases that is like consistently being chiseled away
where a graph database just went out?
Sorry, could you repeat that question?
Basically meaning is there a list of things where you recommend, well, okay,
if you're in these scenarios, don't use a graph database that you, you know,
like you said before, you don't recommend it for transactional database
and then you built transactions.
So now you you
take that back you maybe that was the list that was that was the list okay well there's one thing
in the list or not well uh flat data right so if you don't have a lot of relationships then
it's i mean you can i'm according to what you just said man as you can use them but they're not
necessarily optimized for that right you're not going to get the advantages necessarily right so i think the time series data is the is the one which i mentioned yeah it's just
not great for graphs what about management and maintenance because that's when i so i am a
postgres user and have been for for years and so i always look at these shiny different data stores
and i think this sounds great when I'm in development.
And then I have to actually put the thing into the world and run it and like back it up and make sure it's always up and so on and so forth.
And then it's like now I have to relearn or learn a brand new set of maintenance or management skills that I already own on the Postgres side.
So I think that's probably a barrier for a lot of people.
What's the story with deploying this thing?
I know it's built and distributed, so it's going to shard horizontally for you,
which sounds amazing, but also potentially scary.
I don't know. Tell us about deployment.
So deployment is where you lose customers
i think not for dgraph in general but i'm just talking about in general this is where you can
easily lose customers because devops guys are always hard to impress and we have spent a lot
of time making sure that devops guys are happy with dgraph. So we already built in, as I said, it's distributed, so it can shop the data for you, but it is
also replicated, and all of that is part of the open core.
So a bunch of deployments that we're doing right now, they use what we call a six-node
cluster, where we have three replicas for Dgraph0 and three replicas for DGraph0 and three replicas for DGraphAlpha. Don't worry about the
terminology here, but just understand that it's three replicas each. And DGraph uses a consensus
algorithm called Raft to make sure that every data that you put into DGraph, it reaches a quorum and gets replicated across majority of these replicas
before the acknowledgement is sent back to the user.
So in case one of the servers crash, nothing happens.
Your queries would keep on running, your data will keep on mutating, everything will just be fine.
The DevOps guy would get a notification they can
either swap the machine or the machine just if you're using kubernetes the machine just comes
back up automatically and your users don't even see it so it becomes really easy as a devops
person to just run the graph and and keep everything uh happy And one more thing that happens at the developer level
is that, as I said before,
that sometimes with Postgres, for example,
or any database which has eventual consistency
in the replication system,
they will, let's say, create a new account into the master.
And then they want to read this new user's account.
And they end up going to a replica.
And the replica still doesn't have that new record.
So it will show, hey, account not found,
which is just bad experience for a user.
So there's a lot of systems built on top,
or you have to build it yourself to make sure that
if you're doing a read after write then the read goes back to the master which basically means your
replicas are not used as well or you have to do a bunch of application level tweaks and techniques
to make it work now dgraph doesn't in d, you don't have to worry about any of that because
it's all consistent. So even if a node crashes and is down for a long time, comes back up,
immediately run a query, the query would block until the node has caught up to the rest of the
cluster. And only once the data is up to date, would it reply back. And obviously, there
is also ways by which you can time out and query another server. So all of these things
are built in to make sure that you always get the freshest data, what we call linear
disabled reads. So it tackles some of the common issues that from the both the dev side and also from the developer side.
So does it give up availability then in that case when the query blocks until it's consistent?
So you're losing availability?
Yeah.
So in the CAP theorem, it goes for consistency in partitioning instead of availability but note that a lot of
people mistake this cap theorem is not the same as high availability D graph is
highly available but it still goes for CP instead of CA So I have some pretty awesome news to share.
We are now partnered with Algolia.
If you've ever searched Hacker News, Teespring, Medium, Twitch, or even Product Hunt,
then you've experienced the results of Algolia's search API.
And as we expand our content, we knew that one day we'd have to
either roll our own search solution on top of Postgres, or we could partner up with Algolia.
And I'm happy to report that phase one of our search is now powered by Algolia.
We're able to fine tune our indexing, gain insights from search patterns and analytics.
We can create custom query rules to influence ranking behavior, as well as improve our search
experience by adding synonyms and alternative corrections to queries.
Sure, we could build search ourselves, but that would mean we would be busy doing that instead of shipping shows like you're listening to right now.
Huge thanks to our friends at Algolia for working with us.
Check the show notes for a link to get started for free or learn more by heading to Algolia.com.
And by GoCD.
GoCD is an open source continuous delivery server built by
thoughtworks check them out at gocd.org or on github at github.com go cd go cd provides continuous
delivery out of the box with its built-in pipelines advanced traceability and value stream
visualization with go cd you can easily model orchestrate and visualize complex workflows from end to end with no problem.
They support Kubernetes and modern infrastructure with Elastic on-demand agents and cloud deployments.
To learn more about GoCD, visit gocd.org slash changelog.
It's free to use, and they have professional support and enterprise add-ons available from ThoughtWorks.
Once again, gocd.org slash changelog.
So Manish, based on what you've shared with us so far, it sounds like the initial start for D-Graph as a company was 2013.
Is that right?
2015.
2015.
And 2015, you did a round, you raised $3.1 million, if I remember correctly.
Is that right?
So we did a round in early 2016 and another round in sort of late 2017.
Okay.
Just a total of, I think, 2.9-ish million.
So that means somebody trusts you with millions of dollars,
basically, is what I'm trying to get at.
You're establishing a company, you build a technology
that's obviously proven itself, and somebody said,
yeah, here's money, I trust you, I trust what you're trying to build,
and I think it makes sense to do so.
And sometimes that means that you've licensed things appropriately.
The project has been open core, open source.
You can tell us more about the inner details of that and what that means.
But somehow, someway, at some point, you chose the right license that allowed you to take on funding and build a company around it.
Can you kind of walk us through what that is?
Because I'm imagining there's just so many developers out there you know going to choose a license.com or is it.org
and they're they're getting enough information but still yet the wisdom is not there maybe so much
the the definitions and details are but i feel like you can bring some some uh bloody knuckles and some wisdom here. So preach.
Absolutely.
So I think when I was starting Dgraph,
and this is towards the end of 2015,
I naturally went for open source.
And it was not clear to me at that time how the business model would work.
I think, in fact, a lot of people I talked to
around this idea of,
hey, let's build a, I'm going to build a graph database and make it open source. And they were
like, what you're putting all the IP out there, then what's left for you to make money off?
And I think, so the business models around open source only became sort of clear to me
slightly later, you know, and I think, I think a lot of people who are in the Valley probably
are more aware of them, but definitely people in Australia were not. You get OpenCore and
so on and so forth.
Now the choice of licensing was kind of important to me. The behemoth in the graph space Neo4j was licensed as
AGPL and which is considered to be a copyleft license now what AGPL does is
that if you were to touch any code and use this AGPL code as let's say library then you must open
source your code also as AGPL it's sort of like a viral license if you touch it
it affects you as well and we decided to go with a more permissive sort of Apache license.
Now, a lot of people think the reason to open source
something is around getting contributions
from just developers all over the world.
And I would say that is true,
but it is not the main benefit of open sourcing something.
The biggest benefit of open sourcing software in my mind is around adoption.
It's basically free marketing. You put your code as open source, anybody can see it,
they feel more comfortable using it,
they don't have to pay you a dime to use it, particularly in permissive licenses like Apache
and BSD and MIT, etc. And these days, if you want to build an infrastructure company, I've noticed
most startups and most tech-based companies, they really want the underlying technology to be open source.
And they have multiple benefits of doing so.
When they have the code available to them, they already have the engineering talent.
That talent can potentially go and modify the code base to improve it or modify it to their liking, etc.
So the biggest thing I've seen around permissive licenses is adoption.
And also you get contributions.
But more importantly, I think over the journey of both DGraph and Badger that I've noticed
is just the fact that people tell you, people give you feedback
around issues that they run into. And that feedback I feel is more important sometimes
than the actual core contributions that you get. So if you look at any open source repository,
you'll see, you know, 90% of the contributions are being done by the core team of three or four people and then there's a
whole long tail of small contributions done by the bigger open source community um that's sort of
like the the uh ugly truth or unknown truth about uh open source projects so really i think it's the
feedback that that really makes uh that that improves the robustness of code.
That's definitely an interesting take. I think most people would say that
the contributions are the main reason, but I think
that's a compelling statement that you have there with regard to the feedback versus
actual code contributions. So you mentioned picking Apache versus
AGPL. Tell us about agpl
maybe even contrast it with gpl which is a modification of to a certain degree and then why
it was unattractive to you as a license so i think let me just start with explaining a bit
about agpl itself and uh again this is best to my understanding with GPL.
The idea is that, you know, it's the code is on the same place and the users are sort of
linking to it as a library.
And again, the virality of this whole GPL series comes into play.
So if you link your code to GPL code, your code becomes, it's supposed to become GPL
as well. And you must make it open source on the GPL terms. Now, AGPL was then devised as a way
by which it can tackle GPL running as a server and you interfacing with it over the network so i think the idea was that is to try to make the same virality affect you if you
are running gpl code in the server and interfacing with it over a client that's my understanding as
well the the gpl had a quote-unquote loophole because it was designed before the proliferation of services,
websites, web servers, web services,
where you're not delivering the end code,
you're delivering a byproduct of the code.
And so the AGPL was basically a fix for that loophole
to also make the server side,
even if you don't deliver the code to the end user,
still covered under the what you
said the virality portion of the gpl so i think we're in agreement with that's that being the
primary means and then for the aim and also i think it was effective in that regard absolutely
and a lot of companies who still want to like hold on tightly to their code base tend to use AGPL
as a sort of like a stop gap
between going fully permissive open source
and while still trying to make sure
that they have a more solid sort of business model around us.
Now, we actually
like Dgraph initially, we did also try to convert
from Apache to AGPL. Now, when you do
such a conversion, the first thing that you have to make sure of is that
even before the project started, you have a good
ICLA in place. Now, what's an ICLA? It's an
individual contributor license agreement, which means that any contribution that you take in
into the open source project, the rights to that contribution are given back to the
company running the open source. And we put that in place into Dgraph very early on,
even then we were under Apache.
So that means that in a way,
the authors of that contribution,
they hand the rights back to the company,
which means the company can now change the licensing if need be.
We do not accept any contributions
without the author signing ICLA.
And it's just a standard practice I've noticed across not just Degra, but other open source
companies as well.
So that meant that we could change the licensing terms.
And we did change it to AGPL.
This was, I think, after MongoDB went IPO and MongoDB was using AGPL.
And we felt maybe that's a better way for us to make sure that we have a good business
model.
And once we had switched over to AGPL, we started hitting some of these things that
we did not really understand before.
Now to give you a bit of a history, Google explicitly bans
AGPL code. Google's open source guy, Chris DeBona, in fact, sort of famously said that
no AGPL code is useful or good. And we don't need to use it. They banned it. Now now when google goes and bans a license other companies follow right so facebook
now facebook doesn't publish it openly and i don't really know but i know that much that in facebook
and in apple and some of these big companies it is very hard or almost impossible to bring in
any agpl code which means uh and we actually had some of these things.
So somebody wants to play with Dgraph
at one of these big companies, they are unable to,
because they can't even bring the code
into the company at all.
So we started realizing that because of this,
people were having hard time adopting Dgraph.
And again, this going back to my point about why would you choose open source over proprietary
license?
It's largely for adoption.
So we started seeing some of those issues.
And we switched over from Apache to AGPL in March 2017, if I'm not wrong. And then towards the end of 2017, we decided, hey, we need a better solution here.
AGPL seems too toxic to be used for Dgraph.
And around that time, we started a discussion, somewhere after that, started discussion with Redis Labs folks.
And, you know, together we came up with this thing
called the Commons Close.
Now, the idea behind Commons Close is that
you use a permissive license like Apache
or in this case of Redis, they use BSD.
And you add a close,
which basically says that it basically prohibits some company or some person to sell the software as it is um and and why would we why would we go
to agpl or why would go to commerce laws the reason is that um what's been happening lately and what none of the open source licenses have thought
about is that big companies and these platform as a service or infrastructure as service
etc companies most notably amazon and the chinese counterparts they would pick up an open source
project and they will run it as a service at a much cheaper price. And, you know, they, because they have the, they have the bandwidth and the engineering
talent and the money for it, they would, they would run it as a service without contributing
back to the open source project.
And the main thing that, that we were going for is to avoid that. If you want to sell this thing to developers,
you should at least contribute back, or you should help the company financially who is actually doing
most of the contributions. So all of these licenses, AGPL or Commons Close, and now Mongo's SSPL,
they are really around trying to dissuade big providers,
service providers, from just ripping off an open source project.
It seems like this stems, based on your earlier points,
is like your motivations, right?
Your lens for which you're navigating this.
And in your case, in particular with Dgraph,
you know, you're optimizing as open source for adoption,
not so much contributions, right?
So you still want contributions.
It's still important as part of the world
how open source works,
but you're doing it based on adoption.
So you've had to go through different licenses
and you want to be,
you want to have a liberal license with the clause that protects you so you can be a company and actually be viable and sustainable and there's some that say that that added clause
basically makes you not open source what do you say to that yeah i think it's uh it's it's a very
delicate trade-off between trying to choose a permissive license,
which allows most users to just use the software while also dissuading a big company from coming in
and stealing your financial longevity in some sense.
And if you put Commons Clause in place, it is true. The project is no longer open source because it is not,
commons clause is not OSDI approved.
Now Redis did a smart thing where they kept most of their code base
under the BSD license, which is, it is still open source,
but chose some of the modules that they had built
and put them under
common slows. So you can think of again as this open core model in some sense where most of your
code is open source, but then some of your code is not. And when we applied common slows to DGraph,
we applied it fully, which means all of the core base was under
common clause. And we were just not convinced that that was the right move. And this became
very apparent when, again, Google went in and banned common clause as well. Now, I don't agree with the reasoning for Google to ban commons clause,
which was that they feel that commons clause
prohibits all commercial usages,
which is completely wrong, really.
Commons clause has this term called
if the code is substantially the same
as the original code, then you can't sell it.
Substantially is a term used very commonly in legal documents to basically indicate that if you tweak things a bit,
it doesn't make it different. And that is just a way of saying that if largely you're selling
the same thing, which is selling, let let's say redis modules in this case or
selling dgraph then you would not be allowed to do that but you can build something on top of it
for example you could build a question answering website you could build some other proprietary
service on top of dgraph and you can sell that nobody stops you from doing that because
it is not substantially the same thing.
So that was the idea behind Commons Close. I feel that the intentions were correct,
but it was very hard to convey to people in the community and even Google in this case, what substantially meant.
I think we went through many, many debates around
explaining to people substantially does not mean this, substantially doesn't mean that.
But I don't think it was, it's a fight that is easy to win. And then I think most, and we,
so we in the end, after we realized that commerce laws was banned by Google,
brings us back to the same place where AGPL is banned by Google.
And again, it affects adoption.
And so we decided that we would switch back to Apache license.
Now, there's an interesting sort of backdrop here.
This is back in 2017, I think.
CockroachDB, a database company in New York, they had come up with a license,
which was essentially Apache plus enterprise license,
what they call the Cockroach license.
And what they did was,
instead of trying to close source their enterprise modules,
they made it source visible,
and they co-located it right next to their open source code base.
So now what they have is they have the main source tree which is Apache licensed
and then certain modules which are under the enterprise license
are still with the code visible.
And that was a very attractive uh sort of uh um sort of system
and it was very well received by their community and um it's something that that i had in the back
of my mind for a while uh and i felt that uh digra was still sort of young enough and we have not
yet we have started to build our enterprise features
but i felt that we can easily switch over to that license um and uh make it work so what we have
done now is that we have brought the graph back to apache without any clothes um and we're going to
build enterprise modules which would be source visible this This system is also adopted, if I'm not wrong, by Elasticsearch.
And it's just in general a very big win for liberal open source licenses in some sense.
One more thing on top of this is that, you know, so this is our journey.
That's where our journey sort of like kind of concludes.
But after we switched over to Apache license and
enterprise license, MongoDB,
which was previously
AGPL, has grown even
stricter and created a license
called SSPL,
which is server-side public license.
Now, you know,
as AGPL was
sort of stricter
than GPL,
SSPL is even stricter than AGPL.
And what it says is that it tries to do the same thing as common clause in some sense, but does it a bit differently. So what they say is that if you run MongoDB as a service, then you must open source the code base, which helps you run MongoDB as a service.
Again, it's a jab at the big
service providers like amazon but it's just done in a different way where they probably have a
better chance of getting it approved by osi but in my mind it's trying to achieve the same thing
as what redis was doing with common slots So there are plenty of people out there that are vehemently opposed to Commons Clause with regards to open source software.
Because as you said, the OSI has not approved it and potentially will not approve it.
And so there's Commons Clause licensed projects that claim to be open source.
And even on the commonsclause.com, it says, is this open source?
And it says no, because of that specific thing.
That being said, do you believe the Commons Clause is in the spirit of open source?
Because I'm on the fence there.
It seems like freedom to modify, or to dispute it seems like a
bit anti-freedom but only for a small subset right it's like large corporations slash service
providers you can't like we're like your freedom away but everybody else is still free i don't know
what's this was something you've you've gone down the path you've you implemented it's kind of there
and back again apache 2 maybe hgpl
maybe commons clause you've had some pushback from your community you mentioned google banning it
was the showstopper makes a lot of sense for adoption but all along the way it seems like
your intentions are are good from what i can tell from this conversation so what do you think about the
commons clause with regards to maybe it's not open source approved but do you do you believe
it's in the heart in the spirit of open source or or not um i absolutely believe uh it is i feel it
is more in the spirit of open source than agpl is why is that that? The problem with AGPL being used
at any medium to big company
is that the moment you bring in AGPL,
you have to be afraid about,
hey, do I need to open source my own code base?
And the problem with big companies
is that they have this spaghetti code,
which is part proprietary, part ancient.
It's very hard to say, okay, this piece I can break off
and maybe open source this, but this piece I can keep proprietary.
It's very hard to say that.
And therefore, if you look at, you know, Google, for example,
when they built Kubernetes or when they built gRPC,
they didn't just open source their existing systems, Borg and Stubby.
They had to rewrite them from scratch to make it open source.
And so AGPL puts this restriction upon these companies that if they use any AGPL code,
they must open source because of virality.
It's very prohibitive.
Now, you bring in Commons Close plus Apache.
Apache gives you anything uh basically you
can do anything with the code base you don't have to open source it's not vital um and commerce law
stops you from selling the the database in this case or or whatever it's the code base is from
selling that particular code it works for big companies. Very cut and dry. It should work for, you know, let's say Google.
It should work for Facebook because they're not trying to sell Redis.
They're not trying to sell Tcraft.
They're just trying to use it.
So I feel that it is more permissive than a GPL.
The only companies it should really affect is if you are amazon trying to sell redis
and all the particular modules that they that they put in the commons close then you're not
able to sell that um which is i i feel it's fine because uh if they did not contribute um then
maybe they shouldn't sell it and maybe they should let the contributors sell that.
So that's my take on it. For AGPL, I might have a somewhat analogous take on this, so to speak.
It reminds me of CSS in a way.
There's a cascade, an unwanted effect of using it,
which is not always clear when you make changes or use a class or something like that.
There's hidden things.
So if I use a GPL, it may affect licenses or other feature software ever use in unwanted
ways.
And those unwanted ways provides ambiguity and it's not clear.
So in those reasons, I can see why it's not, you know, that's accurate.
Then I can see why it's less likely.
Whereas commas class is more like a razor blade.
Like it's less likely whereas commas clause is more like a razor blade like it's a clear cut
you know it's like i i can license my code permissively you know at one level and then
clause in or add an addendum which is the point of it is here's one clause and it's only for this
project and it doesn't affect any other things it touches. It's just like, if you're trying to resell my thing here, then that's just not possible.
So I'm with you too, Jared.
Like, um, I'm going to just take like, seems like a great guy.
I like him.
Um, you know, he's still here.
We haven't hung up on him yet.
I can hear you.
Right.
You know, it's, this is where I think this needs to be a dialogue.
And blog posts are great for getting points across.
I really feel like this needs some sort of like at large literal discussion because behind all software is human beings with often great intentions.
Right.
Manish isn't trying to hurt people.
He just wants to be able to create awesome tech and have people use it.
He said that here and he's trying to look for, and he and his team, and I'm sure his
investors too, are trying to make sure that remains possible.
And so I'm for that.
Couldn't he just do that now that we're talking about him and he's not here anymore?
Couldn't he just do that by having closed source software?
Like, isn't that just a way of going?
I mean, if you want to do that,
I'm just playing devil's advocate.
Right.
If you want to do that,
obviously Manish, please feel free to respond.
We're not actually talking about you.
Like you're not here.
Couldn't you just close source?
I mean, keep it proprietary
and then you get to say hands off.
You don't have these problems.
So the thing about closed sources,
again, it goes back to the reason about why do you want to open source in the first place?
I think it's not about the contributions.
I mean, obviously, if you get contributions, I always thank people for contributions, thank them for the feedback.
But the reason you make anything open source is adoption.
You want to build something which a lot of companies, a lot of people are going to base
their entire tech stack upon, in this case a database.
They're going to trust you with their data.
They want to be able to look at the code and make sure that the code base is good quality,
it doesn't have any weird bugs, that they are able to modify the code.
And what if the company dies tomorrow they should still be able to adopt that code base and then maybe run with it so i feel
open you can do that with proprietary license as well i mean you could ship them binaries plus
source code as part of their license this isn't something they wouldn't be able to do that's the
thing about proprietaries you can do whatever you want with it true the the other part of this equation is that when you make something proprietary
the selling becomes a lot more work you need to have an entire sales team to uh to be able to
go to individual companies and be like hey uh have you heard about this thing called you know
dgraph and it's a proprietary thing.
You can't see it online, but we can sell it to you for use.
It is a lot harder pitch than, hey, developers, it's just free.
It's out there.
You can try it.
And if you don't like it, it's fine.
If you like it, it's fine.
You don't have to talk to us.
And I think that's the beauty of open source is that it avoids having to have sales people running around um and uh you just become part of a
developer conversation um anywhere in the world um nobody has to pay you to try it okay i think
the problem though is is uh is being seen as masquerading as open source, but not really being open source.
It goes back to like original things.
It's been said like the anti-commons clause or whatever.
Just in terms of the spirit of open source.
And sure, it is open.
You can see it.
I can contribute back if I want to.
But I think what the community is really pushing back on
is less like, hey,
that's a bad thing and more like, hey, this really isn't open source. So just don't call it open
source and we'll be okay. Yeah. It's potentially a namespace conflict, right? As all things are,
because, you know, the benefits of open source are immense, as you've said, Manish. And in many
cases, especially in infrastructure style,
like missions critical enterprise software in 2018, it's almost table stakes for success
because people expect it. As you said, your sales processes are easier. People can like,
the trust is immediately there. And yet when you add Commons clause to it,
it's it's restricting in that regards.
And so now it's like,
well, right there
on commonsclause.com,
this is not open source.
It's something else.
But then there's also
like you want.
It's almost like
I'm not saying this
personally against you
or against GGAF,
but it's as if
you want the benefits
of open source
without actually being
open source.
And so maybe it needs
to be like available
source or readable source. Or, you so maybe it needs to be like available source
or readable source, or, you know,
it's almost like we just got to come
with some more nomenclature,
similar to how we have copy left, copy right,
or free, you know, free and Libre versus open source.
We have all these different terms.
Maybe there's a need for another term for this style.
I don't know.
What do you think about that, Manish?
We were very careful
when we switched to uh to commons close and apache that we removed all the references to open source
and we swapped them with a liberal license because i think goes back into my uh my take on this is
that it's more liberal than agpl and some of the other open source licenses.
So we had to switch it over to liberal license. It was a bit of a heartache for me because I've
been an open source guy for a long time. Back in 2005, I wrote this thing called FlickrFS to
build a file system on top of Flickr, which was the most popular image sharing site at the time.
So I've been through and through open source guy and it was a bit of a hard
decision, but I think something and, and just to be clarified, right?
So we have moved away from common slows, but still,
I would sort of defend the thought at the time was that it,
it is probably not approved by the folks who are at MSI.
But in terms of the spirit of open source, I feel it was there.
I think open source has to evolve to a point where people who are building open source can sustain themselves from what they are building as opposed to having to ask for donations or having to work for another company
or having to be acquired by another company who is writing proprietary code.
Every time I see some open source author having to go join a company and abandon their open source
project which is very popular, it hurts me in some sense. It feels bad why shouldn't a person who is writing an
amazing code not able to sustain themselves with the right intentions in their mind which is that
hey open source obviously makes sense now we should we should there should be a deeper
conversation about hey open source makes sense we all agree let's figure out how do we make money how do we make
sure that people who are in open source continue to make money and not just not just by working
making open source their secondary project but having open source as their primary project and
the source of income it definitely i mean hearing it from that perspective and then also knowing you know what a issue you have in open source
back to flickr fs you know it makes you really consider this what you say is a necessary evolution
of open source because based on what you just said there and how you said it was that
the restriction the free and libre of open source is there,
but at some point it does restrict potentially the sustainability by restricting its original creators and maintainers and community
from being able to profit in certain ways from it
because of just sheer competition.
You can't compete with Amazon.
Maybe you can, maybe you really can, but I mean, like most,
if Amazon launches a furniture line, well, Wayfair's stock goes down 6% in a day.
I mean, that happens, right?
So, you know, how can we expect little old you guys in your team to compete?
And the restriction is that the the restriction comes
back to the original core team and how you can sustain it financially without having to as you
said the examples were either ask for donations work for a company you're you're you're not
liberated to operate a company around this source code in a way that is financially feasible if you have to face sheer weight of competition that is just so massive.
Does that summarize somewhat of what you're trying to say there?
Yeah, I think one thing we failed to mention is the three models of open source money making.
I think I should quickly mention that.
So it all ties together.
You know, the first one is that you have this open core,
which is under an open source license,
and you build proprietary features on top of it,
and which you sell.
That's the first one.
And in Redis Labs' case,
they basically try to make those modules
sort of under a commerce clause so that they
can sell those.
The second is that you obviously support and training comes in, right?
Red Hat pioneered this a long time ago, and every open source company does open support
and training.
That's how they make money.
The third one is that you run that software as a service. I think this is where Amazon story
comes from the picture is for example, with Redis Labs, Amazon is probably running Redis
behind their scenes for either Elastic Cash or I forget what it is. And they're literally just
running that without paying anything back to Redis Labs. And Redis Labs in this case also has competing Redis as a service availability.
And so both MongoDB and Redis and whoever is trying to use commerce close is trying to avoid a big company like Amazon.
And also, and now these days, their Chinese counterparts, I think, I forget the name of that, but they also are running Redis and Mongo behind their service providers and charging customers for it.
So these companies are like, hey, we build this thing.
We should be, you shouldn't be competing with us on this and we should be getting that money.
Trying to stop the leeches, you know, stop leeching off people, you know, contribute
back.
And it makes you kind of mad, even though you don't, I totally get it.
Right.
I can see it from Amazon's side.
Right.
But yeah, it's like the leech clause.
There you go.
The leech clause.
Well, in a free world, people are free to do literally whatever they want and so i think in
the spirit of open source they the idea has been for it to be a free world in in most or all senses
of the word and i think when you restrict that recognize you know this leech scenario and the viability of it
if we continue to allow that to happen and not have conversations that hear all sides then we
essentially allow the freedom of the software as good as it may be to
stagnate and potentially,
like you said,
Jared,
why not go into proprietary?
And then we,
we wouldn't even be talking to them.
Nish,
you know,
cause I mean like what would be the point,
right?
The open source is one that's in quotes.
It's been said not just by us,
but others,
you know,
so that's in quotes.
Well,
I mean, it's official nadia eggball said it on request for commits many times and others agreed so that's why i see it's in quotes
because it's been said not just here by us but others um yeah i mean i think it just needs more
attention i'm not saying i agree or it's wrong or it's right i definitely see the pain points
and we need some sort of evolution.
I would like to add one thing.
I think this is,
it seems like a fresh thing.
It seems like a new thing that there's this attack on open source
in some sense by this common clause, et cetera.
But this was done before.
If you look at GPL,
the idea was behind GPL was that,
hey, open source is important.
We must do open source.
In fact, we force you to do open source.
If you use our code, you must also open source your code, right?
And then AGPL was evolution of that to say, hey, also on the network, same thing.
And then the MongoDB SSPL is extending that to say, hey, if you run it as service, same thing, right?
But think about what they are really doing practically,
like what are the practical consequences of this
is that in some sense,
they're dissuading others who have not contributed
from leeching off it in some sense, right?
And I think that's the direction
that common slows and SSPL are all going.
I recall we had Joseph Jax on the show of OSS Capital a couple of weeks back,
and we asked him about Commons Clause because drafted by Heather Meeker,
she's part of OSS Capital, so I'm sure you know her as well.
And one thing that he said about it is he sees it as a stepping stone
or as an effort in a specific direction,
and that there are things that you said there's necessary evolution that has to happen for the greater open source community to continue to not strive so much, but thrive.
Right.
And so I'm happy to have this conversation.
I've learned a lot here, Manish.
Thanks so much for coming on and just continuing
to talk about these things. I know it's the kind of the nitty gritty licensing, not the most kind
of a dry topic, but there's so many facets to these decisions and the implications of changing
a license, picking a license. They're just massive. And we're definitely living in a brave
new world where we're trying to figure this out together. And clearly a world with big numbers.
I mean, we've seen the headlines on changelog.com this week in the news feed.
You know, billion-dollar valuations, multi, hundreds of millions of dollars invested into, you know, new companies or companies that are now unicorns.
HashiCorp being an open-core model type company that's just taken on you know a new round
of gigantic funding so there's clearly lots of money at play here you know and it's a new world
for open source every single day so where we go from here i mean clearly we've had a great
conversation that's led from not only dgraph as a tech and how it applies graph databases 101 on
through how they can be used you're clearly super smart you've had
to relicense you've been through a journey what do you suggest maybe the next step maybe not here
today because we're getting out of time but where what are some suggestions for you to continue this
conversation in ways that are meaningful that can get to meaningful change do we have a conference
about it do we do a sustained like unconference or just kind of a
gathering how can this best be approached by the right people in ways that are not vicious and
attacking but in ways that are meant to actually get to change what do you suggest i think it's a
it's a tough conversation it's a conversation um of ideals versus practicality.
It would require flexibility from the maintainers
or the people in charge at OSI
to think through some of the practical considerations
of running a open source company in today's environment.
And I think it would definitely need a bigger dialogue.
I feel, you know, if MongoDB's SSPL gets approved by OSI,
that would be probably a great outcome of this
and can easily see a bunch of other companies
jumping onto that bandwagon.
If it gets rejected,
then other open source companies
are going to keep coming up with something new,
which might work.
There's definitely a need for a change here.
I think that much is clear.
Well, let's close the show with anything for you.
I know you got lots of stuff happening.
We've obviously covered quite a bit of ground,
but if people are following along with you,
where do they go?
What do they do?
Do you have anything to announce
here at the close of the show?
Yeah, I do want to announce something.
I think, you know, we are doing,
we are solving really complex problems at Dgraph
and we also have Badger,
both written purely in Go,
both open source.
If you want to help us
and if you want to experience
these challenging problems,
come join us.
You can go to httpsdcraft.io
and see some of the job openings
we are looking for backend engineers.
So apply.
Manish, thank you so much for sharing
not only your story, but your wisdom here.
I know it's a tough subject and going on record
because we do have an awesome transcript for this show.
Thank you, Alexander, for being so awesome and all the contributors out there who help us to make them like like our friends who mentioned the top of the show, make our shows less unintelligible and more intelligible, so to speak. So, I mean, I know it's tough to be on record about very tough subjects and we just
appreciate your courage to share how you feel and the willingness to continue
to go on the road,
even when it's bumpy.
And thank you for,
thank you for sharing your time with us.
Thanks for having me guys.
All right.
Thanks for tuning into this episode of the change log.
If you enjoyed this show,
do us a favor.
Go into iTunes or Apple Podcasts and leave us a rating or a review.
Go into Overcast and favorite it.
Tweet a link to it.
Share it with a friend.
And, of course, I want to thank our awesome sponsors and partners,
Rollbar Digital Ocean, Algolia, and GoCD.
Also, thanks to FASI, our bandwidth partner.
Head to FASI.com to learn more
and we're able to move fast around here and fix things because of roll bar check them out at
rollbar.com and we're hosted on leno cloud servers head to leno.com slash changelog support this show
this episode was hosted by myself alice dicoviak and j Santo. Editing was by Tim Smith. And the mix and master also by Tim Smith.
Music is by the ever awesome Breakmaster Cylinder.
And if you want to hear more episodes like this,
subscribe to our master feed at changelog.com slash master
or go into your podcast app and search for ChangeLog Master.
You'll find it.
Subscribe, get all of our shows,
as well as some extras that only hit the master feed.
Thanks for tuning in.
We'll see you soon.