Orchestrate all the Things - Databases, graphs, and GraphQL - past, present, and future. Featuring Manish Jain, Dgraph CTO and founder, and Josh McKenzie, Apollo VP of Software Engineering

Starting point is 00:00:00 Welcome to the Orchestrate All the Things podcast. I'm George Amadiotis and we'll be connecting the dots together. Manis Hein and Josh McKenzie are both engineering rock stars who wear many hats. They also have something else in common. They are both avid GraphQL users and builders, despite getting there from different startpoints. GraphQL is an API query language that has taken the software development world by storm and Hein and Mackenzie exemplify this. Hein, founder and CTO of Dgraph, was among the first to take note of GraphQL in 2015. He

Starting point is 00:00:36 liked it so much that he built Dgraph around it. Mackenzie got exposed to GraphQL in 2020 while building Stargate during his team data stacks. He liked it so much that he is now VP of Software Engineering at Apollo, a vendor built around GraphQL. We are not as early as time, but in 2018, we noted της GraphQL για την ανάπτυξη της Database με εμβόλους από ούτε Τσάιν και ο Μακένζης για τα τεχνικά, το επιχείρημα και τα κοινωνικά επαναστασία. Ένας φυσικός τρόπος για να ξεκινήσουμε την συζήτηση, και, ξέρετε, καθώς το παρακολουθούμε αυτό, για τους βοήθειες των ανθρώπων που μπορεί να ακούσουν, θα ήθελα να σας παρακολουθήσω και να σας πω the benefits of the people who may be listening would be to actually introduce yourselves

Starting point is 00:01:25 and just say a few words about your background and your current roles. And from that, we'll move on to the specifics of the discussion, GraphQL, databases, and how they're related and all of that. So, you can pick who gets to go first. You wanna go first, Josh? Sure, I can go ahead and start.

Starting point is 00:01:48 Josh McKenzie, I'm currently VP of engineering at Apollo. The crucible that I kind of, I guess, grew up in was a systems engineer in high-frequency trading in Chicago, back in the late off before microservices were a real thing that were widely adopted. And that was where my, I guess, pattern of seeing the pain points that came with massive systems and how you deal with data and the disjoint between

Starting point is 00:02:12 like modern SOA and databases became pretty clear. When you have tens of thousands of microservices and one monolithic Postgres server, you kind of see your single point of failure pretty clearly. But after that, spent a couple years doing calculations in Vertica at Morningstar Financial, and then ended up the last seven years at Datastacks working on Apache Cassandra. So I'm a committer and a PMC member on the Cassandra code base. I did the executive VP of engineering round at Datastacks as well. Kind of had a foot in the proprietary world and a foot in the open source world for quite some time. And really last year was my introduction to GraphQL. So I'm pretty new to the ecosystem and the space, but the Stargate project, which Datastacks open sourced was looking at adding new APIs to

Starting point is 00:02:54 Cassandra. And the more I dug into GraphQL, the more it struck me like this is the technological answer to the proliferation of microservices. Like it is the thing that lets people have their cake and eat it too when it comes to developer velocity. And so really the more I dug into it, the more interested in the technology I became, the more it was like, well, the world has accepted eventual consistency, which is the hill we were climbing with Cassandra for years. And they've accepted that in the form of microservices to begin with and GraphQL as their data, essentially control plane on top of that. So yeah, really that was what kind of led to my transition over to Apollo is realizing

Starting point is 00:03:28 there's a group of humans that see the same thing in the shape of the technology, that believe the same things about kind of democratizing that technology to the world. And that's where I am now. So I'm definitely excited to be there. But I've been with Apollo for about six weeks now. So I'm very fresh in the space good thank you josh so manish then it's uh your turn i guess yeah um hi i'm uh i'm manish jain i'm the the founder and uh recently the cto of uh of the graph labs um i started d graph uh around aboutish years ago.

Starting point is 00:04:05 The version 0.1 of Dgraph came out in December 2015. And before I go into Dgraph, I'd like to talk about my own background from before. So I grew up in India, went to college in Singapore, and joined Google, graduated out of college. So I was working in Google Mountain View on web search infrastructure for the first three years of my life there from 2007 to 2010. And then started working on the knowledge graph effort, which came out of an acquisition at Google.

Starting point is 00:04:36 And that was a big effort because we already had a bunch of structured data, which we were serving out of these things we call one boxes. And each of them was a different back end right um and it was becoming very hard to maintain because different teams would run it in different ways have different slas and so and so forth and so the there was an internal effort by the name of dgraph at google uh which was sort of like uniting all of these uh one boxes one boxes means like let's say flights and weather and events. So if you search for like weather in San Francisco

Starting point is 00:05:08 or like, you know, events in New York, the thing which shows up, that's the one box. And so the idea was to take the knowledge graph and build an equivalent graph indexing and serving system at Google and then wrap up all of the boxes knowledge as well into this one system. So I was sort of like one of the tech leads to build that system, worked on both the indexing

Starting point is 00:05:31 and the serving part of it and learned a bunch about how do you sort of like build a sort of low latency, high throughput graph serving system, which can scale to like Google's needs because what 30% of the web search traffic was supposed to hit that, hit that system. So learned a bunch there spent, I think overall spent like six, six and a half years at Google. Did a short stint at Cora just so the, how a startup works and you know, the over-reliance on Postgres as, as Josh mentioned was definitely there.

Starting point is 00:06:05 And then around 2015 is when I started to look at the Graph space and felt like, you know, what we learned, what I learned back at Google could be used to build something better. And around the same time is when Facebook came out with GraphQL. And I really liked the query language even back then in the, whatever the, it was before 1.0, I think,

Starting point is 00:06:27 the spec was, and quite liked it. And decide that, you know, for the graph system that I'll build, we'll use GraphQL as a query language. Now it was very bold choice, right? And even bold by even today's standards, right? But we then ended up sort of like forking GraphQL to build a flavor of GraphQL, but we call now, we call it GraphQL plus minus for a while now, we call it DQL. And it's been five years since then. And, you know, so Dgraph is sort of like a mix between a graph database and a graphql native database and that's where we are

Starting point is 00:07:09 today in terms of my own transition within dgraph i i was the ceo of dgraph for for up until now and we just scored in a new ceo to help me run the business side of things like while I can go back and refocus on the product and the engineering. Plus, I have just got a newborn like last year. So, you know, with two kids in hand and a company to run could really use the extra help. Well, yeah. And congratulations on finding and hiring actually in your CEO. I'm sure it wasn't an easy one. And your dilemma is something I've also kind of faced in the past. Well, the stakes were not exactly the same. So I didn't ever run a company the size of Diggraph before, but I know how it is, you know, having the sense of, well,

Starting point is 00:07:57 this is a technical issue I should be attending to, but I have all these other things to do. Exactly. Exactly. Yeah. And I felt like there was like, you can never win because if you focus on the tech, then you're losing on the, you know,

Starting point is 00:08:11 you're not able to focus on the business side of things. If you focus on the business side of things, you can't focus on the tech. So you're just constantly juggling and never satisfied. And I think, and having like two people

Starting point is 00:08:21 sort of like focus on different aspects really helps. It does. It does. So, yeah, great. Thanks both for the introductions. And actually, before we get into the specifics, even deeper actually into the specifics, another starting point we should get covered is, well, GraphQL actually.

Starting point is 00:08:42 So some of the people will be familiar with it some others not so much and actually for me the first time I heard about GraphQL and I guess for for many other people as well my initial thought was like okay so this is a graph query language and it's in my opinion a very popular misconception misconception. And let's clarify that. Let's get it out of the way. So the simplest definition I can possibly think of about what GraphQL really is, is that it's a kind of API query language, really.

Starting point is 00:09:18 It does have a kind of graph structure, but it's not really a graph query language in the strict definition. So what's your take on that? Yeah, absolutely agree. I think it's definitely a misnomer. And I think we joke that it's as much of a misnomer that GraphQL is a graph query language as a relational databases about relationships, right? So you're right. It's really like an API query language and it's able to do the kind of retrieval

Starting point is 00:09:55 that you would want to do in any typical application, which was not as easily possible with REST APIs. So people compare it generally with REST. And as we sort of like learned firsthand while building Dgraph is that to really make it a GraphQL language, you have to add a whole bunch of functionality on top of it to be able to do the kind of complex queries

Starting point is 00:10:23 that you would typically want to do from a graph system. And so it sort of falls way short of that. And it's not even trying to be that, right? I mean, we have talked to the GraphQL community, GraphQL sort of like founders and stuff and what they are trying to achieve is a general spec, which can be implemented easily on any system, as opposed to a very specific spec,

Starting point is 00:10:53 which would be complete, right? It's not a SQL equivalent. SQL is a complete spec and it can do a lot of stuff on its own. GraphQL is not that and it's not trying to be that. And similarly GraphQL leaves a lot of holes, I would say in how you implement things, because it's not an implementation, it's a spec.

Starting point is 00:11:13 So the way you would implement, let's say filtering would be different for Dgraph, different for, let's say Hasura, different for, let's say a Fauna or anybody else who implements it, they could look slightly differently because the spec doesn't tell you exactly how it should look like, right?

Starting point is 00:11:33 So in that regards, absolutely agree, yeah. Yeah, I have a slightly different perspective on it, which is like there's the top-down going from computer science definition of graph and looking at GraphQL and thinking, yeah, obviously those two don't match. And then there's the top-down going from computer science definition of graph and looking at GraphQL and saying, yeah, obviously those two don't match. And then there's the bottom-up of backing into a graph-based data structure by just building things in different product teams. And that's one of the things you see strategically the direction Apollo's headed with Federation, not to muddy the waters of GraphQL, but there is a shape of a graph computer science problem

Starting point is 00:12:02 starting to surface as these things get larger and larger, as the usage of GraphQL gets broader and broader and more widely adopted. So I see the technology growing into the name as opposed to it. I don't know whether that was just an immense forethought five years ago when they named it that to see the shape that it was going to take as it evolved. But certainly there's a future in which if this thing gets adopted at scale, like I believe that it will, hence I put Malmik chips in with Apollo, that we're going to start needing to think about those things in the query planner space. And some of the different algorithms in the graph space will become quite relevant in that respect as well. One of the things that I wrestled with in the past,

Starting point is 00:12:40 TinkerPop was compatible with Cassandra. There's built on top of Cassandra, there's the whole, there was a pressure for the users to go from understanding CQL to understanding graph modeling and graph theory. And there's still a pretty big lift when it comes to graph shape problems being modeled in a very domain specific way, which I think is doing a disservice to how powerful Graph actually is as a technology.

Starting point is 00:13:08 And so it's this question of like, who's the first that's going to crack that nut of making a human consumable Graph shaped API, which I think, you know, D Graph is going a long way towards that. Like I look at Gremlin and think that is not DX that I want to get involved with, right? So there's that question of how much of a lift are you asking from users to user technology? And I do think GraphQL is right in the sweet spot there where there's a little bit of an ask for ramping on the language itself.

Starting point is 00:13:34 But I think in whatever direction it evolves, it's probably not going to be surfacing graph-shaped problems or graph-shaped API extension points to end users in the GraphQL spec proper. So it makes a ton of sense, the direction that Dgraph has gone with the language extensions there. And one more thing to add on there, absolutely agree with Josh. One more thing to add on is like, going back to 2015 when I first saw the spec and why I liked it was actually the response was what was really interesting to me,

Starting point is 00:14:06 the way responses were shaped, because they were responding with a subgraph, a JSON-based subgraph, but responding with a subgraph in the exact same shape as a query, right? And that has a lot of benefits, because if you look at a typical database, let's say, look at a SQL, right? The responses are always lists of things, right? But you kind of miss like how the list came to be, like who was connected to what and what's the result for, right? So you have to have another thing which would tell you that and so on and so forth. But in GraphQL, that relationship was never sort of lost, right? So you would say, let's say, go from movies to all the actors to the directors and the actors, or whatever and you can get the entire sort of subgraph back and you know exactly what the relationships were

Starting point is 00:14:48 in the response and that i feel was was really powerful um and what really caught my eye back in back in the days just just to add to what both of you uh have mentioned and specifically actually probably more to what Josh mentioned. Well, historically, I think part of the reason is, you know, because GraphQL came out of Facebook, and people there do have the tendency to, you know, name things after graphs as much as possible. But you're definitely right in the sense of, well,

Starting point is 00:15:21 it does answer to a graph structure, in the sense of having this interconnected mess of microservices, if you look at it from a distance. But that said, I think Manish is, well, actually, definitely Manish is also right in the sense that it's not a traditional, it's not a typical graph query language in the way that people would define. It's not the traditional, it's not the typical GraphQL language in the way that, you know, people would define it. It's not Gremlin, you know, in any sense.

Starting point is 00:15:49 Yeah. For the vast majority of end users, the word graph has no meaning to them for their interactions with this API right now. Like that's, that's part of the disjoint, but whether that happens under the hood and behind the scenes is a different story entirely. Yeah. Yeah. And it's an interesting point to explore, actually, what you hinted at, whether over time it may grow in that direction or even incorporate serving some kind of graph traversals maybe. That's, in my opinion, it's very far removed from where we are now. But let's see what happens.

Starting point is 00:16:22 Actually, before we get there, before we get to the graph database specific stuff, one intermediate point to explore would be whether it actually like a couple of years, two or three years back when I started seeing this emergent property, let's call it. frameworks such as Hasura that Manis mentioned before, or PoloQL, which at least back then was not so much geared specifically towards databases. Perhaps things have changed now, so Josh, you can fill us up on that. And there were also a bunch of other frameworks. And then eventually, databases started adopting GraphQL or adding GraphQL layers on their own. So we have Fauna and then MongoDB added support for GraphQL along the way and then YugaByte. And so it started getting more and more popular, let's say.

Starting point is 00:17:37 So how do you think GraphQL works as a database access layer? And to the best of my knowledge, Dgraph at the moment is probably the only database that relies exclusively on GraphQL. The others that I'm aware of, some of them do offer a GraphQL access layer, but they still have, you know, the kind of traditional, let's say, query language. So interested to hear your experiences in that space. Yeah, I can actually start out with this if you're okay with that, Maneesh. One of the tensions that we often navigate with APIs for databases is how much of the underlying representation on disk you're exposing through the API itself. And the more of that you expose,

Starting point is 00:18:25 the more performant you can expect queries to be, but the heavier the lift is for the end user to really be able to interface with that API and understand it. On the Cassandra project, we went through that where originally it was thrift, it was untyped, it was just, you know, wild west, do whatever you want. And then over time, we train, we move more and more into CQL with this notion of it's closer to SQL, people will understand it it it's a lot easier time adopting it and objectively it took power out of the hands of the end user by making it more user-friendly for them and throughout that same time from that same trajectory you see from a from a strategic perspective mongo's play was basically to say how do we most empower the end user how do we put the most flexibility and power into their hands

Starting point is 00:19:03 and then we'll figure out the technology side and we'll keep revising and investing in that over time. For me, this distills down to this question of are you optimizing for CPU compute time or optimizing for human heartbeats? Which of those two things is the resource that you want your API to really leverage and facilitate. GraphQL is firmly in the camp of being more human-usable and human-consumable than a lot of the bespoke. Again, I go back to the gremlin. But back to these other, like the extreme example, I don't know if the two of you are familiar with KDB and the Q query language. The thing is so bespoke to how it's actually working as a column or data store that the lift to learn it is massive, but it's hugely performant once you learn it. Whereas on the other extreme end, you can end up with programming languages and things that are targeting human expressivity

Starting point is 00:19:54 and the development investment required from people in that space. So I think GraphQL is kind of in the place where from a meta perspective, it isn't as close to the metal. It's not as interested in shaping things to the way databases are actually materializing data on disk necessarily out of the gate. And I think you see a lot of the relational database implementations, largely third normal form as a reflection of how these things are storing B3 based indices and other things on disk and giving power to the end user to be able to, in a human consumable way, figure out how to navigate that and have that be reproducible. I think GraphQL is taking a different stance on that. And we can see that with the adoption curve for it in

Starting point is 00:20:34 different demographics, how it's significantly higher, but there's no right answer with that at all. And there are certain problems that are large enough and that are complex enough that you really have to have that lift to be able to work with those data set sizes at scale. And the question just becomes where you calibrate on that continuum, I guess, as a technology, if that makes sense. Yeah. Thank you, Josh. Yeah. Agree with that. I think GraphQL's charm is that it's simple to work with, simple to understand. And therein lies its Achilles heel is that it's simple to work with, simple to understand. And therein lies its Achilles heel is that it's simple. And so when you want to do something more complex,

Starting point is 00:21:16 then you kind of like start to like have to work around it in various ways. And I think what people have done is that, you know, the whole resolver pattern and which is what sort of Apollo Roof, like we're a pioneer, the whole resolver pattern makes it possible for you to have all of that complexity hidden away from the consumers of GraphQL and it can then talk to anything really. But that does, the question of what happens to the performance, right? Now you are doing all this, then you end up in like this, this N plus one problems

Starting point is 00:21:46 or other kinds of problems that you run into. And then people sometimes complain about, hey, the GraphQL is not scaling as well as I thought it would be. But it's because, you know, you're doing graph operations on things which are not really built

Starting point is 00:21:58 for graphs, right? And I think like the reason, you know, so Hasura and Apollo and Prisma, they make a lot of sense because they're abstractions on top and they sort of like hide away that complexity of the actual implementation away from the consumers. But the reason I feel the GraphQL is becoming interesting for databases to implement is like if you look at a typical REST API, right, it's hard to define like what is it that will be useful to the user. So as a database, you have to think about, I think this is the type that will be useful and out of this type, these are the fields that the user might want. And that becomes like a permutation problem of like, well, some of them might not want this, some of them might want this. So it's very hard to build a REST API on top of a database to make it easier for people to directly consume it.

Starting point is 00:22:46 But with GraphQL, databases find that there is a way that they can sort of like, you know, match that need because the user can tell them this is what I want and the databases can quickly translate that to how they store data on disk. And so, plus that opens up their gates to like all of this front end developers who are really just sort of like,

Starting point is 00:23:07 really liking GraphQL. And that's why I think like, you know, databases have started to integrate GraphQL directly, which we could not see with a rest API, or I don't know, SOAP API or whatever, right? Before that came before. And Dgraph took that like sort of one step further right we were like you know what this this language is good we think we can we can really build a system around it because a lot of

Starting point is 00:23:32 what what what it's doing is traversals and and and filters right and and if you make that easier for people to do and make it more modern right compared to Compared to, let's say, as Josh mentioned, Gremlin, right? If you look at Gremlin the first time as a front-end developer or anybody like from the modern developer world, like they'll be like, what is this? Like I'm literally walking a graph. It talks in a graph walking format,

Starting point is 00:23:58 but you show them GraphQL, they're like, I can kind of make sense of this. I can sort of understand what's happening here. I can see how you're accessing stuff. And that's the consistent feedback, even with our modification, the DQL, which is a fork of GraphQL. Even the modification, we kept on getting the feedback

Starting point is 00:24:14 is like, it's very easy to work with. You know, we know exactly what's going on. We see like how things are being moved around and so on and so forth. And so I think GraphQL has played a bigger role than anticipated in the database journey, right? And I remember when we were starting with GraphQL, people were like, hey, but this is not a language, right,

Starting point is 00:24:36 for database, do you understand that? And we were like, yes, we do understand that. But now it's great to see so many other databases also adopting GraphQL because it just makes sense for modern developers. Yeah. And I will say to add on to what Manish is saying here, the resolver pattern and being able to do things in the server side with GraphQL is a key part of it because you can still leverage all of the database specific APIs and things you need to get things to scale. But the trick is it lets you decouple.

Starting point is 00:25:05 You can have a team that specializes in that and then surfaces data. And then the consumers don't have to actually know about any of what's happening in there. And whereas historically, like I, I have those pains firsthand in my career in the past where you're going through a shared service team and you're revving APIs and like everything is requiring data model changes. GraphQL is like the technical solution that allows those two teams to collaborate being lightly, lightly coupled. And that, changes, GraphQL is like the technical solution that allows those two teams to collaborate

Starting point is 00:25:25 being lightly, lightly coupled. And that lets you leverage whatever specific strengths the database has it's in the backend or the microservices or whatever it's connecting to without necessarily having to expose them to the end user, which is a pretty huge change. Yeah. Just to add, this is where we take a controversial slightly stance again, right?

Starting point is 00:25:47 So, I mean, we took one five years ago and now we're taking it again. Like, we love the resolver pattern. And it's great when you, like, as Apollo's layer, it's great because you can talk to, like, a microservice. You can talk to a database. You can talk to, like, file on disk, right? Who cares, right? You can talk to anything, really. And DGraph's stance is that, you know what?

Starting point is 00:26:04 Resolver patterns are great, but they are not as performant as we would like it to be. Because if you're building a database around GraphQL, you don't want resolvers. You want it to be as strictly tied to on this representation as possible. And so what we have done with the DQL, the fork, is that we have tried to expose as much as we can directly back to the corner of the query language. And with the DraftCloud now allowed a Lambda function that they can run if they need to do business logic that wouldn't be possible using DQL itself. So we have flipped around the resolve pattern to have a stronger query language and then have lambda functions that you can call to do more data massaging to return back to the end user.

Starting point is 00:26:59 And the beauty of that is like, it's just the use case demands it. And some use case demands that level of performance. And so that level of coupling is very, very empowering for the users that need it and other use cases don't. And this is something we constantly wrestled with with Cassandra as well, because it's masterless query coordination pattern

Starting point is 00:27:14 is essentially means you're funneling all your queries through a hop to get to where your data is actually stored and then in the big table, LSM tree in the backend. And there are certain use cases that were so demanding for performance. That's where you get into tunable consistency that's where you get into only go to one node so um it was also cassandra's achilles heel is that level of complexity and that level of optionality became very difficult for users to reason about because you're basically saying you know end user you now need to understand all the nuances of consistency and make your

Starting point is 00:27:41 choice um which is a lot of foot guns. But it is really interesting to see there's just so many different needs in terms of use cases across this continuum that all these different technologies can thrive, great. Yeah, and actually that's such a great point, just the one you made earlier about how GraphQL and the resolver pattern enables you to organically decouple different teams, basically.

Starting point is 00:28:09 Let's say the backend team and the frontend team. And that has traditionally been a huge pain point for projects where these two different sets of people had to work with each other. And GraphQL kind of offers a natural way out of this. But that said, the problem there is that, well, maybe GraphQL can serve, I don't know, let's say roughly 75 or even 80, 85,

Starting point is 00:28:32 whatever percent of use cases. You know, the simple, typical stuff like, you know, creates and reads and updates and what have you. What about the rest though? I mean, when, you know, before GraphQL, the way you could deal with that would be like, I don't know, the front-end people would go to the back-end people and tell them, well, hey, we need this complex query. Can you do something for us? And then the back-end people would, I don't know, build a view or, you know, implement some complex

Starting point is 00:28:59 group by logic or what have you and do something custom and expose it to the front end people. With GraphQL, how do you deal with that? And since, you know, taking into account the fact that as Manish said in his introduction, by design, GraphQL does not actually really cater to the really complex and bespoke stuff. Yeah, it has room for the complex and bespoke stuff, but you still have that pattern where an individual that is an expert in whatever the backing data store domain is has to build that retrieval.

Starting point is 00:29:32 And so you either have a full stack dev who's been, we see this pattern a lot with Apollo client, Apollo server, where a product team will be responsible for the entire stack of how they're interacting with data. And they'll have that business logic and that translation, et cetera, happening in Apollo server and the resolvers. But at the end of the day,

Starting point is 00:29:49 you can't get around having the expertise and the skillset of working with a specific backing data store. If you need the affordances that backing data store gives you and, you know, GraphQL, at least right now in its current form, isn't going to solve that. And that's where we get into conversations about predicate pushdown, about hoisting things out of databases into the GraphQL space, if that makes sense or not, et cetera. But certainly in its current form, like you can expect the workflow of a developer using GraphQL to be one that's exploratory and that is discovery based.

Starting point is 00:30:20 And if they find they don't have the data they need to compose together to build what they need to build, they either need to go and surface that into the graph or go to a team that knows how to do that with a request to say basically like, here's my use case. Here's what success looks like. Can you please surface this data for me? And in the past, that takes the form of a REST API being exposed or something else. And in GraphQL, it takes the form of it being introduced into the graph and then they can consume it and they can work with it. But at least for right now, there's no getting around the fact that different data stores have very, very different needs

Starting point is 00:30:49 and they have different languages than APIs to reflect that and to surface that. And somebody somewhere has to pay the piper to make that stuff available. I guess, Mainis, for you, you already kind of spoke to that.

Starting point is 00:31:04 For you, it's been a slightly different experience because GraphQL is all you have. You can't bypass it. All you can do is basically extend it. And that's what you've gone and done. I mean, if you look very carefully, you start to realize that DQL is really slightly different because it's a fork of it.

Starting point is 00:31:23 So it's actually different from GraphQL. But what it gives you is that, so, I mean, as Josh mentioned, you still have to pay the piper, right? If you want to do some of the complex stuff, for example, like the way we implement the official GraphQL spec is that, you know, you can do pretty much the,

Starting point is 00:31:40 like the schema that you have is how it's represented on the disk as well. And you can do most of the queries, but then you start to build something a bit more complex and you want to use DQL to achieve that. So what we have is like, we allow you to define a field in GraphQL, which then maps to what DQL query, right?

Starting point is 00:31:58 So in behind the scenes, we run DQL query for it, which is, which is not different from resolver pattern in some sense, right? There's something changing to something, but it's because it's so tightly integrated, like it's almost not a resolver pattern, right? So that's how we have sort of like tackled some of those issues where we can still implement and visualize sort of like provide a simple GraphQL API while doing the complex stuff behind the scenes by setting them up in the schema itself, right? And that sort of like gives you the role performance

Starting point is 00:32:30 that you care about while also giving the simplicity that you care about. And going back to some of the things that George mentioned about Cassandra and that's interesting journey for me, right, because I was, I've been looking at Cassandra for a while. I mean, there is this tussle between, hey, do you want simple and sort of like,

Starting point is 00:32:49 limited in what you can achieve or do you want powerful but complex to understand, right? And I think our response to that is that, our powerful still looks relatively simple because based upon GraphQL, but then you can make it simpler by doing this bit of translation between a GraphQL field to a DQL query and stuff like that. Yeah. And that's a really, really great point. You can choose specific frontiers

Starting point is 00:33:17 to democratize and make complex things simple by actually automating and building out infrastructure. It's similar to what Apollo has taken with their federation approach, as opposed to schema stitching, right? You have a lot of complex machinery in the backend to make certain use cases a lot more easily consumable and digestible. But then you have a, and it isn't a problem if it matches the use cases, but you have something of a calcification because those are the things that you can now do well, but you have lost the flexibility, same pattern from Perth to CQL, et cetera. And there's nothing wrong with that, right? That's about product market fit and finding the right kind of blend of what the technology and the API should offer. And then you get massive uptake from people saying, yes, and like validating

Starting point is 00:33:55 the way that you're doing this solves enough of a need for me that I'm going to adopt it at scale. It's definitely one of those things that we see happen over and over and over with technology, and especially in open source, see things forking off to express a specific opinion that matches a broad set of specific needs, but it allows you to go deeper and expose those things more simply by automating how that actually happens in the tech itself. Right. Another interesting project that emerged recently in that space was Stargate, actually. And Josh, I know that you've been closely involved in that. And I'm curious to hear your experience and to share how I first got exposed to that. So when I first saw it, my first thought was like, OK, so this actually looks a lot like GraphQL. So why reinvent the wheel?

Starting point is 00:34:45 Then I realized, okay, so it's actually not exactly reinventing the wheel. It kind of wraps GraphQL, but then it also allows you to do other things. And then it started making more sense. And then I started looking at it as more like, okay, so this could actually be GraphQL for databases because that's kind of what it seemed to me it was built to do. So if you can share a little bit on, you know, having been, I guess, closely involved with that, a little bit of the thinking that was behind it, and

Starting point is 00:35:18 whether you see that this could possibly grow into actually being a GraphQL specifically for databases or not? Yeah, I think that's a great point. So the evolution of it was essentially going from having a bunch of microservices that were serving different APIs in front of Cassandra to saying, like, you have microservices that are going to a coordinator, which is essentially a very fat microservice, which is then coordinating queries to storage engines. Why do you have two things doing the same job?

Starting point is 00:35:45 And the coordinator in Cassandra actually has, it's essentially Dynamo, right? It's open source Dynamo for the masses in terms of its consistency levels, in terms of its integration with Paxos to have masterless consensus. There is a future in which that technology really could be, I view it as the marriage of GraphQL and Dynamo.

Starting point is 00:36:04 And so that's bringing Paxos, bringing consensus, bringing the query coordination algorithms and replication factors into the GraphQL space and actually having that be separate from whatever storage engine you choose to put behind it. And I mean, there definitely is a path forward for it, but that's a very complex problem and that's a big lift. And there's no getting around, like you see the same thing with Cosmos and on Azure, where it's like you have six different types of consistency levels. And so you've got to reason about those and you've got to know like what your serialization needs become. You've got to know what your acidity guarantees become. So this goes back to the paying the piper metaphor. Like there isn't really a way to expose the specific things that make Dynamo so powerful in a simplified fashion that's digestible for people. I think that's going to be the big existential lift there is how can you actually distill those things into service the majority of the use cases by making these easier to reason about and putting them in the hands of the users,

Starting point is 00:37:05 much like DGraph is doing with graph-related constructs, right? How do you simplify it, but still provide 90% of the power and then take the other 10% and hide it behind advanced users and customizability? There's definitely a future for Stargate for sure. And certainly for me, the more I went through that, the more I started questioning, like, why even have REST support? Why even have support for other APIs? Like, why not just have GraphQL as the API that this thing supports? Because GraphQL really does allow incremental adoption as well and slowly migrating things over. So that provides a, anyway, so that's part of my genesis and my journey personally of just seeing like, okay, GraphQL, I can see why it's got the same kind of gravity well as Kubernetes does.

Starting point is 00:37:48 Like it's the right technical solution to a complex technical problem. But yeah, that's kind of just a bit unfocused, but a scattering of thoughts about Stargate and its potential future and its architectural role. Cool, thank you. Manish, any thoughts on Stargate? I was just looking at Stargate as Josh was talking. It definitely looks really, really interesting. I don't actually have much context on that, but it's all the stuff that,

Starting point is 00:38:14 I mean, if it's doing like backsells and if it's doing like some of those guarantees, that's exactly what Dgraph have been doing as well. The Dgraph is pretty inspired by Spanner, right? So it does like consistent replication and distributed transactions and provides some pretty solid guarantees and so on and so forth. So definitely, I mean, it seems like if Stargate is doing something similar,

Starting point is 00:38:39 worth checking out. Cool. And that was my thought. That were my thoughts when I saw it. I was like, okay, so even though it was custom built for Cassandra, really, I thought, you know, this may have a chance of becoming somewhat of an industry standard, let's say, on how people can access databases using GraphQL. Actually, Stargate doesn't only support GraphQL, but I think it's the most prominent or maybe most easy way to use it.

Starting point is 00:39:11 Yeah. I think the, one of the big questions you run into with adoption of technology is the, the friction and inertia against the users getting involved with it and tinkering and, and branching and making their own functionality in it. And that's one place where I'd say GraphQL being in the, you know, the GraphQL implementations and the TypeScript space kind of have a leg up. Cause the Cassandra code base is 12 years old at this point.

Starting point is 00:39:32 And it was Lakshman's research project internally from Facebook that they were using for their inbox search. So like the code base itself has grown a lot and has certain concerns that are completely unrelated to people putting their own arbitrary like data backends in there, which is something that we've wrestled with the project for some time. You can see some work about pluggable storage engines and realize everything is all wired together inside that code base for the purposes of performance. So whereas you can add

Starting point is 00:40:00 other resolvers pretty easily in the GraphQL space and other things that you're querying and to Manish's point, like you can actually build out to where you're querying, you know, DQL in the backend or DGL based on the things that you're putting with a front-end API. That's actually a significantly bigger technical lift to do that at least today. Because, you know, in Stargate, for instance, you're not allowing people to run like sandbox JavaScript. And that's a big impediment to product development teams to have to work in a polyglot environment in different languages to actually add extensions to their data translation layer. So that's one of the things we've seen massive uptake on with GraphQL, just because the number of JavaScript developers in the world is just explosively growing in insane numbers. So the

Starting point is 00:40:40 obvious answer becomes meet the people where they are with the technology that they know. So I think for Stargate to have that kind of adoption, they would probably need to be considering the support of other languages and other developers that are using different technologies and how to integrate that correctly with their pipelines. Yeah. And it's also, you know, very much, very much not really a technical thing, but rather, you know, community building and evangelizing and all of those things. And it's up to the people, I guess, who came up with Stargate, whether they want to do that and to that extent

Starting point is 00:41:15 and how successful they can be with that. But just, you know, just looking at it from the purely technical point of view, I just thought, well, okay, this could work that way. Absolutely. Another question I had for you, I guess, Josh, mostly was, well, I know that you're fresh with Apollo, but if you have any kind of feeling as to how much people that use Apollo are actually interested in specifically accessing databases πόσο πολλοί που χρησιμοποιούν το Apollo είναι ενδιαφέροντα σε ειδικά εφαρμογές για εφαρμογή δεδομένων

Starting point is 00:41:48 και πόσο το Apollo μπορεί να εμφανιστεί σε αυτό σε σχέση με το πλαίσιο που είναι ειδικά προσδιορισμένο σε αυτό το πλήθος, όπως το Hasura ή τα άλλα που αναφέραμε. Αυτή είναι μια ερώτηση που δεν έχω πολλά κατάσταση για, towards that goal, like, you know, Hasura or the other ones that we mentioned earlier? That is a question I don't have a ton of context around because I've been looking in that space right now. So Apollo Client, Apollo Server, Federation, et cetera, the different projects that we have, that we've open sourced,

Starting point is 00:42:17 aren't actually touching the databases themselves. And so we're not taking that approach necessarily of scaffolding and saying, hey, if you're a Postgres developer, here's the plugin that you should use to interface with Postgres. Or if you're Cassandra, here's the plugin you should use there. Right now, that's distant from what we're focused on. Largely, what we're focused on is if you have two teams together, if you've got five teams together, in the case of Netflix Studio, if you have 70 teams, how on earth can they all collaborate across this backend tapestry of data stores? So that's a big enough problem to focus on that I don't think we've gotten into the

Starting point is 00:42:48 optimization or the enablement for certain database communities at this time, really. Okay. Fair enough. Okay. So yeah, covered lots of ground. And another topic that I wanted to explore before we get to wrapping up was where do you think this space is headed towards? So the areas that you think GraphQL should be addressing and where you see that moving in the future. So I know Josh, you started with mentioning the prospect of potentially extending to cover things such as graph algorithms. And I don't know how close or how far we possibly are to that. But just interested in hearing your thoughts as to where do you think the people in the GraphQL committee are going with that? And where would you personally like to see this go? For where I think it's going right now, there's some rumbling lately and there has been for

Starting point is 00:43:51 multiple years about namespacing, about the ability to have multiple subgraphs that are all coming together and the ability to essentially segment data off from one another without clashing with each other, et cetera. And this comes back to that whole multiple teams working with a shared data graph. What are they going to do there? And some of the tension that you see is the spec coming out of Facebook and the Facebook engineer still being a large part of the steering of GraphQL as a spec. The question becomes, do they need it at their scale? And if they don't need it at their scale, why should anyone else? And then you get into the debates about the patterns of usage of the technology. And oftentimes what we see in open source communities is that really large companies have the engineering resources

Starting point is 00:44:33 and skills and capability available to them that most enterprises don't. And so there's this tension between open source projects that are tailored towards the usage of a single company or a small handful of companies with exceptional engineering capabilities, and then other people who have a job to be done and their attempts to get that job done. So that's the way I see things going just from the pressure that I see from the individual developers and what they're running into. From a personal preference, I'm a firm, huge believer in the democratization of graph, the fundamental computer science graph power into the hands of end users

Starting point is 00:45:11 without them having to go into all the lift of learning the details of how it works. And I really do think that the first group of people that crack that nut, that figure out the right way to democratize that. I mean, you're talking about the democratization of the business models of some of the biggest companies in the world.

Starting point is 00:45:26 They are based on graphs of information and the massive network effects they're in. And so the first group that manages to technologically democratize that to end users is gonna be like, that's lightning in a bottle. Like they're gonna be off to the races. So I'm personally incredibly bullish on that as a topic, just because like again and again, we see that from a business perspective, the implications are massive, but it requires incredible specialization and lift from engineers and from integration with your business to get there at that point.

Starting point is 00:45:56 So now whether or not GraphQL evolves in that way towards, you know, the empowerment through graph technology to end users, not sure. But, you know, I'd like to see it moving in that direction, certainly. Cool, thank you. And I think really great point Josh, and I think to that point, I think that's the vision for Dgraph is that, we can actually take this complex graph stuff that people are afraid of and make it usable

Starting point is 00:46:21 and make it accessible to developers and bring it to the modern world, which is why we wanted to choose GraphQL as a query language, as opposed to, you know, some of the others which are available, right? The way, you know, I think about some of the problems that exist for GraphQL, you know, we have largely been able to sort of like resolve those, for example, George mentioned namespaces, right? So Dgraph supports multi-tenancy and you can have different, a single company could have like different GraphQL schemas for different data

Starting point is 00:46:50 in the same, in the same place. And they could even sort of like, you know, be able to, to query across depending upon what they could want, or they have access control to not be able to query, right? So there's, there's certain things, basically the problems, the way we have been approaching all of this is that if we see a problem with the spec, we devise a solution. We don't try to push hard to bring it back to the official spec because, you know, we feel like there's a lot of like, you know, complexity that you can add to the languages, but we have done to make it more powerful. But at the same time, if some other third company, which doesn't care about these

Starting point is 00:47:30 things, they don't want to have to build it now because the spec says that you should build it this way. For example, we have variables, right? So if you look at SQL, SQL has variables, you can say CS count, et cetera, et cetera. Then again, use that C in the other part of the same query, and it's much more performant than having to run two different queries or three different queries. Now our DGraphs query language supports that, right? But do we want that back in GraphQL?

Starting point is 00:47:55 Don't know, because that actually is a hard problem to crack for everybody else who's using the GraphQL spec. So I think like, you know, I think my, where know, I think my, where I see GraphQL growing, I definitely see it being used by a lot, but I think it's being used by a lot of people

Starting point is 00:48:10 because of the simplicity of it. And what are we do to it? We should just make sure that it remains simple and easily like consumable and implement table by different people. I would like to see some standardization around some common stuff, like let's say filters and the way filters are run across different implementations. Like some of that is like I feel is generic enough that it's worth putting into the standard.

Starting point is 00:48:35 But I just worry about, you know, having the standard getting too complex. And because that actually does not benefit anybody, right? Everybody gets like off by complex standard. And if you look at rest, right? Rest is extremely simple to understand, you know, nothing complicated there. And we want GraphQL to be similar in that approach. And one other point to that, to add onto that,

Starting point is 00:49:04 that I think is part of what makes open source so powerful is the ability to fork things and express a different opinion and then let the market decide what to do with that. The core of GraphQL remaining simple, remaining consumable, and having the right primitives that other opinions can actually be built on top of that is crucial because that's fundamentally the same thing Apollo is doing, the same thing that Dgraph is doing. You see that in the Cassandra ecosystem, exact same pattern. That's part of a healthy open source ecosystem is the branching out and then the bringing back if there's a critical mass of adoption with certain opinions that are expressed there.

Starting point is 00:49:36 So yeah, it's really good to see that the core of GraphQL remaining stable and simple and solid is facilitating all these different exploratory patterns and businesses that branch off it. Like that's really one of the core powers of open source ultimately. Just a follow-up question for Manister. And I was just curious whether you have seen people basically being drawn to Dgraph coming from, you know, people that have been using GraphQL and were, I don't know, frustrated, let's say, but by things that they could not express, that they could not

Starting point is 00:50:14 do using GraphQL and discovering your query language and being drawn to the database that way. Absolutely. I think if you look at our roadmap, the biggest request that we got from GraphQL users was a better integration with DQL, the GraphQL language. And so that's the top item on our roadmap for this year.

Starting point is 00:50:35 We already had like multiple ways to call DQL from GraphQL, but our users were like, well, it's still not as simple as I want it to be because they do want to tap into the power of DQL. Then we started like supporting the official spec. Initially we felt like there'll be a clear distinction between the DQL users and the GraphQL users, GraphQL spec users.

Starting point is 00:50:55 But we realized like the moment GraphQL spec users like started to come in, they immediately wanted DQL. They're like, we're coming in because of the power of DQL, but we also want a simplicity of the official spec and how it works with the rest of the ecosystem. And so that is what we are seeing people using DQL to express complex stuff. The other stuff that we are seeing is graph users, traditionally very graph users are coming in and they want GraphQL. And so we are seeing people coming in from two different streams. The GraphQL user is coming in asking for more BQL graph stuff and GraphQL user is coming in and asking for more GraphQL stuff.

Starting point is 00:51:31 And there's like a broad mix of people who are using both of them interchangeably. And that's sort of like the big roadmap for us for this year is to make it really seamless and make sure that, you know, it's easy to easy to pick and choose and build a basket of all the different functionalities. To Josh's earlier point about having, for example, the ability to integrate graph analytics or graph algorithms in GraphQL, you're probably the one person that's closest to realizing that. And I wonder if it's in your roadmap at all. Oh, yeah. So the way we have...

Starting point is 00:52:17 Sorry, you mentioned Josh and I was wondering if it's for Josh. So the way we have done that is that we have done all of the graph algorithm stuff in DQL. And now we have made a very simple way by which you can call DQL from GraphQL. So for example, if you want shortest path, you could create a field which is called, let's say, shortest path or whatever. And it would internally call the DQL query, which would give you the shortest paths, right?

Starting point is 00:52:41 And then you can traverse it just like GraphQL without knowing that you actually had to call, you know, the consumer of the API does not need to know that it actually was calling the shortest path in DQL. So yeah, so that's our approach to provide, bringing the power of Graph into GraphQL user. And that's the interoperation that we want to do more of. So another example, languages.

Starting point is 00:53:05 So Dgraph, actually Dgraph at Google supported languages because we have interfaces in different languages. So we could have Greek, we could have German, we could have English. But you might not have the translations for every language. So we had a way by which what we could do is we could say, give me the name in German. And if German is not available, give me the name in English. And if German is not available, give me the name in English

Starting point is 00:53:27 or even if it's not available, give me the name in Russian or vice versa. And so Dgraph build that support very early on. Now it's relatively unheard of in the GraphQL ecosystem. Like there's no native support for human languages there, but people are asking us for it. And so these are some of the other things that we actually want to bring to GraphQL as well,

Starting point is 00:53:47 by which you can have that automatic, you know, fallback mechanisms of switching between human languages to build your interface for different consumers in different demographics. So some of these things, which are pure Graph stuff and pure even Dgraph stuff, are it's being it's being accessible to a GraphQL user of dgraph. Cool thank you okay well I think that's

Starting point is 00:54:14 that that's even more than what I had in in my list so thank you thank you both for a very very interesting discussion and yeah that's that's it from my side. If you have any closing thoughts or comments you'd like to share, feel free. Otherwise, thank you both, gentlemen. Thanks for having us, George. Yeah, thanks, George. It's great to meet you both.

Starting point is 00:54:36 It's been good. I hope you enjoyed the podcast. If you like my work, you can follow Link Data Orchestration on Twitter, LinkedIn, and Facebook.

CODACE Plant Stand

Orchestrate all the Things - Databases, graphs, and GraphQL - past, present, and future. Featuring Manish Jain, Dgraph CTO and founder, and Josh McKenzie, Apollo VP of Software Engineering

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.

Your Ad Here

CODACE Plant Stand

Orchestrate all the Things - Databases, graphs, and GraphQL - past, present, and future. Featuring Manish Jain, Dgraph CTO and founder, and Josh McKenzie, Apollo VP of Software Engineering

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.