Epicenter - Learn about Crypto, Blockchain, Ethereum, Bitcoin and Distributed Technologies - Yaniv Tal: The Graph – A Marketplace for Web3 Data Indexes Based on GraphQL
Episode Date: April 16, 2019At the core of the Web 2.0 stack lies the REST API. It's the fiber which allows frontend applications to communicate with their backend counterparts, as well as the services on which they depend. But ...the API model is highly constrained and inflexible. The API is divorced from the data model, which creates a number of restrictions and inefficiencies. Most blockchain clients, including Geth, Parity and Bitcoin Core, use a JSON-RPC model which suffers from similar issues. Several Ethereum DApps maintain high-availability, centralized data indexes which sit between the client and the blockchain. Thought user experience is greatly improved, the practice means most of the ecosystem relies on centralized infrastructure. We're joined by Yaniv Tal, Project Lead at The Graph. The project aims to create a scalable marketplace of robust and high-availability blockchain data indexes. Relying on the modern GraphQL data query language initially developed by Facebook, The Graph allows developers to make complex queries to a robust and high-availability data infrastructure. Launched as a hosted service earlier this year, The Graph plans to move to a decentralized model in the future. Topics covered in this episode: The vision of The Graph and why the team chose to work on this problem The REST API client-server model in the Web 2.0 paradigm The state of the Ethereum ecosystem and the challenges relating to data availability How DApps work behind the scenes and their backend infrastructure GraphQL as the evolution of the API model How The Graph addresses the issue of data querying and availability Their hosted services and plans to move to a hybrid model How The Graph addresses privacy and scalability The incentive mechanisms and economics related to data integrity Early applications and the project's near-term roadmap Episode links: The Graph - A Query Protocol for Blockchains The Graph on Medium Browse and Explore Subgraphs Graph Docs GraphQL Will Power the Decentralized Web The Graph’s Research and Specifications are Now Open Source Our Investment In The Graph - Multicoin Capital GraphQL Yaniv Tal on Twitter Sponsors: Cosmos: Join the most interoperable ecosystem of connected blockchains - http://cosmos.network/epicenter This episode is hosted by Brian Fabian Crain & Sebastien Couture. Show notes and listening options: epicenter.tv/283
Transcript
Discussion (0)
This is Epicenter, Episode 283 with guest, Yanivthal.
This episode of Epicenter is brought you by Cosmos.
Cosmos is building the internet of blockchains, an ecosystem where thousands of blockchains can interoperate, creating the foundation for a new token economy.
If you have an idea for ADAP, visit cosmos.network slash epicenter to learn more and to get in touch with the Cosmos team.
Hi, welcome to the Epicenter. My name is Sebastian Kui.
And my name is Brian Pavan-Kain.
So today we have Yaniv Tal as our guest.
Yaniv is the project lead at the graph.
You may have heard of the graph.
It recently launched.
And the graph is interesting because it solves pretty prevalent problem in the Ethereum space.
And that is the centralization of Ethereum.
No, I'm being facetious here.
But as many of you know, DAPs that are building on Ethereum are having to rely on index
data. So, you know, the blockchain itself isn't actually built to, you know, sort of index
data and have that data readily available for DAPs. And so what many DAPs are doing is that
they're building quite centralized infrastructure on servers and allowing DAPs to query that
infrastructure with traditional Web 2 APIs. And so this is obviously a problem if the goal of
Ethereum is to build a world's computer that's totally decentralized.
So the graph is taking an interesting approach in that they're building a graph database on top of
Ethereum as a layer on top of Ethereum.
And we sort of go into what a GraphQL database is, if you're not familiar with that technology,
and allowing DAPs to query that directly, so making it much more efficient.
And so their goal is to have this become a decentralized marketplace for indexes.
of the Ethereum blockchain.
So it's an early project.
I think generally we try to wait a little bit longer
in a project maturity to bring the project on the show.
But in this case, we just thought it was such an interesting project
and felt that it was a good, at least now a good point
to have them on and perhaps it'll have them on again in the future
when the project fully goes live and is fully decentralized.
So last week I solicited contributions for the Wikipedia article.
That solicitation is still open.
So if you want to contribute to the Wikipedia article that is in draft, you can go to
Epicenter.
dot rocks slash wiki and you'll get directly to the article.
Another thing that we should mention, and if you're hearing this, that's good news because
we've updated our podcast hosting service, which means that our feed has changed.
And so your podcast app should have reset your feed.
So you should be hearing the new feed.
One thing you'll notice that's different in the feed is that episode numbers are no longer in the episode title.
And that's simply because modern podcast players have a native field for episode numbers.
So you just won't see the episode number in the title anymore.
If you're not seeing the episode numbers and you want to see them, then I would recommend
and you know, updating to a more mature podcast app.
So like the iTunes app supports them,
pocketcast supports them, this sort of thing.
So just wanted to mention that.
So without further delay, here's our interview with Yaniv Tal.
So we're here today with Yaniv Tal,
and he's the founder and the project lead of the graph.
Now we're going to speak with Yaniv about,
you know, a lot of different things about querying
and, you know, probably,
one aspect of the decentralized application, Web 3 stack that a lot of people are not
all that's familiar with. So thanks so much for joining us today, An Eve.
Thank you so much for having.
Oh, maybe where we can start is if you can speak a little bit about how you became
interested in blockchain and, you know, the perspective you brought to blockchain.
Sure. So I first started getting interested in the idea of blockchains in 2011. I think,
You know, following the Occupy Wall Street movement, I became very interested in thinking
through how we could improve governance and the economy through digital money.
But I didn't actually make it like my day job until 2017.
So a lot happened in between.
I spent a lot of time at different startups.
I did my own startups with my co-founders here at the graph
and focused on developer tools for quite a few years.
And the perspective that I took into blockchains
was trying to figure out how to make it easier for people
to build software just in general.
So we had a startup where we were doing React developer tools
to make it easier to build user interfaces.
We got into functional programming,
looking at building on immutable data.
And along that path,
what got me really excited about Ethereum
was the idea that for the first time
we could have this kind of global, immutable database
that isn't just bounded by a single organization,
so that everybody can agree on a global state of truth.
And that to me was just really accepted.
as being like a foundational primitive for just how software development could move generally.
What was interesting, I listened to another interview you did,
and it was interesting to hear you speak about blockchain kind of from this perspective of,
okay, developer efficiency, you know, because generally when people speak about
developing blockchain applications, everyone complains, oh, it's so inefficient, so hard to build.
And, you know, there's these other benefits like maybe decentralization,
and you put up with all of the hard time of developing something
because there are those other things that make it so much better.
And now you're kind of looking at from a perspective,
oh, actually, it can make it easier to develop.
So I thought that was super interesting to hear.
That's exactly right.
And I think that this immaturity is just because it's a brand new platform.
And we've spent like 20 years building up the web platform to what it is today.
So at the beginning, people were building their own servers, web servers in like C++.
And that wasn't easy, but they put up with it because, you know, the web was exciting for them.
And so similarly, I think, you know, we are in the infrastructure phase.
And, you know, I think as we build out more of this infrastructure,
there's no reason why it wouldn't be even easier to build on Web3.
than it is on Web 2.
And I'm quite confident that that'll be the case.
But there's plenty of work that needs to be done
before we get to that point.
So why did you guys decide to start working on the graph?
Well, we got interested in Ethereum seriously
in early 2017.
And we started building different DAPs
And we quickly stumbled upon this problem that it's actually really difficult to get data that you need to power a web or a mobile app directly from an Ethereum node.
You know, I think we got into the space probably for a lot of like idealistic reasons, the same reason a lot of other people got drawn to the idea of decentralization.
but the graph specifically was basically partially realizing that this indexing and query layer is completely missing from the Web3 stack today.
And we've actually spent a lot of time working on this part of the stack at previous companies.
So, for example, at our last startup, we built a custom framework on top.
top of an immutable database called Datomic.
And we built this GraphQLI query language on top of this immutable database.
And we just did that kind of out of passion just because we're always trying to refine
what's just the best way to build software, what are the right abstractions just for building
applications and so you know we'd already spent a lot of time thinking about that
before we got into Ethereum and then when we saw that this part of the stack was
missing for Web 3 you know that we decided that's where we focus our efforts
one thing I thought it would be interesting to speak a little bit about and
that that I thought was really fascinating is the way you guys described kind of
the problem of you know the way databases and APIs work
in, you know, Web 2 and, you know, the kind of effects of that, such as rigidity of APIs,
duplication of data and effort. Can you explain a little bit, you know, how that works?
What's the status quo here? And what are some of the downsides of the status quo?
Yeah. So, you know, today every Web 2 company is basically running a fully vertical stack,
right? They manage the infrastructure, the database.
the application, you know, the user interfaces.
And you really have to, you know, trust these companies
to kind of, you know, continue to manage this data
and make it available to their applications.
And, you know, you end up with these kind of data silos
where, you know, the model for Web2 is, you know,
you build some kind of, you know,
process around some data and then you restrict access to that data.
And that is generally called like a moat.
And how well you're able to build a moat basically determines how successful of a company
you are in Web 2.
And I think that's just very limiting for enabling innovation to continue.
And if you kind of look at this.
larger arc of, you know, software development, it's taken maybe 20 years to really figure out
how to build applications that people really want to use, you know, even just on the web.
And, you know, we've seen the transition from, you know, PHP to, you know, Ruby on Rails to
client-side frameworks and microservices and data science and all of these different disciplines
where we're just figuring out how do you actually deliver software at scale globally on the web?
And the companies that figure this out were able to build these modes and data silos
where they control access to that data.
But we've kind of gotten to a point now where, first of all, these are more or less solved problems.
And so people now know actually like how to build full stock web applications that scale globally.
And yet we've continued to have these kind of, you know, monopolies that now kind of control access to that data.
And so it seems to me that the next kind of phase here is to basically commoditize what, you know, we've collectively learned.
and to actually, you know, make that data more generally available so that we can continue to make progress.
And then the thing that you guys described, I thought was nice, no, is if you have like the current paradigm where, you know, I have some data and maybe I make an API available, but then I have to decide, you know, what, what gets exposed in what way.
and so I have to somewhat anticipate what people are going to try to build with it, right?
And then if they try to do something else, it doesn't really work.
Yeah, so that's a big problem with REST.
So basically, the way REST has kind of evolved is initially the idea is you have one endpoint per resource.
And that way, if you have some interface that requires you to combine multiple resources together,
then you can make several round-trip calls to each of those different endpoints,
and that's how you get your data.
And as we try to build applications that load really fast,
we realize that you actually can't make all of these round-trip requests
because otherwise you're going to be staring at a spinner for too long.
And so what companies have ended up doing is building custom endpoints
for their different user interfaces so that you can load,
the data that you need really quickly.
But the problem is that now you have to basically,
you have this tight coupling between that one API and the client.
And so if I'm building a new feature,
suddenly I need like, you know, a new field to put up in the interface.
I have to go talk to the server engineer, ask them to add that field,
and it really slows down the pace of development.
And so, you know, Facebook is a massive company.
they're constantly adding features and building new products.
And they, like everybody else, you know, was grappling with this limitation.
And because of that, they invented GraphQL, which is a query language that is really built with clients in mind
so that you can specify a schema upfront on the server.
And then the clients can request exactly what data.
they need and get just that data back in a single response.
Okay, that's a good explanation.
I think one way of looking at this issue is that APIs themselves are sort of data silos.
So for every resource, there is a data silo.
If the company building the API has actually need for that silo, they'll build it into the
API or if they think the users will have need for that silo, they'll build it into the API.
So when we look at this is to say, okay, if I need to know the data relative to the organization,
which is attached to a user, well, first I have to make a query to the user endpoint, which is kind of a silo in itself,
and then that will return the data with the user, so name, you know, bio, photo, et cetera.
And then an ID for an organization, which I then need to query to retrieve the information about that organization.
So in this case, it would require two queries, whereas with something like GraphQL, you could do this in one query.
So at this point, maybe let's dive into GraphQL and how that sort of changes the paradigm with regards to SQL and how we've been doing things for the last 20 years plus.
Yeah, so GraphQL is a query language that is really built with application developers in my.
And so you can define your schema, which is basically all the different entities and how they relate to each other.
So in the example you gave, you say this is an organization, an organization has many members,
and these are the different fields that organizations and the members have.
And then depending on what screen I'm building, what particular user interface I want to build,
I can specify the fields that I want, and I can seamlessly traverse.
across these relationships so that I can get whatever data I want back in just a single response.
So it's effectively putting a query language at the API endpoint itself.
So rather than having a database that's within the company, you know, siloed, closed off,
and then you need to build the API endpoints for each of the queries that you think you're,
users is going to be using, it's effectively putting the query language right at the edge where
the user can access it directly and make those complex queries in a very seamless kind of way.
Yeah, that's exactly right.
And there are a few restrictions on what you can do.
So it's not as generally powerful as a queer language like SQL, but it actually maps really well to what you need if you're building an application.
So one of these limitations is that you basically need to know
how all of this data is structured ahead of time.
So you have to define that schema.
And if you have data that you want to be able to query,
you can only query according to that schema.
But that actually is how most applications are built.
So you generally know how your different entity types
relate to each other ahead of time.
And you can, for example, denormalize your data
to make aggregations and various computations
that you want available ahead of time.
You can also parameterize these things.
So GraphQL allows you to, for example,
if you want to query a collection,
you can include parameters.
So you can do, for example, pagination or filtering
or any of these kinds of things.
But essentially what you get back is the J-Sahel
an object that exactly matches the structure of those entities and fields that you requested.
This episode of Epicenter is brought to you by Cosmos, the internet of blockchains.
Cosmos is live and we couldn't be more excited to see so many projects already building on it.
Blockchain technologies are evolving fast and development shouldn't be one size fits all.
As a DAP developer, you need the tools that will allow your DAP to scale, grow, and evolve over time.
The Cosmos SDK is a user-friendly, modular framework which allows you to customize your DAP to best suit your needs.
It's powered by tenement core, an advanced implementation of the BFT proof-of-state protocol.
Cosmos takes care of networking and consensus and allows you to focus on building your application in your language of choice.
Ethereum smart contracts will be supported soon, and the SDK makes it simple for you to connect to other blockchains in the Cosmos network.
If you have an idea for ADAP, and would like to learn more about the Cosmos SDK, or if you'd
like to connect your existing dot to Cosmos, visitcosmo.network slash Epicenter.
For Epicenter listeners, the Cosmos team will reach out to answer your questions and help you
get started. We'd like to thank Cosmos for the support of Epicenter.
So there's a nice blog post and we'll link to it in the show notes too, where your co-founder
talked a lot about, you know, kind of the benefit of GraphQL. And one of the ways he explained
it was that with SQL-based databases, you have this overfetching and underfetching problem.
Can you explain what that is and how GraphQL solves that?
Yeah, so this is a problem with most Rest API endpoints.
So typically with Rest, if you request a resource, you get back all of the fields that the server
engineer thinks you might want for that resource. And so it might be that I want to render a card
that just has a user's name and profile photo, but if I request the user objects, I actually
get back in a hundred different fields, everything that the server knows about that user. And so that's
not very efficient. The converse is maybe I want the user's name and photo and also the count of
their friends, but the only way that I can get back the friend information is if I, you know,
actually request all of their friend objects. And so now you're also sending me back, you know,
one friend for, you know, object for each of those friends. And so, so these are the types of
problems that you end up with with REST APIs all the time. And so, you know, GraphQL solves that
very elegantly by just sending you back the data that you,
you're requesting. No more, no less.
Okay. And so then this is something that Facebook basically built because, okay, of
because they have so many queries internally, they were just, okay, we need to like save
costs, safe bandwidth, making more efficient and build this kind of query language because
of that. That's right. The other thing that Facebook deals with is, you know, with mobile apps,
people tend to run old versions of these applications. And so they're supporting, you know,
probably hundreds of different client versions, if not more.
And so that makes it really difficult if you have this tight coupling
between the user interface and the API,
because the backwards compatibility becomes a nightmare to support.
And this is actually quite similar to the situation that we have in Web3,
where you have a protocol that is going to
to be supporting a large number of clients.
You know, one of the big benefits of Web3 is you can have a lot of different applications
built on top of, you know, these protocols.
And so to be able to support, you know, multiple applications and multiple versions of those
applications, you end up having the same exact kind of situation.
So in one of the blog posts that you've published, you mentioned that GraphQL will likely be the
preferred query language and database of the decentralized web.
What did you mean by that and why do you think that is the case?
GraphQL has been rising in popularity just in traditional web development.
So there's a lot of companies like GitHub and Twitter and Yelp and many others that, you know,
have already switched to GraphQL.
And so this is already like a really big
trend happening in web development generally.
And really what you need when you're building applications is essentially like a standard
for how you want to access your data.
And you basically need an abstraction.
And we believe that GraphQL basically is just kind of the right level for this sort of abstraction,
where you don't want to have to know.
you know what blocks some data was you know updated in and all of the like
internals of a blockchain if you're trying to build an application and so having
an abstraction on top that is just the data model is we think you know the right
level for how people will choose to build their apps and then you can kind of you
deal with all of the implementation details on the backside of that GraphQL API.
All right. Now, switching over to DAPS then, so how do these problems relate to the DAP space?
I mean, because in the DAP space, we don't have databases or SQL or any of that stuff.
We're just curing the blockchain, right? You know, why would we need a database on top of the blockchain?
Yeah. So, you know, a lot of the applications that were built in the early days of Ethereum were very simplistic.
But you're exactly right that basically this indexing and query layer was completely missing from blockchains.
And, you know, I think it's just a function of, you know, these blockchains have a lot of work to do.
It's a lot of work just to maintain consensus.
And so, you know, Ethereum, for example, is focused on these like scalability problems
and how to make it so you can build these smart contracts.
But indexing is very much its own kind of layer in the stack
and its own problem altogether.
And I remember when people started building daps
on top of Ethereum, maybe there's only 10 transactions
that have gone into the smart contract.
And so you want to view whatever application is.
And it's easy to just show 10 things on a screen
and feel like everything's OK.
But if you remember, you know, the applications that started taking off, suddenly, you know,
there's like a hundred things on the screen, a thousand things on the screen, and suddenly you just
have this really long list and you're scrolling really far.
And, you know, anyone who's used like a good web or mobile app would never think to themselves
that that's an acceptable experience.
But, you know, we're, you know, we've been kind of building toys.
And it's kind of this growing up process to go from having like a proof of concept to having an
application that people are happy to use. And I think in that transition, a lot of people have
realized that like, wait a minute, you actually need to be able to, you know, search for specific
data and filter and sort and pageate. And these are all things that you need to have
indexes to enable. And so one company that I think has a quite significance in the Ethereum space
is Infura, right, which is a consensus project that serves tons of API calls about, you know,
is that basically the problem that they aim to solve as well?
No, so infura doesn't do any indexing.
Infura is just kind of a managed Ethereum node service.
So it is nice to not have to sync a whole node in order to start interfacing with Ethereum.
And so that's the problem that infura solves.
but then the indexing problem actually happens a level on top.
And so assuming that you can talk to an Ethereum node,
you basically have access to the JSON-RPC interface.
That's the interface that the Ethereum node is exposed.
And this JSON-RPC interface allows you to get certain fields
that are stored in storage in your smart contracts.
but you're very limited in how you can access that data.
And so, for example, if I've got some field that's, you know,
account that's stored in my smart contract,
then I can get that count.
But if I have like a list that's, for example,
let's say that I'm a marketplace,
and I have a bunch of different listings
of things that people can buy and sell in the marketplace,
and I want to filter to find a specific account
a specific category of listings and then I want to sort that to just get, you know,
the most expensive items for sale in that category.
For that, I need to be able to do filtering and that's not something that is exposed
in that JSONRPC interface and it's not exposed because it would be too expensive to actually
expose that without having indices.
And these Ethereum nodes don't maintain these indexes.
And so that's that missing indexing layer.
So how are DAPs, you know, in companies building DAPs and all these projects that need
indexing solving this problem now because a lot of DAPs are relying on indexing of data in one form or another.
And Ethereum is fully decentralized, so there must be a way that we're dealing with this issue.
Yeah.
So every project that has gotten to the point,
where they're trying to build a really good web
or mobile application has hit this issue.
And what most of them have done
is built these custom proprietary indexing servers.
So they're like, crap, we can't actually run the queries
that we need.
And so let's build a server that syncs data from Ethereum,
stuff that into SQL database,
and serves it.
and serves it over an API,
and then our front ends will just hit that custom API.
I'd say that that's probably what like 90 plus percent
of the projects have done so far.
The other option is if you want to keep your app
completely decentralized, then you can try
to just sync everything on the client.
So basically, you know, in that case
where you want to let people filter particular listings,
you could load up all
of the listings on the client and then filter it locally on the client. And there are a lot of
applications that do that also. So, you know, that second option only works when you have some
small amount of data or if you're willing to make the users wait a very, very long time before you
can show them a screen. So some applications have chosen to do that, but, you know, the alternative
is you have to run and operate your own servers. And, you know, I think it's one of these
cognitive dissonance things where we're trying to build dafts and a big part of building a dapp is this
idea of you know it's completely serverless and you don't have to trust anyone to operate servers
and infrastructure and yet in order to build applications that are actually usable we have to do
exactly that okay so let me get this straight so we're clear here so currently what you're saying is that
90% of Ethereum DAPs are running their own proprietary software that's sitting on top of the
Ethereum node that's being hosted on server infrastructure.
And they're serving SQL queries and through APIs allowing users that are using those
apps on their clients, the query the databases of these centralized choke points before they
can even access the blockchain database.
That's exactly right.
Very well said.
Okay.
Great.
I'm glad we got that set.
I'm being a little facetious here because this is something that I've kind of stumbled upon recently.
I mean, I hadn't really realized to what extent this was actually happening.
And through digging and speaking to a lot of app developers about different DevOps problems
that they were having realized that, yeah, this is a big issue.
And one of the ways that people have talked to me about potentially solving this is
making this proprietary layer open source. And so then the idea is that, you know,
different people out of the goodness of their hearts would host this infrastructure. For example,
you have, you know, DAP A, and DAPA has all this proprietary infrastructure. They open source
that stuff and then users of those DAPs and maybe those who are particularly interested in
making sure that their DAP client runs well and maybe they need access to certain types of data.
Like, for example, if you're running a fund on one of these DAPs might host that infrastructure.
And I just thought, well, you know, if you don't have the incentive mechanisms there to make that work,
well, it's unlikely that we'll ever get past this point of like hypercentralization.
Yeah, you're exactly right, right?
We want, you know, Web3 to operate on top of this like public global infrastructure.
And if we want that infrastructure to be sustainable, then kind of payment needs to be built in.
There needs to be incentives for operating the infrastructure, and that's the only way to ensure that it continues to be available.
So my next question, and I guess to me, maybe I'm missing something about the technical infrastructure of Ethereum, but why wouldn't we just have this query language built into Ethereum?
Why would Ethereum nodes themselves simply not expose GraphQL endpoints so that the clients would have direct access to the node and direct access to.
do these GraphQL queries?
So there actually is an effort
to introduce GraphQL natively into the nodes
as an improvement for JSONRPC
because the Ethereum nodes actually do maintain a few indexes.
So for example, if I want to get the ether balance
for a particular account, that's pretty fast
because that is something that's indexed internally.
So that is something that we're gonna be seeing,
But the reason why the Ethereum nodes don't maintain more indexes is that maintaining indexes is actually really expensive.
And which indexes to maintain is a function of which applications are getting built on top.
So, you know, with traditional SQL databases, you usually have like a DBA, for example,
that's looking at, like, what are the slow queries?
You know, let's add, you know, these specific indexes to make these queries faster.
And there's no way for the Ethereum nodes to know which indices are the right ones to maintain.
And there's no incentive for them to do that.
So generally with software, we look to build things in stacks, right?
Where there's like layers that build on top of each other.
And I think it really makes sense to separate out basically layer one from these kinds of,
you know, you could call this almost like a layer two problem.
One is really concerned with things like consensus and, you know, data availability.
And then, you know, what they produce is blocks in a single, you know, global, you know,
state that everyone can agree on.
And then the problem of how do you organize that data so that it's easy to access for all the
different applications that want to build on top.
really sits very cleanly as a separate layer on top in the stack.
Cool. Well, let's let's dive in a little bit in, you know, the graph. I mean, we've
spoken about the problem of databases and querying in a traditional web application
in the decentralized context. We spoke a little bit about GraphQL. So what is it
that the graph brings to the table?
So the graph is an indexing and query protocol.
call for blockchains and storage networks.
So we index data from these different Web3 data sources
and make it available over GraphQL.
So the very first thing that we set out to do
was to basically build a standardized way of doing this indexing.
So we've already kind of talked about how most of the projects
in the space have done their own custom proprietary
indexing server.
And the first step towards being able to kind of introduce this indexing layer in the stack
is to basically come up with a standardized way of doing that indexing.
So we launched our standalone graph node July of last year and open sourced it.
And basically that defined the developer APIs for building what we call a subgraph.
And a subgraph basically defines how to do this indexing work in a way that can eventually run on a decentralized network.
And essentially what you do is you define, here are the data sources that I want to listen to.
Here's a mapping script.
So it's a turn-complete language in a way that you can transform that data at ingestion time.
And here's the graph drill schema for how we want to be able to query that data.
And with that subgraph definition, you now have a standardized way of indexing that data and making it available over GraphQL.
Basically, the way it was before, right, is people built their own SQL databases and their own way of curing that database.
And now you guys say, okay, there is basically a standard way of doing it.
You can use this graph, the graph node.
And then use these subgraphs, which is basically kind of,
like, okay, I want to have this data as long as I comply exactly to this format.
I can sort of add it.
And is then in the end the idea we will have the graph note that contains all of the
different subgraphs?
Or would I have like maybe my local node with different subgraphs than you do?
Well, eventually the goal is to combine all of these graph notes together into a global
decentralized network.
So any graph node itself can choose which subgraphs it wants to index.
And what we want to do is actually open this up to a global marketplace and use market forces
to do the resource allocation for which nodes are indexing what data.
So yeah, let's get to the public network, decentralized network in a little bit.
But first, right now you guys are running this as basically like a hosted service.
Why did you decide to, you know, start with that?
So, you know, we really believe in just shipping early and often and the best way to build software
is to get it in people's hands and to work closely with users in developing that software.
And so it was really just kind of a pragmatic choice for basically coming up with what are these
intermediate milestones that we can hit where we can, you know, ship something, make sure that
we're solving real problems for developers, and then improve it over time. So the first milestone
that we had was just open sourcing the graph node, and we learned a ton after that milestone
because projects were able to build on us, and so we were able to quickly improve the software
at that stage, but it was still kind of there was this barrier to entry where people would have to run their own nodes if they wanted to use it.
And so the second milestone for us was launching the hosted service where we could run a bunch of nodes for you,
and you could just deploy to our nodes, and we have a really nice user interface where you can see the status as the nodes are like indexing these subgraphs,
and you can easily run queries in the browser.
so you can like test as you're developing.
And that's something that we launched at Graph Day, January this year.
And the next milestone for us is the hybrid network.
And so it's really just kind of a practical intermediate step
for making it really easy for people to build on the graph today.
Right.
So basically the choice that would have then is today,
they can either, you know, build their own SQL database and they host it on their own server.
They built their own like querying and API for that.
Or they do these subgraphs, which is a little bit like developing their API sort of, right?
They define the data format and stuff.
And then you, you guys would host it and they can just query that and you basically offer it as a hosted service.
Exactly.
And really, you know, we've been very kind of transparent about our goals to build this
decentralized network from the very beginning.
So our model for the hosted service is we are going to be releasing a pay tier, and the idea
to essentially just kind of cover the costs of running the infrastructure, because, you know,
as I mentioned, it's really important that, you know, payment is built in and that this
infrastructure is sustainable.
And so, you know, this is kind of one step along, you know, the path of, you know, launching the decentralized network.
But, you know, this way it's really easy to kind of get started and we can start proving out a lot of, you know, these pieces that need to be built on top.
And so what about the hybrid network? What's that going to look like?
So the hybrid network is that our next milestone where we no longer are running all of these indexing nodes.
So in the hybrid network, we're going to have, we're going to introduce our work token.
So we released at Grafday, Brandon Ramirez gave a research talk where,
he described a bunch of new details about our decentralized network design, and we published
the specs for the hybrid network. So you can check that out if you go to github.com slash graph protocol
slash research. The token has two uses in the network. One is a work token for people that
want to run these indexing nodes. And the second is for data curation and staking on the subgraphs
themselves. So for the work token model, anyone can come in and stake tokens to run a graph node,
and then they can charge fees per query. And that's done in an open marketplace where they can set
their own prices per query. And so in the hybrid network, we're going to open it up so that other
people can run these notes, but we're going to be running a centralized fisherman service and
graph protocol, the company is going to be very involved in kind of enforcing security in the network
in this kind of like intermediate phase. So the couple of things here, and I guess regarding
scaling, how does this scale? Because it occurs to me that, so one, you know, the Ethereum
blockchain is already quite large, you know, several hundred gigabytes.
Now, if you have to build an indexing service that has all of this data, but then, you know, structured in a way that presumably makes it much larger, how do you deal with that?
But then also, the other question is, you know, you mentioned earlier that with GraphQL, you have the schemas.
And it's unclear to me how we're going to come up with like a generally agreed upon schema that everybody agrees, like, this is a scheme.
that Ethereum should have and that is optimized for all of the different Daps, right?
So like one, you know, unless I'm missing something here, but one DAP might require a different
data schema than another DAP. Yeah, great, great question. So first on the scaling part,
you're exactly right. From a sheer amount of data perspective, the graph is probably going to be
like one of the largest networks in terms of just how much data we need to, you know,
store and make available.
And that's really kind of what our layer in the stack is focused on.
And so, for example, that's why we kind of have sharding, which right now is kind of at the
subgraph level.
And then it's actually going to be down to the individual index level.
Each graph node only needs to index some very small subset of the data.
But yeah, to make data, you know, very quickly accessible to clients all over the world,
you would really need to scale that out to many, many nodes that are geographically, you know, distributed at the edge.
And that's a core part of what we need to do.
Now, as far as coming up with these, you know, schemas, that's where, you know, governance comes in.
So right now, anyone can define their own schemas for their particular application for their subgraphs.
And so you can think of subgraphs as being like kind of a unit of governance.
But over time, one of the problems that we would love to solve is to, you know, add this layer of governance for these subgraphs and for these, you know, data types to exactly solve the problem that you're describing, which is, you know, if I have, like, a very custom application, you know, I should be able to define my own schema and not have to talk to anybody.
But one of the big benefits of Web 3 is to enable interoperability.
And if you want to provide these APIs and you want to make data available
so that many different applications can be built on top,
then you do need to have some coordination.
So we've seen like standards bodies in the past that try to come up with, you know,
standardized data formats.
And these standards bodies tends to move really slow.
And this is something that really excites me about Ethereum,
that we actually now have a platform that enables kind of large-scale coordination.
And my hope is that we can build great governance systems on top,
where, for example, people can vote on changes that they want to make
to these globally shared schemas,
and that this can create a scalable way of evolving standards
that enables for a lot more interoperability.
So you mentioned that you guys will need sharding and that there, you know, there could be,
or initially some sort of chart that you have, you know, per subgraph, you know.
But what's the timeline here?
I mean, sharding seems to be not super close, right?
If you look at all of the sharding efforts, like when do you think, like, yeah, what's the sort
of timeline for like the hybrid network when that's going to launch and, like, a sharder network?
Well, luckily, the problem is quite a bit easier for us than it is for these layer one blockchains.
So, you know, layer ones are responsible for solving things like data availability and, you know, a whole set of issues that we don't need to solve, right?
Because at our layer of the stack, we only accept these, like, you know, strong layer ones as inputs, as our data sources.
And so we get to assume that that data will always be available.
And all we're doing is processing that data and indexing it to make it more easily accessible.
And so at our layer of the stack, all we're concerned about is kind of quality of service.
You know, if for some reason, you know, an underlying shard kind of goes offline, like, you know, we can always rebuild the index.
And so that makes sharding a lot easier for us than for some of these elements.
Yeah, let me see if I understand that because, you know, I guess if I say you look at Ethereum
and the sharding there, well, what's hard is you have these different charts and then you have
like they have to basically be able to communicate with each other.
They have these smart contract calls across charts and, you know, it gets very complicated.
But here it would be like, okay, I want a node on the graph and I can just go and say,
okay, I'm going to go in the auger subgraph and I just take that, you know, that kind of
schema and I just have that data and I don't have to care about any other subgraph, right?
They just have to be able to serve that information.
Is that roughly correct?
Yeah, so as long as you have access to the underlying data sources that you need to process
your subgraph, then you're good to go and you can build up your index and you can start
serving clients.
So another question that came up for me is that, you know, even at course one, you know,
we've got a cosmos valid, we've been doing some of the stuff, right, like getting the data
out of blockchain, putting in the SQL database, reading it out.
And, you know, let's say different projects will have their own, you know, maybe analytics
or insightful things, like proprietary stuff that they're doing and that they may not want
to expose exactly what they're querying.
Because, you know, I could imagine if you can like see all of the queries that Coinbase is doing, like maybe you can figure out what they're going to develop next or.
So how does privacy work in the graph network?
Yeah.
So I think there's a lot of use cases like you're describing and people are doing like prop trading, things like that, where they don't want to make that data available to others.
And for that, you wouldn't need to use the graph.
It might be that you actually still want to build on the graph because it just makes it easier for you to use those tools internally, but then you'll just use it offline.
You can run your own graph node.
It's all open source, and so that option's available to you.
You know, the network that we're building is all around public data.
And, you know, we think that there's going to be more and more, and, you know, many orders of magnitude, more open data that's going to become available as part of this move.
to blockchains. And so there, if you have data that you want to make available,
you know, for other developers and for different applications, um, that's when, when you
would use the graph.
Well, yeah, actually, I think that's a good point, right? Because you could just like run your own
the graph node with this necessary subgraphs and then query that and then that's not
visible to the network, right? Right. Yeah. So, so you have that option available just like you can
run your own private Ethereum, you know, network yourself if you want to do that.
So moving now to economics, we talked about the hybrid network a little bit. So talk about some of
the, you know, game theory and economics that go into the hybrid networks, which is sort of
the next phase in the life of the product. Yeah. So, you know, the first step is,
making sure that you've got indexing nodes that are incentivized to index data sets.
So, you know, the main incentive for the nodes is that they can set their own prices per query.
And so, you know, they can set those prices based on essentially demand.
So if there's a subgraph that isn't being used a lot and they need to cover their cost or doing the indexing,
and then they could set the price is higher.
If it's a really profitable subgraph,
people are querying it all the time,
that they can set the price lower.
And that's essentially the incentive.
Then you want to have an incentive for people to add data onto the graph.
And so we've got this kind of curation,
which is basically you can stake on a subgraph,
and that entitles you to future revenue
that's proportional to the number of queries on that subgraph.
Now, there's actually a lot of things that could make doing that quite difficult,
and this is one element of the game theory that gets quite interesting.
So you would want to make it so that if I, you know, create a valuable data set, then I can make more money.
And that creates an incentive for people to add more and more valuable data onto the graph.
But that can actually be quite easy to gain.
Right.
So, you know, for example, you know, maybe I could find a subgraph that's doing really well and I could copy it and then try to get people to use my subgraph instead of yours.
I could choose to run my graph node and then query my own query myself where I'm just paying myself,
so I'm just moving money from my left hand to my right hand to make it look like there's a lot of demand for this data set when really I'm just paying myself.
So in order to actually make it so that you can create like a long-term incentive for,
for adding valuable data to the graph,
we have to solve these kinds of problems.
Yeah, no, that reminds me a lot of like ocean protocol
that you know, spend some time looking at
because they have a lot of the same problems.
And I think it's pretty tricky
to get the game theory and incentives right there.
Yeah, now, you know, one thing that we're doing,
you know, very different is that all of this
is for public data.
And so we're not trying to solve the problem of,
you know, someone's keeping this data private and you have to kind of like set up an agreement
in order to access that data because we think that ultimately most information is going to be
public. Information wants to be free and, you know, for the next like evolution of the web, we should
just assume that that's going to be the case. But just because information is free and available,
it doesn't mean that you're not willing to pay for fast, efficient, secure access.
And so what you're paying for isn't really for the information,
but it's for the work that somebody is doing to organize it
and make it easily accessible to you.
So, you know, one way of thinking about it is like, you know, maybe Uber.
So, you know, if I want to, you know, travel somewhere in San Francisco,
I can walk, right?
That option is available to me.
Nothing's stopping me from, you know, you know,
tied up my shoes and walking five miles.
But if I want a car to take me to my destination a lot faster,
that I can pay a little bit of money and, you know, get in a car
and I can get there, you know, significantly faster.
And, you know, how fast I can get to my destination
is basically a function of the number of cars that are,
moving around the city. So if Uber only had 10 cars, then, you know, it would take me longer to get a car.
And, you know, if Uber has, you know, 10, thousands or tens of thousands of cars in a city,
then the average time that I have to wait to get a car is going to be significantly shorter.
And so this is kind of, you know, a way to think about, like, you know, liquidity in, like,
you know, an indexing marketplace where, you know, essentially you have this two-sided market.
where you have like the data producers and the data consumers.
And you want to have like a thriving marketplace where there's incentives for both sides to provide a service and to consume the service.
So that's an interesting point then to look at this as a marketplace.
And it ties into my next question, which is, you know, what are the costs that we're looking at here?
So already as a DAP user, their costs associated using the Ethereum Network.
and making transactions and, you know, people are working on ways to, you know, improve things like
reducing the cost of maintaining state, for instance. But there's a bunch of costs there. And then for
DAP developers, there's costs as well, whether those be development costs or cost to store data.
So there's already a bunch of costs here. And then I'll know what with this marketplace that
is like a layer on top, you're introducing another set of costs, which is to access data,
like indexes and things like this. What are your thoughts on, you know, the increasing costs
of using a blockchain and adding more layers of costs there? Yeah. So, you know, blockchains today
are expensive, but I would expect to see the cost go down significantly. If you have computation
that you need to have replicated on tens of thousands.
of machines, that's going to be expensive.
And if you have data that you need to make available forever,
that's going to be expensive.
But I think we're already seeing a lot of new designs
for blockchains that should be able to make it significantly cheaper
and maybe different applications need different levels of security.
And so I think we'll see those costs go down.
For the data layer, we are introducing a cost.
a cost, but it's significantly cheaper.
So I would be surprised to see the costs end up somewhere around, you know, 0.001 cents per query.
Like if you actually think about the cost of computation, it's actually one of the cheapest
things that a company spends money on.
So like in a traditional startup, you know, maybe you're spending.
you know, $1,000, a $2,000 on server costs, at least in the early days, whereas maybe you're
spending, you know, over $100,000 a year per engineer. So, you know, I think actually the kind of
social scalability costs, I think, are much more important to consider than the raw cost
of computation. And I think that when you actually build the payments into the protocols, I think
in the end, we're going to see that, you know, it ends up being cheap enough when all of a sudden done.
So today, you know, most people are probably used to spending $30 a month maybe on like, actually it's probably more, like $70 a month for like, you know, your mobile broadband.
You're used to spending on, you know, your different services, like your Netflix and whatever else.
And I think that with Web3, you know, people are going to have to, you know, it's kind of a new category.
It's a new expense for sure that people need to get used to like, hey, I am now paying to access like Web3.
But, you know, I think, you know, we can kind of look at how much that's going to cost in total and what users are going to get back in return.
and I think it's going to be more than worth it to basically have, you know, an explosion in the number of applications that you can use that all interoperate.
And I think that the kinds of apps that will be able to come out of that are going to be more than worth the costs.
So right now you guys are focusing on Ethereum is the idea that this is going to be, you know, the graph is going to be this network that's going to index,
you know, all of the data or all of the blockchains and make it clearable?
Or are you, so what's the plans there?
Yes, that's the vision.
So, yeah, we're going to index all the data.
And, you know, I think that is going to become increasingly important as we see more and more blockchains.
But already you're seeing just with Ethereum, a lot of people are choosing to go to Infura, for example,
because it's just an easier way to access data.
And imagine what's going to happen when there's,
you know, 10 or 20 different blockchains and each one has many different charts and you have to figure out which full node to go talk to to get what data or you're, you know, you yourself are going to start running 50 full notes.
It's not really going to scale. And so I think it's, you know, that makes it even more important for there to be like a single network that's indexing all of the data across all of these different data sources and making it easily available.
You know, one thing that, you know, I would want to kind of mention here is that we're essentially acting as a sort of aggregator.
Web3 is, you know, all about decentralization and, you know, we've got, you know, these different networks and, and you want everything to kind of be decentralized.
But, you know, from a user perspective, there's a reason why aggregators exist.
You know, if, you know, the web itself was decentralized, but like, if I couldn't go to
Google.com to search for something and find a website, you know, those web pages wouldn't be
very useful to me. Similarly, there's lots of different, you know, items that are for sale
and different merchants on Amazon, but if I couldn't browse and search for items for sale,
those merchants wouldn't be much use to me.
And so you always have this kind of aggregation
that becomes important to make things usable.
And typically, you know, those aggregators
become your kind of centralized points of failure.
And that's exactly why decentralization is so important to us.
Because, you know, we want to make Web3 usable
but we think it's really important that if you have this global API that's indexing all of the world's structured information and making it available and that every application can use this global API to access its data, that it's really important that that API is not owned and controlled by a single company or, you know, a single entity.
And so, you know, that's why we're so passionate about making this a decentralized network.
So some projects are already using the graph.
So you recently, I think in February,
had this event in San Francisco and several projects
participated and spoke there.
And there you unveiled the hosted version of the graph
and the Graph Explorer, which from a product perspective,
like I thought looked really nice.
I played around with the Explorer.
And it looks like you guys spent a lot of time
on design and making sure that it's quite usable and like developer docs and things like this.
So can you tell us a little bit about the projects that are using the graph now and maybe
highlighting one that comes to mind and how they're using the product?
Yeah, so we do have a bunch of projects using the graph already.
One that I'm really excited about is Moloch.
So if you go to Molochdao.com, that's actually using the graph to power.
that user interface.
So I'm sure many of your listeners know,
Moloch is a DAO for funding Ethereum infrastructure.
And it's got members, and those members vote on different proposals
for how they'd like to spend their funds.
And all of that is being indexed on the graph
and empowering their user interface.
So you can go to the graph
Explorer to find all of this data that's being indexed and is available for people to build
applications at thegraph.com slash Explorer.
You can see a bunch of featured subgraphs there right now and then a bunch of community
subgraphs.
So it's really easy for anyone who wants to build a subgraph.
You can deploy it to the hosted service and it'll show up in the Explorer.
And we just want to grow the amount of data that's being indexed and is available
to people that want to build applications.
Cool.
So let's talk about, before we wrap up, timeline and what's the roadmap?
So how long should it be before like the hybrid network is released?
And what are there some important points on the roadmap coming up?
Yeah, we've got a bunch of features that we're going to be adding this year.
So one of the big ones is expanding to multiple blockchains.
We've got things like pending transactions and something called the Confirmations API.
We're really excited about.
So right now, when you query the graph, it just gives you back the latest state that it knows
from whatever Ethereum node is connected to.
But we think that one element that's been very overlooked in building DAPs,
is communicating to users kind of the finality of various actions
that are taking place in the interface.
So a typical UI that you'd see today is you perform some action
and then it sends you a link to EtherScan
where you know you pop over to this other, you know,
website to kind of see if this transaction has been accepted in a block.
And, you know, we've all kind of put up with this,
but I think it's a really great example of like a UX,
you know, paradigm that is not going to scale to mainstream users.
And so being able to actually show in the interface, like, you know, when, you know, a transaction has been accepted, when this action has been performed, you know, this, you know, if I'm looking at a crypto collectible and it says that I own it, you have I owned it for a while?
or has this just changed as of, you know, two blocks ago?
So that's the Confirmations API coming later this year.
We've made a lot of progress on the hybrid network.
So like I mentioned, the specs are already available.
We'd love for people to jump in and provide feedback.
We've actually built the first version of the smart contracts to run that.
So we're doing a lot of, like, you know, testing and development internally.
I would expect it in the early part of next year, but I generally try not to comment on dates too much.
But that's kind of the roadmap.
And really what we're focused on this year is really just getting adoption,
because we think the most important thing is to help DAPS build usable experiences as soon as possible.
you know, there's something nice about this progression of actually starting centralized and
decentralizing over time is it allows us to, you know, iterate a lot faster. And, you know,
if we get to a point where there's, you know, a hundred or more applications that are built on top
and the graph works really well for them, then we know that by the time we, you know, launched the
decentralized network and it'll open it up to many more nodes to be doing the indexing that we have
something that works really, really well.
Yeah, Neva was very interesting to be speaking with you about the graph today and great to have
you on to discuss this topic. I think, you know, as I was alluding to earlier, too much centralization
in Ethereum obviously isn't desirable and that's really what we, the situation that we're in
at the moment, at least for most DAPs. So the vision of the graph is one that I think will resonate
with a lot of people and kind of goes in the direction that I think the original ideas behind
Ethereum meant to lead us towards.
One, I guess, I wouldn't want to call it the criticism, but one thing that we should be,
I think, careful about is creating too much dependency on one service.
So for example, for any additional layer that sits on top of Ethereum, I don't think
think it would be much better if every dapping ecosystem would now depend on that one layer.
And should that layer cease to function or the economics are flawed or for some reason no longer
really works, but then you have a situation where it's not a whole lot better than a centralized
platform.
But hopefully the standard will allow for a multitude of marketplaces and similar types of products
to emerge where people can choose, you know, which version of the graph they're pointing their
depths to us.
Yeah, I think this idea of having it be open, right, which means open source so you can
run your own nodes and you can verify everything yourself and, you know, you can experiment
and that nobody's kind of locked into any one particular solution is really important.
And so we agree with that vision, you know, completely.
Great.
Thanks for coming on.
And we look forward to seeing future developments of the graph.
And so we will be linking to everything we've talked about, the blog posts, the website, the Explorer, the GitHub, and everything.
And in the show notes, is there anywhere else that you would like to point people if they want to get involved other than other than those resources?
has already mentioned. Yeah, we're most active just on Twitter and Medium.
Great. Thanks again for going on. Thank you so much.
Thank you for joining us on this week's episode. We release new episodes every week.
You can find and subscribe to the show on iTunes, Spotify, YouTube, SoundCloud, or wherever you listen to podcasts.
And if you have a Google Home or Alexa device, you can tell it to listen to the latest episode of the Epicenter podcast.
Go to epicenter.tv slash subscribe for a full list of places where you can
watch and listen. And while you're there, be sure to sign up for the newsletter, so you get new
episodes in your inbox as they're released. If you want to interact with us, the guest or other
podcast listeners, you can follow us on Twitter. And please leave us a review on iTunes. It helps
people find the show, and we're always happy to read them. So thanks so much, and we look forward
to being back next week.
