Orchestrate all the Things - Knowledge graph evolution: Platforms that speak your language. ZDNet Article
Episode Date: May 6, 2020Knowledge graphs are among the most important technologies for the 2020s. Here is how they are evolving, with vendors and standard bodies listening, and platforms becoming fluent in many query l...anguages. Article published on ZDNet, January 2020.
Transcript
Discussion (0)
Welcome to the Orchestrate All The Things podcast.
I'm George Anatiotis and we'll be connecting the dots together.
This is episode number one of the podcast called
Knowledge Graph Evolution, Platforms That Speak Your Language.
It is based on an article I wrote for ZDNet, published in January 2020.
To make the article available as a podcast,
a combination of AI-generated narrator voices have been used.
I hope you will enjoy the podcast. If you like my work, you can follow LinkedAid Orchestration on Twitter and LinkedIn.
Knowledge Graph Evolution. Platforms that speak your language.
This may come as a shock if you've first encountered knowledge graphs in Gartner's Hype Cycles and Trends, or in the extensive coverage they are getting lately. But here it is, knowledge graph technology is about 20 years old. This,
however, does not mean it's stagnating, on the contrary. Gartner predicted that the application
of graph processing and graph databases will grow at 100% annually through 2022 to continuously
accelerate data preparation, and enable more complex and adaptive data science.
Graph database vendors seem to verify this across the board. 2019 was a very good year.
Having identified knowledge graphs as a key technology for the 2020s,
we take a look at how they are evolving. The 20-Year-Old Hype
First, let's quickly recap those 20 years of history. What we call knowledge graphs today
has been largely initiated by none other than Tim Berners-Lee in 2001. Berners-Lee, who is also
credited as the inventor of the web, published his Semantic Web Manifesto in the Scientific
American in 2001. The core concepts for knowledge graphs have been laid there.
The Semantic Web Manifesto was in many ways ahead of its time. Looking back today, we can see some
parts of it going strong, while others have faded. Building on a foundation of standards for
interoperability, such as Unicode, URIs, and RDF, the core of the vision has always been semantics,
instilling meaning in web content. The semantic web got a bad name for being academic,
while some technical choices such as XML did not quite work out. The thing is,
however, that crawling and categorizing content on the web is a very hard problem to solve without
semantics and metadata. This is why Google adopted the technology in 2010, by acquiring MetaWeb.
In 2012, the term knowledge graph was introduced. A very successful rebranding indeed, and that's
not all we have Google to thank for.
Google employs key people in the domain and is the driving force behind schema.org.
Schema.org is the core of Google's knowledge graph. It is, unsurprisingly, schema. Knowledge graphs and schemas are foundationally bound. While not all knowledge graphs are as big as Google's,
every one of them is based on a schema. Knowledge graph neophytes do not always
realize this, but whether it's implicit or explicit, there's always a schema. Which brings
us to the point. Knowledge graphs and graph databases. Knowledge graphs can be stored in
any back-end, from files to relational databases or document stores. But since they are, well,
graphs, it does make sense to store them in a graph database.
This greatly facilitates storage and retrieval, as graph databases offer specialized structures,
APIs, and query languages tailored for graphs. In addition, many graph databases today offer a lot more than just a store for data. They come packaged with algorithms for graph analytics,
visualization capabilities, machine learning features, and development environments. They have essentially grown from databases to platforms. But there is further
nuance here. Graph databases come in two main flavors, depending on which graph model they
support, property graph and RDF. In general, RDF graph databases emphasize semantics and
interoperability, while property graph databases emphasize ease of use and performance.
When it comes to knowledge graphs, RDF graph databases are a natural match.
It's not impossible to build knowledge graphs on top of property graph databases.
Usually, however, this results in having to learn knowledge management fundamentals the hard way,
and re-implement relevant features. While lessons don't come for free,
building on platforms centered around knowledge management helps. Property graphs and RDF graphs are not that different
conceptually. Having interoperability between them would be both possible and desirable.
This is why in March 2019 a W3C workshop on web standardization for graph data took place,
as the first step towards standardization in the graph database world. A key element to
bridge the gap is something called RDF, RDF star. RDF is a proposal to standardize a modeling
construct for RDF graphs, namely the addition of properties to edges. Although this is possible in
RDF, there is no standard way of doing it. Standardizing it would not only help interoperability
with property graphs but also interoperability among RDF graphs. From secret handshakes to RDF stars. As Steve Sarsfield, VP of Product and Cambridge
Semantics put it, before RDF, if people wanted to use edge properties in RDF graphs, they had to
rely on secret handshakes. This is not ideal, especially considering one of the key advantages
of the RDF stack is standardization
and interoperability. In the wake of the W3C initiative, a couple of RDF graph database
vendors went ahead and implemented RDF. Cambridge Semantics is one of them. Its Anzo graph database
supports RDF, as well as Sparkle. Sparkle is the standard query language for RDF, and Sparkle is
its extension that works with RDF. Cambridge
Semantics recently unveiled AnsoGraph DB version 2, and when discussing the release with Sarsfield,
we wondered what their experience from the field has been. Are people asking for RDF,
has it helped adoption? Bridging the gap with property graphs has enabled AnsoGraph to get
an implementation of Cypher, the most popular language for querying property graphs, underway.
Sarsfield noted that it's still relatively early days for knowledge graph adoption.
As such, many of the organizations that use AnzoGraph tend to have highly skilled people on
board. For them, switching between data models and query languages is not much of an issue.
For mainstream adoption, however, this is important. Stardog is another RDF graph database vendor that has implemented RDF.
Mike Grove, Stardog co-founder and VP engineering, said this has been in the works for a while,
and they are very excited about it.
Stardog started working on the plumbing as part of the Stardog 7 development effort,
and they were very happy to be able to ship the feature.
Regarding its reception, Grove noted that what people wanted was a more user-friendly way to have edge properties. Neo4j obviously got this
right. RDF does a fantastic job of bringing the same ease of use to semantic graphs. He went on
to add that customers are excited, and many are already working on integrating it into their
applications. Technically, RDF and Sparkle are not yet standardized. Both have been introduced by
Olaf Hartig, a researcher at Linköping University. When inquiring about their status,
Hartig noted that while there have been delays, he hopes the standardization process will pick
up speed soon. For knowledge graph platforms, too, GraphQL is a plus.
Both Sarsfield and Grove noted that they expect RDF to boost knowledge graph adoption.
Implementation is key, and having early adopters and real-world usage may also catalyze the
standardization process. Sarsfield and Grove express their support for the process, as well
as the need to get the word out. RDF can make a difference, but it's not the only thing going on
in the knowledge graph world. As knowledge graphs entail several layers and can be a central piece of infrastructure for organizations, graph databases are growing into platforms. Anzo
Graph started as part of the Anzo platform before becoming a product in its own right.
Stardog also touts its product as a platform, emphasizing features such as visualization and
virtualization built around the graph database core. Another RDF graph database vendor,
Ontotext, recently
announced a new version of its own platform. An interesting feature that Stardogs and Ontotext's
platform share is support for GraphQL. Unfortunately, GraphQL's name does not do it justice.
As if there was not enough confusion already regarding Graph, GraphQL is not a Graph query
language. GraphQL is a replacement for REST APIs.
Despite the misnomer, it's very useful, and its popularity among developers is growing.
This is why more and more databases are adding support for GraphQL,
with names such as MongoDB joining the GraphQL wave. Graph databases are no exception.
Stardog has announced it in 2017, Ontotext is in the process of adding it. As StarDog put it, more developers know and are learning GraphQL than all the GraphQuery languages combined.
Ontotext on its part put together a rather elaborate post on the use of GraphQL in its platform.
Whichever way you approach it, however, GraphQL makes lots of sense for accessing services built around database platforms.
GraphQL Plus Variants StarD platforms. GraphQL plus variants.
Stardog reports GraphQL success within its customer base. Grove mentioned that one of the
big Silicon Valley tech companies exclusively uses GraphQL to interact with Stardog. Both Grove and
Jem Rayfield, Ontotech's chief architect, agree that GraphQL can work well in some cases, but by
its very design, the expressiveness of GraphQL is quite limited. Most people who don't know GraphQL can work well in some cases, but by its very design, the expressiveness of GraphQL
is quite limited. Most people who don't know GraphQL assume it's a graph database query language.
Most people who know GraphQL wonder how a graph database can be powered by it.
This statement comes from Manish Jain, the CEO and founder of Graph.
Graph is a graph database powered by GraphQL, or something like it. GraphQL Plus is a derivative of GraphQL,
developed and used exclusively by Graph until today. In a 2019 interview with ZDNet,
Jane expressed no interest in standardization for GraphQL Plus. No other vendor we know of
has expressed interest in adopting GraphQL Plus either. But that's not all there is to
GraphQL for graph databases. Most approaches are about what GraphQL
can do for knowledge graphs. But to close the loop with the semantic web underpinning of knowledge
graphs, here's an idea. What if GraphQL resources were annotated with URIs? URIs are global
identifiers, which can denote concepts from shared vocabularies, such as schema.org or other ontologies.
This seems like a natural fit, and one that both Grove and Rayfield
agree has potential. There is another working group set up to align RDF and GraphQL, although
it does not look like it's moving very fast. Knowledge Graphs in the 2020s. We speak your
language. It seems we are moving towards a new status quo. If NoSQL stands for not only SQL,
we could call this NoSparkle, not only Sparkle.
Sparkle remains the language of choice for taking full advantage of knowledge graph capabilities.
It also doubles as an API, its expressiveness is beyond what GraphQL can attain, and Sparkle's
federated query and data integration capabilities are unique. But vendors seem set to meet users
where they are, be it GraphQL or any other language.
Even SQL. As Stardog's Grove put it, we've always strived to bring our technology to the users.
GraphQL was a step in that plan. Supporting SQL is the next step in that journey,
not because SQL is better than GraphQL, but because of what that support enables.
SQL enables existing tooling to work on top of graph databases,
making them accessible to a wider audience. Stardog is not the first graph database platform to have added in SQL connectivity
layer. Cambridge Semantics also offers a connectivity layer for Tableau. More graph
databases support SQL, and there is an ongoing standardization effort to add graph extensions
to SQL itself. Eventually, even natural language support could
be an option. No matter how you feel about SQL, Sparkle, GraphQL, or any other query syntax
slash language, natural language is just better. Why ask someone to learn an esoteric syntax when
they can just simply type? Said Grove. Grove mentioned Stardog will be launching a natural
language interface to the knowledge graph. A pipe dream?
This may not be too far off. There is ongoing research for natural language interfaces for databases. And, to add to this, there are also existing integrations for accessing databases
via voice assistants. So, you can see where this is going. We don't know whether conversational
knowledge graphs are something everyone would be comfortable with. What we do know is that more options is a good thing, and exciting times are ahead.
Stay tuned as we keep exploring the years of the graph.
I hope you enjoyed the podcast.
If you like my work, you can follow Linked Data Registration on Twitter and LinkedIn.