Orchestrate all the Things - Knowledge graph evolution: Platforms that speak your language. ZDNet Article

Starting point is 00:00:00 Welcome to the Orchestrate All The Things podcast. I'm George Anatiotis and we'll be connecting the dots together. This is episode number one of the podcast called Knowledge Graph Evolution, Platforms That Speak Your Language. It is based on an article I wrote for ZDNet, published in January 2020. To make the article available as a podcast, a combination of AI-generated narrator voices have been used. I hope you will enjoy the podcast. If you like my work, you can follow LinkedAid Orchestration on Twitter and LinkedIn.

Starting point is 00:00:33 Knowledge Graph Evolution. Platforms that speak your language. This may come as a shock if you've first encountered knowledge graphs in Gartner's Hype Cycles and Trends, or in the extensive coverage they are getting lately. But here it is, knowledge graph technology is about 20 years old. This, however, does not mean it's stagnating, on the contrary. Gartner predicted that the application of graph processing and graph databases will grow at 100% annually through 2022 to continuously accelerate data preparation, and enable more complex and adaptive data science. Graph database vendors seem to verify this across the board. 2019 was a very good year. Having identified knowledge graphs as a key technology for the 2020s, we take a look at how they are evolving. The 20-Year-Old Hype

Starting point is 00:01:18 First, let's quickly recap those 20 years of history. What we call knowledge graphs today has been largely initiated by none other than Tim Berners-Lee in 2001. Berners-Lee, who is also credited as the inventor of the web, published his Semantic Web Manifesto in the Scientific American in 2001. The core concepts for knowledge graphs have been laid there. The Semantic Web Manifesto was in many ways ahead of its time. Looking back today, we can see some parts of it going strong, while others have faded. Building on a foundation of standards for interoperability, such as Unicode, URIs, and RDF, the core of the vision has always been semantics, instilling meaning in web content. The semantic web got a bad name for being academic,

Starting point is 00:02:00 while some technical choices such as XML did not quite work out. The thing is, however, that crawling and categorizing content on the web is a very hard problem to solve without semantics and metadata. This is why Google adopted the technology in 2010, by acquiring MetaWeb. In 2012, the term knowledge graph was introduced. A very successful rebranding indeed, and that's not all we have Google to thank for. Google employs key people in the domain and is the driving force behind schema.org. Schema.org is the core of Google's knowledge graph. It is, unsurprisingly, schema. Knowledge graphs and schemas are foundationally bound. While not all knowledge graphs are as big as Google's, every one of them is based on a schema. Knowledge graph neophytes do not always

Starting point is 00:02:45 realize this, but whether it's implicit or explicit, there's always a schema. Which brings us to the point. Knowledge graphs and graph databases. Knowledge graphs can be stored in any back-end, from files to relational databases or document stores. But since they are, well, graphs, it does make sense to store them in a graph database. This greatly facilitates storage and retrieval, as graph databases offer specialized structures, APIs, and query languages tailored for graphs. In addition, many graph databases today offer a lot more than just a store for data. They come packaged with algorithms for graph analytics, visualization capabilities, machine learning features, and development environments. They have essentially grown from databases to platforms. But there is further nuance here. Graph databases come in two main flavors, depending on which graph model they

Starting point is 00:03:34 support, property graph and RDF. In general, RDF graph databases emphasize semantics and interoperability, while property graph databases emphasize ease of use and performance. When it comes to knowledge graphs, RDF graph databases are a natural match. It's not impossible to build knowledge graphs on top of property graph databases. Usually, however, this results in having to learn knowledge management fundamentals the hard way, and re-implement relevant features. While lessons don't come for free, building on platforms centered around knowledge management helps. Property graphs and RDF graphs are not that different conceptually. Having interoperability between them would be both possible and desirable.

Starting point is 00:04:14 This is why in March 2019 a W3C workshop on web standardization for graph data took place, as the first step towards standardization in the graph database world. A key element to bridge the gap is something called RDF, RDF star. RDF is a proposal to standardize a modeling construct for RDF graphs, namely the addition of properties to edges. Although this is possible in RDF, there is no standard way of doing it. Standardizing it would not only help interoperability with property graphs but also interoperability among RDF graphs. From secret handshakes to RDF stars. As Steve Sarsfield, VP of Product and Cambridge Semantics put it, before RDF, if people wanted to use edge properties in RDF graphs, they had to rely on secret handshakes. This is not ideal, especially considering one of the key advantages

Starting point is 00:05:03 of the RDF stack is standardization and interoperability. In the wake of the W3C initiative, a couple of RDF graph database vendors went ahead and implemented RDF. Cambridge Semantics is one of them. Its Anzo graph database supports RDF, as well as Sparkle. Sparkle is the standard query language for RDF, and Sparkle is its extension that works with RDF. Cambridge Semantics recently unveiled AnsoGraph DB version 2, and when discussing the release with Sarsfield, we wondered what their experience from the field has been. Are people asking for RDF, has it helped adoption? Bridging the gap with property graphs has enabled AnsoGraph to get

Starting point is 00:05:40 an implementation of Cypher, the most popular language for querying property graphs, underway. Sarsfield noted that it's still relatively early days for knowledge graph adoption. As such, many of the organizations that use AnzoGraph tend to have highly skilled people on board. For them, switching between data models and query languages is not much of an issue. For mainstream adoption, however, this is important. Stardog is another RDF graph database vendor that has implemented RDF. Mike Grove, Stardog co-founder and VP engineering, said this has been in the works for a while, and they are very excited about it. Stardog started working on the plumbing as part of the Stardog 7 development effort,

Starting point is 00:06:19 and they were very happy to be able to ship the feature. Regarding its reception, Grove noted that what people wanted was a more user-friendly way to have edge properties. Neo4j obviously got this right. RDF does a fantastic job of bringing the same ease of use to semantic graphs. He went on to add that customers are excited, and many are already working on integrating it into their applications. Technically, RDF and Sparkle are not yet standardized. Both have been introduced by Olaf Hartig, a researcher at Linköping University. When inquiring about their status, Hartig noted that while there have been delays, he hopes the standardization process will pick up speed soon. For knowledge graph platforms, too, GraphQL is a plus.

Starting point is 00:07:00 Both Sarsfield and Grove noted that they expect RDF to boost knowledge graph adoption. Implementation is key, and having early adopters and real-world usage may also catalyze the standardization process. Sarsfield and Grove express their support for the process, as well as the need to get the word out. RDF can make a difference, but it's not the only thing going on in the knowledge graph world. As knowledge graphs entail several layers and can be a central piece of infrastructure for organizations, graph databases are growing into platforms. Anzo Graph started as part of the Anzo platform before becoming a product in its own right. Stardog also touts its product as a platform, emphasizing features such as visualization and virtualization built around the graph database core. Another RDF graph database vendor,

Starting point is 00:07:44 Ontotext, recently announced a new version of its own platform. An interesting feature that Stardogs and Ontotext's platform share is support for GraphQL. Unfortunately, GraphQL's name does not do it justice. As if there was not enough confusion already regarding Graph, GraphQL is not a Graph query language. GraphQL is a replacement for REST APIs. Despite the misnomer, it's very useful, and its popularity among developers is growing. This is why more and more databases are adding support for GraphQL, with names such as MongoDB joining the GraphQL wave. Graph databases are no exception.

Starting point is 00:08:25 Stardog has announced it in 2017, Ontotext is in the process of adding it. As StarDog put it, more developers know and are learning GraphQL than all the GraphQuery languages combined. Ontotext on its part put together a rather elaborate post on the use of GraphQL in its platform. Whichever way you approach it, however, GraphQL makes lots of sense for accessing services built around database platforms. GraphQL Plus Variants StarD platforms. GraphQL plus variants. Stardog reports GraphQL success within its customer base. Grove mentioned that one of the big Silicon Valley tech companies exclusively uses GraphQL to interact with Stardog. Both Grove and Jem Rayfield, Ontotech's chief architect, agree that GraphQL can work well in some cases, but by its very design, the expressiveness of GraphQL is quite limited. Most people who don't know GraphQL can work well in some cases, but by its very design, the expressiveness of GraphQL

Starting point is 00:09:05 is quite limited. Most people who don't know GraphQL assume it's a graph database query language. Most people who know GraphQL wonder how a graph database can be powered by it. This statement comes from Manish Jain, the CEO and founder of Graph. Graph is a graph database powered by GraphQL, or something like it. GraphQL Plus is a derivative of GraphQL, developed and used exclusively by Graph until today. In a 2019 interview with ZDNet, Jane expressed no interest in standardization for GraphQL Plus. No other vendor we know of has expressed interest in adopting GraphQL Plus either. But that's not all there is to GraphQL for graph databases. Most approaches are about what GraphQL

Starting point is 00:09:45 can do for knowledge graphs. But to close the loop with the semantic web underpinning of knowledge graphs, here's an idea. What if GraphQL resources were annotated with URIs? URIs are global identifiers, which can denote concepts from shared vocabularies, such as schema.org or other ontologies. This seems like a natural fit, and one that both Grove and Rayfield agree has potential. There is another working group set up to align RDF and GraphQL, although it does not look like it's moving very fast. Knowledge Graphs in the 2020s. We speak your language. It seems we are moving towards a new status quo. If NoSQL stands for not only SQL, we could call this NoSparkle, not only Sparkle.

Starting point is 00:10:26 Sparkle remains the language of choice for taking full advantage of knowledge graph capabilities. It also doubles as an API, its expressiveness is beyond what GraphQL can attain, and Sparkle's federated query and data integration capabilities are unique. But vendors seem set to meet users where they are, be it GraphQL or any other language. Even SQL. As Stardog's Grove put it, we've always strived to bring our technology to the users. GraphQL was a step in that plan. Supporting SQL is the next step in that journey, not because SQL is better than GraphQL, but because of what that support enables. SQL enables existing tooling to work on top of graph databases,

Starting point is 00:11:05 making them accessible to a wider audience. Stardog is not the first graph database platform to have added in SQL connectivity layer. Cambridge Semantics also offers a connectivity layer for Tableau. More graph databases support SQL, and there is an ongoing standardization effort to add graph extensions to SQL itself. Eventually, even natural language support could be an option. No matter how you feel about SQL, Sparkle, GraphQL, or any other query syntax slash language, natural language is just better. Why ask someone to learn an esoteric syntax when they can just simply type? Said Grove. Grove mentioned Stardog will be launching a natural language interface to the knowledge graph. A pipe dream?

Starting point is 00:11:50 This may not be too far off. There is ongoing research for natural language interfaces for databases. And, to add to this, there are also existing integrations for accessing databases via voice assistants. So, you can see where this is going. We don't know whether conversational knowledge graphs are something everyone would be comfortable with. What we do know is that more options is a good thing, and exciting times are ahead. Stay tuned as we keep exploring the years of the graph. I hope you enjoyed the podcast. If you like my work, you can follow Linked Data Registration on Twitter and LinkedIn.

Orchestrate all the Things - Knowledge graph evolution: Platforms that speak your language. ZDNet Article

Knowledge graphs are among the most important technologies for the 2020s. Here is how they are evolving, with vendors and standard bodies listening, and platforms becoming fluent in many query l...anguages. Article published on ZDNet, January 2020.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.