Software Huddle - Introduction to GraphRAG with Stephen Chin

Episode Date: September 4, 2024

Today we have Stephen Chin, VP of developer relations at Neo4j on the show. Stephen is an author, speaker, and Java expert, we’ll actually be crossing paths in person at the upcoming Infobip Shift c...onference in September. We got together to talk about GraphRAG. His CTO recently wrote an article titled The GraphRAG Manifesto, and Stephen joined us to explain how a knowledge graph can be used to improve performance over traditional RAG architectures. It also helps address some of the fundamental limitations to LLM adoption from enterprises today, like hallucinations and explainability. GraphRAG is relatively new, but looks like a very promising approach to improving performance for certain generative AI use cases, like customer support.

Transcript
Discussion (0)
Starting point is 00:00:00 What is a knowledge graph and how does that kind of fit into this world of RAG? Graph databases and knowledge graphs have been around for quite a while. Imagine if you had an LM where part of its brain not only came from the large language model or the base model, which is loaded up with all of this vectorized information, which is a very statistical model. But then it also has a knowledge graph, which is connected to the vector database in terms of linking between nodes and vectors. But now you can actually do the search algorithms on which things are related, kind of which parts of the graph are relevant for different user queries. And you can start to pull useful kind of semantic knowledge rather than just relying on that statistical information. In terms of performance when compared to traditional RAG architecture, how much better is this? So what I want to say is performance is hopefully similar, right? You want response time and query
Starting point is 00:01:00 results coming back just as quickly as you would from a standard LLM architecture with a vector database. What you're really looking for is higher accuracy. So the results you're getting back more often actually answer the question. And the ability to understand what the answer coming back is and how it was arrived at. Those are the main benefits you're going to get from a graph-reg architecture over a standard like vector database. What would you recommend for people to kind of get up and running and where do they start?
Starting point is 00:01:35 Hey everyone, Sean here. Today, I have Stephen Chin, VP of Developer Relations at Neo4j on the show. Stephen's an author, speaker, and Java expert. We'll actually be crossing paths in person at the upcoming InfoBib Shift Conference in September. He and I got together to talk about GraphRag. His CTO recently wrote an article titled
Starting point is 00:01:54 The GraphRag Manifesto, and Stephen joined me to explain how a knowledge graph can be used to improve the performance over traditional RAG architectures. It also helps address some of the fundamental limitations to LLM adoption for the enterprise today, things like hallucinations and explainability. GraphRAG was relatively new to me,
Starting point is 00:02:11 but looks really promising approach to improving performance for certain generative AI use cases like customer support. As always, if you have questions for the show, let Alex or I know. And with that, let's get you over to my interview with Steven. Steven, welcome to the Software Huddle. Very glad to be here. Yeah, I think this should be exciting. And also, we're both going to be speaking at an upcoming conference together. So I think this is a great
Starting point is 00:02:33 introduction. Yeah, this is a good way for us to, you know, we get to meet virtually before we meet in person. That's always exciting. And then you, I think it's your first time going to InfoBip Shift, the big software engineering conference in Croatia hosted by InfoBip. So you at least have a friendly face there that you can kind of grab onto. Yeah, it should be exciting. I've been at, I think, almost every developer conference in Europe, from DevOps to JFocus to KubeCon and all the other big events.
Starting point is 00:03:06 But somehow I missed Croatia on my trips around the world. So I think this should be exciting. And it seems like it's a very engaged crew that really cares about getting good technologists, building a really good set of content for the audience. And I'm excited about it. Yeah, I might even say that at some point on this podcast, but I think that of the, I don't know,
Starting point is 00:03:29 probably thousands of conferences I've been to, at least as a speaker, I think Shift does one of the best jobs I've seen in terms of creating a really fun speaker experience. And I think part of that is they do have a really engaged crowd that's there. So you kind of feed off that energy. It's a terrible experience to be a speaker at a conference and everyone's just kind of checking their email
Starting point is 00:03:54 and no one seems to be participating in the thing that you spent all this time putting together to try to get people excited about. Yeah, exactly. So I'm excited. It should be a great event. Yeah, awesome. Well, so I. It should be a great event. Yeah, awesome. Well, so I wanted to kind of kick things off by talking a little bit about enterprise LM architecture.
Starting point is 00:04:12 So, you know, first of all, is any enterprise actually using LMs in production today? Like, I feel like there's a lot of AI companies, there's a lot of demos, but not necessarily a whole lot of deployment. Yeah, so that's the million-dollar question. Or maybe billions and billions of dollars question. And I think where most companies are in their adoption
Starting point is 00:04:35 stage is they're building out proofs of concepts or augmenting existing applications. They're getting good results. Like they're, they're finding things which really work and provide value, but not a lot of companies are pushing to production for various reasons. We can talk about now. There are some folks who at Neo4j we've,
Starting point is 00:05:01 we've seen be very successful with graph rag architectures and actually push them into production systems. And without getting into specific details, like we've seen some large oil companies using it for analysis and for kind of figuring out how they can optimize and resolve issues on the production. We've had some customer support use cases with big companies, which have been using knowledge graphs and graph databases and graph reg architectures to solve problems. And I think one of the things which is challenging for large language models and for LLMs in general is a problem of hallucinations.
Starting point is 00:05:46 So for end-user systems, like if you're playing around with chat GPT or if you're using Copilot to help you code, the fact that the LLM is not always right and sometimes makes up crazy stuff, at best, it's entertaining and at worst it's you know it's something which you can which you can identify and you can you can fix or or you can you can kind of filter out in the in the realm of that's not really reasonable or that doesn't um that doesn't match expectations but if you were building something where it was a commercial use case like trying to fix do aircraft maintenance having an lm suggest maybe this part would be good on on the aircraft and here's some instructions on how to install it and if it's the wrong part the plane crashes and you're done so if the the range of error in industrial use
Starting point is 00:06:42 cases is much lower than in commercial use cases. And I think a lot of enterprises are struggling with this because when you're doing a customer support system and you get, let's say, 70% or 80% correct or accepted answers by folks who are using the system, you're literally telling 20% of your customers the wrong information. And is that something which is production ready? I mean, maybe if you have
Starting point is 00:07:10 a front layer of folks who are going to filter the bad responses out, yes, maybe it's a good productivity accelerator for the organization, but it's pretty far from the hype and promise that I think a lot of us expected to come out of Gen AI as a technology that's going to automate and solve all the world's problems. On the customer support side, so people have been investing in not necessarily large language models, but in ways of automating customer support for years, essentially. And in those models, I would say are significantly less, you know, a person in to answer those questions, or at least acknowledge like, hey, I don't know the answer to this versus with a large language model, like, it's going to make something up, it's going to be able to answer a question regardless of whether it like actually is able to answer that question accurately
Starting point is 00:08:17 or not. Yeah, yeah. So I think that's part of the problem is, so LLMs will always give an answer back and quite authoritatively. And in the lack of information, they'll come up with an answer which is quite reasonable, seems plausible. And even somebody who's like a trained expert might accidentally be like, well, this sounds right. And let me test it. And oh, that doesn't actually work. So that's part of the problem the other part of the problem is getting the knowledge and the data sets which actually can support LM so if you look at the big
Starting point is 00:08:54 LM models they're trained on hundreds of millions of documents and sources and so the amount of context they have, it eclipses what us as individuals are able to consume and take in resource-wise. And they can do that because there's a lot of information and knowledge and good knowledge sources on the web and other places, which is not all correct,
Starting point is 00:09:22 but in mass, you can actually get good information out of it. For most organizations, even if you put in a bunch of unstructured documents or relational tables or other data into an LM, you don't have that kind of vast array of data and information sources on enterprise knowledge and enterprise systems. So if you think about your average knowledge base for customer support, it's trained off information from existing cases, from folks who ask questions, it gets added to and augmented over time.
Starting point is 00:09:56 But there's a lot of cases where it just can't cover all of the cases and all the knowledge. So one question is how accurate is your LLM? You want to get the highest accuracy possible. And then the next question is how explainable is the answer coming out of it? If it gives an answer back and it's just a black box, you don't know why it got back the answer. It can't reference documents or information or like like the the source by which it arrived at the answer that's also not helpful
Starting point is 00:10:31 and it makes it very hard to to diagnose it and um at some level also how how auditable is it like to to improve it you need to find a way where um you can when the LLM doesn't know the answer and then go back and help to improve either the knowledge it's trained on or the information sources which it's working off of. And these are all quite big challenges for advancing LLMs. What a lot of our customers are using is they're using knowledge graphs. And it might help to explain what a knowledge graph is as well, but basically using knowledge graphs as a data source for LLMs. And then rather than just doing a vector search where you're doing a statistical model to get back results, when you do a search on a knowledge graph, then you actually have an information set you can navigate and reason about that helps you explain how you arrived at the answer, whether it's the correct answer.
Starting point is 00:11:29 And sometimes that there's no answer. If it's not found in the knowledge graph, the LM can still give a response back to the user that might be helpful. But it can do it on the basis of not having the information and saying, well, maybe you should talk to a customer support person. Yeah, so I definitely want to dig into that in more detail. But just one comment before we move on to that is, even outside of these challenges, I think, with hallucination and explainability, which are kind of like usability problems.
Starting point is 00:12:00 It's an experience problem for me as a consumer of the LLM where I'm going to have to figure out, is this a made-up answer or not? And where did this actually come from? On the technical side, there's also this problem of essentially, this is all an emerging tech stack. So even things like testing, observability, monitoring, these are challenges as well that companies are going to be facing with really moving these into production systems. It's because you're sort of cobbling together a variety of different tools that are all relatively new probably to your organization to do something from a training pipeline to versioning of your models to testing and iterating on. Yeah, no, absolutely.
Starting point is 00:12:46 I think a lot of the process kind of like build pipelines and like DevOps infrastructure to four large language models and AI systems still isn't quite there. And in addition to the things you said, performance and response time is a huge issue,
Starting point is 00:13:04 especially as folks are looking into agentric systems where now they're pulling in from a whole bunch of different knowledge sources or they're trying multiple models. And to get a single answer, you can potentially have multiple layers and layers and layers of systems where it's querying and you have additional response time issues and performance latency, which is very hard to set up automated test infrastructure and actually design things which can kind of give you something close to what the end user experience is. Yeah. So you mentioned this knowledge base
Starting point is 00:13:44 approach or like graph rag, is the way that I saw it expressed from your CTO who wrote this article about the GraphRag manifesto. So this is all about bringing knowledge graphs into the RAG architecture. So can you give a little bit of background on that? Like maybe let's first answer that question of like, what is a knowledge graph for anybody who maybe doesn't know? And then how does that kind of fit into this world of rag yeah so um graph graph databases and knowledge graphs have been around for for quite a while and um basically it's a more human way of of looking at data so instead of rows and columns you have nodes and a node in a graph can have multiple properties or things attached to it so they can be quite rich and then relationships between those nodes
Starting point is 00:14:32 so if you're building out a family tree you could say this person is the father of this other person so you can actually build relationships that have multiple relationships between nodes and multiple layers and the benefit of actually build relationships that have multiple relationships between nodes and multiple layers. And the benefit of kind of encoding things in a knowledge graph is one is it gives you a more fluid data model. So you can have a schema and you can make it structured, but also you can have an unstructured graph as well to represent things which are hard to represent with a very fixed schema. And also performance, especially when you're doing multi-level queries. So relational models kind of struggle when you have to go and query multiple levels on the same or different indexes to get a single answer, whereas this is the strength of graph databases. So they love doing deep queries to get information because they have that model,
Starting point is 00:15:30 which is graph-based rather than a table-based model. And so with the typical sort of like information queries you get off knowledge systems, it's quite fast. And the kind of the classic example that graphs are pretty much agreed upon to be the best answer from a technology standpoint is fraud systems. Because you're trying to find unique relationships in data or patterns of transactions. A really classic example of this is the Panama Papers. So they used a graph database to figure out fraud and issues by sucking all the Panama Papers up into a graph database, finding the
Starting point is 00:16:12 relationship between different companies, individuals, entities, partners. And then you can start to do interesting things like find me different actors who are living at the same physical address, but they own businesses which have overlapping bank accounts. So maybe one person has an offshore bank account in Panama. The other person living at the same address has another individual account in the US. And there's some transactions
Starting point is 00:16:45 or things which are happening which are not directly addressable but then you can kind of by inference you can figure out that clearly there's some embezzlement or some money flow which shouldn't be happening and the speed and the rate
Starting point is 00:17:02 at which you can query, you can identify patterns and you can solve complex problems makes it indispensable technology for being faster than the hackers, being faster than the fraud artists, because now you can create expert systems and automated systems, which kind of use knowledge graphs as this traversal.
Starting point is 00:17:19 And the same principle applies to a generative AI application. So imagine if you had an LM where part of its brain, part of its knowledge, not only came from the large language model or the base model, which is loaded up with all of this vectorized information, which is a very statistical model, but then it also has a knowledge graph, which is connected to the vector database in terms of like linking between nodes and vectors. But now you can actually do the search algorithms on which things are related, kind of which parts of the graph are relevant for different user queries. And you can start to pull useful kind of semantic
Starting point is 00:18:04 knowledge rather than just relying on that statistical information. Yeah, so I know my brother-in-law actually works in the space of investigative search, and this is a really popular thing that they do there because they're going to be sucking in data from all different sources and they want to know and figure out these different relationships that would be difficult to tell in any other modeling fashion. In terms of comparing this to something like doing similarity within a vector database, it is a statistical model, but part of the point of vectorizing the data is to be able to tell semantic relationships between objects in
Starting point is 00:18:46 space by you know their closeness for things like cosine metric or something like that so what is the um you know where is it that uh like a graph-based model like is a better choice essentially for determining those relationships between the objects versus the more vector approach. Yeah, so it depends on the type of knowledge graph and the quality of the data that's underlying the knowledge graph. But if you have a knowledge graph which has a lot of that knowledge embedded inside of it, what you can start to do is you can start to pull relationships between nodes and also extract even sub portions of the knowledge graph and hand it to the LLM as context. So one example of this is we have a training workshop we run where we load up the SEC filings. And you can ask an LLM or a basic RAG architecture, for example, like one of
Starting point is 00:19:49 the data sets we loaded up was information about lithium ion battery shortages. And you ask the LLM, like, you know, how to what's the effect of lithium ion battery shortages on the industry? And it gives you a typical chat GPT response where it explains why lithium shortages are a problem and how it affects companies and like how these different companies in the industry might be impacted. Very authoritative, very professional, but it's not grounded in a lot of facts. And you give the same LL LM a knowledge graph with information rather than vectorized in knowledge graph format of the finding. So basically the same data set, but in knowledge graph rather than vectors.
Starting point is 00:20:32 And basically what the algorithm's doing in the background is it runs the query and it queries not only the vector database, but also queries the knowledge graph using a cipher query. It pulls back part of the knowledge graph with a bunch of companies, organizations, and relationships, passes it in as context to the LLM. And then the LLM responds with now the initial user query, but also the additional context from the knowledge graph.
Starting point is 00:21:03 And in that answer, instead of just giving a broad definition of what lithium-ion battery shortage is, it says, well, hey, these are the companies which are going to be impacted. Here's a case study on Black & Decker and how they're using it and how it impacts their supply chain.
Starting point is 00:21:19 Because now the LM has all this additional knowledge and context loaded up. But the difference is if you just had an NLP and an expert system and knowledge graphs, an LLM together, the LLM does all of that work to interpret the question, to kind of pull the relevant knowledge out of the knowledge graph context it was passed in, and then give back a very professional, authoritative, perfect English response that an end user is going to read and they're going to be like, okay, this is helpful. This solves my problem. This gives me some real data. I can then go apply. And then
Starting point is 00:22:10 like they, they, they get the results out of the system they're looking for. Yeah. So it kind of sounds like you're taking some of the, like the best of both worlds of expert systems and large language models where, you know, large language models are good at creating a readable response that like a human can digest. Expert systems, maybe not so much, or it takes a lot of work to get there, but expert systems have this, you know, base knowledge that's actually grounded in reality
Starting point is 00:22:37 that we can, you know, leverage essentially in this graph rag architecture to inform the LLM so it does give like a response that's also grounded in reality. It's also readable. Yeah. And if you think about it, I mean, this is this is like literally what we we do as humans when we we we join a company and you get a job as customer support or a pre-sales engineer or like some sort of like technical
Starting point is 00:23:04 person interaction role. Your first job is months of training, learning either on the job or getting mentorship and workshops and training. And there's a lot of organizational and enterprise knowledge which you pick up and you learn as part of that process. And for LLMs to be effective in a lot of the roles and the types of tasks which we want them to accomplish, you essentially need to give them that same base knowledge, training,
Starting point is 00:23:36 understanding in a way which the LM can actually use rather than just a black box. It becomes challenging for companies. And I think this is why we see a lot of like, we were chatting about getting to production. So if you have a system, which maybe by, you know,
Starting point is 00:23:56 by testing and quality control measures and maybe some field testing, it's like, it's like, you know, 76% accurate. How do you, how do you get it to the level where it's like 76% accurate. How do you get it to the level where it's actually going to be good enough that you can put in production
Starting point is 00:24:11 and it could actually meet the end user goals? And a lot of that is making sure that it's grounded in a knowledge base where over time it's going to improve, it's going to get more answers, it's going to become a better assistant. And then you can look at the results coming out of it and you can say, OK, well, this answer was a bad answer. But why was it a bad answer?
Starting point is 00:24:36 Did it pull back nothing from the knowledge sources? Maybe it actually there was no information because it's a net new problem. Or maybe it didn't understand the question. So it got it wrong from an explainability problem. And a lot of the way of getting that extra mile, getting from like 76% to 80%, 85% or whatever your target is, is how can you tweak, either improve the knowledge or get better queries or do those extra improvements so that you can get the LM to meet the needs and requirements of stakeholders. And until
Starting point is 00:25:13 you can do that reliably, it's very hard to put a system into production where the ability to get it and improve it to the level it's needed to actually be useful is unknown. And I think that's why a lot of projects get stalled in the prototype phase, because it shows great promise. It solves the happy case great. But then what about all these corner cases and issues where the folks testing and doing quality control come back and say, oh, it failed all these other measures. Did you try this and this and this and this?
Starting point is 00:25:44 And then you're back to the drawing board development. Yeah. I mean, it's a little bit like if you, in the customer support scenario, put like any I don't know, like college educated person on customer support but they had zero knowledge of like what the actual company did. They could probably manufacture answers to people and maybe even answer some of them like competently. But a lot of those answers are going to be wrong, even if they are readable. The difference is that the person can actually acknowledge when they don't know the answer versus the LLM has...
Starting point is 00:26:14 They're essentially overconfident in competence, where they're just going to say something really confidently, whether it's correct or not. So the grounding it using the knowledge graph approach is a little bit like giving the the new hire the training that they need in order to be able to answer questions that are ground in reality as well exactly exactly in terms of like what right like the rag training pipeline looks like is really the only difference here is that azure presumably also probably building uh like a vector database using vector embeddings. You're also building a knowledge base
Starting point is 00:26:49 or knowledge graph based on your input sources as well. Yeah, so the easiest way to do this is if you either have a knowledge graph or you build up a knowledge graph based on a combination of structured, unstructured data sources. And we have a new open source prototype tool, but it's quite good, called the Knowledge Graph Builder. And basically what it does is you can feed it a set of documents, a set of videos, a set of different sources,
Starting point is 00:27:18 and it will use an LM to generate the knowledge graph. So the LM reads the document sources and then uses that to generate the knowledge graph. So the LLM reads the document sources and then uses that to construct a knowledge graph. It actually does quite a good job for a starting place. It gets stuff wrong, it misses some of the relationships. It's not like a very well-tuned knowledge graph model, but it gives you a good starting place.
Starting point is 00:27:40 So once you have a knowledge graph, then you can do two things with a knowledge graph. So one is, you can just treat it like a normal reg architecture and do vector searches against it. So you take the knowledge graph, you create your vector embeddings, you directly query it and pass in context to the LM. That works just as good as using
Starting point is 00:28:02 the same data in a relational database structure. It's basically relying on the statistical probability of kind of how good the vector model is at generating and creating relationships. It's hard to explain some of the answers because it's kind of a black box, but it works well. And then since you already have a knowledge graph, you can additionally do queries against the knowledge graph. And there's a couple different patterns for how you can accomplish this. Kind of the place most people start, and it's the most intuitive,
Starting point is 00:28:35 is you take the incoming query and then you convert that to a knowledge graph query, so a vector cipher query. And what that does is it queries the knowledge graph query, so a vector cipher query. And what that does is it queries the knowledge graph. It will get back a response and possibly a large portion of the graph to pass in as context to the LLM. Sometimes it won't return a result, but then you still have the vector embeddings to fall back on. And then over time, if you see some user queries are very common and it can't generate the correct queries against it, you can also put in hand-tuned queries to optimize those use cases.
Starting point is 00:29:10 The other one, which is not as obvious, but actually more of our production customers have found this to be really effective, is when you query a vector database, it gives you back a bunch of candidates. And it really doesn't have much of a mechanism to prioritize those candidates and give back the best information to the LM. Yeah, a lot of people end up applying a secondary model to basically filter the context window down small enough. So knowledge graphs are actually the best source for doing that second level filtering because now you can look at the results, you can compare it to the knowledge graph, and you can start to use the knowledge graph to figure out which ones are close enough, were closest from a knowledge standpoint to what the person was initially asking. And unlike generating a cipher query, now you don't have that case where you have to fall back entirely on the vector database.
Starting point is 00:30:07 Now you can just look at all the results coming out of the vector database, use the knowledge graph to augment that and prioritize them, and it gives you much better results than if you went directly off of the prioritization coming out of the vector database. What about from a performance standpoint? Inference is already a fairly
Starting point is 00:30:26 slow operation, especially in the chatbot scenario where I'm expecting relatively real-time responses. Now I'm adding another, essentially, database that I have to query as part of the inference process. Yeah, so I think
Starting point is 00:30:42 as we keep adding additional steps and things happening to the normal workflow. So while you're doing the vector search, you can also do the knowledge search. So that gives you the ability to streamline and parallelize things. And the second one is that knowledge graph technology has been around for a while now. And so if you're using a high performance commercial
Starting point is 00:31:23 graph database, then really the response times you get are going to be really, really fast. So you can get similar response times to your vector search as your graph database, feed it into the LLM, and then get a response back that doesn't slow down your L4 processing. Now, that said, as you tune it more and you say, ah, well, let's add an extra prioritization step. Let's use the knowledge graph to look at the answer, which comes back and see if we can get additional insights. You can create complex architectures that reduce the response time, but then give you benefits and higher accuracy or higher explainability. And so what I would say depends on the requirements of the system.
Starting point is 00:32:07 If your goal is to get back the most accurate, the most highly accurate results, you're probably going to end up with a system with multiple stages and multiple knowledge systems that you're asking, you're querying. And one way which you can do this from a user interface standpoint is as your agentric systems processing one request, you tell them, okay, well, now I'm going and querying additional info from this other knowledge source. And now I'm verifying the answer. And you kind of give them feedback as you're going through phases.
Starting point is 00:32:36 So it's not just sitting and spinning for a minute or more. But I think that is a big usability issue, especially in systems which have a lot of disparate sources. They need to do a lot of processing on the back end. How do you make it a good user experience when you have to do a lot of work to get the best answer possible? And I think we're all basically conditioned to expect the Google experience. I type a random string in a web search, and then it comes back within milliseconds on response time. I think it depends a lot on the use case, though. I think the customer support scenario, maybe my expectations are relatively
Starting point is 00:33:20 like, if I know I'm talking to essentially a bot, then my expectations are I'm getting it, like, relatively close to, like, a real-time response. But, you know, we built in my company, like, an internal content copilot that we use for helping us create first drafts for documentation and blog posts and stuff like that. And they're like, if it takes, accuracy is more important than real-time response. If it takes even two minutes for response, but the result is saved you an hour or more
Starting point is 00:33:52 and it's much more accurate, that's much, much higher value in that particular use case than having something that's real-time. Yeah, and I think even in the case for customer support systems um i think if you look at the experience today um it takes a very long time to to get to a satisfactory response even when you're talking to a real person well it's even worse when you're talking to a real person because usually they're handling like probably 15 chats at the same time yeah i i was
Starting point is 00:34:22 well this i was trying to get a billing issue fixed on my mobile phone. I won't mention the company, but like I literally, I knew what the problem was. Exactly what the issue was. I pointed it out to them in the first minute of the call.
Starting point is 00:34:36 Two hours. And that's, I think that's the experience a lot of folks have today when you're trying to resolve an issue is that getting from the explanation of the problem to the result, even when the answer is known in the middle, like there's a knowledge source, like this is an through human-assisted customer service. So I'm very hopeful that once we get either fully automated customer service or we get reasonable assistance where the customer support people can ask an expert system with an LM backing and a knowledge graph,, hard questions and actually get back good answers, that this is a better experience for all of us where things which normally would have taken a very long time and been very tedious can be solved easily. And repeatedly as well,
Starting point is 00:35:35 because I think that's the biggest issue is if you're a company and you have a lot of different support cases and problems coming in, maybe it takes a lot of time to diagnose and figure it out the first time, but every additional time after that, it should just be an immediate fix. Yeah, absolutely. In terms of performance when compared to traditional RAG architecture, how much better is this?
Starting point is 00:36:03 So what I want to say is performance is hopefully similar, right? Like you want response time and query results coming back just as quickly as you would from a standard LLM architecture with a vector database. What you're really looking for is higher accuracy. So the results you're getting back more often actually answer the question. And the ability to understand what the answer coming back is and how it was arrived at, those are the main benefits you're going to get from a graph-reg architecture over a standard vector database. Yeah, well, do you have other performance
Starting point is 00:36:43 numbers, essentially, for comparing accuracy or reduction in hallucinations or anything like that? Oh, yeah. So there are some good studies on this for like comparing similar results coming back from a RAG architecture versus a GraphRAC architecture. I think Microsoft research did this and they published a research paper on this and showed comparative results coming back from their GraphRack architecture and it was quite good. There have been other industry research studies as well. I don't have the numbers on the tip of my finger, but they'll appear while we're talking. Is there challenges with context window lengths where if you're combining both knowledge graph information plus traditional
Starting point is 00:37:32 vector database search into the context window, do you run into issues with having to condense that information to fit inside depending on the model's lengths? The LLMs are limited in the context window, so you always have to take that into account in the design and how you're approaching building the system. Within the limited context window, I think one of the advantages of GraphRag
Starting point is 00:37:56 is if you give the more relevant information first, it improves the quality answer greatly. And one of the things which we've noticed building GraphRack architectures is LLMs are also notorious for ignoring information farther down the context window. So they look at the information closer to the beginning
Starting point is 00:38:15 of it. And if you give them too much information or too much data, their ability to get useful insights out of it declines. So I think prioritizing what you're putting in the context for the LM is one of the most important things which you can do with the GraphRag architecture. And getting back to your previous question about accuracy data.
Starting point is 00:38:40 So one example is by a data catalog company, Data.World. And they published a study on their GraphArc architecture and showed a three times improvement on accuracy using a GraphArc architecture over a traditional RAG architecture. So the accuracy of their LM responses improved by 54.2%. Okay, that's pretty significant. What about, how big can these knowledge graphs get? They can get quite large. And actually, there's kind of two use cases for GraphRag, which are very distinct. So what we've been primarily talking about is
Starting point is 00:39:26 augmenting expert systems or providing a more grounded knowledge source for enterprise systems. Another use case for GraphRag architectures is research. So let's say instead you were trying to solve a hard problem. You have a big data source and you can represent this most cleanly as a knowledge graph. And you want to answer some very difficult questions, but more in a research-oriented fashion where you don't care about the size of the context window. You don't care about the size of the context window. You don't care about the response time.
Starting point is 00:40:05 So basically, you're willing to invest additional compute resources into the LLM and to wait a longer time for response time, but you want to get some meaningful insights. And so we actually had one customer which did this in the oil industry. And they loaded up a lot of information into an LLM, kind of used it as an expert system to start to find patterns in the data. And were quite successful in getting the LLM to glean some insights from an extremely large data set, which normally would have been impossible for humans or would have taken a lot of data science and research time to understand and investigate the insights. So I think that's a different use case for how to use knowledge graphs and LMs. But if you have very large context windows and you can feed it a lot of data source,
Starting point is 00:40:56 LMs are actually really, really good at taking large data sets and gleaming insights from it where it would be very hard to do otherwise. What if you use... Has there been any experiments with using only the knowledge graph and actually not using a vector database for augmentation at all? It's possible to just use a knowledge graph and pull information back from the knowledge graph and pass into context window only. I think the challenge with that is, like we were talking about, the text to query capabilities are not perfect yet. you hit failure cases where it either doesn't return back a sufficient data set or it misses entirely because the information doesn't exist in the knowledge graph to query against.
Starting point is 00:41:51 So from a user experience standpoint, those negative cases tend to be a bad experience. So augmenting that with an LM, which can respond and kind of fill in the gap with a vector search is quite helpful. It gives a more generative AI LLM type response than if you just use a knowledge graph alone. So what we found is the combination of the two is the best for giving back a good experience. And then with traditional RAG, as you get new source material, you're probably going to go in some batch mode and update your vector database from time to time. With Knowledge Graphs, how does that work there?
Starting point is 00:42:42 Is the update cycle slightly different? Yes, I mean, it's just like any other database. So you can go in and update it programmatically. You can use import and re-import the data if you want to take it from other data sources. So it's quite easy to update the knowledge graph. And then when you update the knowledge graph, assuming you built the vector database
Starting point is 00:43:04 and everything on top of it, it automatically updates the entire stack. So now your LLM is pulling from the latest knowledge sources instead of from outdated information. So that's quite easy to do. Are there frameworks that exist for building out graph architecture? So I think there's a bunch of good projects and foundations which are looking at ways of making it easier to get started with these technologies. So we've published a bunch of open source projects for like getting started, graph rec architectures and different things from our standpoint.
Starting point is 00:43:40 But now we're collaborating with the Open Platform for Enterprise AI, the OPA project that's part of the LFAI and Data Foundation. And what they have is they have a repository of different RAG patterns and Enterprise LM patterns, which are all open source, easy for end users to pick up. We're contributing a GraphRAack example to the repository. And the great thing about kind of like working with foundations or working with industry neutral bodies is nobody's selling you a platform. Nobody's trying to convince you to buy services.
Starting point is 00:44:20 It's basically designed for developers who are building their own LLM stacks and trying to figure out how to get the best results out of their systems. And I think the way generative AI technologies have grown up, the base models are the one part which is just too expensive and too costly for you to train yourself. But the rest of the pipeline, you can build with technologies like Langchain and Alama.
Starting point is 00:44:48 You can hook in your databases, your data sources, any agents which you need to respond to it. And so it makes it a very friendly open source ecosystem for you to build your enterprise applications on top of. And we want to support that ecosystem,
Starting point is 00:45:10 and we have integrations with Langchain Alama. We're contributing examples to projects like OPA, and we're also part of the OpenAI Alliance. And I think that if we have a great set of open source tools, which everybody can use to build their Gen AI stacks, it's just much easier and better for everybody who's, who's now building these technologies. And I've,
Starting point is 00:45:33 I've been talking to a lot of developers as I've been speaking at conferences. So I was at India a couple of months ago at the great India, great international developer summit. Also gave the keynote at AI Dev Summit in San Francisco and I've been on a bunch of smaller events around the world. And the folks who ask the best questions are the ones who basically at the beginning of this year,
Starting point is 00:46:00 they got told, hey, go build a RAC application like Augment, our enterprise application, go figure it out. They did the basic, the easy stuff, right? So they plugged in a vector database, they got all this stuff integrated,
Starting point is 00:46:17 and now they're trying to figure out how they can actually get good results out of the system, because they have systems working, like they have, you know, you can chat the system you can ask questions for the for the the the planted questions it does great for the like the hard questions it does horribly and so now they're trying to get to the next level with their architecture where like can i turn this into something which provides real business value can Can I actually build something which solves hard problems? And I think there's enough folks who have gotten down this path and are looking for new approaches and new ways of improving the accuracy and explainability that there's a great ecosystem around this as well.
Starting point is 00:46:58 We even see this in research groups where there's a lot of research happening in GraphRack communities. There's a lot of conferences and hackathons that universities are putting around science and data and trying to solve these challenges. And I think we should, just like we saw huge advances in the LLM space, we should see a lot of great advances coming out in the knowledge graph and LLM field. Do you feel like the sort of momentum of
Starting point is 00:47:33 enterprise LLM architecture is kind of going in the direction of GraphRag? Yes, I think for a lot of use cases it's just the right answer um and it depends on the type of data which you have as an organization um like what your requirements are for accuracy of the information for for like whether you need to be able to explain the answers coming out um and also like i think a lot of organizations are trying to figure out how to explain the answers coming out. And also, like I think a lot of organizations are trying to figure out how to improve the quality of the systems they already have.
Starting point is 00:48:10 So it's a really good answer for all of those use cases. Now, I think as it evolves, as we see more people putting it into production, there will be clear cases where these are awesome cases where, you know, GraphRag is the right architecture. And here are some other cases where you want to use an agentric system with these characteristics, or you want to use just a standard vector
Starting point is 00:48:32 database with these characteristics. I think that over time, we'll figure out what the exact best use cases are. One thing we've noticed for GraphRag is the customers, which have a mix of structured and unstructured data. So some data stored in tables and like more regular format, and then some documents and other information, which they want to add to it.
Starting point is 00:48:59 That type of mix of structured and unstructured data lends itself really well to putting into a knowledge graph. And also, a graph-reg architecture does a better job of feeding the right documents and the right information into the LLM. So those systems usually give superior results. But I mean, it really depends on your use case and your data as an end user and what you're trying to build. Is it most helpful if you already have some sort of knowledge base created? I mean, if you already have a knowledge graph, then it's easy.
Starting point is 00:49:31 So that's the 100% why aren't you already doing graph reg case. But I think most companies, they have knowledge graphs for certain purposes, but they have a lot of other information in unstructured documents or in database tables or other sources. So if you look at the mix, like, of course, existing knowledge graph, 100%. If you're looking at things where it's structured unstructured data, that's a great fit.
Starting point is 00:50:00 You can load in a knowledge graph and get really good insights out of it. If your data is simple and it all fits inside a relational database and it's not very complex query-wise and you're getting good results out of a standard architecture, that could be a good answer for some people as well. I think it depends on your
Starting point is 00:50:19 data and it's really about trying a few different approaches and seeing what gives you the best quality results. It seems kind of similar how we've gotten to a place where we have all these reference architectures when it comes to building certain types of classes of applications. That over time, we need to develop something similar for different classes of general AI use cases. If you're trying to solve this particular problem, then here's a reference architecture where using a combination of
Starting point is 00:50:51 a knowledge graph and a vector database makes sense, but maybe this other class of problem you want to use, this other reference architecture, maybe is using other foundational building blocks. Exactly. I think it's similar. And even in the graph technology space,
Starting point is 00:51:09 like there's clear places where it just wins, like fraud, supply chain management, anything where you have kind of that deep, complex relationship in data. And so I think over time, with LLM, since it's very much a knowledge based system, I think knowledge graphs will have a much larger share of the space in terms of what they're most applicable for. But I think we'll kind of norm on
Starting point is 00:51:35 use cases for LLM architectures and what the best design is given the requirements of different types of systems. In terms of people learning about these different approaches, what do you suggest? You travel around a lot, you talk to lots of folks at different conferences and are a thought leader in the space.
Starting point is 00:51:56 So what would you recommend for people to get up and running and where do they start? Yeah, so I think a great place to learn in general about LLMs and get some content is at deeplearning.ai. So they have a lot of great course content. It's free. It's easy to get started on. We also contribute a GraphRag course there.
Starting point is 00:52:19 So that's a great way if you're interested in GraphRag to try it out, but also to try a bunch of different LLM technologies and learn about them. And also specifically to GraphRag, we have a bunch of free training on our Graph Academy. And it teaches you how to build a chatbot. We have one for Python, one for JavaScript, and we're going to be coming out with additional courses as well. Again, free content. You don't need to sign up for anything. We just spin up a graph database, and all of it is really straightforward to learn how to build your first chatbot using a GraphRig
Starting point is 00:52:56 architecture. And I think, really, for the generative AI space, this is a great time to learn and upskill. So it's kind of unique in that a lot of technologies that have trended the past couple of years, like maybe IoT or blockchain or other things, they're very, very niche, only applicable to a small percentage of use cases and require fairly deep expertise in either learning specific languages or upscaling in certain technologies. Whereas generative AI technology is very easy to get started with and applies to a very, very broad set of applications.
Starting point is 00:53:38 It's like the easy way of describing it is back in, I'm dating myself, but back in early 2000, right before the bubble burst on the web, everybody was building web apps and e-commerce apps. And there was a lot of focus on a web app and a web page was your interface to customers and to consumers. Yeah, it's basically like the introduction of the app server. Today's app server. You can build lots the introduction of the app server. Exactly. Yeah, today's app server.
Starting point is 00:54:05 You can build lots of stuff with an app server. And generative AI is the new interface. People don't want to go through a website, go through a service. I forget the name of the device which they came out with. But basically, it's a little portable device with a camera on it and a speaker and microphone. And basically, you talk to it, tell it what you want. It orders Uber. It looks around the environment, tells you things about where you are, all using a Gen AI backend and agents to contact different services.
Starting point is 00:54:41 But it completely gets rid of your traditional phone user interface. So I think even the app ecosystem around mobile apps and all those things in the future could be replaced entirely with generative AI technologies. It's hard to even articulate, I think, how much this could change for various people. And I think people get a little bit overwhelmed by all the news around it. It's like, stop talking to me about AI at this point where there's other things that are happening in technology. But it is something that is going to, I think, fundamentally change so many things. So it's hard to kind of not be thinking about the future
Starting point is 00:55:25 and sort of disconnect that from the amount of things that generally the AI is going to touch. Just so I can mention the device, it was the Rabbit was the one I was thinking of. The Rabbit? Yeah. So it has a little tiny little screen on it and a camera, and basically you talk to it,
Starting point is 00:55:43 and it has a Gen AI background, back end for how it does things. Yeah, I mean, I think Humane's AI pin is also probably going in that direction as well. Yeah, yeah. One of the guys, too, at our recent meetings had a little open source audio pin. And basically it would kind of look at Node Transcriber. It would listen to the conversation, send it all to a backend, summarize it with a Gen AI model. And he was using it for meetings because you have so many meetings and so many different things you're bouncing between in a single day that it was helpful to get a summary of exactly what he'd done, what the action items were, what things he needed to accomplish done by an AI assistant. So I think we're just going to find a whole bunch of things from a consumer's perspective where Gen AI enters our life.
Starting point is 00:56:32 And as software developers, I think the biggest challenge for us is going to be adapting these to the expectations and needs of enterprise software. Because it's entirely different expectations in terms of how responsive, enterprise software because it's it's entirely different expectations in terms of like like how responsive what sort of um knowledge it needs to bring in from expert systems how it can um be extremely accurate and solve hard technical problems which is entirely different from most of the consumer use cases which we see gen AI popping up today. Yeah, I think consumer makes sense for the place to start. I think in terms of enterprise serving a business use case, there's also
Starting point is 00:57:11 a whole area around privacy and security, which is also a factor in consumer as well, that also needs to be addressed. That's a huge challenge for, I think, enterprise to be able to truly leverage this technology too. Well, as we get close to time here, I have some quickfire questions I want to run through with you. So first of all, if you could master one skill you don't have right now, what would it be? Mastering one skill. Multitasking. What wastes the most time in your day unfortunately meetings
Starting point is 00:57:53 yeah if you could invest in one company that's not the company you work for who would it be investing in one company I'm more of a market guy I like to buy the market so
Starting point is 00:58:09 I'm gonna I'm gonna I'm gonna pass on this one and say say my favorite tech index fund but I think right now is is ITX
Starting point is 00:58:18 that's like a good tech mix if folks want to invest in the market although it's down currently which maybe is good because of all the NVIDIA and everybody kind of taking a hit. What tool or technology could you not live without?
Starting point is 00:58:32 Okay, so this is a surprising one because am I not rapid firing this fast enough? Well, you determined to be quick fire, so don't worry. so um for years i i absolutely refused to to wear any sort of watch um and it was because um basically carpal tunnel early on when i was a developer and geek um and so like for years i had nothing on my wrists and i would just keep my you know keep your wrists elevated so you don't um hurt your wrists and um i've been quite good about that
Starting point is 00:59:11 but um it's really hard and it's socially unacceptable in in meetings or conferences or events to to constantly pull out your phone and look at messages. So what I found I can't live without now is a smartwatch. And I mainly use it as a notifier. I'll just get notifications from my phone when an urgent call comes in or my boss wants something on Slack quickly. And sometimes I'll even respond from the dinner table or somewhere else random. So I've went back and actually I wear a watch now but only because
Starting point is 00:59:50 it is a socially accepted corporate use of responding to things in the middle of a meeting. Yeah, I actually think I need to invest in that. I don't wear anything so then I end up pulling out my phone probably more than I need to invest in that. I don't wear anything, so then I end up pulling on my phone probably more than I need to,
Starting point is 01:00:08 and then that leads to further distractions. I was not going to get any sort of watch. I was dead set against it for years. But a couple of years ago, I realized that I was just missing calls and notifications in the middle of meetings or at conferences, and it was annoying enough that I was just missing calls and notifications in the middle of meetings or, or at conferences. And it was annoying enough that I, I started wearing one. All right.
Starting point is 01:00:29 Which person influenced you the most in your career? For this one, I would say Jim Weaver. So, um, back, back when I was, uh,
Starting point is 01:00:41 mostly like a, like a geek slash architect slash engineering manager. In my evenings and weekends, I was experimenting with different technology. And one of the technologies I was doing a lot with was JavaFX, which is a UI language for Java. And Jim Weaver pulled me in and asked me... I was doing some open source projects and some hobby things on the side. he asked me, just because we came to know each other, if I wanted to co-author a book with him. And that's the reason why I'm speaking at conferences and
Starting point is 01:01:16 I'm in developer relations and like doing all this community stuff. Because he kind of had that big heart and invited me in to collaborate on a technical project, but also has been a great friend over the years. So yeah, he's a great guy. Awesome. And the last one, five years from now, will there be more people writing code or less? There'll be more people maintaining code. Yeah, I think Gen AI in particular has accelerated the pace of development. I think even as software engineers, we're expected to produce more code and better quality code and code faster.
Starting point is 01:01:55 And inevitably where that goes is there's more code to maintain, there's more things which are broken and need fixing. And I think the pace of software development isn't going to slow down, it's just going to continue accelerating. Yeah, I agree. pace of software development isn't going to slow down. It's just going to continue accelerating. Yeah, I agree. Is there anything else you'd like to share? No, no, I hope for folks watching, this is educational and informative.
Starting point is 01:02:16 And I think that part of what I try to do is help folks in their career to learn new technologies, to upskill, to kind of become better. And a great way of doing it too is to meet folks at conferences. It could be a meetup, it can be a local conference. So I would encourage folks to take the time to both network with their peers and also to get out there and say hi to conference speakers and folks like us. Because if we don't hear feedback from folks,
Starting point is 01:02:49 if we don't hear what people are really doing with technology, then we lose that grounding and connection that helps us to be effective and to kind of keep moving the technology industry along. Awesome. Well said.
Starting point is 01:03:01 Steven, thanks so much for doing this. It was great to see you. And I will see you probably in, I guess, a few weeks. Awesome. Thanks a lot, Sean. Yep. Cheers.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.