The Good Tech Companies - Mastering LLM Knowledge Graphs: Build and Implement GraphRAG in Just 5 Minutes

Episode Date: October 18, 2024

This story was originally published on HackerNoon at: https://hackernoon.com/mastering-llm-knowledge-graphs-build-and-implement-graphrag-in-just-5-minutes. Extract and u...se knowledge graphs in your GenAI applications with the LLM Knowledge Graph Builder. Check more stories related to machine-learning at: https://hackernoon.com/c/machine-learning. You can also check exclusive content about #llm-knowledge-graphs, #neo4j, #graph-rag, #ml-models, #llm-knowledge-graph-builder, #retrieval-augmented-generation, #unstructured-data-processing, #good-company, and more. This story was written by: @neo4j. Learn more about this writer by checking @neo4j's about page, and for more stories, please visit hackernoon.com. The Neo4j LLM Knowledge Graph Builder is an innovative application for turning unstructured text into a knowledge graph. It uses ML models (LLMs: OpenAI, Gemini, Diffbot) to transform PDFs, web pages, and YouTube videos. This capability is particularly exciting as it allows for intuitive interaction with the data, akin to having a conversation with the knowledge graph itself.

Transcript
Discussion (0)
Starting point is 00:00:00 This audio is presented by Hacker Noon, where anyone can learn anything about any technology. Mastering LLM Knowledge Graphs, Build and Implement GraphRag in Just 5 Minutes, by Neo4j. The LLM Knowledge Graph Builder is one of Neo4j's GraphRag ecosystem tools that empowers you to transform unstructured data into dynamic knowledge graphs. It is integrated with a retrieval augmented generation, RAG, chatbot, enabling natural language querying and explainable insights into your data. Get started with GraphRAG. Neo4j's ecosystem tools What is the Neo4j LLM Knowledge Graph Builder? The Neo4j LLM Knowledge Graph Builder is an
Starting point is 00:00:40 innovative online application for turning unstructured text into a knowledge graph with no code and no cipher, providing a magical text-to-graph experience. It uses ML models, LLMs, OpenAI, Gemini, DiffBot, to transform PDFs, webpages, and YouTube videos into a knowledge graph of entities and their relationships. The front-end is a React application based on our Needle Starter Kit, and the back end is a Python FastAPI application. It uses the LLM Graph Transformer module that Neo4j contributed to Langchain. The application provides a seamless experience following four simple steps. 1. Data ingestion supports various data sources, including PDF documents, Wikipedia pages, YouTube videos, and more. 2. Entity Recognition uses LLMs to identify and extract entities and relationships from unstructured text.
Starting point is 00:01:34 3. Graph Construction converts recognized entities and relationships into a graph format using Neo4j graph capabilities. 4. User Interface. Provides an intuitive web interface for users to interact with the application, facilitating the upload of data sources, visualization of the generated graph, and interaction with a RAG agent. This capability is particularly exciting as it allows for intuitive interaction with the data, akin to having a conversation with the knowledge graph itself. No technical knowledge is required. Let's try it out. We provide the application on our Neo4j hosted environment with no credit cards required and no LLM keys, friction-free. Alternatively, to run it locally or within your environment, visit the public GitHub repo and follow the
Starting point is 00:02:20 step-by-step instructions we will cover in this post. Before we open and use the LLM Knowledge Graph Builder, let's create a new Neo4j database. For that, we can use a free AuraDB database by following these steps login or create an account at https colon slash slash console. Neo4j.io. Under instances, create a AuraDB-free database. Download the credentials file. Wait until the instance is running. Now that we have our Neo4j database running in our credentials, we can open the LLM Knowledge Graph Builder and click Connect to Neo4j in the top right corner. Drop the previously downloaded credentials file on the connection dialog. All the information should be automatically filled. Alternatively, you can enter everything manually. Creating the knowledge graph,
Starting point is 00:03:09 the process begins with the ingestion of your unstructured data, which is then passed through the LLM to identify key entities and their relationships. You can drag and drop PDFs and other files into the first input zone on the left. The second input will let you copy, paste the link to a YouTube video you want to use, while the third input takes a Wikipedia page link. For this example, I will load a few PDFs I have about a supply chain company called Graph ACME, a press article from Forbes, and a YouTube video about the Corporate Sustainability Due Diligence Directive, CSDDD, as well as two pages from Wikipedia, Corporate Sustainability
Starting point is 00:03:46 Due Diligence Directive in Bangladesh. While uploading the files, the application will store the uploaded sources as document nodes in the graph using langchain document loaders and YouTube parsers. Once all files have been uploaded, you should see something similar to this. All we need to do now is select the model to use, click generate graph, and let the magic do the rest for you. If you only want to generate a file selection, you can select the files first with the checkbox in the first column of the table and click generate graph. Info warning note that if you want to use a pre-defined or your own graph schema, you can click on the setting icon in the top right corner and select Opera Defined Schema from the drop-down. Use your own by writing down the nodalabels and relationships, pull the existing schema from an existing Neo4j database, or copy, paste text and
Starting point is 00:04:36 ask the LLM to analyze it and come up with a suggested schema. While it is processing your files and creating your knowledge graph, let me summarize what is happening under the hood one. The content is split into chunks. 2. Chunks are stored in the graph and connected to the document node and to each other for advanced RAG patterns. 3. Highly similar chunks are connected with a similar relationship to form a K-nearest neighbors graph. 4. Embeddings are computed and stored in the chunks and vector index.
Starting point is 00:05:09 5. Using the LLM Graph Transformer or DiffBot Graph Transformer, entities and relationships are extracted from the text 6. Entities are stored in the graph and connected to the originating chunks Explore your knowledge graph The information extracted from your document is structured into a graph format, where entities become nodes, and relationships turn into edges connecting these nodes. The beauty of using Neo4j lies in its ability to efficiently store and query these complex data networks, making the generated knowledge graph immediately useful for a variety of applications. Before we use the RAG agent to ask questions about our data, we can select one document, or many, with the checkbox
Starting point is 00:05:45 and click show graph. This will display the entities created for the documents you selected. You can also display the document and chunks node in that view. The open graph with bloom button will open Neo4j Bloom to help you visualize and navigate your newly created knowledge graph. The next action, delete files, deletes the selected documents and chunks from the graph and entities if you selected in the options. Talk to your knowledge. Now comes the last part. The RAG agent you can see in the right panel. The retrieval process. How does it work? The image below shows a simplified view of the graph RAG process. When the user asks a question, we use the Neo4j vector index with a retrieval query
Starting point is 00:06:26 to find the most relevant chunks for the question and their connected entities up to a depth of two hops. We also summarize the chat history and use it as an element to enrich the context. The various inputs and sources, the question, vector results, chat history, Arial sent to the selected LLM model in a custom prompt, asking to provide and format a response to the question asked based on the elements and context provided. Of course, the prompt has more magic, such as formatting, asking to cite sources, to not speculate if an answer is not known, etc. The full prompt and instructions can be found as final underscore prompt in QA underscore integration. P.Y. Ask questions
Starting point is 00:07:06 related to your data in this example. I loaded internal documents about a fake company named Graph ACME, based in Europe, producing and documenting their whole supply chain strategy and products. I also loaded a press article and YouTube video explaining the new CSDDD, its impact, and regulation. We can now ask the chatbot questions about our internal, fake, company knowledge, questions about the CSDDD law, or even questions across both, such as asking for the list of products Graph ACME produces, if they will be affected by the CSDDD regulation, and if so, how it will impact the company. Chat features on the right side of the home screen.
Starting point is 00:07:49 You will notice three buttons attached to the chat window close will close the chatbot interface. Clear chat history will delete the current session's chat history. Maximize window will open the chatbot interface in a full screen mode. On the RAG agent's answers, you will find three features after the response details will open a retrieval information pop-up showing how the RAG agent collected and used sources, documents, chunks, and entities. Information about the model used and the token consumption is also included. Copy will copy the content of the response to the clipboard. Text-to-speech will read the response content aloud. Wrap-up To dive deeper into the LLM Knowledge Graph Builder,
Starting point is 00:08:24 the GitHub repository offers a wealth of information, including source code and documentation. Additionally, our documentation provides detailed guidance on getting started, and GenAI ecosystem offers further insights into the broader tools and applications available. What's next? Contributing and extension capabilities. Your experience with the LLM Knowledge Graph Builder is invaluable. If you encounter bugs, have suggestions for new features, want to contribute, or wish to see certain enhancements, the Community Platform is the perfect place to share your thoughts.
Starting point is 00:08:57 For those adept in coding, contributing directly on GitHub can be a rewarding way to help evolve the project. Your input and contributions not only help improve the tool but also foster a collaborative and innovative community, Neo4j online community. Join the Neo4j Discord server. Resources. Learn more about new resources for Gen AI applications, Neo4j GraphRag ecosystem tools. These open-source tools make it easy to get started with Gen AI applications grounded with knowledge graphs, which help improve response quality and explainability
Starting point is 00:09:30 and accelerate app development and adoption. Video, HTTPS colon slash slash U2. B, Z42VVH9QNGO, C equals LFU3IE5ZS9WOQ8h9 and embeddable equals true links get started with graph rag neo4j's ecosystem tools github neo4j labs llm graph builder neo4j graph construction from unstructured data neo4j llm knowledge graph builder extract nodes and relationships from JLLM Knowledge Graph Builder, Extract Nodes and Relationships from Unstructured Text, PDF, YouTube, Webpages, Neo4j Labs, Gen AI Ecosystem, Neo4j Labs, Needle Starter Kit 2, Xero, Templates, Chatbot, and more. Thank you for listening to this HackerNoon story, read by Artificial Intelligence. Visit HackerNoon.com to read, write, learn and publish.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.