The Good Tech Companies - Mastering LLM Knowledge Graphs: Build and Implement GraphRAG in Just 5 Minutes
Episode Date: October 18, 2024This story was originally published on HackerNoon at: https://hackernoon.com/mastering-llm-knowledge-graphs-build-and-implement-graphrag-in-just-5-minutes. Extract and u...se knowledge graphs in your GenAI applications with the LLM Knowledge Graph Builder. Check more stories related to machine-learning at: https://hackernoon.com/c/machine-learning. You can also check exclusive content about #llm-knowledge-graphs, #neo4j, #graph-rag, #ml-models, #llm-knowledge-graph-builder, #retrieval-augmented-generation, #unstructured-data-processing, #good-company, and more. This story was written by: @neo4j. Learn more about this writer by checking @neo4j's about page, and for more stories, please visit hackernoon.com. The Neo4j LLM Knowledge Graph Builder is an innovative application for turning unstructured text into a knowledge graph. It uses ML models (LLMs: OpenAI, Gemini, Diffbot) to transform PDFs, web pages, and YouTube videos. This capability is particularly exciting as it allows for intuitive interaction with the data, akin to having a conversation with the knowledge graph itself.
Transcript
Discussion (0)
This audio is presented by Hacker Noon, where anyone can learn anything about any technology.
Mastering LLM Knowledge Graphs, Build and Implement GraphRag in Just 5 Minutes, by Neo4j.
The LLM Knowledge Graph Builder is one of Neo4j's GraphRag ecosystem tools that empowers you to
transform unstructured data into dynamic knowledge graphs. It is integrated with a retrieval augmented
generation, RAG, chatbot,
enabling natural language querying and explainable insights into your data.
Get started with GraphRAG. Neo4j's ecosystem tools
What is the Neo4j LLM Knowledge Graph Builder? The Neo4j LLM Knowledge Graph Builder is an
innovative online application for turning unstructured text into a knowledge graph
with no code and no cipher, providing a magical text-to-graph experience. It uses ML models, LLMs,
OpenAI, Gemini, DiffBot, to transform PDFs, webpages, and YouTube videos into a knowledge
graph of entities and their relationships. The front-end is a React application based on our Needle Starter Kit, and the back end is a Python FastAPI application. It uses the LLM Graph Transformer module that
Neo4j contributed to Langchain. The application provides a seamless experience following four
simple steps. 1. Data ingestion supports various data sources, including PDF documents, Wikipedia pages, YouTube videos,
and more. 2. Entity Recognition
uses LLMs to identify and extract entities and relationships from unstructured text.
3. Graph Construction converts recognized entities and relationships into a graph format
using Neo4j graph capabilities. 4. User Interface. Provides an intuitive web interface for users to
interact with the application, facilitating the upload of data sources, visualization of the
generated graph, and interaction with a RAG agent. This capability is particularly exciting as it
allows for intuitive interaction with the data, akin to having a conversation with the knowledge
graph itself. No technical knowledge is required. Let's try it out. We provide the application on our Neo4j hosted
environment with no credit cards required and no LLM keys, friction-free. Alternatively,
to run it locally or within your environment, visit the public GitHub repo and follow the
step-by-step instructions we will cover in this post. Before we open and use the
LLM Knowledge Graph Builder, let's create a new Neo4j database. For that, we can use a free AuraDB
database by following these steps login or create an account at https colon slash slash console.
Neo4j.io. Under instances, create a AuraDB-free database. Download the credentials file. Wait
until the instance is running. Now that we have our Neo4j database running in our credentials,
we can open the LLM Knowledge Graph Builder and click Connect to Neo4j in the top right corner.
Drop the previously downloaded credentials file on the connection dialog. All the information
should be automatically filled. Alternatively, you can enter everything manually. Creating the knowledge graph,
the process begins with the ingestion of your unstructured data, which is then passed through
the LLM to identify key entities and their relationships. You can drag and drop PDFs
and other files into the first input zone on the left. The second input will let you copy,
paste the link to a YouTube
video you want to use, while the third input takes a Wikipedia page link. For this example,
I will load a few PDFs I have about a supply chain company called Graph ACME, a press article from
Forbes, and a YouTube video about the Corporate Sustainability Due Diligence Directive, CSDDD,
as well as two pages from Wikipedia, Corporate Sustainability
Due Diligence Directive in Bangladesh. While uploading the files, the application will store
the uploaded sources as document nodes in the graph using langchain document loaders and YouTube
parsers. Once all files have been uploaded, you should see something similar to this. All we need
to do now is select the model to use, click generate graph, and let the magic do the rest for you. If you only want to generate a
file selection, you can select the files first with the checkbox in the first column of the table
and click generate graph. Info warning note that if you want to use a pre-defined or your own graph
schema, you can click on the setting icon in the top right corner and select Opera Defined Schema from the drop-down. Use your own by writing down the nodalabels and
relationships, pull the existing schema from an existing Neo4j database, or copy, paste text and
ask the LLM to analyze it and come up with a suggested schema. While it is processing your
files and creating your knowledge graph, let me summarize what is happening under the hood one.
The content is split into chunks.
2. Chunks are stored in the graph and connected to the document node and to each other for
advanced RAG patterns.
3. Highly similar chunks are connected with a similar relationship to form a
K-nearest neighbors graph.
4. Embeddings are computed and stored in the chunks and vector index.
5. Using the LLM Graph Transformer or DiffBot Graph Transformer,
entities and relationships are extracted from the text 6. Entities are stored in the graph and connected to the originating chunks
Explore your knowledge graph
The information extracted from your document is structured into a graph format,
where entities become nodes, and relationships turn into edges connecting these nodes. The beauty of using Neo4j lies
in its ability to efficiently store and query these complex data networks, making the generated
knowledge graph immediately useful for a variety of applications. Before we use the RAG agent to
ask questions about our data, we can select one document, or many, with the checkbox
and click show graph. This will display the entities created for the documents you selected.
You can also display the document and chunks node in that view. The open graph with bloom button
will open Neo4j Bloom to help you visualize and navigate your newly created knowledge graph.
The next action, delete files, deletes the selected documents and chunks from
the graph and entities if you selected in the options. Talk to your knowledge. Now comes the
last part. The RAG agent you can see in the right panel. The retrieval process. How does it work?
The image below shows a simplified view of the graph RAG process. When the user asks a question,
we use the Neo4j vector index with a retrieval query
to find the most relevant chunks for the question and their connected entities up to a depth of two
hops. We also summarize the chat history and use it as an element to enrich the context.
The various inputs and sources, the question, vector results, chat history,
Arial sent to the selected LLM model in a custom prompt, asking to provide and
format a response to the question asked based on the elements and context provided. Of course,
the prompt has more magic, such as formatting, asking to cite sources, to not speculate if an
answer is not known, etc. The full prompt and instructions can be found as final underscore
prompt in QA underscore integration. P.Y. Ask questions
related to your data in this example. I loaded internal documents about a fake company named
Graph ACME, based in Europe, producing and documenting their whole supply chain strategy
and products. I also loaded a press article and YouTube video explaining the new CSDDD,
its impact, and regulation. We can now ask the
chatbot questions about our internal, fake, company knowledge, questions about the CSDDD law,
or even questions across both, such as asking for the list of products Graph ACME produces,
if they will be affected by the CSDDD regulation, and if so, how it will impact the company.
Chat features on the right side of the home screen.
You will notice three buttons attached to the chat window close will close the chatbot interface.
Clear chat history will delete the current session's chat history. Maximize window will open the chatbot interface in a full screen mode. On the RAG agent's answers, you will find three
features after the response details will open a retrieval information pop-up showing how the RAG agent collected and used sources, documents, chunks, and entities.
Information about the model used and the token consumption is also included.
Copy will copy the content of the response to the clipboard.
Text-to-speech will read the response content aloud.
Wrap-up
To dive deeper into the LLM Knowledge Graph Builder,
the GitHub repository offers a
wealth of information, including source code and documentation. Additionally, our documentation
provides detailed guidance on getting started, and GenAI ecosystem offers further insights into
the broader tools and applications available. What's next? Contributing and extension capabilities.
Your experience with the LLM Knowledge Graph Builder is invaluable.
If you encounter bugs, have suggestions for new features, want to contribute, or wish
to see certain enhancements, the Community Platform is the perfect place to share your
thoughts.
For those adept in coding, contributing directly on GitHub can be a rewarding way to help evolve
the project.
Your input and contributions not only
help improve the tool but also foster a collaborative and innovative community,
Neo4j online community. Join the Neo4j Discord server.
Resources. Learn more about new resources for Gen AI applications, Neo4j GraphRag ecosystem tools.
These open-source tools make it easy to get started with Gen AI applications
grounded with knowledge graphs, which help improve response quality and explainability
and accelerate app development and adoption. Video, HTTPS colon slash slash U2. B, Z42VVH9QNGO,
C equals LFU3IE5ZS9WOQ8h9 and embeddable equals true links get started with graph rag
neo4j's ecosystem tools github neo4j labs llm graph builder neo4j graph construction from
unstructured data neo4j llm knowledge graph builder extract nodes and relationships from JLLM Knowledge Graph Builder, Extract Nodes and Relationships from Unstructured Text, PDF,
YouTube, Webpages, Neo4j Labs, Gen AI Ecosystem, Neo4j Labs, Needle Starter Kit 2,
Xero, Templates, Chatbot, and more. Thank you for listening to this HackerNoon story,
read by Artificial Intelligence. Visit HackerNoon.com to read, write, learn and publish.