The Good Tech Companies - Using MinIO to Build a Retrieval Augmented Generation Chat Application
Episode Date: September 18, 2024This story was originally published on HackerNoon at: https://hackernoon.com/using-minio-to-build-a-retrieval-augmented-generation-chat-application. Building a productio...n-grade RAG application demands a suitable data infrastructure to store, version, process, evaluate, and query chunks of data. Check more stories related to machine-learning at: https://hackernoon.com/c/machine-learning. You can also check exclusive content about #retrieval-augmented-generation, #minio, #minio-blog, #rag, #modern-datalake, #data-science, #llms, #good-company, and more. This story was written by: @minio. Learn more about this writer by checking @minio's about page, and for more stories, please visit hackernoon.com. Building a production-grade RAG application demands a suitable data infrastructure to store, version, process, evaluate, and query chunks of data that comprise your proprietary corpus.
Transcript
Discussion (0)
This audio is presented by Hacker Noon, where anyone can learn anything about any technology.
Using Minio to build a retrieval augmented generation chat application
by MinIO. It's often been said that in the age of AI, data is your mode. To that end,
building a production-grade RAG application demands a suitable data infrastructure to store,
version, process, evaluate, and query chunks of data that comprise your proprietary corpus.
Since Minio takes a data-first approach to AI, our default initial infrastructure recommendation
for a project of this type is to set up a modern data lake, Minio, and a vector database.
While Otharin's Solari tools may need to be plugged in along the way,
these two infrastructure units are foundational. They will serve as the center
of gravity for nearly all tasks subsequently encountered in getting your RAG application
into production. But you are in a conundrum. You've heard of these terms LLM and RAG before
but beyond that you haven't ventured much because of the unknown. But wouldn't it be nice if there
was a Hello World or Boilerplate app that can help you get started, don't worry, I was in the same boat.
So in this blog, we will demonstrate how Taos Minio to build a retrieval augmented generation,
RAG-based chat application using commodity hardware. Use Minio to store all the documents,
process chunks and the embeddings using the vector database.
Backslash dot, use Minio's bucket notification feature to trigger events when adding or
removing documents to a bucket. Webhook that consumes the event and process the documents
using Lanckchain and saves the metadata and chunk documents to a metadata bucket.
Trigger Minio bucket notification events for newly added or removed chunk documents.
A webhook that consumes the events and generates embeddings and save it to the vector database.
LanceDB.
That is persisted in Minio.
Key tools used.
Minio.
Object store to persist all the data.
LanceDB.
Serverless open source vector database that persists data in object store.
Alama.
To run LLM and embedding model locally.
OpenAI API compatible.
Gradio. Interface through which to interact with RAG application. FastAPI, server for the webhooks that receives bucket notification
from Minio and exposes the Gradio app. Langchain and unstructured, to extract useful text from our
documents and chunk them for embedding. Models used. LLM, PHY 3-128K. 3. 8B Parameters. Embeddings. NOMIC Embed Text V1. 5. Matryoshka Embeddings. 768 DIMM. 8K Context. Here, start Ollama server plus download LLM and embedding model download Ollama from here.
Create a basic GRADIO app using FastAPI to test the model, test embedding model,
ingestion pipeline OVERVIEWCREATE MINIO buckets use MCC command or do it from UI custom corpus.
To store all the documents. Warehouse. To store all the metadata, chunks and vector embeddings. Create webhook that
consumes bucket notifications from custom corpus bucket. Create minio event notifications and link
it to custom corpus bucket. Create webhook event in console. Go to events to add event destination
to webhook. Fill the fields with following values and hit save identifier. Doc webhook endpoint. http://localhostport8808.api.v1.document.notification.
Click restart minio at the top when prompted to. Note. You can also use mic for this.
Link the webhook event to custom corpus bucket events in console. Go to buckets.
Administrator. To custom corpus to events fill the fields with
following values and hit save arn select the doc webhook from drop down select events check put
and delete node you can also use mic for this we have our first webhook setup now test by adding
and removing an objectextract data from the documents and c h u n k w e will use langchain and unstructured to
read an object from minio and split documents into multiples chunks add the chunking logic
to webhook add the chunk logic to webhook and save the metadata and chunks to warehouse bucket
update fast api server with the new logic add new webhook to process document metadata chunks now
that we have the first webhook working next step is the get all
the chunks with metadata generate the embeddings and store it in the vector database create minio
event notifications and link it to warehouse bucket. Create webhook event and console go to
events to add event destination to webhook endpoint, http://localhostport8808.api.v1.metadata.notification.
Click restart Minio at the top when prompted to. Note. You can also use MIC for this.
Link the webhook event to custom corpus bucket events in console. Go to buckets,
administrator, to warehouse to events. Fill the fields with following values and hit save ARN.
Select the metadata webhook from drop-down prefix, metadata, suffix.
JS on select events, check put and delete, note.
You can also use MIC for this.
We have our first webhook setup now test by adding and removing an object in custom corpus and see if this webhook gets trig red create lance db vector database in minio. Now that we have the basic webhook working, let's set up the
lance db vector databs in minio warehouse bucket in which we will save all the embeddings and
additional metadata fields, add storing, removing data from lance db to metadata webhook. Add a
scheduler that processes data from queues. Update FastAPI with the vector
embedding changes. Now that we have the ingestion pipeline working let's integrate the final RAG
pipeline. Add vector search capability. Now that we have the document ingested into the LANsDB
let's add the search capability. Prompt LLM to use the relevant documents. Update FastAPI chat
endpoint to use RAG. Were you able to go through and
implement RAG-based chat with Minio as the data lake backend? We will in the near future do a
webinar on this same topic where we will give you a live demo as we build this RAG-based chat
application. Rags are us. As a developer focused on AI integration at Minio, I am constantly
exploring how our tools can be seamlessly integrated into modern AI architectures to enhance efficiency and scalability. In this article, we showed you
how to integrate MinIO with Retrieval Augmented Generation, RAG, to build a chat application.
This is just the tip of the iceberg, to give you a boost in your quest to build more unique
used cases for RAG and Minio. Now you have the building blocks to do it.
Let's do it. If you have any questions on Minio RAG integration be sure to reach out to us in
Slack. Thank you for listening to this HackerNoon story, read by Artificial Intelligence.
Visit HackerNoon.com to read, write, learn and publish.