The Good Tech Companies - Using MinIO to Build a Retrieval Augmented Generation Chat Application

Episode Date: September 18, 2024

This story was originally published on HackerNoon at: https://hackernoon.com/using-minio-to-build-a-retrieval-augmented-generation-chat-application. Building a productio...n-grade RAG application demands a suitable data infrastructure to store, version, process, evaluate, and query chunks of data. Check more stories related to machine-learning at: https://hackernoon.com/c/machine-learning. You can also check exclusive content about #retrieval-augmented-generation, #minio, #minio-blog, #rag, #modern-datalake, #data-science, #llms, #good-company, and more. This story was written by: @minio. Learn more about this writer by checking @minio's about page, and for more stories, please visit hackernoon.com. Building a production-grade RAG application demands a suitable data infrastructure to store, version, process, evaluate, and query chunks of data that comprise your proprietary corpus.

Transcript
Discussion (0)
Starting point is 00:00:00 This audio is presented by Hacker Noon, where anyone can learn anything about any technology. Using Minio to build a retrieval augmented generation chat application by MinIO. It's often been said that in the age of AI, data is your mode. To that end, building a production-grade RAG application demands a suitable data infrastructure to store, version, process, evaluate, and query chunks of data that comprise your proprietary corpus. Since Minio takes a data-first approach to AI, our default initial infrastructure recommendation for a project of this type is to set up a modern data lake, Minio, and a vector database. While Otharin's Solari tools may need to be plugged in along the way,
Starting point is 00:00:41 these two infrastructure units are foundational. They will serve as the center of gravity for nearly all tasks subsequently encountered in getting your RAG application into production. But you are in a conundrum. You've heard of these terms LLM and RAG before but beyond that you haven't ventured much because of the unknown. But wouldn't it be nice if there was a Hello World or Boilerplate app that can help you get started, don't worry, I was in the same boat. So in this blog, we will demonstrate how Taos Minio to build a retrieval augmented generation, RAG-based chat application using commodity hardware. Use Minio to store all the documents, process chunks and the embeddings using the vector database.
Starting point is 00:01:20 Backslash dot, use Minio's bucket notification feature to trigger events when adding or removing documents to a bucket. Webhook that consumes the event and process the documents using Lanckchain and saves the metadata and chunk documents to a metadata bucket. Trigger Minio bucket notification events for newly added or removed chunk documents. A webhook that consumes the events and generates embeddings and save it to the vector database. LanceDB. That is persisted in Minio. Key tools used.
Starting point is 00:01:50 Minio. Object store to persist all the data. LanceDB. Serverless open source vector database that persists data in object store. Alama. To run LLM and embedding model locally. OpenAI API compatible. Gradio. Interface through which to interact with RAG application. FastAPI, server for the webhooks that receives bucket notification
Starting point is 00:02:12 from Minio and exposes the Gradio app. Langchain and unstructured, to extract useful text from our documents and chunk them for embedding. Models used. LLM, PHY 3-128K. 3. 8B Parameters. Embeddings. NOMIC Embed Text V1. 5. Matryoshka Embeddings. 768 DIMM. 8K Context. Here, start Ollama server plus download LLM and embedding model download Ollama from here. Create a basic GRADIO app using FastAPI to test the model, test embedding model, ingestion pipeline OVERVIEWCREATE MINIO buckets use MCC command or do it from UI custom corpus. To store all the documents. Warehouse. To store all the metadata, chunks and vector embeddings. Create webhook that consumes bucket notifications from custom corpus bucket. Create minio event notifications and link it to custom corpus bucket. Create webhook event in console. Go to events to add event destination to webhook. Fill the fields with following values and hit save identifier. Doc webhook endpoint. http://localhostport8808.api.v1.document.notification.
Starting point is 00:03:31 Click restart minio at the top when prompted to. Note. You can also use mic for this. Link the webhook event to custom corpus bucket events in console. Go to buckets. Administrator. To custom corpus to events fill the fields with following values and hit save arn select the doc webhook from drop down select events check put and delete node you can also use mic for this we have our first webhook setup now test by adding and removing an objectextract data from the documents and c h u n k w e will use langchain and unstructured to read an object from minio and split documents into multiples chunks add the chunking logic to webhook add the chunk logic to webhook and save the metadata and chunks to warehouse bucket
Starting point is 00:04:15 update fast api server with the new logic add new webhook to process document metadata chunks now that we have the first webhook working next step is the get all the chunks with metadata generate the embeddings and store it in the vector database create minio event notifications and link it to warehouse bucket. Create webhook event and console go to events to add event destination to webhook endpoint, http://localhostport8808.api.v1.metadata.notification. Click restart Minio at the top when prompted to. Note. You can also use MIC for this. Link the webhook event to custom corpus bucket events in console. Go to buckets, administrator, to warehouse to events. Fill the fields with following values and hit save ARN.
Starting point is 00:05:07 Select the metadata webhook from drop-down prefix, metadata, suffix. JS on select events, check put and delete, note. You can also use MIC for this. We have our first webhook setup now test by adding and removing an object in custom corpus and see if this webhook gets trig red create lance db vector database in minio. Now that we have the basic webhook working, let's set up the lance db vector databs in minio warehouse bucket in which we will save all the embeddings and additional metadata fields, add storing, removing data from lance db to metadata webhook. Add a scheduler that processes data from queues. Update FastAPI with the vector embedding changes. Now that we have the ingestion pipeline working let's integrate the final RAG
Starting point is 00:05:51 pipeline. Add vector search capability. Now that we have the document ingested into the LANsDB let's add the search capability. Prompt LLM to use the relevant documents. Update FastAPI chat endpoint to use RAG. Were you able to go through and implement RAG-based chat with Minio as the data lake backend? We will in the near future do a webinar on this same topic where we will give you a live demo as we build this RAG-based chat application. Rags are us. As a developer focused on AI integration at Minio, I am constantly exploring how our tools can be seamlessly integrated into modern AI architectures to enhance efficiency and scalability. In this article, we showed you how to integrate MinIO with Retrieval Augmented Generation, RAG, to build a chat application.
Starting point is 00:06:36 This is just the tip of the iceberg, to give you a boost in your quest to build more unique used cases for RAG and Minio. Now you have the building blocks to do it. Let's do it. If you have any questions on Minio RAG integration be sure to reach out to us in Slack. Thank you for listening to this HackerNoon story, read by Artificial Intelligence. Visit HackerNoon.com to read, write, learn and publish.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.