The Good Tech Companies - Let's build a customer support chatbot using RAG and your company's documentation in OpenWebUI
Episode Date: July 9, 2024This story was originally published on HackerNoon at: https://hackernoon.com/lets-build-a-customer-support-chatbot-using-rag-and-your-companys-documentation-in-openwebui. ... We'll share our journey of creating a technical support chatbot designed to assist our front-line team by answering user questions. Check more stories related to machine-learning at: https://hackernoon.com/c/machine-learning. You can also check exclusive content about #ai, #openwebui, #ai-tools, #chatbots, #technical-support-chatbot, #gpu-servers, #chatbot-development, #good-company, and more. This story was written by: @hostkey. Learn more about this writer by checking @hostkey's about page, and for more stories, please visit hackernoon.com. OpenWebUI offers a unique opportunity to build fascinating and useful chatbots even without extensive coding experience. In this article, we'll share our journey creating a technical support chatbot designed to assist our front-line team by answering user questions. Here's a step-by-step summary of its actions.
Transcript
Discussion (0)
This audio is presented by Hacker Noon, where anyone can learn anything about any technology.
Let's build a customer support chatbot using RAG and your company's documentation in OpenWeb UI
by Hostki.com. OpenWeb UI is a comprehensive media platform featuring a suite of AI tools,
OpenAI, Ollama, Automatic 1111, Comfy UI, Whisper API, Custom Model Training, LanqChain-based RAG with ChromaDB,
Hybrid BM25, WebSearch, and more. While all of this has been available for some time,
documented, and implementable with Python programming knowledge, OpenWeb UI offers a
unique opportunity to build fascinating and useful chatbots even without extensive coding experience.
In this article, we'll share our
journey of creating a technical support chatbot designed to assist our frontline team by answering
user questions and eventually becoming a part of our team itself. Greater than rent GPU servers
with instant deployment or a server with a custom greater than configuration with professional-grade NVIDIA Tesla A100H 180GB or A5000, greater than A4000 cards.
GPU servers with Gaming RTX 4090 cards are also available.
Starting point, we have user documentation built using material for MKDocs.
This results in a directory structure containing MD files with markdown formatting.
We also have a deployed open web UI and Ollama
setup with the Lama 3 to 8B instruct model loaded. Project goals 1. Develop a custom chatbot.
This chatbot will interact with users and provide information based on our documentation.
2. Convert documentation into a format suitable for LLMs. We need to transform our markdown
documentation into a format that can be efficiently processed by LLMs. We need to transform our markdown documentation into a
format that can be efficiently processed by LLMs for retrieval augmented generation, RAG.
3. Enable data updates and additions. The system should allow for ongoing updates and additions to
the vector database containing our documentation. 4. Focus on question answering. The chatbot
should primarily function as a question answering system
and avoid engaging in non-it related conversations. 5. Provide source links. Whenever possible,
the chatbot should link back to the original documentation sources for the information
provided. 6. Implement question filtering. We need the ability to configure question
restrictions for the chatbot. For example, we might want to prevent it from answering questions based on geographical
location. Naive implementation. Our initial attempt was to simply load our existing
documentation in its original Markdown format and use the LAMA3 model without any modifications.
Their results, to put it mildly, were disappointing. First, our Markdown files
contain various elements like
image tags, footnotes, code blocks, bold and italic formatting, internal and external links,
icons, and even asterisk asterisk constructions for buttons. All of this extra noise creates
problems when breaking the documentation into chunks for processing. Second, the sentence transformers, all mini LML 6v2 model,
which open web UI USESBY default for representing sentences and paragraphs in a 384-dimensional
vector space, essential for RAG tasks like clustering and semantic search, is primarily
trained on English. We'd like our bot to eventually support other languages as well. Third, while
Lama 3 is an instruct model, it can still be steered into off-topic discussions rather than
focusing on answering user queries. A 70B model might be more suitable, but it requires a GPU
with 40GB of video memory, whereas Lama 3 to 8B can run on a GPU with just 8GB.
While the third issue could potentially be addressed
by creating a custom model, agent and OpenAI terminology, the first two require more
significant workarounds. Here's what we've come up with so far. Step by step. Setting up technical
support chatbot in OpenWeb UI. First, we'll convert the documentation into a format suitable for loading into our RAG
system.
We've created a powerful bash script called i__text__generator to automate this process.
The script traverses all documentation directories and uses regular expressions within sed, oc,
and perl to remove and replace markdown markup that's not needed by RAG. Finally, it adds a link to the original documentation hosted a thtps://hostg.com
documentation at the end of each document. This script meticulously prepares your documentation
for use with a RAG system in Open Web UI. Here's a step-by-step summary of its actions
URL generation. It generates a complete URL for each documentation file.
Image markup removal.
Removes all markdown markup related to images.
Annotation deletion.
Strips out all annotations from the text.
Button formatting.
Transforms markdowns asterisk asterisk and asterisk asterisk syntax into
effectively formatting them as buttons.
Heading removal. Deletes lines that begin with greater than, which are likely used for creating
an outline or table of contents. Icon removal. Removes any markdown markup or code that represents
icons. Bold text formatting. Removes markdowns bold text formatting. Link modification. Deletes internal links while preserving external links.
Email link formatting.
Reformats links to email addresses.
Whitespace normalization.
Removes extra spaces at the beginning of each line until the first character.
Line ending conversion.
Converts CRLF, Windows line endings, to Unix format, LF.
Empty line reduction. Eliminates consecutive
empty lines exceeding one. URL appending. Appends the generated URL to the end of each processed
file. After running the script, the i underscore data directory will contain a set of files ready
for loading into OpenWebUI's RAG system. Next, we need to add a new model to OpenWebUI for working
with our Document Vector Database and the Olamlm. This model should support a more casual,
U-Tai tone, not just in English. We're planning to add support for other languages like Turkish
in the future. 1. To get started, we'll go to the Admin Panel, Settings, Documents. In the Embedding Model field, we'll select Sentence Transformers, all mini LML 12v2.
We've tested all the recommended models from this list, https colon slash slash www.
Espert, Net, Docs, Sentence underscore Transformer,
Pre-trained underscore Models, HTML, and found this one to be the best fit.
2. We'll click the download icon next to the embedding model field to download and install it.
3. Right away, we'll set up the RAG parameters. Top K equals 10. This means the system will
consider the top 10 most relevant documents when generating a response. Chunk size equals 1024.
Documents will be broken down into chunks
of 1024 tokens for processing. Chunk overlap equals 100. There will be a 100 token overlap
between consecutive chunks. After that, you can head to the workspace, documents section and
upload our documentation. It's a good idea to give it a specific collection tag, in our case, hostkey underscore n, to make it easier to connect to the model or API requests later on.
Next, we'll create a custom model for our chatbot. To do this, we'll go back to workspace,
models and click the plus icon. We'll give our chatbot a name and select the base model,
in our case, llama3 latest. Then, we'll define the system prompt.
This is what tells the chatbot how to see itself and behave. It outlines its role,
limitations, and our desired outcomes. Here's the system prompt we've designed for our tech
support chatbot. Next, we'll connect the necessary document collection. In the knowledge section,
we'll click the Select Documents button and choose the collection we need based in its tag. Greater than rent GPU servers with instant deployment or a server
with a custom greater than configuration with professional grade Nvidia Tesla A100
H180 gigabits or A5000 greater than A4000 cards. GPU servers with gaming RTX 4090 cards are also
available.
We also need to configure some additional parameters hidden under the Advanced Params
tab.
Clicking Show will reveal these settings.
We'll set Temperature to 0, 3 and Context Length to 4089.
Finally, we click Save and Update to create our custom tech support chatbot model.
And there you have it!
Our chatbot is ready to work and handle user
requests. It's polite, patient, and available 24-7. Tips for working with RAG in OpenWeb UI.
Here are some important tips to keep in mind. 1. If you're working with a large number of
documents in RAG, it's highly recommended to install OpenWeb UI with GPU support,
branch OpenWebUI, CUDA. 2. Any modifications to the embedding model,
switching, loading, etc. will require you to re-index your documents into the vector database.
Changing RAG parameters doesn't necessitate this. 3. When adding or removing documents,
always go into your custom model, delete the collection of those documents, and add them back
in. Otherwise,
your search may not work correctly or will be significantly less effective.
If your bot is providing incorrect answers but the documentation with the necessary information, XLSX, PPT, PPT, TXT.
It's best practice to upload documents as plain text for optimal performance.
5. While hybrid search can improve results, it's resource-intensive and can significantly
increase response times, 20-30-40 seconds or more, even on a powerful GPU.
This is a known issue with the developers working on a solution.
Now that we've tested the chatbot, the next step is integrating it into our company's
existing chat system.
OpenWebUI offers an API and can function as a proxy to Alama, adding its own unique features.
However, the documentation is still lacking, making integration a bit of a challenge.
By examining the code and commit history, we've gleaned some insights into how to structure API
requests, but it's not quite working as expected yet. We've managed to call the custom model,
but without RAG functionality. We're eagerly awaiting the developers' promised features in upcoming releases,
including RAG, web search, and detailed examples and descriptions.
The testing process also revealed some inconsistencies and redundancies in our documentation.
This presents an opportunity to both enhance the chatbot's performance and improve the overall
clarity and accuracy of our documentation. Greater than rent
GPU servers with instant deployment or a server with a custom greater than configuration with
professional-grade NVIDIA Tesla A100H 180GB or A5000 greater than A4000 cards. GPU servers with
Gaming RTX 4090 cards are also available. Thank you for listening to this HackerNoon story,
read by Artificial Intelligence. Visit HackerNoon.com to read, write, learn and publish.