The Good Tech Companies - Let's build a customer support chatbot using RAG and your company's documentation in OpenWebUI

Starting point is 00:00:00 This audio is presented by Hacker Noon, where anyone can learn anything about any technology. Let's build a customer support chatbot using RAG and your company's documentation in OpenWeb UI by Hostki.com. OpenWeb UI is a comprehensive media platform featuring a suite of AI tools, OpenAI, Ollama, Automatic 1111, Comfy UI, Whisper API, Custom Model Training, LanqChain-based RAG with ChromaDB, Hybrid BM25, WebSearch, and more. While all of this has been available for some time, documented, and implementable with Python programming knowledge, OpenWeb UI offers a unique opportunity to build fascinating and useful chatbots even without extensive coding experience. In this article, we'll share our

Starting point is 00:00:45 journey of creating a technical support chatbot designed to assist our frontline team by answering user questions and eventually becoming a part of our team itself. Greater than rent GPU servers with instant deployment or a server with a custom greater than configuration with professional-grade NVIDIA Tesla A100H 180GB or A5000, greater than A4000 cards. GPU servers with Gaming RTX 4090 cards are also available. Starting point, we have user documentation built using material for MKDocs. This results in a directory structure containing MD files with markdown formatting. We also have a deployed open web UI and Ollama setup with the Lama 3 to 8B instruct model loaded. Project goals 1. Develop a custom chatbot.

Starting point is 00:01:33 This chatbot will interact with users and provide information based on our documentation. 2. Convert documentation into a format suitable for LLMs. We need to transform our markdown documentation into a format that can be efficiently processed by LLMs. We need to transform our markdown documentation into a format that can be efficiently processed by LLMs for retrieval augmented generation, RAG. 3. Enable data updates and additions. The system should allow for ongoing updates and additions to the vector database containing our documentation. 4. Focus on question answering. The chatbot should primarily function as a question answering system and avoid engaging in non-it related conversations. 5. Provide source links. Whenever possible,

Starting point is 00:02:12 the chatbot should link back to the original documentation sources for the information provided. 6. Implement question filtering. We need the ability to configure question restrictions for the chatbot. For example, we might want to prevent it from answering questions based on geographical location. Naive implementation. Our initial attempt was to simply load our existing documentation in its original Markdown format and use the LAMA3 model without any modifications. Their results, to put it mildly, were disappointing. First, our Markdown files contain various elements like image tags, footnotes, code blocks, bold and italic formatting, internal and external links,

Starting point is 00:02:51 icons, and even asterisk asterisk constructions for buttons. All of this extra noise creates problems when breaking the documentation into chunks for processing. Second, the sentence transformers, all mini LML 6v2 model, which open web UI USESBY default for representing sentences and paragraphs in a 384-dimensional vector space, essential for RAG tasks like clustering and semantic search, is primarily trained on English. We'd like our bot to eventually support other languages as well. Third, while Lama 3 is an instruct model, it can still be steered into off-topic discussions rather than focusing on answering user queries. A 70B model might be more suitable, but it requires a GPU with 40GB of video memory, whereas Lama 3 to 8B can run on a GPU with just 8GB.

Starting point is 00:03:44 While the third issue could potentially be addressed by creating a custom model, agent and OpenAI terminology, the first two require more significant workarounds. Here's what we've come up with so far. Step by step. Setting up technical support chatbot in OpenWeb UI. First, we'll convert the documentation into a format suitable for loading into our RAG system. We've created a powerful bash script called i__text__generator to automate this process. The script traverses all documentation directories and uses regular expressions within sed, oc, and perl to remove and replace markdown markup that's not needed by RAG. Finally, it adds a link to the original documentation hosted a thtps://hostg.com

Starting point is 00:04:31 documentation at the end of each document. This script meticulously prepares your documentation for use with a RAG system in Open Web UI. Here's a step-by-step summary of its actions URL generation. It generates a complete URL for each documentation file. Image markup removal. Removes all markdown markup related to images. Annotation deletion. Strips out all annotations from the text. Button formatting.

Starting point is 00:04:58 Transforms markdowns asterisk asterisk and asterisk asterisk syntax into effectively formatting them as buttons. Heading removal. Deletes lines that begin with greater than, which are likely used for creating an outline or table of contents. Icon removal. Removes any markdown markup or code that represents icons. Bold text formatting. Removes markdowns bold text formatting. Link modification. Deletes internal links while preserving external links. Email link formatting. Reformats links to email addresses. Whitespace normalization.

Starting point is 00:05:33 Removes extra spaces at the beginning of each line until the first character. Line ending conversion. Converts CRLF, Windows line endings, to Unix format, LF. Empty line reduction. Eliminates consecutive empty lines exceeding one. URL appending. Appends the generated URL to the end of each processed file. After running the script, the i underscore data directory will contain a set of files ready for loading into OpenWebUI's RAG system. Next, we need to add a new model to OpenWebUI for working with our Document Vector Database and the Olamlm. This model should support a more casual,

Starting point is 00:06:11 U-Tai tone, not just in English. We're planning to add support for other languages like Turkish in the future. 1. To get started, we'll go to the Admin Panel, Settings, Documents. In the Embedding Model field, we'll select Sentence Transformers, all mini LML 12v2. We've tested all the recommended models from this list, https colon slash slash www. Espert, Net, Docs, Sentence underscore Transformer, Pre-trained underscore Models, HTML, and found this one to be the best fit. 2. We'll click the download icon next to the embedding model field to download and install it. 3. Right away, we'll set up the RAG parameters. Top K equals 10. This means the system will consider the top 10 most relevant documents when generating a response. Chunk size equals 1024.

Starting point is 00:07:04 Documents will be broken down into chunks of 1024 tokens for processing. Chunk overlap equals 100. There will be a 100 token overlap between consecutive chunks. After that, you can head to the workspace, documents section and upload our documentation. It's a good idea to give it a specific collection tag, in our case, hostkey underscore n, to make it easier to connect to the model or API requests later on. Next, we'll create a custom model for our chatbot. To do this, we'll go back to workspace, models and click the plus icon. We'll give our chatbot a name and select the base model, in our case, llama3 latest. Then, we'll define the system prompt. This is what tells the chatbot how to see itself and behave. It outlines its role,

Starting point is 00:07:50 limitations, and our desired outcomes. Here's the system prompt we've designed for our tech support chatbot. Next, we'll connect the necessary document collection. In the knowledge section, we'll click the Select Documents button and choose the collection we need based in its tag. Greater than rent GPU servers with instant deployment or a server with a custom greater than configuration with professional grade Nvidia Tesla A100 H180 gigabits or A5000 greater than A4000 cards. GPU servers with gaming RTX 4090 cards are also available. We also need to configure some additional parameters hidden under the Advanced Params tab.

Starting point is 00:08:30 Clicking Show will reveal these settings. We'll set Temperature to 0, 3 and Context Length to 4089. Finally, we click Save and Update to create our custom tech support chatbot model. And there you have it! Our chatbot is ready to work and handle user requests. It's polite, patient, and available 24-7. Tips for working with RAG in OpenWeb UI. Here are some important tips to keep in mind. 1. If you're working with a large number of documents in RAG, it's highly recommended to install OpenWeb UI with GPU support,

Starting point is 00:09:02 branch OpenWebUI, CUDA. 2. Any modifications to the embedding model, switching, loading, etc. will require you to re-index your documents into the vector database. Changing RAG parameters doesn't necessitate this. 3. When adding or removing documents, always go into your custom model, delete the collection of those documents, and add them back in. Otherwise, your search may not work correctly or will be significantly less effective. If your bot is providing incorrect answers but the documentation with the necessary information, XLSX, PPT, PPT, TXT. It's best practice to upload documents as plain text for optimal performance.

Starting point is 00:09:58 5. While hybrid search can improve results, it's resource-intensive and can significantly increase response times, 20-30-40 seconds or more, even on a powerful GPU. This is a known issue with the developers working on a solution. Now that we've tested the chatbot, the next step is integrating it into our company's existing chat system. OpenWebUI offers an API and can function as a proxy to Alama, adding its own unique features. However, the documentation is still lacking, making integration a bit of a challenge. By examining the code and commit history, we've gleaned some insights into how to structure API

Starting point is 00:10:36 requests, but it's not quite working as expected yet. We've managed to call the custom model, but without RAG functionality. We're eagerly awaiting the developers' promised features in upcoming releases, including RAG, web search, and detailed examples and descriptions. The testing process also revealed some inconsistencies and redundancies in our documentation. This presents an opportunity to both enhance the chatbot's performance and improve the overall clarity and accuracy of our documentation. Greater than rent GPU servers with instant deployment or a server with a custom greater than configuration with professional-grade NVIDIA Tesla A100H 180GB or A5000 greater than A4000 cards. GPU servers with

Starting point is 00:11:19 Gaming RTX 4090 cards are also available. Thank you for listening to this HackerNoon story, read by Artificial Intelligence. Visit HackerNoon.com to read, write, learn and publish.

The Good Tech Companies - Let's build a customer support chatbot using RAG and your company's documentation in OpenWebUI

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.