The Good Tech Companies - This Is What Happens When You Store Your AI Prompts in the Wrong Place

Episode Date: April 5, 2025

This story was originally published on HackerNoon at: https://hackernoon.com/this-is-what-happens-when-you-store-your-ai-prompts-in-the-wrong-place. Would you want your ...chatbot to start discussing Taylor Swift lyrics instead of providing tech support? Well.. that’s what our chatbot did. Here's why. Check more stories related to machine-learning at: https://hackernoon.com/c/machine-learning. You can also check exclusive content about #ai, #ai-promtps, #prompt-injection, #ai-prompt-management, #store-prompts-safely, #confluence-prompt-issues, #secure-llm-prompts, #good-company, and more. This story was written by: @andrewproton. Learn more about this writer by checking @andrewproton's about page, and for more stories, please visit hackernoon.com. Would you want your chatbot to start discussing Taylor Swift lyrics instead of providing tech support? Well.. that’s what our chatbot did. Here's why.

Transcript
Discussion (0)
Starting point is 00:00:00 This audio is presented by Hacker Noon, where anyone can learn anything about any technology. This is what happens when you store your AI prompts in the wrong place. By Andrew Prosikin. This is part of an ongoing series, see first post here. AI Principle 2. Load prompts safely, if you really have to. Would you want your chatbot to start discussing Taylor Swift lyrics instead of providing tech support? That's what our chatbot did when we violated the principle above. If you wanted to Swift your application and make your AI architecture safer, keep reading. Sorry Taylor fans. Where to store prompts? Do you store your prompts with the rest of the code?
Starting point is 00:00:38 Or load them from another source? Perhaps a combination of both? Below is the framework for thinking about this decision. Option A. Store prompts in GIT The first question you should ask is, is there an immediate reason for storing the prompts separately from your code? If no, leave the prompts in Git with the rest of the codebase, where they belong. This is by far the easiest and safest setup to maintain. It is the default option. Going back to principle number one. Prompts are code. Storing parts of your code base outside Git is possible and
Starting point is 00:01:09 sometimes necessary, but not trivial. Do not take the decision to move prompts out lightly. Option B. Load prompts from a version-controlled platform. What if some of your prompts need to be edited by non-engineers? This could occur if deep domain expertise in an area is needed, or if a prompt needs to be edited by non-engineers. This could occur if deep domain expertise in an area is needed, or if a prompt needs to be modified very frequently and you can't wait on the engineering department. In this case, you'll need to load the prompt at runtime from a version-controlled source.
Starting point is 00:01:35 I've seen Confluence and Google Docs successfully used for this purpose. Many other version-controlled, API-accessible platforms are also available. When planning the prompt loading logic, do not underestimate the amount of effort in adding this integration. You'll need to handle a variety of error conditions and scenarios to have confidence in your application. Access permissions need to be configured and maintained, and automated testing and additional monitoring should be extended to catch errors as early as possible. Here are some of the scenarios you need to plan for.
Starting point is 00:02:07 The application is unable to load prompts at runtime. Do you kill the deployment? Switch to a backup version of the prompt? Prompt syntax becomes invalid after a change and returns unusable data structures. Automated tests failed to detect the issue because prompts weren't loaded during test execution. What kind of additional testing infrastructure and monitoring needs to be added to detect the issue because prompts weren't loaded during test execution. What kind of additional testing infrastructure and monitoring needs to be added to detect this and minimize customer impact?
Starting point is 00:02:31 Prompt needs to be urgently rolled back. Does this require a new code deployment? Or do you build a separate UI for prompt deployment? Syntax added to the document by platforms like Confluence can infiltrate the runtime prompt, negatively affecting its performance. Make sure you filter the fuzz out with tools such as Beautiful Soup. All of these issues are 100% solvable, but it's easy to fall into the pattern of thinking that loading a prompt from a Google Doc is a trivial operation that would affect the
Starting point is 00:02:59 application architecture in a deep way. As I've shown above, loading an external prompt is serious business to be approached with care for high reliability applications. Option C. Load prompts from a non-version controlled platform. This is a bad idea, and you will regret it. The source of truth for the prompts needs to be version controlled, have proper API and access controls. This is not an area to cut corners. Option D, hybrid APPROACH. The hybrid approach combines storing some prompts directly within your code base and loading others from external version controlled sources. While maintaining a unified location for all prompts is often simpler and more reliable.
Starting point is 00:03:39 There are scenarios where a hybrid strategy can offer advantages. Consider adopting a hybrid approach under conditions such as mixed usage. Certain prompts require frequent updates by non-coding domain experts, making external loading practical, while others are only changed by engineers. Risk management. Critical prompts, E, G, guardrails, should reside in the main repository for maximum reliability. Less critical prompts, particularly those undergoing frequent adjustments, can safely live externally.
Starting point is 00:04:10 Evaluation flexibility. Prompts intended for ML-style evaluation can be managed externally to simplify their integration with an evaluation framework. Guardrail prompts Guardrail prompts, also known as sensor prompts, are specialized to screen responses before they reach users, ensuring appropriate, safe, and compliant outputs. Guardrails serve as a protective mechanism, particularly in applications where user interactions carry significant legal or ethical risks. They provide a second line of defense, catching inappropriate outputs that slip through. Do not load guardrail prompts from an external dock. This adds a significant unnecessary risk. Either keep them in Git with your code or use a dedicated third-party tool, such as Fiddle guardrails. Guardrail logic doesn't
Starting point is 00:04:56 change very often, so this approach won't slow you down all that much. Using guardrails is a principle of its own, to be discussed in much more detail in a future post. It's a great pattern that improves the safety of your application and helps you sleep better at night. Just don't load them from Google Docs. Loading prompts for easier evaluation teams often load prompts externally to integrate them with evaluation engines, such as MLflow. The underlying assumption behind this practice is that prompts are similar to ML models and need a detached statistical assessment. You plug in a prompt, measure the F1 score on the output, or whatever metric you prefer, and iterate. This approach is
Starting point is 00:05:36 sometimes valid, for instance, on classification prompts designed to behave as ML models. But most prompts are fundamentally different. As outlined in principle number one, LLM prompts are code. Typical prompts are more similar to application logic than to ML models. They are more suited to pass fail type evaluation together with the surrounding code rather than a statistical evaluation approach. External evaluation engines will not help you with most prompts. Instead, you should use automated i-driven tests, similar to traditional unit tests.
Starting point is 00:06:09 These are going to be the focus of subsequent posts. Consider the following practices. Only prompts whose functionality explicitly mimics machine learning models, e.g. classification or scoring tasks, should be externally evaluated. Maintain the majority of business logic prompts within the main codebase, employing traditional automated testing approaches similar to unit testing rather than ML validation techniques. Where external evaluation is warranted, isolate only those prompts when possible. Case study the central issue with loading prompts as availability. What should you do if prompt doesn't load when you expect it to? This is what happened to us in the Taylor Swift example.
Starting point is 00:06:49 None of the prompts for a tech support app loaded as a result of a confluence credentials issue, including the guardrail prompt. This somehow didn't trigger any runtime error sand the bot began responding without any instructions or input, since the input formatting string was part of the prompt. And what does OpenAI's LLM want to talk to in the absence of input? any instructions or input, since the input formatting string was part of the prompt. And what does OpenAI's LLM want to talk to in the absence of input? Turns out, the lyrics to, I want to break freebie queen and various Taylor Swift songs. Fortunately, this
Starting point is 00:07:15 was caught and fixed almost immediately, and users enjoyed the music discussion, at least that's what I tell myself. Why did this incident occur? Two mistakes were made. No checks were performed that the prompts had successfully loaded. There should have been an error thrown at prompt load time, since the app could not function without prompts loaded. The guardrail prompt was loaded externally with the rest of the prompts. That's one prompt that should not be loaded in this way. It should have been kept in Git as the last line of defense.
Starting point is 00:07:45 After the incident, the guardrail prompt was re-migrated to Git and exception logic was added to prevent deployment if a prompt failed to load or was invalid. You can save yourself a postmortem by following these recommendations proactively. Conclusion In this post, I examined key considerations around prompt storage and loading within AI applications. The default practice is to store your prompts alongside your code in version-controlled repositories. Only deviate from this when there's a compelling reason, such as frequent editing by non-engineers or specific evaluation requirements. When prompts must be loaded externally, choose reliable and strictly version-controlled sources, adding testing and monitoring for resilience.
Starting point is 00:08:26 Guardrail prompts, given their critical role in application safety, should remain in your codebase to avoid severe reliability risks. Most prompts are closer in nature to code than to ML models, so only use ML-style tools where you need them. Don't store all of your prompts externally just to simplify integration with an evaluation tool for a few of them. If you enjoyed this post, follow the series for more insights. Thank you for listening to this Hacker Noon story, read by Artificial Intelligence. Visit HackerNoon.com to read, write, learn and publish.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.