Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
When building apps that integrate with large language models (LLM), you should test how your app handles various LLM failure scenarios. Dev Proxy allows you to simulate realistic language model failures on any LLM API that you use in your app using the LanguageModelFailurePlugin.
Simulate language model failures on any LLM API
To start, enable the LanguageModelFailurePlugin
in your configuration file.
{
"$schema": "https://raw.githubusercontent.com/dotnet/dev-proxy/main/schemas/v1.0.0/rc.schema.json",
"plugins": [
{
"name": "LanguageModelFailurePlugin",
"enabled": true,
"pluginPath": "~appFolder/plugins/DevProxy.Plugins.dll",
"urlsToWatch": [
"https://api.openai.com/*",
"http://localhost:11434/*"
]
}
]
}
With this basic configuration, the plugin randomly selects from all available failure types and applies them to matching language model API requests.
Configure specific failure scenarios
To test specific failure scenarios, configure the plugin to use particular failure types:
{
"$schema": "https://raw.githubusercontent.com/dotnet/dev-proxy/main/schemas/v1.0.0/rc.schema.json",
"plugins": [
{
"name": "LanguageModelFailurePlugin",
"enabled": true,
"pluginPath": "~appFolder/plugins/DevProxy.Plugins.dll",
"configSection": "languageModelFailurePlugin",
"urlsToWatch": [
"https://api.openai.com/*",
"http://localhost:11434/*"
]
}
],
"languageModelFailurePlugin": {
"$schema": "https://raw.githubusercontent.com/dotnet/dev-proxy/main/schemas/v1.0.0/languagemodelfailureplugin.schema.json",
"failures": [
"Hallucination",
"PlausibleIncorrect",
"BiasStereotyping"
]
}
}
This configuration only simulates incorrect information, plausible but incorrect responses, and biased content.
Test different LLM APIs
You can test different LLM APIs by configuring multiple instances of the plugin with different URL patterns:
{
"$schema": "https://raw.githubusercontent.com/dotnet/dev-proxy/main/schemas/v1.0.0/rc.schema.json",
"plugins": [
{
"name": "LanguageModelFailurePlugin",
"enabled": true,
"pluginPath": "~appFolder/plugins/DevProxy.Plugins.dll",
"configSection": "openaiFailures",
"urlsToWatch": [
"https://api.openai.com/*"
]
},
{
"name": "LanguageModelFailurePlugin",
"enabled": true,
"pluginPath": "~appFolder/plugins/DevProxy.Plugins.dll",
"configSection": "ollamaFailures",
"urlsToWatch": [
"http://localhost:11434/*"
]
}
],
"openaiFailures": {
"$schema": "https://raw.githubusercontent.com/dotnet/dev-proxy/main/schemas/v1.0.0/languagemodelfailureplugin.schema.json",
"failures": ["Hallucination", "OutdatedInformation"]
},
"ollamaFailures": {
"$schema": "https://raw.githubusercontent.com/dotnet/dev-proxy/main/schemas/v1.0.0/languagemodelfailureplugin.schema.json",
"failures": ["Overgeneralization", "IncorrectFormatStyle"]
}
}
Tip
Configure different failure scenarios for different LLM providers to test how your app handles provider-specific behaviors. Name the configSection
after the LLM service you're testing to make the configuration easier to understand and maintain.
Common testing scenarios
Here are some recommended failure combinations for different testing scenarios:
Testing content accuracy
Test how your app handles incorrect or misleading information:
{
"languageModelFailurePlugin": {
"$schema": "https://raw.githubusercontent.com/dotnet/dev-proxy/main/schemas/v1.0.0/languagemodelfailureplugin.schema.json",
"failures": [
"Hallucination",
"PlausibleIncorrect",
"OutdatedInformation",
"ContradictoryInformation"
]
}
}
Testing bias and fairness
Test how your app responds to biased or stereotypical content:
{
"languageModelFailurePlugin": {
"$schema": "https://raw.githubusercontent.com/dotnet/dev-proxy/main/schemas/v1.0.0/languagemodelfailureplugin.schema.json",
"failures": [
"BiasStereotyping",
"Overgeneralization"
]
}
}
Testing instruction following
Test how your app handles responses that don't follow instructions:
{
"languageModelFailurePlugin": {
"$schema": "https://raw.githubusercontent.com/dotnet/dev-proxy/main/schemas/v1.0.0/languagemodelfailureplugin.schema.json",
"failures": [
"FailureFollowInstructions",
"Misinterpretation",
"IncorrectFormatStyle"
]
}
}
Testing response quality
Test how your app handles vague or overly complex responses:
{
"languageModelFailurePlugin": {
"$schema": "https://raw.githubusercontent.com/dotnet/dev-proxy/main/schemas/v1.0.0/languagemodelfailureplugin.schema.json",
"failures": [
"AmbiguityVagueness",
"OverSpecification",
"CircularReasoning",
"FailureDisclaimHedge"
]
}
}
Start Dev Proxy with your configuration file and use your app to see how it handles the simulated language model failures. The plugin intercepts responses from language model APIs and replaces them with synthetic failure responses that exhibit the configured failure behaviors.
Create custom failure scenarios
You can create custom failure scenarios by adding .prompty
files to the ~appFolder/prompts
directory. For example, to create a "technical jargon overuse" failure:
Create a file named
lmfailure_technical-jargon-overuse.prompty
Define the failure behavior in the
.prompty
file:--- name: Technical Jargon Overuse model: api: chat sample: scenario: Simulate a response that overuses technical jargon and unnecessarily complex terminology, making simple concepts difficult to understand. --- user: How do I create a simple web page? user: You are a language model under evaluation. Your task is to simulate incorrect responses. {{scenario}} Do not try to correct the error. Do not explain or justify the mistakes. The goal is to simulate them as realistically as possible for evaluation purposes.
Reference it in your configuration as
TechnicalJargonOveruse
{ "languageModelFailurePlugin": { "$schema": "https://raw.githubusercontent.com/dotnetdev-proxy/main/schemas/v1.0.0languagemodelfailureplugin.schema.json", "failures": [ "TechnicalJargonOveruse", "Hallucination" ] } }
Next step
Learn more about the LanguageModelFailurePlugin
.