no-training-data-exposure
This rule identifies code patterns where user data might be sent to LLM training endpoints or when training data collection is enabled.
Prevents user data from being sent to LLM training endpoints.
📊 Rule Details
| Property | Value |
|---|---|
| Type | problem |
| Severity | 🟡 HIGH |
| OWASP LLM | LLM03: Training Data Poisoning |
| CWE | CWE-359: Privacy Violation |
| CVSS | 7.0 |
| Config Default | warn (recommended), error (strict) |
🔍 What This Rule Detects
This rule identifies code patterns where user data might be sent to LLM training endpoints or when training data collection is enabled.
❌ Incorrect Code
// Training enabled
const config = {
training: true,
};
// Allow training flag
const options = {
allowTraining: true,
};
// Training endpoint
fetch('https://api.openai.com/v1/fine-tune');✅ Correct Code
// Training disabled
const config = {
training: false,
};
// No training endpoint
await generateText({
model: openai('gpt-4'),
prompt: userInput,
});⚙️ Options
| Option | Type | Default | Description |
|---|---|---|---|
trainingPatterns | string[] | ['train', 'training', 'finetune', 'feedback'] | Patterns suggesting training |
🛡️ Why This Matters
Exposing user data to training can:
- Privacy violations - User data used without consent
- Data poisoning - Malicious data taints model
- Compliance violations - GDPR, CCPA violations
- IP leakage - Proprietary information exposed
Known False Negatives
The following patterns are not detected due to static analysis limitations:
Environment-Based Training Flags
Why: Environment variables are not resolved.
// ❌ NOT DETECTED - Training from env
const options = { training: process.env.ENABLE_TRAINING };Mitigation: Hardcode training: false. Never use env for training flags.
Training Endpoints in Config
Why: Endpoints from config files are not visible.
// ❌ NOT DETECTED - Endpoint from config
fetch(config.apiEndpoint); // May be fine-tune endpointMitigation: Review API configurations for training endpoints.
Implicit Training via SDK Options
Why: Hidden SDK options enabling training may not be detected.
// ❌ NOT DETECTED - SDK defaults to training
const client = new AIClient(); // training: true by defaultMitigation: Explicitly set training: false in all SDK configs.
📚 References
no-system-prompt-leak
This rule identifies code patterns where system prompts or AI instructions are returned in API responses, logged, or otherwise exposed to clients. System prompt
no-unsafe-output-handling
This rule identifies code patterns where AI-generated output is passed directly to dangerous functions that can execute code, manipulate the DOM, or run databas