LLM Application Security Assessment
Testing Prompt Injection, Jailbreaking, Data Extraction, and Trust Boundary Failures in LLM-Powered Applications
LLM-powered applications fail differently than traditional software. Prompt injection — direct and indirect — is the most prevalent vulnerability class, and it bypasses every traditional security control: WAFs do not detect it, SAST cannot find it, and standard penetration tests do not test for it.
This assessment provides a methodical, adversarial evaluation of your LLM application against the OWASP LLM Top 10 (2025). Every finding is demonstrated with a specific prompt sequence and model response — not theoretical risk statements. The output is a technical findings report, an attack surface map of your application's trust boundaries, and architecture-specific remediation guidance.
The assessment tests the application layer — how your product is built on top of the model — not the model provider's safety guardrails. The attack surface is in your system prompts, tool configurations, retrieval pipelines, and output handling.
Who This Is For
Ideal clients for this engagement.
The Problem
What this engagement addresses.
Prompt Injection
The most prevalent LLM security failure. Direct injection overrides system instructions. Indirect injection embeds malicious instructions in retrieved documents, emails, or web pages that the model follows without user awareness.
Invisible to Traditional Testing
Standard penetration tests, WAFs, and static analyzers cannot detect LLM-specific vulnerabilities. Prompt injection is not SQL injection — it operates at the semantic layer, not the syntax layer.
System Prompt Extraction
Attackers can extract system prompts revealing application logic, internal instructions, credentials, API endpoints, and configuration — information that was never intended to be user-visible.
Insecure Output Handling
Model outputs rendered without sanitization can produce XSS, execute SQL, trigger shell commands, or inject content into downstream systems. The model becomes an injection vector.
Excessive Agency
Tools and function calls invoked by the model outside the user's intent or authorization. The model takes actions the application architect did not anticipate.
Data Extraction via Conversation
Training data memorization, RAG document leakage, and multi-turn manipulation sequences that gradually extract sensitive information the application should not expose.
Assessment Coverage
What we test — systematically.
Direct override, persona injection, jailbreaking templates, multi-turn manipulation, indirect injection via retrieved documents, web content, emails, and database records.
System prompt extraction, training data extraction, credential and API key extraction, RAG document leakage across authorization boundaries.
Third-party plugin and tool security assessment. Model provenance verification for self-hosted and fine-tuned models.
Poisoning via user-facing fine-tuning, feedback loops, retrieval index updates, and adversarial examples injected through application interfaces.
HTML/XSS via model output, unsanitized SQL and shell command execution, unsafe downstream system processing, markdown injection.
Tool invocation outside intended scope, parameters beyond authorization, chained tool calls producing unintended impact, authorization boundary testing.
Direct elicitation techniques, indirect reasoning attacks, partial disclosure probing, model completion attacks targeting system prompt content.
Least-privilege review of tool, API, and database access granted to the LLM application. Revocability assessment.
Edge-case false outputs in high-stakes applications (medical, legal, financial). Uncertainty communication assessment.
Rate limiting, token budget abuse, denial of service via expensive prompts, data exfiltration via repeated interactions.
Deliverables
What you receive.
Technical Findings Report
Every finding with OWASP LLM Top 10 category, risk rating, specific prompt sequence, model response, business impact, and architecture-specific remediation guidance. Complete interaction transcript included. Findings classified by responsible layer — system prompt design, input validation, output handling, tool authorization, retrieval access control, or application logic.
Executive Summary
Non-technical summary of overall risk level, top findings with plain-language business impact, trust boundary gaps, and priority remediations. Written for security leadership, product leadership, and board-level audiences.
Attack Surface Map
Structured map of system prompt design, retrieval integration points, tool and function call surface, output handling pipeline, and user interaction model. Annotated with trust assumptions and associated findings. A living document for ongoing security review.
Remediation Guidance & Retest
Defense-in-depth controls per finding: input validation, output parsing, privilege separation, agent sandboxing, sanitization, and tool authorization model changes. Retest of all Critical and High findings within 90 days, documented as a report addendum.
Methodology
How the engagement works.
Architecture Review & Threat Modeling
Week 1
- Application architecture review and system prompt analysis
- Tool and function call surface inventory
- Trust model documentation — what trusts what
- Testing plan development based on architecture
Manual Adversarial Testing
Weeks 1 – 3
- OWASP LLM Top 10 systematic testing
- Direct and indirect prompt injection
- Output handling and downstream impact testing
- Tool authorization and excessive agency testing
- Data extraction and system prompt probing
- Multi-turn manipulation sequences
Reporting, Debrief & Retest
Within 5 business days of test completion
- Technical findings report delivery
- Attack surface map delivery
- Live debrief session with engineering and security teams
- Remediation retest after fixes (within 90 days)
Engagement Tiers
Scoped to your architecture.
Focused
Single LLM application, no tool use or retrieval integration. For single-model chatbots or simple Q&A applications.
- OWASP LLM Top 10 coverage
- Technical findings report
- Executive summary
- Attack surface map
- Remediation retest
Standard
Single LLM application with RAG integration and/or function calling / tool use. Includes indirect prompt injection and tool authorization testing.
- Everything in Focused
- Indirect prompt injection via retrieved content
- Tool authorization and excessive agency testing
- RAG document leakage assessment
Complex
Single LLM application with complex architecture — multi-step reasoning, multiple tool integrations, plugin ecosystem, or customer-facing application with sensitive data and transactional APIs.
- Everything in Standard
- Extended depth across all assessment domains
- Red team objective-based component
- Multi-agent architecture testing available
Prerequisites
- Application access (staging or production as agreed in Rules of Engagement)
- System prompt or application design documentation where available
- API access credentials for the LLM application under test
- Description of intended use cases, user roles, and trust model
Frequently Asked Questions
Common questions.
Does this test the model itself (GPT-4, Claude)?
No. This assesses how your application is built on top of the model — system prompts, tool configurations, retrieval pipelines, output handling, and trust boundaries. The model provider's safety guardrails are not in scope.
What is indirect prompt injection and why is it different from direct injection?
Direct prompt injection is a user deliberately trying to override the system prompt. Indirect prompt injection is attacker-controlled content — a retrieved document, an email, a web page — that contains instructions the model follows when it processes that content. Users never see the injected instructions. It bypasses all user-facing input validation.
Is this safe to run against production?
All testing targets the application's interface only — no direct model provider API calls outside the application context, no exfiltration of real user data, no actions against production systems beyond the designed interaction surface without explicit written authorization. Staging is preferred for applications with real-world action capability.
How is this different from a traditional penetration test?
A traditional penetration test evaluates network, web, and API security using established exploitation techniques. LLM applications introduce entirely new vulnerability classes — prompt injection, jailbreaking, system prompt extraction, excessive agency — that require different testing methodologies, different tools, and different expertise.
Every finding has a proof-of-concept?
Yes. Every finding includes the specific prompt sequence used, the model's response, and the business impact. Engineering teams can reproduce every finding directly. No theoretical risk statements.
Related Offerings
Often paired with this engagement.
Agentic AI Security Review
For multi-agent systems — covers inter-agent trust boundaries, tool authorization across agent chains, memory system security, and human oversight mechanisms.
RAG Pipeline Security Assessment
Deeper coverage of the retrieval infrastructure — vector store access control, ingestion pipeline security, and document corpus integrity.
Secure AI Architecture & Threat Modeling
Design-layer review before the application is built — reference architectures, threat models, and runtime guardrail specifications.
AI Governance Program Build
Governance framework for regulated or customer-facing AI deployments — policies, risk management, approval workflows, and compliance mapping.
API Security Assessment
HTTP-layer security for externally accessible LLM APIs — OWASP API Top 10, authorization, and business logic testing.
Ready to discuss this engagement?
30-minute discovery call. We will discuss your application architecture, your specific concerns, and whether this assessment is the right fit.
