Deep Layer Security Advisory
AI SecurityAssessment2 – 5 Weeks

LLM Application Security Assessment

Testing Prompt Injection, Jailbreaking, Data Extraction, and Trust Boundary Failures in LLM-Powered Applications

LLM-powered applications fail differently than traditional software. Prompt injection — direct and indirect — is the most prevalent vulnerability class, and it bypasses every traditional security control: WAFs do not detect it, SAST cannot find it, and standard penetration tests do not test for it.

This assessment provides a methodical, adversarial evaluation of your LLM application against the OWASP LLM Top 10 (2025). Every finding is demonstrated with a specific prompt sequence and model response — not theoretical risk statements. The output is a technical findings report, an attack surface map of your application's trust boundaries, and architecture-specific remediation guidance.

The assessment tests the application layer — how your product is built on top of the model — not the model provider's safety guardrails. The attack surface is in your system prompts, tool configurations, retrieval pipelines, and output handling.

OWASP LLM Top 10 (2025)MITRE ATLASNIST AI RMFISO 42001

Who This Is For

Ideal clients for this engagement.

Organizations shipping customer-facing chatbots, copilots, or AI assistants before independent security review
Internal knowledge assistants built on LLMs that access sensitive organizational data
Applications with retrieval-augmented generation (RAG) where documents from multiple access levels are in the corpus
LLM applications with function calling or tool use — code execution, database queries, email, API calls
Autonomous agents with real-world action capability going to production
Code generation tools integrated into development workflows
Any LLM-powered application that has not had specialized LLM security testing

The Problem

What this engagement addresses.

Prompt Injection

The most prevalent LLM security failure. Direct injection overrides system instructions. Indirect injection embeds malicious instructions in retrieved documents, emails, or web pages that the model follows without user awareness.

Invisible to Traditional Testing

Standard penetration tests, WAFs, and static analyzers cannot detect LLM-specific vulnerabilities. Prompt injection is not SQL injection — it operates at the semantic layer, not the syntax layer.

System Prompt Extraction

Attackers can extract system prompts revealing application logic, internal instructions, credentials, API endpoints, and configuration — information that was never intended to be user-visible.

Insecure Output Handling

Model outputs rendered without sanitization can produce XSS, execute SQL, trigger shell commands, or inject content into downstream systems. The model becomes an injection vector.

Excessive Agency

Tools and function calls invoked by the model outside the user's intent or authorization. The model takes actions the application architect did not anticipate.

Data Extraction via Conversation

Training data memorization, RAG document leakage, and multi-turn manipulation sequences that gradually extract sensitive information the application should not expose.

Assessment Coverage

What we test — systematically.

LLM01: Prompt Injection

Direct override, persona injection, jailbreaking templates, multi-turn manipulation, indirect injection via retrieved documents, web content, emails, and database records.

LLM02: Sensitive Information Disclosure

System prompt extraction, training data extraction, credential and API key extraction, RAG document leakage across authorization boundaries.

LLM03: Supply Chain

Third-party plugin and tool security assessment. Model provenance verification for self-hosted and fine-tuned models.

LLM04: Data and Model Poisoning

Poisoning via user-facing fine-tuning, feedback loops, retrieval index updates, and adversarial examples injected through application interfaces.

LLM05: Insecure Output Handling

HTML/XSS via model output, unsanitized SQL and shell command execution, unsafe downstream system processing, markdown injection.

LLM06: Excessive Agency

Tool invocation outside intended scope, parameters beyond authorization, chained tool calls producing unintended impact, authorization boundary testing.

LLM07: System Prompt Confidentiality

Direct elicitation techniques, indirect reasoning attacks, partial disclosure probing, model completion attacks targeting system prompt content.

LLM08: Excessive Permissions

Least-privilege review of tool, API, and database access granted to the LLM application. Revocability assessment.

LLM09: Misinformation

Edge-case false outputs in high-stakes applications (medical, legal, financial). Uncertainty communication assessment.

LLM10: Unbounded Consumption

Rate limiting, token budget abuse, denial of service via expensive prompts, data exfiltration via repeated interactions.

Deliverables

What you receive.

01

Technical Findings Report

Every finding with OWASP LLM Top 10 category, risk rating, specific prompt sequence, model response, business impact, and architecture-specific remediation guidance. Complete interaction transcript included. Findings classified by responsible layer — system prompt design, input validation, output handling, tool authorization, retrieval access control, or application logic.

02

Executive Summary

Non-technical summary of overall risk level, top findings with plain-language business impact, trust boundary gaps, and priority remediations. Written for security leadership, product leadership, and board-level audiences.

03

Attack Surface Map

Structured map of system prompt design, retrieval integration points, tool and function call surface, output handling pipeline, and user interaction model. Annotated with trust assumptions and associated findings. A living document for ongoing security review.

04

Remediation Guidance & Retest

Defense-in-depth controls per finding: input validation, output parsing, privilege separation, agent sandboxing, sanitization, and tool authorization model changes. Retest of all Critical and High findings within 90 days, documented as a report addendum.

Methodology

How the engagement works.

1

Architecture Review & Threat Modeling

Week 1

  • Application architecture review and system prompt analysis
  • Tool and function call surface inventory
  • Trust model documentation — what trusts what
  • Testing plan development based on architecture
2

Manual Adversarial Testing

Weeks 1 – 3

  • OWASP LLM Top 10 systematic testing
  • Direct and indirect prompt injection
  • Output handling and downstream impact testing
  • Tool authorization and excessive agency testing
  • Data extraction and system prompt probing
  • Multi-turn manipulation sequences
3

Reporting, Debrief & Retest

Within 5 business days of test completion

  • Technical findings report delivery
  • Attack surface map delivery
  • Live debrief session with engineering and security teams
  • Remediation retest after fixes (within 90 days)

Engagement Tiers

Scoped to your architecture.

Focused

Single LLM application, no tool use or retrieval integration. For single-model chatbots or simple Q&A applications.

  • OWASP LLM Top 10 coverage
  • Technical findings report
  • Executive summary
  • Attack surface map
  • Remediation retest

Standard

Single LLM application with RAG integration and/or function calling / tool use. Includes indirect prompt injection and tool authorization testing.

  • Everything in Focused
  • Indirect prompt injection via retrieved content
  • Tool authorization and excessive agency testing
  • RAG document leakage assessment

Complex

Single LLM application with complex architecture — multi-step reasoning, multiple tool integrations, plugin ecosystem, or customer-facing application with sensitive data and transactional APIs.

  • Everything in Standard
  • Extended depth across all assessment domains
  • Red team objective-based component
  • Multi-agent architecture testing available

Prerequisites

  • Application access (staging or production as agreed in Rules of Engagement)
  • System prompt or application design documentation where available
  • API access credentials for the LLM application under test
  • Description of intended use cases, user roles, and trust model

Frequently Asked Questions

Common questions.

Does this test the model itself (GPT-4, Claude)?

No. This assesses how your application is built on top of the model — system prompts, tool configurations, retrieval pipelines, output handling, and trust boundaries. The model provider's safety guardrails are not in scope.

What is indirect prompt injection and why is it different from direct injection?

Direct prompt injection is a user deliberately trying to override the system prompt. Indirect prompt injection is attacker-controlled content — a retrieved document, an email, a web page — that contains instructions the model follows when it processes that content. Users never see the injected instructions. It bypasses all user-facing input validation.

Is this safe to run against production?

All testing targets the application's interface only — no direct model provider API calls outside the application context, no exfiltration of real user data, no actions against production systems beyond the designed interaction surface without explicit written authorization. Staging is preferred for applications with real-world action capability.

How is this different from a traditional penetration test?

A traditional penetration test evaluates network, web, and API security using established exploitation techniques. LLM applications introduce entirely new vulnerability classes — prompt injection, jailbreaking, system prompt extraction, excessive agency — that require different testing methodologies, different tools, and different expertise.

Every finding has a proof-of-concept?

Yes. Every finding includes the specific prompt sequence used, the model's response, and the business impact. Engineering teams can reproduce every finding directly. No theoretical risk statements.

Related Offerings

Often paired with this engagement.

Agentic AI Security Review

For multi-agent systems — covers inter-agent trust boundaries, tool authorization across agent chains, memory system security, and human oversight mechanisms.

RAG Pipeline Security Assessment

Deeper coverage of the retrieval infrastructure — vector store access control, ingestion pipeline security, and document corpus integrity.

Secure AI Architecture & Threat Modeling

Design-layer review before the application is built — reference architectures, threat models, and runtime guardrail specifications.

AI Governance Program Build

Governance framework for regulated or customer-facing AI deployments — policies, risk management, approval workflows, and compliance mapping.

API Security Assessment

HTTP-layer security for externally accessible LLM APIs — OWASP API Top 10, authorization, and business logic testing.

Ready to discuss this engagement?

30-minute discovery call. We will discuss your application architecture, your specific concerns, and whether this assessment is the right fit.