Retrieval-augmented generation has become the default architecture for enterprise knowledge assistants. Organizations ingest their document repositories, Confluence wikis, SharePoint libraries, and internal databases into vector stores, then use semantic similarity search to provide relevant context to an LLM at query time. The result is a conversational interface to institutional knowledge that can answer questions, summarize policies, and surface information that would otherwise require hours of manual search.
There is a fundamental problem with most RAG implementations: vector databases retrieve documents based on semantic similarity to the query, not based on whether the user asking the question is authorized to see those documents. This authorization gap is the single most consistently found vulnerability in RAG pipeline security assessments, and it exists because the retrieval layer was designed for relevance, not for access control.
How Vector Search Ignores Authorization
In a traditional application, a user queries a database and the query engine enforces row-level or document-level access controls. A sales representative sees their own accounts. An HR manager sees employee records for their department. These controls are enforced at the data layer and are independent of the query itself. The user cannot formulate a query that bypasses authorization because the database engine filters results before returning them.
Vector databases operate on a fundamentally different principle. Documents are converted to embedding vectors and stored in a high-dimensional space. When a user submits a query, the query is also converted to an embedding, and the database returns the nearest vectors by cosine similarity or similar distance metric. The retrieval is purely mathematical. There is no concept of document ownership, classification level, or access control list built into the similarity search. If a confidential board strategy document and a public company policy are both in the vector store, the retrieval engine will surface whichever is more semantically relevant to the query, regardless of who is asking.
Most RAG implementations compound this problem by using a single shared vector store for all users. Every document ingested into the pipeline is available to every user who can submit a query. A junior employee asking about vacation policy might receive context chunks from an executive compensation analysis, a pending acquisition memo, or a legal privileged communication, simply because those documents contain semantically similar language. The LLM then incorporates that information into its response, effectively declassifying the document through the conversational interface.
Metadata Filtering: The Incomplete Solution
The most common attempted fix for the RAG authorization gap is metadata filtering. During ingestion, documents are tagged with metadata indicating their access permissions: department, classification level, owner, or required role. At query time, the application attaches a filter to the vector search that restricts retrieval to documents the current user is authorized to see. This approach works in principle, but it fails in practice for several interconnected reasons.
First, metadata must be accurate and complete at ingestion time. Document permissions in source systems like SharePoint, Google Drive, and Confluence are dynamic: they change when users join or leave teams, when documents are shared or unshared, and when folders are restructured. RAG ingestion pipelines rarely re-sync permissions in real time. A document that was public when ingested may have been restricted since then, but the vector store still has the old metadata. Conversely, a user who gained access to a document library yesterday may not see those documents in RAG results until the next full re-ingestion cycle.
Second, metadata filters can be bypassed through application-layer flaws. If the filter is constructed from user-supplied parameters, an attacker may be able to manipulate the filter to broaden the search scope. If the application constructs the filter from a session token, the filter is only as secure as the session management implementation. And if the metadata is stored as document-level attributes rather than chunk-level attributes, a document that was split into hundreds of chunks during ingestion may have some chunks that inherited metadata and others that did not, creating gaps in filter coverage.
Ingestion Pipeline Risks and Corpus Poisoning
The security of a RAG pipeline extends beyond the retrieval layer to the ingestion pipeline itself. Documents are processed through a chain of extraction, chunking, embedding, and storage steps. Each step represents an opportunity for data to be corrupted, leaked, or poisoned. If the ingestion pipeline pulls from a source that can be modified by external parties, such as a shared mailbox, a customer-facing form, or a wiki with broad edit permissions, an attacker can introduce documents containing adversarial content that will be served to future users.
Corpus poisoning is the RAG-specific variant of data poisoning. An attacker introduces a document into the knowledge base that contains false information, adversarial instructions, or content designed to manipulate the LLM's behavior when retrieved. Because vector search retrieves by similarity, the attacker can craft the document to be highly relevant to specific queries. For example, a poisoned document about password reset procedures could instruct the LLM to direct users to a phishing page. The document sits in the vector store alongside legitimate content, and neither the retrieval engine nor the LLM has a reliable mechanism to distinguish genuine from adversarial content.
Ingestion pipelines should treat all source content as untrusted input. This means validating document sources, implementing integrity checks on ingested content, restricting who can add documents to the ingestion pipeline, and maintaining an audit trail of all documents in the vector store. Organizations should also implement mechanisms to detect and remove poisoned content, including periodic review of high-retrieval-frequency chunks and monitoring for anomalous changes in LLM response patterns that could indicate corpus manipulation.
Implementing Proper Authorization in RAG Architectures
Solving the RAG authorization gap requires treating it as a system design problem, not a configuration fix. The most robust approach is to maintain separate vector stores or namespaces per authorization boundary, ensuring that documents with different access requirements are physically or logically isolated. A query from a user in the engineering department hits the engineering vector store, which contains only documents that all engineers are authorized to access. This approach trades storage efficiency and operational simplicity for a strong authorization guarantee.
For organizations that cannot maintain per-user or per-group vector stores, a layered approach combines metadata filtering with post-retrieval authorization checks. After the vector search returns candidate chunks, a separate authorization service validates each chunk against the current user's permissions in the source system of record before passing the chunks to the LLM. This adds latency but ensures that the authorization decision is made against live permissions, not stale metadata. The authorization service should query the source system's access control API directly, not rely on cached permission data in the vector store.
Regardless of the architectural approach, organizations should implement output monitoring that detects when the LLM's response contains information from documents the user should not have access to. This serves as a defense-in-depth control that catches failures in the retrieval-layer authorization. Content classification models can flag responses that contain high-sensitivity patterns such as financial figures, personal identifiable information, or legal privileged language, triggering a secondary authorization check before the response is delivered to the user.
