Retrieval-augmented generation (RAG)

Retrieval-augmented generation (RAG) is a technique that fetches relevant documents at query time and feeds them to a language model, so its answer is grounded in specific sources rather than only its training data.

Retrieval-augmented generation (RAG) is a technique for grounding a language model's output in external information. Instead of relying solely on what the model learned during training, a RAG system retrieves relevant documents — from a vector database, a search index, or an API — at the moment of the query and includes them in the model's context, so the answer is based on specific, up-to-date sources.

RAG became popular because it addresses two weaknesses of bare language models: they have a knowledge cutoff, and they can confidently invent facts. By fetching real passages and asking the model to answer from them, RAG reduces hallucination and lets a system reason over private or recent data the model never saw in training. It is the backbone of most production question-answering over a company's own documents.

For software teams it is worth distinguishing RAG, which retrieves information an agent asks for, from a context layer, which proactively delivers the knowledge a change requires before the agent thinks to ask. Retrieval answers a question; anchored context warns about a constraint no one queried. The two are complementary: RAG is a powerful way to surface documents, while code-anchored context decides which knowledge is the right knowledge for the files in play.