A vector database stores data as high-dimensional numeric vectors — embeddings produced by a machine-learning model — and is optimized to find the vectors most similar to a query vector. Instead of matching exact keywords, it ranks results by closeness in this embedding space, which lets it retrieve items that are semantically related even when they share no words.
This capability is the engine behind two now-common AI patterns. Semantic search returns documents by meaning rather than literal terms; retrieval-augmented generation uses the same nearest-neighbor lookup to fetch relevant passages and feed them to a language model so its answer is grounded in real sources. Systems like this rely on approximate nearest-neighbor indexes to stay fast at scale, trading a little exactness for large speed gains.
For engineering teams it is worth being clear about what a vector database is and is not. It is excellent infrastructure for finding text an agent asks for by similarity. It does not, on its own, know which knowledge a change to a specific file requires, or warn about a constraint no one queried — that is the job of code-anchored context, which matches knowledge to files by path rather than by embedding distance.