What is Retrieval-Augmented Generation (RAG)?

2 min read

Suggest changes in Google Docs

Retrieval-Augmented Generation (RAG) is a technique used to improve the capabilities of a large language model by augmenting it with a text corpus and the ability to search that corpus on the fly. A system using RAG receives a query from a user, uses that query to search its corpus, and adds the results to the LLM's prompt to generate a relevant response.

Some advantages of using RAG are that it:

Can enhance an LLM’s knowledge on specific subjects through the use of specialized corpuses that it may not have encountered much or at all during its training.
Allows the system to quote or cite sources from the corpus that have been returned by the query.
Appears to reduce hallucinations in some contexts.
Reduces inference costs by including only relevant parts of the corpus into prompts.
Makes it easier to ensure an LLM’s outputs stay current in domains where information changes quickly (since it’s easier to keep a text corpus up to date than to retrain a model).

An example of a system based on RAG is aisafety.info’s chatbot.