What is Retrieval-Augmented Generation (RAG)?
Retrieval-Augmented Generation (RAG) is a technique used to improve the capabilities of a large language model by augmenting it with a text corpus and the ability to search that corpus on the fly. A system using RAG receives a query from a user, uses that query to search its corpus, and adds the results to the LLM's prompt to generate a relevant response.
Some advantages of using RAG are that it:
-
Can enhance an LLM’s knowledge on specific subjects through the use of specialized corpuses that it may not have encountered much or at all during its training.
-
Allows the system to quote or cite sources from the corpus that have been returned by the query.
-
Appears to reduce hallucinations in some contexts.
-
Reduces inference costs by including only relevant parts of the corpus into prompts.
-
Makes it easier to ensure an LLM’s outputs stay current in domains where information changes quickly (since it’s easier to keep a text corpus up to date than to retrain a model).
An example of a system based on RAG is aisafety.info’s chatbot.
Further reading: