ChatGPT’s training data is limited to September 2021. If you ask ChatGPT about something that occurred since 2022, it will fail to answer factually. We commonly refer to this behavior as “hallucination.” One another downside of LLM model is that they lack domain specific information.
RAG - Retrieval Augmented Generation
Retrieval Augmented Generation refers to the terminology where we use external data as the context specific data and feed it to the LLM.
This data can be domain specific data and fetch it for LLM at inference, this reduce the likelihood of hallucinations.
The limitation of current LLMs lies in their lack of awareness of specific business details, individual requirements, customer demographics, and the unique context of an application.
Retrieval Augmented Generation (RAG) offers a solution to this challenge by enriching the LLM's understanding with tailored context and factual data during the generation process. This can range from customer preferences and transaction records to segments of a play's dialogue, product specifications, real-time stock information, or even multimedia like voice recordings or songs.

Let's consider a different scenario to illustrate this concept:
In a healthcare application, RAG could be implemented to improve responses related to medical queries. The LLM, while powerful in its language processing capabilities, lacks awareness of a specific user's medical history, symptoms, or personalized treatment plans.
RAG would step in by integrating the LLM with a knowledge base containing detailed medical records, patient history, symptoms, and treatment options. When a user asks a health-related question, the querying stage retrieves pertinent information from this knowledge base. This information is then combined with the user's query and provided to the LLM, enriching its understanding.
RAG - High level concept LLamaIndex
LlamaIndex is a platform that empowers developers to create applications utilizing the powerful capabilities of large language models (LLMs) like GPT-3.5. It operates based on the retrieval augmented generation (RAG) paradigm, enhancing LLMs with custom data. This guide will walk you through the high-level concepts and modules within LlamaIndex, focusing on the two main stages of RAG: indexing and querying.
Retrieval augmented generation (RAG) is a two-stage approach that combines a large language model (LLM) with custom data:
Indexing Stage

During this stage, a knowledge base is prepared. This involves organizing and structuring the custom data to make it easily retrievable and accessible. The knowledge base acts as a source of information for the LLM.
- Data Source- This is the external data and can be in any form i.e., CSV, PDF, Web based on the data source, relevant Data loaders are used to process the data.
- Documents / Nodes: A Documents/Node here represents a fundamental unit of data in LlamaIndex, containing a chunk of a source Document with comprehensive metadata and inter-node relationships for precise retrieval actions.
- Data Indexes (VectorStoreIndex): LlamaIndex streamlines data indexing by converting raw documents into intermediary representations, generating vector embeddings, and deducing metadata, with the VectorStoreIndex being a prevalent index format facilitating efficient data retrieval.
Querying Stage

In this stage, the system retrieves relevant context from the knowledge base based on a query. This retrieved context is then used to augment the LLM's understanding and generation capabilities. The LLM can utilize this additional information to formulate more informed and accurate responses to user queries.
- Retrievers: Retrievers are the block that acts on how to retrieve the relevant context from knowledge base when user provides a query.
- Node Postprocessors: A node postprocessor takes in a set of nodes, then apply transformation, filtering, or re-ranking logic to them.
- Response Synthesizers: A response synthesizer generates a response from an LLM, using a user query and a given set of retrieved text chunks.
Advantages of RAG
- No need to train a model on new data
- Cost efficacy
- Reduce hallucination
- Making the LLM model learn domain specific data.