Evaluating Naive RAG and Advanced RAG pipeline using langchain v.0.1.0 and RAGAS

Feb 9, 2024

What is RAG(Retrieval Augmented Generation) ?

Retrieval Augmented Generation (RAG) is a natural language processing (NLP) technique that combines two fundamental tasks in NLP: information retrieval and text generation. It aims to enhance the generation process by incorporating information from external sources through retrieval. The goal of RAG is to produce more accurate and contextually relevant responses in text generation tasks.

In traditional text generation models like GPT-3, the model generates text based on patterns learned from a large corpus of data, but it may not always have access to specific, up-to-date, or contextually relevant information. Retrieval Augmented Generation addresses this limitation by introducing an information retrieval component.

Here’s how RAG works:

Retrieval: The model performs a retrieval step to gather relevant information from external sources. These sources could include a database, a knowledge base, a set of documents, or even search engine results. The retrieval process aims to find snippets or passages of text that contain information related to the given input or prompt.

Augmentation: The retrieved information is then combined with the original input or prompt, enriching the context available to the model for generating the output. By incorporating external knowledge, the model can produce more informed and accurate responses.

Generation: Finally, the model generates the response, taking into account the retrieved information and the original input. The presence of this additional context helps the model produce more contextually appropriate and relevant outputs.

RAG can be beneficial in various NLP tasks, such as question-answering, dialogue generation, summarization, and more. By incorporating external knowledge, RAG models have the potential to provide more accurate and informative responses compared to traditional generation models that rely solely on the data they were trained on.

What are the advantages of using RAG ?

Retrieval Augmented Generation (RAG) offers several advantages over traditional text generation models, especially in scenarios where access to external information is beneficial. Some of the key advantages include:

1 . Contextual Relevance: RAG models can produce responses that are more contextually relevant and informative. By incorporating information from external sources, the generated text is better grounded in real-world facts and up-to-date knowledge, leading to more accurate and context-aware responses.

2. Fact Checking and Verification: Since RAG models retrieve information from reliable external sources, they can perform fact-checking and verification during the generation process. This helps in reducing the generation of false or misleading information and ensures the accuracy of the generated content.

3. Improved Knowledge Incorporation: RAG models can effectively utilize external knowledge bases or documents to enhance their responses. This is particularly useful in question-answering tasks, where the model can access relevant information from a wide range of sources to provide well-informed and accurate answers.

4. Flexibility and Adaptability: The ability to retrieve information from diverse sources makes RAG models more flexible and adaptable. They can handle a wide range of topics and tasks without requiring explicit fine-tuning for each specific scenario, as long as the retrieval mechanism is designed to access the relevant information.

5. Handling Out-of-Distribution Inputs: Traditional text generation models may struggle when faced with out-of-distribution or uncommon inputs that were not present in their training data. RAG models, on the other hand, can leverage the retrieval component to find relevant information, even for unseen or less common inputs.

6. Controlled Content Generation: RAG models can also be used for content-controlled generation. By guiding the retrieval process and specifying the sources, developers can control the type and quality of information the model uses to generate responses.

7. Reduced Bias: In some cases, the retrieval mechanism can help reduce bias in generated content. By using diverse sources of information, the model can provide a more balanced and unbiased response, compared to traditional models that may be influenced by the biases present in their training data.

While RAG offers significant advantages, it’s important to be aware of potential challenges and considerations, such as ensuring the reliability of the retrieval sources, handling contradictory information from different sources, and balancing the trade-off between retrieval accuracy and computational efficiency.

Naive RAG Pipeline Implementation

Advanced Rag Implementation

RAGAS Evaluation

Ragas is a powerful library that lets us evaluate our RAG pipeline by collecting input/output/context triplets and obtaining metrics relating to a number of different aspects of our RAG pipeline.

Here we will evaluate the above two pipelines using the RAGAS evaluation framework. The four primary metrics in the ragas framework is as follows :-

Now RAGAS evaluation frame evaluates the two main components of the RAG pipeline:

Retriever
Generator

The metrics associated with evaluating Retrieval is as follows:

Context Precision : How relevant is the context retrieved to the question asked.
Context Recall : Is the retriever able to retrieve all of the relevant context pertaining to ground truth.

The metrics associated with evaluating Generation is as follows:

Answer Relevancy : How relevant is the answer to our initial question
Faithfulness : It tries to measure the factual consistency of the generated answers against the given context.

Answer Correctness is an end to end RAGAS evaluation metrics which is associated with semantic similarity and factual similarity aspect.

Technology Stack used for implementation

LLM : OpenAI gpt-3.5-turbo
Langchain : Build LLM applications
RAGAS : Evaluation Framework

Code Implementation

Install required dependencies

!pip install -qU langchain pypdf llama-cpp-python huggingface_hub
!pip install -qU sentence_transformers
!pip install -q chromadb
!pip install rank_bm25
!pip install langchain-openai
!pip install -q ragas

Load Your Documents

from langchain_community.document_loaders import WebBaseLoader

loader = WebBaseLoader(
    "https://blog.langchain.dev/langchain-v0-1-0/"
)

documents = loader.load()

Instantiate Embedding Model

from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(
    model="text-embedding-ada-002"
)

Instantiate LLM

from langchain_openai import OpenAI
import os
from google.colab import userdata
os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')
openai_llm = OpenAI(temperature=0)

Document Splitter

from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1250,
    chunk_overlap = 100,
    length_function = len,
    is_separator_regex = False
)
#
split_docs = text_splitter.split_documents(documents)
print(len(split_docs))

Instantiate the Vectorstore

from langchain_community.vectorstores import Chroma
vectorstore = Chroma(embedding_function=embeddings,
                     persist_directory="/content/drive/MyDrive/Vectorstore/chromadb",
                     collection_name="full_documents")

Load and persist the split documents into the vectorstore

vectorstore.add_documents(split_docs)
vectorstore.persist()

Instantiate the Keyword / Sparse embeddings model

from langchain.retrievers import BM25Retriever, EnsembleRetriever
from langchain.retrievers import ContextualCompressionRetriever
#
bm25_retriever = BM25Retriever.from_documents(split_docs)
bm25_retriever.k=10

Instantiate Reranker — Cross Encoders

from __future__ import annotations
from typing import Dict, Optional, Sequence
from langchain.schema import Document
from langchain.pydantic_v1 import Extra, root_validator

from langchain.callbacks.manager import Callbacks
from langchain.retrievers.document_compressors.base import BaseDocumentCompressor

from sentence_transformers import CrossEncoder
# from config import bge_reranker_large

class BgeRerank(BaseDocumentCompressor):
    model_name:str = 'BAAI/bge-reranker-large'
    """Model name to use for reranking."""
    top_n: int = 3
    """Number of documents to return."""
    model:CrossEncoder = CrossEncoder(model_name)
    """CrossEncoder instance to use for reranking."""

    def bge_rerank(self,query,docs):
        model_inputs =  [[query, doc] for doc in docs]
        scores = self.model.predict(model_inputs)
        results = sorted(enumerate(scores), key=lambda x: x[1], reverse=True)
        return results[:self.top_n]


    class Config:
        """Configuration for this pydantic object."""

        extra = Extra.forbid
        arbitrary_types_allowed = True

    def compress_documents(
        self,
        documents: Sequence[Document],
        query: str,
        callbacks: Optional[Callbacks] = None,
    ) -> Sequence[Document]:
        """
        Compress documents using BAAI/bge-reranker models.

        Args:
            documents: A sequence of documents to compress.
            query: The query to use for compressing the documents.
            callbacks: Callbacks to run during the compression process.

        Returns:
            A sequence of compressed documents.
        """
        if len(documents) == 0:  # to avoid empty api call
            return []
        doc_list = list(documents)
        _docs = [d.page_content for d in doc_list]
        results = self.bge_rerank(query, _docs)
        final_results = []
        for r in results:
            doc = doc_list[r[0]]
            doc.metadata["relevance_score"] = r[1]
            final_results.append(doc)
        return final_results

Instantiate a Contextual Compression Pipeline

from langchain_community.document_transformers.embeddings_redundant_filter import EmbeddingsRedundantFilter
from langchain.retrievers.document_compressors import DocumentCompressorPipeline
from langchain.retrievers import ContextualCompressionRetriever
from langchain_community.document_transformers.long_context_reorder import LongContextReorder
from langchain.retrievers.multi_query import MultiQueryRetriever
#
vs_retriever = vectorstore.as_retriever(search_kwargs={"k":10})
#

ensemble_retriever = EnsembleRetriever(retrievers=[bm25_retriever,vs_retriever],
                                       weight=[0.5,0.5])
#

redundant_filter = EmbeddingsRedundantFilter(embeddings=embeddings)
#
reordering = LongContextReorder()
#
reranker = BgeRerank()
#
pipeline_compressor = DocumentCompressorPipeline(transformers=[redundant_filter,reordering,reranker])
#
compression_pipeline = ContextualCompressionRetriever(base_compressor=pipeline_compressor,
                                                      base_retriever=ensemble_retriever)

Helper function to display retrieved documents

def pretty_print_docs(docs):
  print(
      f"\n{'-' * 100}\n".join([f"Document {i+1}:\n\n + {d.page_content}" for i,d in enumerate(docs)])
  )

pretty_print_docs(vs_retriever.get_relevant_documents("What are the major changes in v 0.1.0?"))


Document 1:

 + versioning policy for a little over a month now.langchain itself, however, still remained on 0.0.x versions. Having all releases on minor version 0 created a few challenges:Users couldn‚Äôt be confident that updating would not have breaking changeslangchain became bloated and unstable as we took a ‚Äúmaintain everything‚Äù approach to reduce breaking changes and deprecation notificationsHowever, starting today with the release of langchain 0.1.0, all future releases will follow a new versioning standard. Specifically:Any breaking changes to the public API will result in a minor version bump (the second digit)Any bug fixes or new features will result in a patch version bump (the third digit)We hope that this, combined with the previous architectural changes, will:Communicate clearly if breaking changes are made, allowing developers to update with confidenceGive us an avenue for officially deprecating and deleting old code, reducing bloatMore responsibly deal with integrations (whose SDKs are often changing as rapidly as LangChain)Even after we release a 0.2 version, we will commit to maintaining a branch of 0.1, but will only patch critical bug fixes. See more towards the end of this post on our plans for that.While
----------------------------------------------------------------------------------------------------
Document 2:

 + things that are top of mind for us are:Rewriting legacy chains in LCEL (with better streaming and debugging support)Adding new types of chainsAdding new types of agentsImproving our production ingestion capabilitiesRemoving old and unused functionalityImportantly, even though we are excited about removing some of the old and legacy code to make langchain slimmer and more focused, we also want to maintain support for people who are still using the old version. That is why we will maintain 0.1 as a stable branch (patching in critical bug fixes) for at least 3 months after 0.2 release. We plan to do this for every stable release from here on out.And if you've been wanting to get started contributing, there's never been a better time. We recently added a good getting started issue on GitHub if you're looking for a place to start.One More ThingA large part of LangChain v0.1.0 is stability and focus on the core areas outlined above. Now that we've identified the areas people love about LangChain, we can work on adding more advanced and complete tooling there.One of the main things people love about LangChain is it's support for agents. Most agents are largely defined as running an LLM in some sort of a loop. So far, the only way we've
----------------------------------------------------------------------------------------------------
Document 3:

 + LangChain v0.1.0




Release Notes

GitHub

Docs

Case Studies

Sign in
Subscribe
n v0.1.0

By LangChain
10 min read
Jan 8, 2024
----------------------------------------------------------------------------------------------------
Document 4:

 + only patch critical bug fixes. See more towards the end of this post on our plans for that.While re-architecting the package towards a path to a stable 0.1 release, we took the opportunity to talk to hundreds of developers about why they use LangChain and what they love about it. This input guided our direction and focus. We also used it as an opportunity to bring parity to the Python and JavaScript versions in the core areas outlined below. üí°While certain integrations and more tangential chains may be language specific, core abstractions and key functionality are implemented equally in both the Python and JavaScript packages.We want to share what we‚Äôve heard and our plan to continually improve LangChain. We hope that sharing these learnings will increase transparency into our thinking and decisions, allowing others to better use, understand, and contribute to LangChain. After all, a huge part of LangChain is our community ‚Äì both the user base and the 2000+ contributors ‚Äì and we want everyone to come along for the journey. Third Party IntegrationsOne of the things that people most love about LangChain is how easy we make it to get started building on any stack. We have almost 700 integrations, ranging from LLMs to vector
----------------------------------------------------------------------------------------------------
Document 5:

 + Today we‚Äôre excited to announce the release of langchain 0.1.0, our first stable version. It is fully backwards compatible, comes in both Python and JavaScript, and comes with improved focus through both functionality and documentation. A stable version of LangChain helps us earn developer trust and gives us the ability to evolve the library systematically and safely.Python GitHub DiscussionPython v0.1.0 GuidesJS v0.1.0 GuidesYouTube WalkthroughIntroductionLangChain has been around for a little over a year and has changed a lot as it‚Äôs grown to become the default framework for building LLM applications. As we previewed a month ago, we recently decided to make significant changes to the  LangChain package architecture in order to better organize the project and strengthen the foundation. Specifically we made two large architectural changes: separating out langchain-core and separating out partner packages (either into langchain-community or standalone partner packages) from langchain. As a reminder, langchain-core contains the main abstractions, interfaces, and core functionality. This code is stable and has been following a stricter versioning policy for a little over a month now.langchain itself, however, still remained on
----------------------------------------------------------------------------------------------------
Document 6:

 + for LangSmith has been overwhelming, and we‚Äôre investing a lot in scalability so that we can release a public beta and then make it generally available in the coming months. We are also already supporting an enterprise version, which comes with a within-VPC deployment for enterprises with strict data privacy policies.We‚Äôve also tackled observability in other ways. We‚Äôve long had built in verbose and debug modes for different levels of logging throughout the pipeline. We recently introduced methods to visualize the chain you created, as well as get all prompts used.ComposabilityWhile it‚Äôs helpful to have prebuilt chains to get started, we very often see teams breaking outside of those architectures and wanting to customize their chain - not only customize the prompt, but also customize different parts of the orchestration. üí°Over the past few months, we‚Äôve invested heavily in LangChain Expression Language (LCEL). This enables composition of arbitrary sequences, providing a lot of the same benefits as data orchestration tools do for data engineering pipelines (batching, parallelization, fallbacks). It also provides some benefits unique to LLM workloads - mainly LLM-specific observability (covered above), and streaming,
----------------------------------------------------------------------------------------------------
Document 7:

 + party integrations, which require breaking changes. These can now be reflected on an individual integration basis with proper versioning in the standalone integration package.ObservabilityBuilding LLM applications involves putting a non-deterministic component at the center of your system. These models can often output unexpected results, so having visibility into exactly what is happening in your system is integral. üí°We want to make langchain as observable and as debuggable as possible, whether through architectural decisions or tools we build on the side.We‚Äôve set about this in a few ways.The main way we‚Äôve tackled this is by building LangSmith. One of the main value props that LangSmith provides is a best-in-class debugging experience for your LLM application. We log exactly what steps are happening, what the inputs of each step are, what the outputs of each step are, how long each step takes, and more data. We display this in a user-friendly way, allowing you to identify which steps are taking the longest, enter a playground to debug unexpected LLM responses, track token usage and more. Even in private beta, the demand for LangSmith has been overwhelming, and we‚Äôre investing a lot in scalability so that we can
----------------------------------------------------------------------------------------------------
Document 8:

 + to get started building on any stack. We have almost 700 integrations, ranging from LLMs to vector stores to tools for agents to use. üí°LangChain is often used as the ‚Äúglue‚Äù to join all the different pieces you need to build an LLM app together, and so prioritizing a robust integration ecosystem is a priority for us.About a month ago, we started making some changes we think will improve the robustness, stability, scalability, and general developer experience around integrations. We split out ALL third party integrations into langchain-community ‚Äì this allows us to centralize integration-specific work. We have also begun to split out individual integrations into their own packages. So far we have done this for ~10 packages, including OpenAI, Google and Mistral. One benefit of this is better dependency management - previously, all dependencies were optional, leading to some headaches when trying to install specific versions. Now if integrations are in their own package, we can more strictly version their requirements, leading to easier installation. Another benefit is versioning. Oftentimes, there are changes to the third party integrations, which require breaking changes. These can now be reflected on an individual
----------------------------------------------------------------------------------------------------
Document 9:

 + unique to LLM workloads - mainly LLM-specific observability (covered above), and streaming, covered later in this post.The components for LCEL are in langchain-core. We‚Äôve started to create higher level entry points for specific chains in LangChain. These will gradually replace pre-existing (now ‚ÄúLegacy‚Äù) chains, because chains built with LCEL will get streaming, ease of customization, observability, batching, retries out-of-the-box. Our goal is to make this transition seamless. Previously you may have done:ConversationalRetrievalChain.from_llm(llm, ‚Ä¶)We want to simply make it:create_conversational_retrieval_chain(llm, ‚Ä¶)Under the hood, it will create a specific LCEL chain and return it. If you want to modify the logic - no problem! Because it‚Äôs all written in LCEL it‚Äôs easy to modify part of it without having to subclass anything or override any methods.There are a lot of chains in LangChain, and a lot of them are heavily used. We will not deprecate the legacy version of the chain until an alternative constructor function exists and has been used and well-tested.StreamingLLMs can sometimes take a while to respond. It is important to show the end user that work is being done instead of staring at a blank screen.
----------------------------------------------------------------------------------------------------
Document 10:

 + workflow = Graph()

workflow.add_node("agent", agent)
workflow.add_node("tools", execute_tools)

workflow.set_entry_point("agent")

workflow.add_conditional_edges(
    "agent",
    should_continue,
    {
        "continue": "tools",
        "exit": END
    }
)

workflow.add_edge('tools', 'agent')

chain = workflow.compile()We've been working on this for the past six months, beta-testing it with users. It currently powers OpenGPTs. We'll be adding a lot more examples and documentation over the next few weeks - we're really excited about this!Try it out here.ConclusionLangChain has evolved significantly along with the ecosystem. We are incredibly grateful to our community and users for pushing us and building with us. With this 0.1 release, we‚Äôve taken time to understand what you want and need in an LLM framework, and remain committed to building it. As the community‚Äôs needs evolve (or if we‚Äôre missing something), we want to hear your feedback, so we can address it. They say, ‚ÄúA journey of a thousand miles begins with a single step.‚Äù ‚Äì or in our case, version 0.1.


Tags
By LangChain


Join our newsletter
Updates from the LangChain team and community

docs = compression_pipeline.get_relevant_documents("What are the major changes in v 0.1.0?")
pretty_print_docs(docs)

###### RESPONSE ################
Document 1:

 + things that are top of mind for us are:Rewriting legacy chains in LCEL (with better streaming and debugging support)Adding new types of chainsAdding new types of agentsImproving our production ingestion capabilitiesRemoving old and unused functionalityImportantly, even though we are excited about removing some of the old and legacy code to make langchain slimmer and more focused, we also want to maintain support for people who are still using the old version. That is why we will maintain 0.1 as a stable branch (patching in critical bug fixes) for at least 3 months after 0.2 release. We plan to do this for every stable release from here on out.And if you've been wanting to get started contributing, there's never been a better time. We recently added a good getting started issue on GitHub if you're looking for a place to start.One More ThingA large part of LangChain v0.1.0 is stability and focus on the core areas outlined above. Now that we've identified the areas people love about LangChain, we can work on adding more advanced and complete tooling there.One of the main things people love about LangChain is it's support for agents. Most agents are largely defined as running an LLM in some sort of a loop. So far, the only way we've
----------------------------------------------------------------------------------------------------
Document 2:

 + Today we‚Äôre excited to announce the release of langchain 0.1.0, our first stable version. It is fully backwards compatible, comes in both Python and JavaScript, and comes with improved focus through both functionality and documentation. A stable version of LangChain helps us earn developer trust and gives us the ability to evolve the library systematically and safely.Python GitHub DiscussionPython v0.1.0 GuidesJS v0.1.0 GuidesYouTube WalkthroughIntroductionLangChain has been around for a little over a year and has changed a lot as it‚Äôs grown to become the default framework for building LLM applications. As we previewed a month ago, we recently decided to make significant changes to the  LangChain package architecture in order to better organize the project and strengthen the foundation. Specifically we made two large architectural changes: separating out langchain-core and separating out partner packages (either into langchain-community or standalone partner packages) from langchain. As a reminder, langchain-core contains the main abstractions, interfaces, and core functionality. This code is stable and has been following a stricter versioning policy for a little over a month now.langchain itself, however, still remained on
----------------------------------------------------------------------------------------------------
Document 3:

 + versioning policy for a little over a month now.langchain itself, however, still remained on 0.0.x versions. Having all releases on minor version 0 created a few challenges:Users couldn‚Äôt be confident that updating would not have breaking changeslangchain became bloated and unstable as we took a ‚Äúmaintain everything‚Äù approach to reduce breaking changes and deprecation notificationsHowever, starting today with the release of langchain 0.1.0, all future releases will follow a new versioning standard. Specifically:Any breaking changes to the public API will result in a minor version bump (the second digit)Any bug fixes or new features will result in a patch version bump (the third digit)We hope that this, combined with the previous architectural changes, will:Communicate clearly if breaking changes are made, allowing developers to update with confidenceGive us an avenue for officially deprecating and deleting old code, reducing bloatMore responsibly deal with integrations (whose SDKs are often changing as rapidly as LangChain)Even after we release a 0.2 version, we will commit to maintaining a branch of 0.1, but will only patch critical bug fixes. See more towards the end of this post on our plans for that.While

Define a naive RAG

from langchain.chains import RetrievalQA
#
qa = RetrievalQA.from_chain_type(llm=openai_llm,
                                 chain_type="stuff",
                                 retriever=vectorstore.as_retriever(search_kwargs={"k":5}),
                                 return_source_documents=True)

naive_response = qa("What are the major changes in v 0.1.0?")

###### RESPONSE ##############
 The major changes in v 0.1.0 include a new versioning standard, improved stability and focus on core areas, and the separation of langchain-core and partner packages

Define an Advanced RAG

from langchain.chains import RetrievalQA
#
qa_advanced = RetrievalQA.from_chain_type(llm=openai_llm,
                                 chain_type="stuff",
                                 retriever=compression_pipeline,
                                 return_source_documents=True)
#
qa_adv_response = qa_advanced("What are the major changes in v 0.1.0?")  
qa_adv_response["result"]

#########RESPONSE ####################
 The major changes in v 0.1.0 include separating out langchain-core and partner packages, implementing a new versioning standard, and committing to maintaining a branch of 0.1 for critical

Evaluating Naive RAG and Advanced RAG using RAGAS evaluation Framework

Synthetic Test Set Generation

We can leverage Ragas’ Synthetic Test Data generation functionality to generate our own synthetic QC pairs - as well as a synthetic ground truth - quite easily!

from ragas.testset.generator import TestsetGenerator
from ragas.testset.evolutions import simple, reasoning, multi_context
#
#load documents again to avoid any kind of bias
documents = loader.load()
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1000,
    chunk_overlap = 200
)
documents = text_splitter.split_documents(documents)
len(documents)
#
#
generator = TestsetGenerator.with_openai()
#
testset = generator.generate_with_langchain_docs(documents, test_size=10, distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25})

testset.test_data[0]

###### Response
address them?', contexts=['we made two large architectural changes: separating out langchain-core and separating out partner packages (either into langchain-community or standalone partner packages) from langchain.\xa0As a reminder, langchain-core contains the main abstractions, interfaces, and core functionality. This code is stable and has been following a stricter versioning policy for a little over a month now.langchain itself, however, still remained on 0.0.x versions. Having all releases on minor version 0 created a few challenges:Users couldn‚Äôt be confident that updating would not have breaking changeslangchain became bloated and unstable as we took a ‚Äúmaintain everything‚Äù approach to reduce breaking changes and deprecation notificationsHowever, starting today with the release of langchain 0.1.0, all future releases will follow a new versioning standard. Specifically:Any breaking changes to the public API will result in a minor version bump (the second digit)Any bug fixes or new features will result'], ground_truth="The challenges faced with the previous versioning policy of langchain were that users couldn't be confident that updating would not have breaking changes and langchain became bloated and unstable. The new versioning standard will address these challenges by ensuring that any breaking changes to the public API will result in a minor version bump and any bug fixes or new features will result in a patch version bump.", evolution_type='simple')

Generating Responses with RAG Pipeline

Now that we have some QC pairs, and some ground truths, let’s evaluate our RAG pipeline using Ragas. The process is, again, quite straightforward — thanks to Ragas and LangChain! Let’s start by extracting our questions and ground truths from our create test set. We can start by converting our test dataset into a Pandas DataFrame.

test_df = testset.to_pandas()
test_questions = test_df["question"].values.tolist()
test_groundtruths = test_df["ground_truth"].values.tolist()
test_df.head()

Generate responses using our Naive RAG pipeline using the questions we’ve generated.

answers = []
contexts = []

for question in test_questions:
  response = qa.invoke({"query" : question})
  answers.append(response["result"])
  contexts.append([context.page_content for context in response['source_documents']])

Wrap the information in a Hugging Face dataset for use in the Ragas library.

from datasets import Dataset

response_dataset = Dataset.from_dict({
    "question" : test_questions,
    "answer" : answers,
    "contexts" : contexts,
    "ground_truth" : test_groundtruths
})
response_dataset[0]

####### RESPONSE ##########
{'question': 'What were the challenges faced with the previous versioning policy of langchain and how will the new versioning standard address them?',
 'answer': ' The previous versioning policy of langchain, where all releases were on minor version 0, created challenges such as users not being confident in updating due to potential breaking changes and the code becoming bloated and unstable. The new versioning standard, where any breaking changes result in a minor version bump and bug fixes or new features result in a patch version bump, will address these challenges by clearly communicating breaking changes, allowing for official deprecation and deletion of old code, and more responsibly dealing with integrations.',
 'contexts': ['versioning policy for a little over a month now.langchain itself, however, still remained on 0.0.x versions. Having all releases on minor version 0 created a few challenges:Users couldn‚Äôt be confident that updating would not have breaking changeslangchain became bloated and unstable as we took a ‚Äúmaintain everything‚Äù approach to reduce breaking changes and deprecation notificationsHowever, starting today with the release of langchain 0.1.0, all future releases will follow a new versioning standard. Specifically:Any breaking changes to the public API will result in a minor version bump (the second digit)Any bug fixes or new features will result in a patch version bump (the third digit)We hope that this, combined with the previous architectural changes, will:Communicate clearly if breaking changes are made, allowing developers to update with confidenceGive us an avenue for officially deprecating and deleting old code, reducing bloatMore responsibly deal with integrations (whose SDKs are often changing as rapidly as LangChain)Even after we release a 0.2 version, we will commit to maintaining a branch of 0.1, but will only patch critical bug fixes. See more towards the end of this post on our plans for that.While',
  "things that are top of mind for us are:Rewriting legacy chains in LCEL (with better streaming and debugging support)Adding new types of chainsAdding new types of agentsImproving our production ingestion capabilitiesRemoving old and unused functionalityImportantly, even though we are excited about removing some of the old and legacy code to make langchain slimmer and more focused, we also want to maintain support for people who are still using the old version. That is why we will maintain 0.1 as a stable branch (patching in critical bug fixes) for at least 3 months after 0.2 release. We plan to do this for every stable release from here on out.And if you've been wanting to get started contributing, there's never been a better time. We recently added a good getting started issue on GitHub if you're looking for a place to start.One More ThingA large part of LangChain v0.1.0 is stability and focus on the core areas outlined above. Now that we've identified the areas people love about LangChain, we can work on adding more advanced and complete tooling there.One of the main things people love about LangChain is it's support for agents. Most agents are largely defined as running an LLM in some sort of a loop. So far, the only way we've",
  'Today we‚Äôre excited to announce the release of langchain 0.1.0, our first stable version. It is fully backwards compatible, comes in both Python and JavaScript, and comes with improved focus through both functionality and documentation. A stable version of LangChain helps us earn developer trust and gives us the ability to evolve the library systematically and safely.Python GitHub DiscussionPython v0.1.0 GuidesJS v0.1.0 GuidesYouTube WalkthroughIntroductionLangChain has been around for a little over a year and has changed a lot as it‚Äôs grown to become the default framework for building LLM applications. As we previewed a month ago, we recently decided to make significant changes to the\xa0 LangChain package architecture in order to better organize the project and strengthen the foundation.\xa0Specifically we made two large architectural changes: separating out langchain-core and separating out partner packages (either into langchain-community or standalone partner packages) from langchain.\xa0As a reminder, langchain-core contains the main abstractions, interfaces, and core functionality. This code is stable and has been following a stricter versioning policy for a little over a month now.langchain itself, however, still remained on',
  'only patch critical bug fixes. See more towards the end of this post on our plans for that.While re-architecting the package towards a path to a stable 0.1 release, we took the opportunity to talk to hundreds of developers about why they use LangChain and what they love about it. This input guided our direction and focus. We also used it as an opportunity to bring parity to the Python and JavaScript versions in the core areas outlined below. \uf8ffüí°While certain integrations and more tangential chains may be language specific, core abstractions and key functionality are implemented equally in both the Python and JavaScript packages.We want to share what we‚Äôve heard and our plan to continually improve LangChain. We hope that sharing these learnings will increase transparency into our thinking and decisions, allowing others to better use, understand, and contribute to LangChain. After all, a huge part of LangChain is our community ‚Äì both the user base and the 2000+ contributors ‚Äì and we want everyone to come along for the journey.\xa0Third Party IntegrationsOne of the things that people most love about LangChain is how easy we make it to get started building on any stack. We have almost 700 integrations, ranging from LLMs to vector',
  'LangChain v0.1.0\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nSkip to content\n\n\n\n\n\n\n\n\n                LangChain Blog\n        \n\n\n\n\n\n\nHome\n\n\n\n\nBy LangChain\n\n\n\n\nRelease Notes\n\n\n\n\nGitHub\n\n\n\n\nDocs\n\n\n\n\nCase Studies\n\n\n\n\n\nSign in\nSubscribe\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nLangChain v0.1.0\n\nBy LangChain\n10 min read\nJan 8, 2024'],
 'ground_truth': "The challenges faced with the previous versioning policy of langchain were that users couldn't be confident that updating would not have breaking changes and langchain became bloated and unstable. The new versioning standard will address these challenges by ensuring that any breaking changes to the public API will result in a minor version bump and any bug fixes or new features will result in a patch version bump."}

Evaluating with RAGAS

from ragas import evaluate
from ragas.metrics import (
    faithfulness,
    answer_relevancy,
    answer_correctness,
    context_recall,
    context_precision,
)

metrics = [
    faithfulness,
    answer_relevancy,
    context_recall,
    context_precision,
    answer_correctness,
]
#
naive_results = evaluate(response_dataset, metrics,raise_exceptions=False)
naive_results
####### RESPOSNE
{'faithfulness': 0.8740, 'answer_relevancy': 0.9579, 'context_recall': 0.7599, 'context_precision': 0.7917, 'answer_correctness': 0.7307}

Generate responses using our Advanced RAG pipeline using the questions we’ve generated.

adv_answers = []
adv_contexts = []

for question in test_questions:
  response = qa_advanced.invoke({"query" : question})
  adv_answers.append(response["result"])
  adv_contexts.append([context.page_content for context in response['source_documents']])

#wrap into huggingface dataset
response_dataset_advanced_retrieval = Dataset.from_dict({
    "question" : test_questions,
    "answer" : adv_answers,
    "contexts" : adv_contexts,
    "ground_truth" : test_groundtruths
})
response_dataset_advanced_retrieval[0]

###########RESPONSE
{'question': 'What were the challenges faced with the previous versioning policy of langchain and how will the new versioning standard address them?',
 'answer': ' The challenges faced with the previous versioning policy of langchain were that users couldn\'t be confident in updates not having breaking changes, and the framework became bloated and unstable due to a "maintain everything" approach. The new versioning standard will address these challenges by clearly communicating any breaking changes, providing an avenue for deprecating and deleting old code, and more responsibly dealing with integrations.',
 'contexts': ['versioning policy for a little over a month now.langchain itself, however, still remained on 0.0.x versions. Having all releases on minor version 0 created a few challenges:Users couldn‚Äôt be confident that updating would not have breaking changeslangchain became bloated and unstable as we took a ‚Äúmaintain everything‚Äù approach to reduce breaking changes and deprecation notificationsHowever, starting today with the release of langchain 0.1.0, all future releases will follow a new versioning standard. Specifically:Any breaking changes to the public API will result in a minor version bump (the second digit)Any bug fixes or new features will result in a patch version bump (the third digit)We hope that this, combined with the previous architectural changes, will:Communicate clearly if breaking changes are made, allowing developers to update with confidenceGive us an avenue for officially deprecating and deleting old code, reducing bloatMore responsibly deal with integrations (whose SDKs are often changing as rapidly as LangChain)Even after we release a 0.2 version, we will commit to maintaining a branch of 0.1, but will only patch critical bug fixes. See more towards the end of this post on our plans for that.While',
  'Today we‚Äôre excited to announce the release of langchain 0.1.0, our first stable version. It is fully backwards compatible, comes in both Python and JavaScript, and comes with improved focus through both functionality and documentation. A stable version of LangChain helps us earn developer trust and gives us the ability to evolve the library systematically and safely.Python GitHub DiscussionPython v0.1.0 GuidesJS v0.1.0 GuidesYouTube WalkthroughIntroductionLangChain has been around for a little over a year and has changed a lot as it‚Äôs grown to become the default framework for building LLM applications. As we previewed a month ago, we recently decided to make significant changes to the\xa0 LangChain package architecture in order to better organize the project and strengthen the foundation.\xa0Specifically we made two large architectural changes: separating out langchain-core and separating out partner packages (either into langchain-community or standalone partner packages) from langchain.\xa0As a reminder, langchain-core contains the main abstractions, interfaces, and core functionality. This code is stable and has been following a stricter versioning policy for a little over a month now.langchain itself, however, still remained on',
  'to get started building on any stack. We have almost 700 integrations, ranging from LLMs to vector stores to tools for agents to use. \uf8ffüí°LangChain is often used as the ‚Äúglue‚Äù to join all the different pieces you need to build an LLM app together, and so prioritizing a robust integration ecosystem is a priority for us.About a month ago, we started making some changes we think will improve the robustness, stability, scalability, and general developer experience around integrations. We split out ALL third party integrations into langchain-community ‚Äì this allows us to centralize integration-specific work. We have also begun to split out individual integrations into their own packages. So far we have done this for ~10 packages, including OpenAI, Google and Mistral. One benefit of this is better dependency management - previously, all dependencies were optional, leading to some headaches when trying to install specific versions. Now if integrations are in their own package, we can more strictly version their requirements, leading to easier installation. Another benefit is versioning. Oftentimes, there are changes to the third party integrations, which require breaking changes. These can now be reflected on an individual'],
 'ground_truth': "The challenges faced with the previous versioning policy of langchain were that users couldn't be confident that updating would not have breaking changes and langchain became bloated and unstable. The new versioning standard will address these challenges by ensuring that any breaking changes to the public API will result in a minor version bump and any bug fixes or new features will result in a patch version bump."}

advanced_retrieval_results = evaluate(response_dataset_advanced_retrieval, metrics,raise_exceptions=False)
advanced_retrieval_results

#### RESPONSE
{'faithfulness': 1.0000, 'answer_relevancy': 0.9301, 'context_recall': 0.8000, 'context_precision': 0.8000, 'answer_correctness': 0.5721}

Compare the evaluations

import pandas as pd

df_original = pd.DataFrame(list(naive_results.items()), columns=['Metric', 'Baseline'])
df_comparison = pd.DataFrame(list(advanced_retrieval_results.items()), columns=['Metric', 'Contextual Compresssion with Document Stuffing'])

df_merged = pd.merge(df_original, df_comparison, on='Metric')

df_merged['Delta'] = df_merged['Contextual Compresssion with Document Stuffing'] - df_merged['Baseline']

df_merged

Conclusion

We can see that our faithfulness has improved — as well as our context recall and context precision — but we lost a significant amount of answer correctness.
We would need to do some more experimentation to determine how to improve our pipeline!

References

Install | Ragas

GitHub - explodinggradients/ragas: Evaluation framework for your Retrieval Augmented Generation…

LangChain v0.1.0