Earn 5 XP


Evaluation of LLMs

Evaluating and ensuring the reliability of Language Model (LLM) applications built using LangChain is a crucial aspect of the development process. These evaluations are essential to guarantee that your application provides consistent and useful results across a wide range of inputs, while also ensuring compatibility with other software components within your application. To achieve this, LangChain offers a suite of evaluators and tools for testing and assessing your LLM applications. Let's delve deeper into these evaluators and their importance:

1. String Evaluators: These evaluators focus on assessing the quality of the predicted strings generated by your LLM in response to a given input. They typically involve comparing the generated string against a reference or expected string. This evaluation type is valuable for tasks like text generation, completion, or summarization. By comparing the model's output with a ground truth reference, you can quantify the accuracy and quality of the generated text.

2. Trajectory Evaluators: Trajectory evaluators come into play when your LLM application involves a sequence of actions or decisions, such as in chatbot conversations or gaming environments. These evaluators assess the entire trajectory of agent actions over time. They help ensure that your LLM consistently makes coherent and meaningful decisions throughout a conversation or interaction. This is crucial for maintaining a seamless user experience.

3. Comparison Evaluators: Comparison evaluators are designed to compare the predictions or outcomes generated by two different runs of your LLM on the same input. This type of evaluation is valuable when you want to compare the performance of different models, configurations, or versions of your LLM. It allows you to determine which variant produces more desirable results and make informed decisions about model selection.

Vector DB aka Vector Database

Problem:

While working with Word embeddings, the problem is that for thousands of vectors each hundreds of members long, these calculations could take a while. If you have millions of vectors it would take a very long time indeed, which is inconvenient for most applications. The second problem is case-retrieval. Matching one case to another was surprisingly difficult — often requiring strange fuzzy matching techniques.

About:

The vector database is a new type of database that is becoming popular in the world of ML and AI. Vector databases are different from traditional relational databases, like PostgreSQL, which was originally designed to store tabular data in rows and columns. They’re also decidedly different from newer NoSQL databases, such as MongoDB, which store data in JSON documents. That’s because a vector database is designed for storing and retrieving one specific type of data: vector embeddings.

  • The vector database computes a vector embedding for each data object as it is inserted or updated into the database, using a given model.
  • The embeddings are placed into an index, so that the database can quickly perform searches.
  • For each query:
    • A vector embedding is computed using the same model that was used for the data objects.
    • Using a special algorithm, the database find the closest vectors to the given vector computed for the query.

How do we store data in it?

The vector embedding is inserted into the vector database, with some reference to the original content the embedding was created from. When the application issues a query, we use the same embedding model to create embeddings for the query, and use those embeddings to query the database for similar vector embeddings.

  • A vector database is a type of database that stores data as high-dimensional vectors, which are mathematical representations of features or attributes.
  • Each vector has a certain number of dimensions, which can range from tens to thousands, depending on the complexity and granularity of the data.
  • The vectors are usually generated by applying some kind of transformation or embedding function to the raw data, such as text, images, audio, video, and others.
  • The embedding function can be based on various methods, such as machine learning models, word embeddings, feature extraction algorithms.

With the rise of Large Language models, the need of Vector Database has been in the demand. The most popular Open Source VectorDB used in the industry are:


In this module notebooks, you will implement Evaluation of Langchain, understand VectorDB and also work on a project: Build ChatBot like ChatGPT on own data

Reference