BeyondLLM 0.2.1 Release: Observability with RAG

Jun 6, 2024

In this article, we will discuss the latest release of BeyondLLM 0.2.1.

What is BeyondLLM?

BeyondLLM offers a user-friendly library that prioritizes flexibility for fellow Data Scientists. When building an application integrating RAG, selecting the most effective retriever and efficient chunking is crucial. We provide a source and auto_retriever that facilitates the implementation of advanced retrieval techniques, which are essential for reducing hallucinations — a key challenge in enterprise adoption of LLMs in their products.

BeyondLLM not only simplifies the construction of complex RAG pipelines with minimal coding but also enhances the evaluation process with comprehensive benchmarks like Context Relevance, Answer Relevance, Groundedness, and Ground Truth. These metrics assess everything from the retriever’s ability to fetch relevant information to the LLMs response accuracy and factual truth, all streamlined within the framework which also automates quick experimentation.

Key Highlights of the Latest Release:

Observability Support for GPT Models: Enhanced monitoring and troubleshooting for GPT models, allowing for deeper insights and improved performance tracking.
Support for GPT-4-o: Advanced language processing with improved accuracy and performance, making it ideal for complex text analysis and conversational agents.
Low Latency Inference with Groq LLM Model: Optimized for real-time processing and quick responses, perfect for interactive chatbots and live data analysis.
Streamlit Application for Ingesting and Inferencing: A user-friendly interface that simplifies data and model management, streamlining workflows for developers and data scientists.
Comprehensive Cookbook for Language Translation using GPT-4o: Detailed instructions and practical examples for implementing translation models, helping you create multilingual applications with ease.

Explore the Cookbook.

For more details, keep scrolling ⬇️!

What’s New in BeyondLLM 0.2.1 🤔

Observability Support for GPT Models

Observability is essential in optimizing the performance of large language models (LLMs). It allows for deeper insights into the internal workings of GPT models, enhancing monitoring, troubleshooting, and optimization. This results in better performance tracking and an improved user experience.

How to Implement Observability

Integrating observability into your LLM applications is straightforward with BeyondLLM. Here’s how you can do it:

Set Up the Environment:

Ensure you have your OpenAI API key ready.

from beyondllm import source, retrieve, generator, llms, embeddings
from beyondllm import observe

import os
os.environ['API_KEY'] = '******'

Initialize Observability:

Start the observability component to monitor your LLM’s performance. When you run the observer, it will provide you a localhost URL, that redirects to a dashboard the helps you track things like response time, token usage and the kind of api call (embedding, llm, etc).

Observe = observe.Observer()Observe.run()

Set Up LLM and Embeddings:

Configure your LLM and embedding models. Currently observer can observe and track only closed source models such as GPT.

llm = llms.ChatOpenAIModel()
embed_model = embeddings.OpenAIEmbeddings()

Load Data and Create Retriever:

Fit your data source and create a retriever for querying the LLM. BeyondLLM offers various retriever types including Normal Retriever, Flag Embedding Reranker Retriever, Cross Encoder Reranker Retriever, and Hybrid Retriever, allowing efficient retrieval of relevant information based on user queries and data characteristics. In this case, we are using Normal Retriever.

data = source.fit("YOUTUBE_LINK", dtype="youtube", chunk_size=512, chunk_overlap=50)
retriever = retrieve.auto_retriever(data, embed_model, type="normal", top_k=4)

Generate Responses

Use the generator component to create responses from the retrieved data.

pipeline = generator.Generate(question="what tool is video mentioning about?", retriever=retriever, llm=llm)
pipeline = generator.Generate(question="What is the tool used for?", retriever=retriever, llm=llm)
pipeline = generator.Generate(question="How can I use the tool for my own use?", retriever=retriever, llm=llm)

Now you can open the dashboard and track the performance of RAG application:

For more details, check out the cookbook.

Low Latency Inference with Groq LLM Model

Groq’s low latency inference optimizes real-time processing and quick responses, making it ideal for interactive applications like chatbots and live data analysis.

How to Use Groq LLM Model with BeyondLLM

Import Libraries:

import os
from getpass import getpass
from beyondllm.llms import GroqModel

Set Groq API Key (Replace with your actual key):

Get your Groq API key from here: http://console.groq.com/

os.environ['GROQ_API_KEY'] = getpass("Groq API key:")

Define System Prompt:

This prompt instructs the Groq model on its role and goals:

system_prompt = "I am a large language model assisting with text processing tasks."

Prepare Your Text Data:

Groq can process text from various sources like strings or files. Ensure your text data is in a format compatible with Groq’s input requirements.

text_data = "This is some text to be processed by Groq."
# Example: Text from a file (replace 'your_file.txt' with your actual file)
with open("your_file.txt", 'r', encoding='utf-8') as f:
text_data = f.read()

Define Prompt for Transcription:

prompt = "Summarize the following text: " + text_data

Create GroqModel and Get Transcription:

chat = GroqModel(system_prompt=system_prompt)
result = chat.predict(prompt)
print(result)

Refer to beyondllm’s documentation for comprehensive details on Groq’s capabilities and prompt formatting.

Why BeyondLLM?

BeyondLLM continues to offer a streamlined approach to developing RAG (Retrieval-Augmented Generation) and LLM applications, ensuring ease of use and powerful capabilities with minimal coding. Here’s a quick overview of what makes BeyondLLM a valuable tool:

Effortless Model Building: Construct robust RAG systems with just 5–7 lines of code. BeyondLLM simplifies the integration of various components and the management of associated hyperparameters, allowing you to focus on your application’s core functionality.

Comprehensive Evaluation Metrics: BeyondLLM supports various evaluation metrics, including Hit rate and Mean Reciprocal Rank (MRR) for embeddings, as well as multiple criteria for assessing LLMs. These metrics help you select the best models and configurations for your specific needs.

Advanced Techniques to Reduce Hallucinations: BeyondLLM incorporates techniques to minimize or eliminate hallucinations in LLM outputs. The framework includes advanced RAG features such as markdown splitting, chunking strategies, re-ranking with cross-encoders, and hybrid search, enhancing the reliability and accuracy of your applications.

Versatile Use Cases: BeyondLLM is suitable for a wide range of applications, including customer service bots, document search, multilingual support, and more. Its flexibility makes it an excellent choice for developers across various industries.

Community Driven: As an open-source project, BeyondLLM encourages collaboration and continuous improvement. The community’s contributions help ensure the framework remains at the cutting edge of AI technology.

Get Started with BeyondLLM 0.2.1

Explore BeyondLLM 0.2.1 and leverage its new features to enhance your AI projects. For more details, visit our documentation and check out Quickstart notebook: Google Colab

We welcome your feedback and contributions. Feel free to open issues or pull requests on our: GitHub - aiplanethub/beyondllm: Build, evaluate and observe LLM apps

By collaborating with the community, we can continue to improve BeyondLLM and drive innovation in the AI field.

You can reach out to us on the Discord: Join the AI Planet Discord Server!

Don’t forget to ⭐️ and fork the repository to stay updated with the latest developments.