GenAI Stack Version 0.2 is here with LLM Caching, Rest APIs, VectorDB Memory and more

3 min read

In this article, we will discuss the latest release stacked in GenAI Stack.

Key highlights of the latest release: 

  • Integrating LLM Cache (#56) with VectorDatabases to reduce the number of repeated queries done by the user, thereby saving significant costs and enhancing response time on LLM requests. 
  • Setup GenAI Stack installation with pypi and poetry setup (#78). You can now install GenAI Stack directly using pypi. 
  • Building a server RestAPIs(#59) of GenAI Stack for developers. With the GenAI Stack RestAPI’s all the required components and stack initialization are handled internally. 
  • GenAI Stack now supports VectorDBMemory (#92), which stores memory conversation inside VectorDB. 

For more details, keep scrolling ⬇️!

Introduced new component: LLM Cache

With the growing popularity of the large language model (LLM), the LLM Caching takes center stage to improve performance. The LLM Cache is essential because it optimizes language model performance by reducing query load and enhancing response time through efficient storage and retrieval of data, resulting in cost savings and improved user experience. 

This component plays a pivotal role in reducing the number of queries made to the LLM. 

How to use LLM Cache?

Cache requires a vector database to store the LLM queries. Currently we have support for Weaviate and ChromaDB. In-order to use the cache, you have to provide the vector database component to the stack. The cache component is dependent on other components and it is not used alone.

from genai_stack.vectordb import ChromaDB
 from genai_stack.llm_cache import LLMCache

 chromadb = ChromaDB.from_kwargs(
 llm_cache = LLMCache.from_kwargs()
 """ add llm_cache along with other components in the Stack. Look into documentation for the example"""

Reference: Example Code usage- LLM Cache documentation

GenAI Stack now supports REST APIs 

From the Initial version to the 0.2.2 version release of GenAI Stack, to build an app or use case we had to manually import and initialize each component with specific configurations and then initialize the stack by passing the components instance. In order to create a new app or use case, you have to create a completely new stack or change existing stack components configurations (mainly VectorDB and Memory). 

With the GenAI Stack RestAPI’s all the required components and stack initialization are handled internally, you just have to create a stack_config.json which contains all the components configurations and then you can run the server and start calling endpoints. 

When you run a GenAI Stack Rest API Server, it internally sets up the sqlite database and persists some of the important and necessary configurations that isolate the apps or use case.

For example we have an endpoint called session, this endpoint internally creates a collection for VectorDB, Memory and LLM Cache, these collections isolate the context and conversations of the apps. 

A sample usage of stack_config.json:

    "components": {
        "vectordb": {
            "name": "weaviate_db",
            "config": {
                "url": "http://localhost:8080/",
                "index_name": "Testing",
                "text_key": "test",
                "attributes": ["page", "path"]
        "memory": {
            "name": "langchain",
            "config": {}
        "llm_cache": {
            "name": "cache",
            "config": {}
        "model": {
            "name": "gpt3.5",
            "config": {
                "parameters": {
                    "openai_api_key": "your_api_key_here"
        "embedding": {
            "name": "langchain",
            "config": {
                "name": "HuggingFaceEmbeddings",
                "fields": {
                    "model_name": "sentence-transformers/all-mpnet-base-v2",
                    "model_kwargs": { "device": "cpu" },
                    "encode_kwargs": { "normalize_embeddings": false }
        "prompt_engine": {
            "name": "engine",
            "config": {
                "should_validate": true
        "retriever": {
            "name": "langchain",
            "config": {}

Read More

Setup Server :

API Endpoint Reference :

Memory integration in VectorDB #92

In our latest release, we now support a memory for the Vector databases. VectorDBMemory supports both ChromaDB and Weaviate, which one is used to store the conversations totally depends on the vector database that is initialized and passed to the Stack for storing the documents.

Why is this important?

VectorDBMemory stores memories in a VectorDB and queries the top-K most salient docs every time it is called. This approach helps in managing long-term memory in Large Language Model applications. 

It provides a way to persist and retrieve relevant documents from a vector store database, which can be useful for maintaining conversation history or other types of memory in an LLM application.

How to use VectorDBMemory?

from genai_stack.vectordb import ChromaDB, Weaviate
from genai_stack.memory import VectorDBMemory

vectordb = ChromaDB.from_kwargs()
vectordb = Weaviate.from_kwargs(
    url="http://localhost:8080/", index_name="Testing", text_key="test"

memory = VectorDBMemory.from_kwargs(index_name = "Conversations")

stack = Stack(


GenAI Stack being an Open Source project, we strive for continuous improvements, enhancements in features and bug fixes. The best part of the open-source project to directly be involved with the community. 

Langchain is a framework to work with Large Language models which eventually is a sub-field of Deep learning. We personally feel GenAI Stack is focused on both Machine Learning and development, making it comfortable for the beginners to contribute to open source. 

What’s next

Check out Getting Started with GenAI Stack demo notebook, and begin building amazing chatbots with GenAI Stack.

You can reach out to us on Discord community forum:

You can follow the project on GitHub. Don’t forget to give us a ⭐️ while you are there!

Leave a Reply

Your email address will not be published.

[mc4wp_form id="491"]