Langchain
Imagine you're building an application that uses different Large Language Models (LLMs) for processing both audio and text data individually. With Langchain – which is an open-source orchestration framework (available in Python and JavaScript) – you can achieve this seamlessly. Langchain offers a unified interface for any LLM, enabling you to develop and integrate your LLM applications within a centralized development environment. It was introduced as an open-source project by Harrison Chase in October 2022 while he was working at the machine learning startup, Robust Intelligence. In short, Langchain streamlines the development of your LLM applications through abstraction (the process of hiding the complexity of working and shifting the focus on things of greater importance).
In our pipeline, we may have various components such as an extractor, text-splitter, embedding model, vector store, and LLM. Langchain enables us to integrate all these components into a single entity called a chain, analogous to linking multiple train coaches. However, unlike a linear train, these chains are typically non-linear.
Let's begin with the LLM. Langchain can accommodate nearly any LLM with just an API key or token. The LLM class within Langchain offers a standard interface compatible with most LLM models, whether closed or open-source. Moreover, Langchain provides Prompt Templates, which formalize prompt compositions without manual coding of context and queries. Chains serve as the backbone of Langchain's workflow, allowing the combination of LLMs with other components to create specific sequences of functions, effectively forming entire applications. For tasks requiring external data (not included in the training data), Langchain has Loaders like the "Document Loader," allowing text uploads in various formats such as CSV and HTML. Moreover, Langchain utilizes Vector Stores like Chroma and Pinecone, as well as text splitters such as Recursive Character Text Splitter and Semantic Chunker. This paragraph was a very brief overview of the components discussed in previous modules.
By default, LLMs lack memory of past conversations unless explicitly provided in the input. However, LangChain addresses this limitation by having conversational memories in your application. These memories can be customized to remember the entire conversation or some parts, condensed into summaries. Additionally, Agents play a crucial role in using language models as reasoning engines to make decisions and take actions. When building a chain for an agent, inputs such as the list of available tools, usage input, and other relevant steps are included.
You can develop applications for various purposes, such as:
Building chatbots within your application.
Summarizing lengthy emails and documents.
Creating tools for answering questions.
Developing virtual agents.
In the next module, we'll work on a project that takes a PDF file as the data source and uses recursive character text splitter (to create chunks) along with fast embeddings to generate numerical embeddings, which will be stored in the ChromaDB vector store (and retrieved using a retriever). We'll then use an open-source language model with the help of HuggingfaceHub. Additionally, we'll define the Prompt Template and then create a chain using the LangChain Expression Language (LCEL), which allows for seamless creation of complex chains with features like fallbacks or streaming.