Earn 5 XP


Auto-regressive Language models

An autoregressive language model is like a smart storyteller. It predicts the next word in a sentence based on the words that have come before it. It's as if the model is trying to guess what word should come next to make the sentence sound right.

For instance, let's say we have the sentence: "The sun is shining ___________."

An autoregressive language model would look at the words "The sun is shining" and predict that the next word could be "brightly." It considers the context provided by the previous words to make an educated guess about the next word.

One of the most famous auto-regressive language models is the GPT (Generative Pre-trained Transformer)

Auto-encoding Language models

An autoencoding language model is like a painter trying to recreate an image. It takes an input, converts it into a simplified form, and then tries to reconstruct the original input from that simplified version.

Imagine you have a picture of a cat. The autoencoding model first compresses the image into a smaller set of numbers that represent the essential features of the cat. Then, it tries to recreate the original picture using only those numbers. The model learns to capture the most important details about the cat in those numbers.

The autoencoder learns to compress the input sentence into a lower-dimensional representation in the latent space and then reconstructs the sentence from this representation. The goal during training is to minimize the difference between the original input sentence and the reconstructed output sentence.

Encoder-Decoder architecture is an example of Auto-Encoding language models

Zero Shot learning

zero
Source: Zero-Shot Next-Item Recommendation using Large Pretrained Language Models Paper

Zero-shot learning refers to the ability of a model to perform a task or make predictions about data it has never seen before, without any specific training examples for that task. In other words, the model can generalize its understanding to new tasks based on the knowledge it has learned during training. This is done by providing the model with a description or a few examples of the task, and it can use its existing knowledge to make predictions.

Example: Let's say you have a language model trained to understand and generate text. During training, it learns about various topics like animals, food, and places. Now, if you give the model a prompt like "Describe a quokka," even if it has never encountered the word "quokka" before, it can still generate a coherent description based on its general knowledge about animals and their characteristics.

Few Shot learning

few_short
Source: KDnuggets

Few-shot learning is a step beyond zero-shot learning. It involves training a model to perform a new task using only a small number of examples or examples from related tasks. The model leverages these limited examples to quickly adapt and make accurate predictions for the new task. Few-shot learning is particularly useful when there isn't an extensive dataset available for a specific task.

Example: Let's take the same language model that we discussed earlier. For few-shot learning, you might provide the model with a few examples of movie summaries and ask it to generate a brief summary for a new movie it hasn't encountered before. Even if it has only seen a handful of movie summaries, it can still use the patterns it has learned from those examples to generate a coherent summary for the new movie.

In a nutshell, zero-shot learning enables models to perform tasks they've never seen before, while few-shot learning takes it a step further by allowing the model to adapt to new tasks using a small amount of relevant training data.

Open Source LLMs

Open source large language models (LLMs) are built using a combination of learning parameters, large datasets, and specific training techniques. Let's break down these components and how they work in the context of open source LLMs:

Learning Parameters:

  • Model Architecture: Open source LLMs typically use architectures like GPT (Generative Pre-trained Transformer) or its variants. These architectures consist of multiple layers of transformer neural networks.
  • Hyperparameters: Training an LLM involves setting various hyperparameters, including the number of layers, hidden units, attention heads, and more. These hyperparameters affect the model's capacity and performance.
  • Training Duration: The training process involves running the model on a vast amount of data for multiple epochs. The duration of training can vary from weeks to months, depending on the scale of the model and available computational resources.

Large Data

  • Text Corpora: Open source LLMs are trained on massive text corpora from the internet, which can include books, articles, websites, and more. These corpora contain a wide range of topics and languages to make the model more versatile.
  • Preprocessing: Data preprocessing involves tokenization, which splits text into smaller units (tokens), and formatting the data into suitable sequences for training.
  • Data Augmentation: To improve model robustness and generalization, data augmentation techniques may be used to create variations in the training data.

Models

  • Pre-trained: Open source LLMs start with a phase called "pre-training." During this phase, the model learns to predict the next word in a sentence using the massive amount of text data. This helps the model acquire a broad understanding of language and context.
  • Fine-tuning: After pre-training, the model can be fine-tuned for specific tasks or use cases. Fine-tuning involves training the model on a narrower dataset that is curated for the desired application. For example, fine-tuning can be done for translation, sentiment analysis, or question-answering tasks.

In the practice Notebook, you will work on Pre-trained models, these three examples:

  • StableLM
  • Orca-3b
  • GPT4All