Reflexion

What it is about:

Reflexion is a framework designed to improve the performance of large language models (LLMs) by enabling them to learn from their mistakes through self-reflection. Unlike traditional reinforcement learning that relies on numerical rewards, Reflexion utilizes verbal feedback to guide the LLM towards better decision-making in the future.

How it works:

Reflexion is a cyclical process with several key components:

Actor: This component generates text and actions based on the current situation. It can be built using various models like Chain-of-Thought (CoT) or ReAct. Additionally, a memory component is included to provide context to the actor.
Evaluator: This component assesses the performance of the actor. It receives the generated actions (trajectory) and assigns a score based on the task's specific requirements. Evaluators can use LLMs or rule-based heuristics depending on the situation.
Self-Reflection: This component analyzes the performance and generates verbal feedback for the actor. It leverages the reward score, the current trajectory, and the actor's memory to create specific and relevant feedback. This feedback is then stored in the memory for future reference.

Examples:

Learning a Game:

Imagine an LLM playing a text-based game using Reflexion:

Action: The actor makes a decision (action) within the game environment.
Evaluation: The evaluator determines if the action helped the LLM progress towards the goal (positive score) or hindered it (negative score).
Self-Reflection: Based on the score and the game state, the self-reflection component generates feedback like "Your previous move put you in a difficult situation. Try exploring a different path next time." This feedback is stored in memory to guide future decisions.

Improving Reasoning:

Consider an LLM answering a question that requires reasoning about multiple facts:

Action: The actor analyzes the question and retrieves information from relevant sources.
Evaluation: The evaluator assesses the answer's accuracy based on the provided facts.
Self-Reflection: If the answer is incorrect, the self-reflection component might suggest: "Your answer seems to miss a key detail mentioned in source B. Consider revisiting that source before finalizing your response."

Learning to Code:

Reflexion can be used to train LLMs to write better code:

Action: The actor generates code based on the problem description.
Evaluation: The evaluator checks if the code produces the expected output and adheres to coding best practices.
Self-Reflection: In case of errors, the self-reflection component might provide feedback like "There's a syntax error in your code around line 15. Double-check the function definition."

When to use it:

Reflexion is particularly beneficial for tasks that involve:

Trial-and-Error Learning: Reflexion is ideal for tasks where LLMs need to learn from mistakes, such as decision-making, reasoning, and programming.
Limited Training Data: Unlike traditional reinforcement learning that requires extensive data, Reflexion offers a more efficient option for scenarios with limited resources.
Nuanced Feedback: The framework utilizes verbal feedback, allowing for more specific guidance compared to simple numerical rewards.
Interpretability: Reflexion promotes interpretability by storing self-reflections in memory, making it easier to understand the LLM's learning process.