Introducing
We’re thrilled to unveil the Effi series—specifically Effi 7b and Effi 13b, our latest Large Language Models (LLMs). But here’s the game-changer – we’re making them open source for the expansive AI community.
Effi 7b, trained on instructions and based on the Llama 7b architecture, pushes the boundaries, surpassing many noteworthy models such as mpt7b and falcon7b. On the other hand, Effi 13b, with its unique training on a chain of thoughts, boasts heightened reasoning capabilities.
Motivation
Since our inception in 2020, AI Planet’s mission has been to make AI accessible for everyone. We’ve consistently made contributions to the global AI community with our free learning resources, bootcamps and expert sessions. The philosophy is simple: cutting-edge research should not be an elite privilege but a global resource. Continuing this spirit, we’re making our AI research freely available to all.
Approach
- Dataset:
- Effi 7b: Powered by an alpaca-style dataset boasting over 120K instruction samples, its roots trace back to the Llama base model architecture available on Hugging Face as huggy llama.
- Effi 13b: We leveraged the dataset available at CoT-Collection to instill chain-of-thought reasoning into the model. Building on the Llama 2 (13B) chat model, our goal was enhanced contextual reasoning, which showed promise in initial tests.
- High-level Model Architecture:
- Effi 7b: With Qlora fine-tuning, the training was facilitated by Google Colab Pro+ and GPU A100 80G instances.
- Effi 13b: A unique causal decoder-only model by AI Planet, it stands on the shoulders of Llama-2-13b-chat-hf architecture and benefits from the expansive CoT dataset with 1.8M. Just a heads up: You’d need a robust 85-100GB memory for smooth effi-13b inference. Interested in the model weights? They’re accessible on the Hugging Face or on the Meta Research website upon request.
- Evaluation: For the empirically minded, the benchmarks on Hugging Face are as follows, you can find the complete list here.
Model | Average | ARC | HellaSwag | MMLU | TruthfulQA |
---|---|---|---|---|---|
aiplanet/effi-7b | 52.2 | 55.12 | 78.07 | 35.91 | 39.71 |
llama-7b | 49.72 | 51.02 | 77.82 | 35.71 | 34.33 |
tiiuae/falcon-7b | 47.01 | 47.87 | 78.13 | 27.79 | 34.26 |
mosaicml/mpt-7b-chat | 49.95 | 46.5 | 75.51 | 37.62 | 40.16 |
aiplanet/effi-13b | 58.26 | 53.33 | 81.22 | 53.57 | 44.92 |
Future Direction
Our compass is set: delving deeper into LLMs research and broadening the horizons by open-sourcing more models for the greater good.
Explore the Model
You can explore the models on the below urls:
Limitations and Considerations:
While the Effi LLMs shine in many aspects, their training predominantly in English can be a limitation. There’s also the inherent challenge of biases and stereotypes in the training data.
Authors:
- Plaban Nayak (Director of AI, AI Planet)
- Srikar Verma (NLP Research, AI Planet)
Implementation and Recommendations:
Eager to harness the power of Effi LLMs? Dive into the Hugging Face model repository. But a word to the wise—stay cognizant of its potential biases and limitations. Equip your users with knowledge about the model’s strengths, weaknesses, and ethical concerns.
For all things, Effi LLMs or if curiosity gets the better of you, reach out to us at [email protected]. Your questions are always welcome!
Soon, we will be releasing a detailed blog on the model architecture and dive deep into the process followed.