Verta | Blog

November roundup: our favorite models:

Written by Baasit Sharief | December 07, 2023

 

 

The GenAI models market moves extremely fast and it's hard to keep up with the latest models. In this Monthly GenAI models Roy roundup, we highlight the top 5 open source GenAI models that you should have on your radar.

Mistral 7b
Who built it? Mistral team

How large is it? 7B

Why is it interesting? Fine-tuned model from Llama2. It punches above its weight in HELM benchmarks and real-world usage. Beats GPT 3.5 8 out of 10 times for next-generation use cases.

Where can you use it? General purpose replacement for Llama-2, GPT-3.5
[2310.06825] Mistral 7B

Zephyr
Who built it? Hugging Face H4 (helpful, honest, harmless, and huggy)

How large is it? 7B

Why is it interesting? It’s a fine-tuned version of Mistral-7b trained on a mix of publicly available, synthetic data using DPO. 

Where can you use it? It is as good as gpt-3.5, gpt-4, claude and llama-70b-chat on topics of writing, humanities, stem and roleplay. Lacks a fair bit on complex tasks like mathematics, reasoning and coding.
[2310.16944] Zephyr: Direct Distillation of LM Alignment


Phi 1.5
Who built it? Microsoft

How large is it? 1.5B

Why is it interesting? Trained on synthetic datasets curated with the help of larger LLMs and significantly better than its counterparts in the same parametric range and competitive with models with 13B parameters.

Where can you use it? When efficiency matters more, it is cheaper to run since it’s low on parameters. Also, can be used as a student for knowledge distillation
[2309.05463] Textbooks Are All You Need II: phi-1.5 technical report

Orca-2
Who built it? Microsoft

How large is it? Orca 2 is a FT version of llama2 - 7B & 13B

Why is it interesting? It shows that synthetic data created by models capable of complex workflows (advanced prompts, multiple calls) can teach Small Language Models (SLMs i.e. order of 10B) new capabilities, for this case reasoning

Where can you use it? Fine-tuning and Distillation
[2212.08410] Teaching Small Language Models to Reason

Qwen
Who built it? Alibaba Cloud

How large is it? 7B and 14B

Why is it interesting? First time that a Chinese model has made its way through the tops of leader boards. As good as gpt-3.5-16k in long text understanding and python coding, big focus on chinese and english

Where can you use it? Works especially good when used for Tooling and Agents, even better than gpt-4, also for uses which might only occur in chinese
[2309.16609] Qwen Technical Report

*In proprietary models land, we also want to give kudos to:

Claude 2.1 - 200k context

GPT-4-turbo - 128K context length

Learn more

For an in-depth look at our capabilities, you can read our full launch blog post and check out the platform here.