Which open source LLMs are best for commercial use in 2026?

DeepSeek V4 Pro (MIT, April 2026), GLM-5.1 (MIT), and Kimi K2.6 (Modified MIT) are the top recommendations for commercial projects with the highest performance. GPT-OSS-120B (Apache 2.0) remains a strong, fully-Apache-licensed option. Llama 4 Maverick offers commercial use under the Llama 4 Community License for companies with up to 700 million monthly active users. Qwen3-235B-A22B-Thinking (Apache 2.0) and the original Kimi K2 (Modified MIT) are also excellent options for business-critical applications. For medium-sized projects, Gemma 3 27B, Phi-4 (14B), and Qwen3-32B are very well suited. Important: Always check the current license terms, as these can change.

What hardware do I need to run open source LLMs locally?

Hardware requirements vary significantly depending on model size: For smaller models (7B parameters): RTX 4090 with 24GB VRAM achieves 138 tokens/s and is sufficient for most applications. At least 16GB RAM and a fast NVMe SSD are recommended. For larger models (70B parameters): Two RTX 4090s or professional GPUs like A100 with 40-80GB VRAM are necessary. A system with 64GB+ RAM is ideal. DeepSeek V3 requires even more resources for optimal performance. Alternative: Apple Silicon with unified memory is surprisingly effective - Mac Studio with 192GB can run Llama 70B at 13.77 tokens/s.

How do the top 3 models differ: DeepSeek V4 Pro, GLM-5.1, and Kimi K2.6?

DeepSeek V4 Pro (released April 24, 2026) leads in scientific reasoning with GPQA Diamond 90.1% and LiveCodeBench 93.5%, with 1.6 trillion total parameters and only 49B active. GLM-5.1 from Z.ai (formerly Zhipu) tops SWE-Bench Pro with 58.4%, leading all open-source models on real software engineering tasks. Kimi K2.6 from Moonshot AI delivers 92% on HumanEval, 90.5% on GPQA, and 96.4% on AIME 2026 with its 1T-parameter MoE architecture. Use DeepSeek V4 Pro for science and code agents, GLM-5.1 for repository-scale software engineering, and Kimi K2.6 for autonomous agentic coding. All three ship under permissive licenses (MIT or Modified MIT).

Which tools make it easier to use open source LLMs locally?

Several user-friendly tools significantly simplify local LLM usage: Ollama: Easiest installation, supports all common models LM Studio: Graphical user interface, ideal for beginners GPT4All: Lightweight solution for consumer hardware Jan: Open source ChatGPT alternative with local execution vLLM: High-performance solution for production environments

Are open source LLMs really free or are there hidden costs?

The models themselves are free, but operating costs can be significant. Local use is cost-free after the hardware investment, but powerful GPUs cost $1,500-$15,000+. Power consumption for training and inference should not be underestimated. Managed API providers often offer free quotas, then charge fees similar to OpenAI/Anthropic. VPS hosting starts at $20/month for CPU-only, GPU servers cost significantly more. The true costs lie in hardware, electricity, and potential cloud usage.

How is the open source LLM landscape developing in 2026?

By April 2026, Chinese labs have taken over most of the top spots among open weights. DeepSeek V4 Pro (1.6T MoE, 49B active, MIT, April 24, 2026), GLM-5.1 from Z.ai (754B MoE, MIT, April 7, 2026), and Kimi K2.6 from Moonshot AI (1T MoE, Modified MIT) deliver near-frontier performance at a fraction of the proprietary cost. Mixture-of-Experts dominates: nearly all top models use MoE for efficiency with trillions of total parameters but only 13-49B active per token. The 2025 era leaders (GPT-OSS-120B, DeepSeek R1, Qwen3-235B, Llama 4 Maverick) are still solid, but they have been pushed down the rankings by these spring 2026 releases. Trend: highly specialized models for specific domains (code, math, agentic engineering) instead of general-purpose giants.

The Best Open Source LLMs: 120+ Models Compared

Open source LLMs are one of the most important AI trends of 2026.

And for good reason:

Open source models were long significantly weaker than proprietary models. But by spring 2026 they have caught up, especially out of Chinese labs:

DeepSeek V4 Pro (released April 24, 2026), GLM-5.1 from Z.ai, Kimi K2.6 from Moonshot AI, and Qwen3.5 from Alibaba can compete with the best proprietary LLMs like Claude Opus 4.8, GPT-5.5, or Gemini 3.1 Pro, and even beat them on specific benchmarks like SWE-Bench Pro and HumanEval. The new proprietary reference point for Terminal-Bench 2.1 is GPT-5.6 Sol (limited preview since June 26, 2026, ~20 OpenAI partners including Codex CLI) at 88.8%, Sol Ultra at 91.9%. Comparable open-source Terminal-Bench 2.1 scores have not been published yet.

In this article, you'll find a sortable, filterable directory of 120+ open source LLMs, including benchmark scores, licenses, API prices, context windows, and capabilities (as of July 2026).

Additionally, I'll show you how to easily and freely use open LLMs on your own computer (without needing to program or use the terminal).

TL;DRKey Takeaways

DeepSeek V4 Pro (1.6T MoE, MIT, April 2026), Kimi K2.6 (1T MoE), and GLM-5.1 from Z.ai lead the April 2026 rankings, with GLM-5.1 topping SWE-Bench Pro at 58.4%
120+ open source LLMs in a filterable, sortable directory, from MIT and Apache 2.0 through to restricted research-only licenses. Columns like prices, context, and capabilities can be toggled individually
Chinese labs (DeepSeek, Moonshot AI, Z.ai, Alibaba) hold most top positions; the 2025 leaders (GPT-OSS-120B, DeepSeek R1, Qwen3-235B, Llama 4) are still solid but no longer at the top
Local usage possible with tools like Ollama, LM Studio, or GPT4All, but the new top models need serious hardware (multi-GPU or quantized variants for consumer rigs)

All Open Source LLMs at a Glance

The directory contains every open-weights model from the models.dev catalog plus curated classics, sorted by release date by default. Use "Columns" to reveal more data, such as modalities, knowledge cutoff, max output, or the number of API providers:

127 of 127 models


Ornith 1.0 31B	DeepReinforce	31B	256K	–	–	–	MIT	–	–	Jun 2026
Ornith 1.0 35B	DeepReinforce	35B	256K	–	–	75.6%SWE-Bench Verified	MIT	–	–	Jun 2026
Ornith 1.0 397B	DeepReinforce	397B	256K	–	–	82.4%SWE-Bench Verified	MIT	–	–	Jun 2026
Ornith 1.0 9B	DeepReinforce	9B	256K	–	–	69.4%SWE-Bench Verified	MIT	–	–	Jun 2026
GLM-5.2	Z.ai	–	1M	–	91.2%GPQA	62.1%SWE-Bench Pro	MIT	$0.50	$2.20	Jun 2026
Kimi K2.7 Code	Moonshot AI	1T (32B active)	256K	–	89.6%GPQA	67.4%Terminal-Bench	Modified MIT	$0.55	$2.25	Jun 2026
Kimi K2.7 Code Highspeed	Moonshot AI	1T (32B active)	256K	–	89.6%GPQA	67.4%Terminal-Bench	Modified MIT	$1.90	$8.00	Jun 2026
North Mini Code	Cohere	–	250K	–	–	61%SWE-Bench Verified	–	–	–	Jun 2026
MiMo-V2.5-Pro-UltraSpeed	Xiaomi	–	1M	–	–	–	MIT	$1.31	$2.61	Jun 2026
Nemotron 3 Ultra 550B A55B	NVIDIA	550B (55B active)	1M	86.8%MMLU-Pro	87%GPQA	89%LiveCodeBench	NVIDIA Open Model License	$0.50	$2.20	Jun 2026
MiniMax-M3	MiniMax	–	500K	–	92.9%GPQA	80.5%SWE-Bench Verified	MIT	$0.28	$1.10	Jun 2026
Step 3.7 Flash	StepFun	–	250K	–	–	76.5%SWE-Bench Verified	Apache 2.0	$0.20	$1.15	May 2026
Command A Plus	Cohere	–	125K	–	–	–	CC BY-NC-4.0	$2.50	$10.00	May 2026
Mistral Medium 3.5	Mistral AI	128B	256K	–	–	77.6%SWE-Bench Verified	–	$1.50	$6.90	Apr 2026
Nemotron 3 Nano Omni 30B A3B Reasoning	NVIDIA	30B (3B active)	250K	77.3%MMLU-Pro	72.2%GPQA	63.2%LiveCodeBench	NVIDIA Open Model License	$0.11	$0.42	Apr 2026
DeepSeek V4 Flash	DeepSeek	284B (13B active)	1M	83%MMLU-Pro	85%GPQA	88%LiveCodeBench	MIT	$0.089	$0.18	Apr 2026
DeepSeek V4 Pro	DeepSeek	1.6T (49B active)	1M	87.5%MMLU-Pro	90.1%GPQA	93.5%LiveCodeBench	MIT	$0.35	$0.74	Apr 2026
Qwen3.6 27B	Alibaba	27B	256K	86.2%MMLU-Pro	87.8%GPQA	77.2%SWE-Bench Verified	Apache 2.0	$0.20	$1.50	Apr 2026
MiMo-V2.5	Xiaomi	–	1M	86.3%MMLU	–	56.1%SWE-Bench Pro	MIT	$0.11	$0.28	Apr 2026
MiMo-V2.5-Pro	Xiaomi	–	1M	89.4%MMLU	–	78.9%SWE-Bench Verified	MIT	$0.40	$0.80	Apr 2026
Kimi K2.6	Moonshot AI	1T (32B active)	256K	84.6%MMLU-Pro	90.5%GPQA	92%HumanEval	Modified MIT	$0.15	$0.60	Apr 2026
Hy3 preview	Tencent	–	250K	87.4%MMLU	87.2%GPQA	74.4%SWE-Bench Verified	Tencent Hunyuan Community	$0.063	$0.21	Apr 2026
Qwen3.6 35B-A3B	Alibaba	35B (3B active)	256K	85.2%MMLU-Pro	86%GPQA	73.4%SWE-Bench Verified	Apache 2.0	$0.11	$0.80	Apr 2026
GLM-5.1	Z.ai	754B	200K	91.7%MMLU	85.7%GPQA	58.4%SWE-Bench Pro	MIT	$0.30	$2.15	Apr 2026
Gemma 4 26B A4B IT	Google	26B (4B active)	256K	82.6%MMLU-Pro	82.3%GPQA	77.1%LiveCodeBench	Gemma Terms of Use	$0.060	$0.30	Apr 2026
Gemma 4 31B IT	Google	31B	256K	85.2%MMLU-Pro	84.3%GPQA	80%LiveCodeBench	Gemma Terms of Use	$0.10	$0.30	Apr 2026
Gemma 4 E2B IT	Google	–	128K	60%MMLU-Pro	43.4%GPQA	44%LiveCodeBench	Gemma Terms of Use	–	–	Apr 2026
Gemma 4 E4B IT	Google	–	128K	69.4%MMLU-Pro	58.6%GPQA	52%LiveCodeBench	Gemma Terms of Use	–	–	Apr 2026
Step 3.5 Flash 2603	StepFun	–	250K	–	–	32.6%Terminal-Bench Hard	Apache 2.0	$0.10	$0.30	Apr 2026
Nemotron Cascade 2 30B A3B	NVIDIA	30B (3B active)	250K	79.8%MMLU-Pro	76.1%GPQA	87.2%LiveCodeBench	NVIDIA Open Model License	$0.14	$0.60	Mar 2026
MiniMax-M2.7	MiniMax	–	200K	81.8%MMLU-Pro	89.8%GPQA	79.9%SWE-Bench Verified	MIT	$0.18	$0.72	Mar 2026
MiniMax-M2.7-highspeed	MiniMax	–	200K	81.8%MMLU-Pro	89.8%GPQA	79.9%SWE-Bench Verified	MIT	$0.33	$1.32	Mar 2026
Mistral Small 4	Mistral AI	119B	250K	–	71.2%GPQA	17.4%Terminal-Bench Hard	Apache 2.0	$0.15	$0.60	Mar 2026
Nemotron VoiceChat	NVIDIA	–	125K	–	–	–	NVIDIA Open Model License	–	–	Mar 2026
Nemotron 3 Super 120B A12B	NVIDIA	120B (12B active)	256K	83.7%MMLU-Pro	79.2%GPQA	81.2%LiveCodeBench	NVIDIA Open Model License	$0.050	$0.25	Mar 2026
Qwen3.5 122B-A10B	Alibaba	122B (10B active)	256K	86.7%MMLU-Pro	86.6%GPQA	72%SWE-Bench Verified	Apache 2.0	$0.12	$0.92	Feb 2026
Qwen3.5 27B	Alibaba	27B	256K	86.1%MMLU-Pro	85.5%GPQA	72.4%SWE-Bench Verified	Apache 2.0	$0.086	$0.69	Feb 2026
Qwen3.5 35B-A3B	Alibaba	35B (3B active)	256K	85.3%MMLU-Pro	84.2%GPQA	74.6%LiveCodeBench	Apache 2.0	$0.057	$0.46	Feb 2026
Qwen3.5 9B	Alibaba	9B	256K	82.5%MMLU-Pro	81.7%GPQA	65.6%LiveCodeBench	Apache 2.0	$0.040	$0.15	Feb 2026
Sarvam 30B	Sarvam AI	30B	125K	85.1%MMLU	66.5%GPQA	70%LiveCodeBench	–	$0.020	$0.10	Feb 2026
Qwen3.5 397B-A17B	Alibaba	397B (17B active)	256K	87.8%MMLU-Pro	88.4%GPQA	76.4%SWE-Bench Verified	Apache 2.0	$0.17	$1.03	Feb 2026
MiniMax-M2.5-highspeed	MiniMax	–	200K	85.2%MMLU-Pro	85.2%GPQA	75.8%SWE-Bench Verified	MIT	$0.19	$1.24	Feb 2026
MiniMax-M2.5	MiniMax	–	200K	85.2%MMLU-Pro	85.2%GPQA	75.8%SWE-Bench Verified	MIT	$0.11	$0.48	Feb 2026
GLM-5	Z.ai	744B	200K	96%MMLU	94%GPQA	94.2%HumanEval	MIT	$0.30	$1.90	Feb 2026
Step 3.5 Flash	StepFun	–	250K	84.4%MMLU-Pro	83.5%GPQA	74.4%SWE-Bench Verified	Apache 2.0	$0.090	$0.29	Jan 2026
GLM-4.7-Flash	Z.ai	–	200K	–	75.2%GPQA	59.2%SWE-Bench Verified	MIT	$0.040	$0.30	Jan 2026
GLM-4.7-FlashX	Z.ai	–	200K	–	75.2%GPQA	59.2%SWE-Bench Verified	MIT	$0.060	$0.40	Jan 2026
Kimi K2.5	Moonshot AI	1T (32B active)	256K	92%MMLU	87.6%GPQA	99%HumanEval	Modified MIT	$0.30	$1.50	Jan 2026
MiniMax-M2.1	MiniMax	230B (10B active)	200K	88%MMLU-Pro	83%GPQA	74%SWE-Bench Verified	MIT	$0.27	$0.95	Dec 2025
GLM-4.7	Z.ai	–	200K	84.3%MMLU-Pro	85.7%GPQA	73.8%SWE-Bench Verified	MIT	$0.15	$0.80	Dec 2025

Benchmark score color coding:

ExcellentTop tier

GoodAbove average

AverageSolid

PoorBelow average

1. Key Benchmarks Explained

To objectively compare open source LLMs, I use three central benchmark categories:

MMLU / MMLU-Pro: The Massive Multitask Language Understanding Benchmark tests general knowledge across 57 subjects (STEM, social sciences, humanities). MMLU-Pro is the more challenging variant with less contamination. Top models score 85-90% here.

MATH / GPQA: These benchmarks test mathematical and scientific reasoning. MATH-500 contains challenging math problems, while GPQA (Graduate-Level Physics Questions Answers) tests expert knowledge in biology, physics, and chemistry. Top models score 70-97% here.

HumanEval / LiveCodeBench: These benchmarks test code generation. HumanEval contains Python programming tasks, LiveCodeBench tests code performance with current, uncontaminated tasks. Top models score 60-90% here.

The table shows up to three benchmark scores per model; the small label under each badge tells you which benchmark it is. Older and niche models don't have every score, in which case you'll see a dash.

SWE-bench Verified shows how close the top open models have come to the proprietary flagships:

Open models within about 8 points of GPT-5.5 and Opus 4.8

Models:

DeepSeek V4 Pro

Kimi K2.6

Claude Opus 4.8

GPT-5.5

Gemini 3.1 Pro

Sources: DeepSeek, Moonshot AI, Anthropic, OpenAI, Google DeepMind

The gap becomes even clearer on price. The leading open models deliver almost the same coding performance at a fraction of the API cost:

DeepSeek V4 Pro and Kimi K2.6: best price-performance in the scatter

Ideal: strong + cheap

DeepSeek

Moonshot AI

Anthropic

OpenAI

Google

Efficiency frontier (best price-performance)

Sources: official API price lists from DeepSeek, Moonshot AI, Anthropic, OpenAI, and Google

CC BY 4.0

gradually.ai

2. Top Models of April 2026

DeepSeek V4 Pro (released April 24, 2026) is the new leader. The 1.6 trillion parameter MoE activates only 49B per token, scores 87.5% on MMLU-Pro, 90.1% on GPQA Diamond, and 93.5% on LiveCodeBench. Same MIT license as the rest of the DeepSeek lineup, and it ships with native 1M-token context at roughly 27% of the inference FLOPs of V3.2.

Kimi K2.6 from Moonshot AI is the second-best open weight overall: 92% on HumanEval, 90.5% on GPQA Diamond, 96.4% on AIME 2026, with a 256K context window and native video input. Modified MIT license, 1T parameters MoE.

GLM-5.1 from Z.ai (formerly Zhipu) tops SWE-Bench Pro with 58.4%, beating GPT-5.4 (57.7%) and Claude Opus 4.6 (57.3%). The 754B-parameter MoE was trained entirely on Huawei Ascend chips and ships under the MIT license. The reasoning sibling, GLM-5, hits 96% on MMLU and 94% on GPQA, the highest knowledge scores in the open-source space.

Kimi K2.5 still posts the highest HumanEval score on any leaderboard (99.0) and leads on MATH-500 (98.0). It is the best open weight purely for code generation when latency matters less than peak quality.

DeepSeek V4 Flash (284B / 13B active) is the cost-efficient sibling of V4 Pro and the most practical choice when you want frontier-class quality on a single high-end GPU.

The previous generation is still very usable: GPT-OSS-120B (OpenAI's first open-weight model since GPT-2), DeepSeek R1, Qwen3-235B-A22B-Thinking, and Llama 4 Maverick all remain strong, just no longer state-of-the-art.

Here are the five new top models side by side:

Feature	DeepSeek V4 Pro	Kimi K2.6	GLM-5.1	Kimi K2.5	DeepSeek V4 Flash
Developer	DeepSeek	Moonshot AI	Z.ai	Moonshot AI	DeepSeek
Licenseall permissive	MIT	Modified MIT	MIT	Modified MIT	MIT
Parametersall Mixture-of-Experts	1.6T (49B active)	1T	754B	1T	284B (13B active)
Runs locallye.g. with Ollama or LM Studio	Yes	Yes	Yes	Yes	Yes
Local hardware needs	Multi-GPU or quantized	Multi-GPU or quantized	Multi-GPU or quantized	Multi-GPU or quantized	One high-end GPU
Standout feature	Native 1M-token context window	256K context, native video input	Trained entirely on Huawei Ascend chips	Leads MATH-500 at 98.0	Cost-efficient sibling of V4 Pro

YesPartialNo

3. LLM Licenses Explained

Here's an overview of the most commonly used licenses for open source LLMs.

Warning

Note: Please always review the current license terms of LLMs yourself before using them. License conditions can change at any time.

MIT License

A very permissive open source license, similar to Apache 2.0. It allows unrestricted use, modification, and distribution of the LLM, including in proprietary programs, as long as the copyright notice is retained. DeepSeek V3 uses MIT with some restrictions for military use.

Llama 2 Community / Llama 3 Community

Meta released Llama 2 and Llama 3 under these licenses. They allow free use of the LLMs for research and commercial applications with up to 700 million monthly active users. The source code and model weights are freely available.

Qwen License / Qianwen LICENSE

Qwen models are released under various licenses. While smaller models are often licensed under Apache 2.0, larger models like Qwen2.5-72B have special license terms that allow commercial use with certain restrictions.

Apache 2.0

A very permissive open source license with minimal restrictions. It allows use, modification, and distribution of the LLM, including in proprietary programs, as long as the copyright notice is retained. It contains no copyleft clause.

CC BY-NC-4.0

A Creative Commons license that allows editing and sharing the LLM in any form, but not for commercial purposes. The author's name must be credited.

CC BY-NC-SA-4.0

Similar to CC BY-NC-4.0, but with the additional Share-Alike condition. This means forks or modified versions of an LLM must be distributed under the same conditions.