Which open source LLMs are best for commercial use in 2025?

GPT-OSS-120B (Apache 2.0) and DeepSeek R1 (MIT) are the top recommendations for commercial projects with the highest performance. Llama 4 Maverick offers commercial use under the Llama 4 Community License for companies with up to 700 million monthly active users. Qwen3-235B-A22B-Thinking and Kimi K2 (both MIT) are also excellent options for business-critical applications. For medium-sized projects, Gemma 3 27B, Phi-4 (14B), and Qwen3-32B are very well suited. Important: Always check the current license terms, as these can change.

What hardware do I need to run open source LLMs locally?

Hardware requirements vary significantly depending on model size: For smaller models (7B parameters): RTX 4090 with 24GB VRAM achieves 138 tokens/s and is sufficient for most applications. AtSign least 16GB RAM and a fast NVMe SSD are recommended. For larger models (70B parameters): Two RTX 4090s or professional GPUs like A100 with 40-80GB VRAM are necessary. A system with 64GB+ RAM is ideal. DeepSeek V3 requires even more resources for optimal performance. Alternative: Apple Silicon with unified memory is surprisingly effective - Mac Studio with 192GB can run Llama 70B at 13.77 tokens/s.

How do the top 3 models differ: GPT-OSS-120B, DeepSeek R1, and Qwen3-235B?

GPT-OSS-120B from OpenAI leads in scientific reasoning with GPQA: 80.1% and AIME: 96.6%, while DeepSeek R1 shows peak mathematical performance with MATH-500: 97.3%. Qwen3-235B-A22B-Thinking surpasses both in code tasks (LiveCodeBench: 74.1%) and beats DeepSeek R1 in 17/23 benchmarks. GPT-OSS-120B requires only 5.1B active parameters (out of 117B total), DeepSeek R1 37B (out of 671B), Qwen3 22B (out of 235B). For scientific tasks: GPT-OSS-120B, for mathematics: DeepSeek R1, for code: Qwen3-235B. All three use permissive licenses (Apache 2.0 or MIT).

Which tools make it easier to use open source LLMs locally?

Several user-friendly tools significantly simplify local LLM usage: Ollama: Easiest installation, supports all common models LM Studio: Graphical user interface, ideal for beginners GPT4All: Lightweight solution for consumer hardware Jan: Open source ChatGPT alternative with local execution vLLM: High-performance solution for production environments

Are open source LLMs really free or are there hidden costs?

The models themselves are free, but operating costs can be significant. Local use is cost-free after the hardware investment, but powerful GPUs cost $1,500-$15,000+. Power consumption for training and inference should not be underestimated. Managed API providers often offer free quotas, then charge fees similar to OpenAI/Anthropic. VPS hosting starts at $20/month for CPU-only, GPU servers cost significantly more. The true costs lie in hardware, electricity, and potential cloud usage.

How is the open source LLM landscape developing in 2025?

2025 marks the breakthrough for open source LLMs: With GPT-OSS-120B, OpenAI enters the space for the first time since GPT-2, while DeepSeek R1 and Qwen3 surpass proprietary models like GPT-4. Mixture-of-Experts (MoE) dominates - 8 of the top 10 use MoE for efficiency with trillions of parameters. Meta's Llama 4 Maverick (400B MoE, 17B active) shows that small activated parameters can achieve top performance. New players like Moonshot AI (Kimi K2 with 1T parameters) and evolved models from Google (Gemma 3), Microsoft (Phi-4), and IBM (Granite Code) intensify competition. Trend: Highly specialized models for specific domains (code, math, reasoning) instead of general-purpose giants.

The 50 Best Open Source LLMs (and How to Use Them)

Open source LLMs are one of the most important AI trends for 2025.

And for good reason:

Open source models were long significantly weaker than proprietary models. But this year they have caught up:

With GPT-OSS-120B, DeepSeek R1, Qwen3-235B-A22B-Thinking, Llama 4 Maverick, and Kimi K2, models have emerged that can compete with the best proprietary LLMs like GPT-5, Claude 4.5, or Gemini 2.5 (and even surpass them in some benchmarks).

In this article, you'll find an overview of the current 50 best open source LLMs with their key benchmark scores and licenses.

Additionally, I'll show you how to easily and freely use open LLMs on your own computer (without needing to program or use the terminal).

TL;DRKey Takeaways

GPT-OSS-120B (OpenAI), DeepSeek R1, and Qwen3-235B lead the 2025 open source rankings, surpassing GPT-4 in many benchmarks (MMLU: 90%+, MATH: 97%+)
50 open source LLMs available with various licenses - from MIT to Apache 2.0 to restricted commercial licenses
New 2025 models like Llama 4 Maverick, Kimi K2, and Gemma 3 27B set new standards for efficiency at smaller model sizes
Local usage possible with tools like Ollama, LM Studio, or GPT4All - but requires powerful hardware (RTX 4090+ recommended)

Open Source LLMs Compared

#	Model	MMLU	Math	Code	Developer	License
1	GPT-OSS-120B (117B MoE)	90.0%	80.1%	96.6%	OpenAI	Apache 2.0
2	DeepSeek-R1 (671B MoE)	90.8%	97.3%	71.5%	DeepSeek	MIT
3	Qwen3-235B-A22B-Thinking	87.0%	92.3%	74.1%	Alibaba	Apache 2.0
4	Llama 4 Maverick (400B MoE)	80.5%	69.8%	43.4%	Meta	Llama 4 Community
5	Kimi K2 (1T MoE)	97.4%	71.6%	53.7%	Moonshot AI	MIT
6	DeepSeek-V3 (671B MoE)	88.5%	90.2%	85.0%	DeepSeek	MIT
7	GPT-OSS-20B (20B MoE)	85.3%	96.0%	69.0%	OpenAI	Apache 2.0
8	Llama 3.3 70B Instruct	86.0%	77.3%	83.0%	Meta	Llama 3.3 Community
9	Qwen2.5-72B-Instruct	85.3%	82.3%	82.0%	Alibaba	Qwen License
10	Llama 3.1 405B Instruct	88.6%	81.1%	73.8%	Meta	Llama 3.1 Community
11	Gemma 3 27B	67.5%	42.4%	69.0%	Google	Gemma Terms of Use
12	Command R+ (104B)	88.2%	85.0%	92.0%	Cohere	CC BY-NC-4.0
13	Llama-3.1-Nemotron-70B	85.0%	57.6%	8.98	NVIDIA	Llama 3.1 Community
14	Mixtral-8x22B (141B MoE)	77.8%	68.0%	75.0%	Mistral AI	Apache 2.0
15	Mistral Large 2 (123B)	84.0%	76.9%	82.0%	Mistral AI	Mistral Research License
16	Phi-4 (14B)	56.1%	82.6%	80.4%	Microsoft	MIT
17	Qwen3-32B-Instruct	83.5%	77.0%	78.0%	Alibaba	Apache 2.0
18	OLMo 2 32B	74.0%	78.6%	84.0%	Allen Institute	Apache 2.0
19	DBRX (132B MoE)	73.7%	70.1%	66.9%	Databricks	Databricks Open Model
20	DeepSeek Coder V2 (236B MoE)	78.5%	90.2%	76.2%	DeepSeek	MIT
21	Llama 3.1 70B Instruct	79.3%	68.0%	80.5%	Meta	Llama 3.1 Community
22	Yi-34B	76.3%	67.6%	85.0%	01.AI	Apache 2.0
23	Falcon 3 10B	73.1%	42.5%	58.0%	TII	Falcon License
24	Qwen2.5-32B-Instruct	83.1%	75.5%	78.9%	Alibaba	Apache 2.0
25	Mistral NeMo 12B	68.0%	83.5%	76.8%	Mistral AI / NVIDIA	Apache 2.0
26	InternLM3 8B-Instruct	72.3%	75.0%	75.6%	Shanghai AI Lab	Apache 2.0
27	Granite Code 34B	75.4%	68.3%	67.5%	IBM	Apache 2.0
28	Falcon 180B	70.4%	85.3%	77.6%	TII	Falcon License
29	WizardLM-2 8x22B	77.2%	83.0%	73.2%	Microsoft	Apache 2.0
30	Qwen2-72B-Instruct	84.2%	89.5%	64.6%	Alibaba	Apache 2.0
31	Mixtral-8x7B (46.7B MoE)	70.6%	74.4%	40.2%	Mistral AI	Apache 2.0
32	Llama 3.1 8B Instruct	68.4%	84.5%	72.6%	Meta	Llama 3.1 Community
33	Gemma 3 8B	70.9%	77.9%	56.0%	Google	Gemma Terms of Use
34	Code Llama 70B Instruct	62.0%	67.8%	62.0%	Meta	Llama 2 Community
35	Falcon 3 7B	67.4%	39.2%	70.8%	TII	Falcon License
36	SOLAR 10.7B v1.0	66.0%	69.9%	71.0%	Upstage	Apache 2.0
37	Mistral 7B v0.3	62.5%	52.2%	83.0%	Mistral AI	Apache 2.0
38	Yi-1.5 34B	76.8%	80.1%	75.0%	01.AI	Apache 2.0
39	OLMo 2 13B	68.2%	71.4%	82.1%	Allen Institute	Apache 2.0
40	StarCoder2 15B	46.0%	36.6%	49.6%	BigCode	BigCode Open RAIL-M v1
41	Phi-3 Medium (14B)	78.0%	91.0%	62.2%	Microsoft	MIT
42	InternLM2-Chat-20B	67.0%	79.6%	67.1%	Shanghai AI Lab	Apache 2.0
43	DeepSeek LLM 67B	71.3%	63.4%	40.0%	DeepSeek	DeepSeek License
44	Vicuna 1.5 13B	55.0%	48.3%	81.6%	LMSYS	Llama 2 Community
45	Zephyr 7B Beta	61.4%	42.0%	61.1%	HuggingFace	MIT
46	Gemma 2 9B	71.3%	68.6%	51.8%	Google	Gemma Terms of Use
47	OLMo 2 7B	64.1%	62.5%	79.8%	Allen Institute	Apache 2.0
48	Baichuan 2 13B	59.5%	58.1%	52.8%	Baichuan Inc.	Baichuan 2 License
49	Orca 2 13B	59.0%	60.5%	61.7%	Microsoft	Microsoft Research License
50	Grok-1 (314B MoE)	73.0%	62.9%	63.2%	xAI	Apache 2.0

Benchmark score color coding:

ExcellentTop tier

GoodAbove average

AverageSolid

PoorBelow average

1. Key Benchmarks Explained

To objectively compare open source LLMs, I use three central benchmark categories:

MMLU / MMLU-Pro: The Massive Multitask Language Understanding Benchmark tests general knowledge across 57 subjects (STEM, social sciences, humanities). MMLU-Pro is the more challenging variant with less contamination. Top models score 85-90% here.

MATH / GPQA: These benchmarks test mathematical and scientific reasoning. MATH-500 contains challenging math problems, while GPQA (Graduate-Level Physics Questions Answers) tests expert knowledge in biology, physics, and chemistry. Top models score 70-97% here.

HumanEval / LiveCodeBench: These benchmarks test code generation. HumanEval contains Python programming tasks, LiveCodeBench tests code performance with current, uncontaminated tasks. Top models score 60-90% here.

The table shows three benchmark scores for each model, which vary depending on the model's strengths (e.g., code-focused models have higher HumanEval scores).

2. Top Models of 2025

GPT-OSS-120B from OpenAI leads the rankings (MMLU: 90.0%, GPQA: 80.1%, AIME: 96.6%) and is the first open-weight model from OpenAI since GPT-2.

DeepSeek R1 with its 671 billion parameters (only 37B active) surpasses GPT-4 in many areas (MMLU: 90.8%, MATH-500: 97.3%) and was trained for just $5.6 million.

Qwen3-235B-A22B-Thinking from Alibaba sets new standards for reasoning (AIME25: 92.3%, LiveCodeBench: 74.1%) and surpasses DeepSeek R1 in 17 out of 23 benchmarks.

Llama 4 Maverick from Meta achieves impressive scores with only 17B active parameters (out of 400B total) (MMLU-Pro: 80.5%, GPQA: 69.8%) and beats significantly larger models.

3. LLM Licenses Explained

Here's an overview of the most commonly used licenses for open source LLMs.

Warning

Note: Please always review the current license terms of LLMs yourself before using them. License conditions can change at any time.

MIT License

A very permissive open source license, similar to Apache 2.0. It allows unrestricted use, modification, and distribution of the LLM, including in proprietary programs, as long as the copyright notice is retained. DeepSeek V3 uses MIT with some restrictions for military use.

Llama 2 Community / Llama 3 Community

Meta released Llama 2 and Llama 3 under these licenses. They allow free use of the LLMs for research and commercial applications with up to 700 million monthly active users. The source code and model weights are freely available.

Qwen License / Qianwen LICENSE

Qwen models are released under various licenses. While smaller models are often licensed under Apache 2.0, larger models like Qwen2.5-72B have special license terms that allow commercial use with certain restrictions.

Apache 2.0

A very permissive open source license with minimal restrictions. It allows use, modification, and distribution of the LLM, including in proprietary programs, as long as the copyright notice is retained. It contains no copyleft clause.

CC BY-NC-4.0

A Creative Commons license that allows editing and sharing the LLM in any form, but not for commercial purposes. The author's name must be credited.

CC BY-NC-SA-4.0

Similar to CC BY-NC-4.0, but with the additional Share-Alike condition. This means forks or modified versions of an LLM must be distributed under the same conditions.

Non-Commercial

Here, using the LLM for commercial purposes is prohibited. However, what exactly counts as "commercial" is not always clearly defined or delimited.

Usually, "non-commercial" models are only released for research purposes or private use.

4. Using Open Source LLMs Locally on Your Own Computer

Using open source LLMs locally on your own computer is easier than you might think:

1. Download LM Studio

Download LM Studio from the website. It's free and available for Mac, Windows, and Linux:

2. Install and Open LM Studio

Next, install LM Studio on your computer and open it.

3. Download Your Desired Open Source LLMs

Now you need to download the open source LLMs you want to use in LM Studio.

Many popular LLMs are already on the home screen. To download an LLM, simply click the blue download button:

To find specific open source LLMs, you can also use the search function:

4. Important: Check System Requirements Before Downloading

Before downloading an LLM, you should check the system requirements.

Llama 3, for example, requires more than 8 GB RAM and 4.92 GB of free storage:

5. Chat with the Open Source LLM

After downloading an open source LLM, you can use it directly in LM Studio.

Simply click on the speech bubble icon (?) in the left sidebar.

The user interface and settings options are reminiscent of the OpenAI Playground:

Frequently Asked Questions About Open Source LLMs