Fine-tuning means adapting a pre-trained LLM (large language model) to a specific task or domain by further training it on a specialized dataset.
During this process, the model's weights are updated so that it better captures the specifics of the task or domain.
Fine-tuning thus allows you to optimize the broad but superficial knowledge base of LLMs for specific use cases.
Did that just go over your head?
No problem, here's a simpler explanation:
Think of an LLM as a new employee in your company who has a lot of general knowledge but little understanding of internal processes and communication within your organization.
Through fine-tuning, you feed your employee the necessary specialized knowledge so they can better fulfill their role in the company.
1. Benefits of Fine-Tuning
Fine-tuning improves "few-shot learning" by training with many more examples than would fit in a prompt.
This means you no longer need to provide as many examples in your prompts to get the desired output. Additionally, you don't need to give the LLM as many details about its task, such as the writing style to use, the target audience, or the output length. This can save a lot of time.
Furthermore, fine-tuning can help an LLM respond with lower latency and consume fewer tokens. Fine-tuning can therefore also reduce costs for API usage or computing power.
2. What Steps Are Required for Fine-Tuning?
Fine-tuning sounds complicated. But it's actually a relatively simple, though very time-consuming process.
The key steps in fine-tuning are:
- Prepare and upload training data (by far the most labor-intensive step for you)
- Train a new fine-tuned model
- Evaluate results and return to step 1 if needed
- Use your fine-tuned model
2.1 How Do You Prepare Training Data?
The most important and time-consuming step in fine-tuning is preparing the training data.
For this, you need to create a diverse set of example conversations that resemble the conversations the model will encounter in production.
Each example in the dataset must have a specific format. When fine-tuning OpenAI models, for example, the training data must be in the same format as the Chat Completions API.
To achieve the best results with fine-tuning, you should primarily train an LLM with cases where it doesn't give the desired responses and provide your preferred answers in the training data.
Here's an example from OpenAI where a chatbot named "Marv" is fine-tuned to give sarcastic responses:
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "What is the capital of France?"}, {"role": "assistant", "content": "Paris, as if everyone doesn't know that already."}]}
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "Who wrote 'Romeo and Juliet'?"}, {"role": "assistant", "content": "Oh, just some guy named William Shakespeare. Ever heard of him?"}]}
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "How far is the Moon from Earth?"}, {"role": "assistant", "content": "About 384,400 kilometers. Give or take a few, as if that really matters."}]}3. Which AI Models Can Be Fine-Tuned?
Fine-tuning is available for many different AI models – from proprietary APIs to open-source models. Here's a comprehensive overview of all currently available models (as of January 2026):
Showing 29 models
| Provider | Model | Parameters | Platform | Method |
|---|---|---|---|---|
| Alibaba / Qwen | Qwen3 | 0.6B–235B | Open Source | FullLoRA |
| Alibaba / Qwen | Qwen2.5-Max | MoE | Alibaba Cloud | SFT |
| Alibaba / Qwen | Qwen2.5 | 0.5B–72B | Open Source | FullLoRA |
| Amazon | Amazon Nova 2 Lite | – | Amazon Bedrock | Reinforcement FT |
| Amazon | Amazon Titan Text | – | Amazon Bedrock | SFT |
| Anthropic | Claude 3 Haiku | – | Amazon Bedrock | SFT |
| Cohere | Command R (08-2024) | 32B | Cohere API | SFTLoRA |
| Cohere | Command R+ (08-2024) | 104B | Cohere API | SFTLoRA |
| DeepSeek | DeepSeek R1 Distill | 1.5B–70B | Open Source | LoRAQLoRA |
| DeepSeek | DeepSeek V3 | 671B (37B MoE) | Open Source | QAT |
| Gemini 2.5 Pro | – | Vertex AI | SFT | |
| Gemini 2.5 Flash | – | Vertex AI | SFT | |
| Gemini 2.5 Flash-Lite | – | Vertex AI | SFT | |
| Gemini 2.0 Flash | – | Vertex AI | SFT | |
| Gemma 3 | 1B–27B | Open Source | FullLoRA | |
| Meta | Llama 3.3 | 70B | Open Source | FullLoRA |
| Meta | Llama 3.2 | 1B–90B | Open Source, Amazon Bedrock | FullLoRA |
| Meta | Llama 3.1 | 8B–405B | Open Source | FullLoRA |
| Mistral | Mistral Large 3 | 123B | Mistral API, Open Source | SFTLoRA |
| Mistral | Mistral Nemo | 12B | Mistral API, Open Source | SFTLoRA |
| Mistral | Codestral | – | Mistral API | SFT |
| Mistral | Mistral Small | – | Mistral API | SFT |
| Mistral | Mistral 7B | 7B | Open Source | FullLoRA |
| OpenAI | GPT-4.1 | – | OpenAI API | SFTDPO |
| OpenAI | GPT-4.1 mini | – | OpenAI API | SFTDPO |
| OpenAI | GPT-4.1 nano | – | OpenAI API | SFTDPO |
| OpenAI | GPT-4o (2024-08-06) | – | OpenAI API | SFT |
| OpenAI | GPT-4o mini | – | OpenAI API | SFT |
| OpenAI | GPT-3.5 Turbo | – | OpenAI API | SFT |
3.1 Explanation of Fine-Tuning Methods
- SFT (Supervised Fine-Tuning): Classic supervised fine-tuning with input-output pairs
- DPO (Direct Preference Optimization): Training with preference data (which answer is better)
- Full Fine-Tuning: All model weights are adjusted
- LoRA (Low-Rank Adaptation): Efficient method that only trains small adapter layers
- QLoRA: LoRA with quantized base model (requires less VRAM)
- QAT (Quantization-Aware Training): Training that accounts for later quantization
- Reinforcement FT: Fine-tuning with reinforcement learning from human feedback
Note: Google AI Studio no longer supports fine-tuning since 2025. For Gemini models, Vertex AI must be used.
Tip: You can also further fine-tune an already fine-tuned model. This is useful when you receive additional data and don't want to repeat the previous training steps.
4. When Should You Use Fine-Tuning?
Fine-tuning is a great method for getting better output from an LLM and is especially useful when it's easier to "show than to explain."
The problem is, however:
Fine-tuning is, as already explained, very time-consuming.
Therefore, it always makes sense to first check whether you can get better results with other methods and only resort to fine-tuning when you've exhausted these methods.
These include:
- Prompt Engineering (i.e., formulating prompts, such as adding a role, precisely defining the answer format, etc.)
- Prompt Chaining (breaking complex tasks into multiple prompts)
- Function Calling (e.g., calling external interfaces or databases)
A major advantage of these methods is that you get feedback much faster and more easily.
For example, if you add a role to your prompt, you can immediately compare your output with the prompt without a role.
With fine-tuning, you often have to spend days or even weeks preparing your training data. Then you have to wait for the model to be fine-tuned and can only test what the fine-tuning actually achieved after these two steps.
5. Approaches to Fine-Tuning
There are various ways to fine-tune LLMs:
| Approach | Description | Analogy |
|---|---|---|
| Full Fine-Tuning | Retraining the entire model, requires a lot of data and resources | Completely training a new employee |
| Parameter Efficient Fine-Tuning (PEFT) | Adding new efficient adapters without changing the model structure | Further training an employee |
| Distillation | Training a smaller specialized model that replicates the decisions of the large model | Having an experienced employee train a new employee |
