What are Sampling Parameters?
Sampling parameters control how a Large Language Model selects the next token (word or subword). They influence whether the output is creative and diverse or precise and deterministic.
The most important parameters are Temperature, Top-P (Nucleus Sampling), Top-K, as well as Frequency Penalty and Presence Penalty.
Temperature
Temperature controls the "creativity" of the model. It affects how much the probability distribution for the next token is "smoothed."
- Temperature = 0: The model always chooses the most probable token. Deterministic, repeatable, but potentially boring.
- Temperature = 0.7: Good middle ground for most applications. Creative but still coherent.
- Temperature = 1.0: Default value. Balanced creativity.
- Temperature > 1.0: Very creative but increasingly chaotic and potentially nonsensical.
When to Use Which Temperature?
- Low (0β0.3): Fact-based answers, code, math
- Medium (0.5β0.7): General conversation, text creation
- High (0.8β1.2): Creative writing, brainstorming
The following interactive table shows how output changes with different Temperature values:
Drag the slider to see the effect
"The sun dives like a burning phoenix into the sea of clouds as the sky explodes in ecstatic colors."
Very creative, intense
Top-P (Nucleus Sampling)
Top-P limits the selection to the smallest group of tokens whose cumulative probability exceeds the value P.
- Top-P = 0.1: Only the most probable tokens (10% of probability mass)
- Top-P = 0.9: Broad selection, 90% of probability mass
- Top-P = 1.0: All tokens are possible
Recommendation: Use either Temperature OR Top-P, not both simultaneously. Many experts prefer Top-P for more control.
See how output changes with different Top-P values:
Drag the slider to see the effect
"The sky is blue due to a physical phenomenon called Rayleigh scattering, where shorter wavelengths of light are scattered more."
Balanced, informative
Top-K
Top-K limits the selection to the K most probable tokens.
- Top-K = 1: Only the most probable token (like Temperature 0)
- Top-K = 40: Selection from the 40 most probable tokens
- Top-K = 0: No limit (all tokens possible)
Top-K is less dynamic than Top-P since it selects a fixed number regardless of probabilities.
The following table demonstrates the effect of Top-K:
Drag the slider to see the effect
"Papaya, pomegranate, lychee."
More unusual selection
Frequency Penalty
Frequency Penalty reduces the probability of tokens that already appear frequently in the text. The more often a token appears, the more it is "penalized."
- 0: No penalty
- 0.5β1.0: Moderate reduction of repetition
- 2.0: Strong reduction, can make text unnatural
Here you can see how different Frequency Penalty values affect repetition:
Drag the slider to see the effect
"Loyal quadrupeds enrich human existence. Furry companions enjoy movement and offer unconditional affection."
Actively different words
Presence Penalty
Presence Penalty penalizes tokens that appear at all in the text, regardless of how often. It encourages introducing new topics.
- 0: No penalty
- 0.5: Encourages new words and concepts
- 1.0+: Strong promotion of diversity
The table shows how Presence Penalty encourages the model to diverge topically:
Drag the slider to see the effect
"Mobility shapes our modern society. Sustainability and ecological footprint also play an important role."
Related topics
Practical Recommendations
| Use Case | Temperature | Top-P |
|---|---|---|
| Code Generation | 0β0.2 | 0.1β0.3 |
| Factual Answers | 0.3β0.5 | 0.5β0.7 |
| General Conversation | 0.7 | 0.9 |
| Creative Writing | 0.9β1.2 | 0.95β1.0 |
Conclusion
Sampling parameters are powerful tools for controlling LLM behavior. For most applications, experimenting with Temperature or Top-P is sufficient. Penalties are useful for avoiding repetition. Experiment with different values to find the optimal settings for your use case.
