Chain-of-Thought Prompting – Definition & Explanation

What is Chain-of-Thought Prompting?

Chain-of-Thought (CoT) Prompting is a prompting technique where an AI model is instructed to explain its thinking process step by step before giving a final answer. This significantly improves quality on complex reasoning tasks.

The technique was introduced by Google researchers in 2022 and has since become one of the most important prompting techniques.

Why Does Chain-of-Thought Work?

LLMs generate text token by token. With complex problems, direct answering often leads to errors because the model doesn't have enough "room to think." CoT forces the model to formulate intermediate steps, which reduces the error rate on mathematical and logical tasks by up to 50%.

How Much Does Chain-of-Thought Really Help?

Research shows impressive improvements from Chain-of-Thought prompting, especially on math and logic tasks. Here are the concrete benchmark results from the key papers:

CoT Performance Comparison

Accuracy of different prompting methods on popular benchmarks (PaLM-540B)

Standard Prompting

Chain-of-Thought

Self-Consistency

Source: Wei et al. 2022, Wang et al. 2022

Key Research Insights

+218 %

GSM8K: +218% Improvement

For math word problems, Chain-of-Thought increased accuracy from 17.9% to 56.9% – an improvement of over 200%.

Wei et al. 2022

+17,5 pp

Self-Consistency: +17.5 Percentage Points

By majority voting over 40 reasoning paths, Self-Consistency improved GSM8K accuracy from 56.9% to 74.4%.

Wang et al. 2022

+794 %

Symbolic Reasoning: +794% on Last Letter

For symbolic tasks like letter concatenation, accuracy jumped from 6.6% to 59% – nearly 8x improvement.

Wei et al. 2022

GSM8K

Math

Standard Prompting17.9 %

Chain-of-Thought56.9 %

Self-Consistency74.4 %

Improvement+217.9 %

SVAMP

Math

Standard Prompting79.0 %

Chain-of-Thought79.0 %

Self-Consistency86.6 %

Improvement+0.0 %

StrategyQA

Reasoning

Standard Prompting65.4 %

Chain-of-Thought77.8 %

Self-Consistency81.6 %

Improvement+19.0 %

CommonsenseQA

Commonsense

Standard Prompting79.0 %

Chain-of-Thought79.9 %

Improvement+1.1 %

Last Letter

Symbolic

Standard Prompting6.6 %

Chain-of-Thought59.0 %

Improvement+793.9 %

Coin Flip

Symbolic

Standard Prompting50.0 %

Chain-of-Thought99.6 %

Improvement+99.2 %

Sources

Wei et al. 2022 - Chain-of-Thought Prompting Wang et al. 2022 - Self-Consistency Kojima et al. 2022 - Zero-Shot CoT

CoT Variants: From Zero-Shot to Graph of Thoughts

Since Chain-of-Thought Prompting was introduced in 2022, numerous variants and advancements have been researched. From the simple Zero-Shot variant to complex tree structures like Tree of Thoughts – each technique has its strengths and optimal use cases.

The following overview compares all major CoT variants:

Chain-of-Thought Variants Comparison

All scientifically founded CoT techniques from current research papers

Showing 14 of 14 techniques

When to Use Chain-of-Thought?

Mathematical Problems: Word problems, calculations, statistics
Logical Reasoning: "If A, then B" chains
Multi-Step Tasks: Problems requiring multiple steps
Code Debugging: Systematic error analysis
Decision Making: Pro/con evaluations

Practical Examples

Without Chain-of-Thought

Question: "A train travels 120 km in 2 hours. How long does it take for 300 km?"
Answer: "5 hours" (often wrong or without justification)

With Chain-of-Thought

Question: "A train travels 120 km in 2 hours. How long does it take for 300 km? Think step by step."
Answer: "Step 1: Calculate speed: 120 km ÷ 2 h = 60 km/h. Step 2: Time for 300 km: 300 km ÷ 60 km/h = 5 hours."

Tips for Effective CoT

Be explicit: "Show every step of your reasoning"
Request structure: "Number your steps"
Demand justifications: "Explain why you take each step"
Combine with self-verification: "Verify your result at the end"

Limitations of Chain-of-Thought

More Tokens: CoT answers are longer and cost more
Not Always Necessary: For simple questions, CoT is overkill
Can Mislead: Convincing-sounding but incorrect reasoning chains
Model-Dependent: Smaller models benefit less from CoT

Conclusion

Chain-of-Thought Prompting is one of the most effective techniques for leveraging the reasoning capabilities of LLMs. For complex tasks, "Think step by step" should be part of the standard repertoire. The technique is easy to apply and delivers measurably better results.

What is Chain-of-Thought Prompting?

The technique was introduced by Google researchers in 2022 and has since become one of the most important prompting techniques.

How Much Does Chain-of-Thought Really Help?

Research shows impressive improvements from Chain-of-Thought prompting, especially on math and logic tasks. Here are the concrete benchmark results from the key papers:

CoT Performance Comparison

Accuracy of different prompting methods on popular benchmarks (PaLM-540B)

Standard Prompting

Chain-of-Thought

Self-Consistency

Source: Wei et al. 2022, Wang et al. 2022

Key Research Insights

+218 %

GSM8K: +218% Improvement

For math word problems, Chain-of-Thought increased accuracy from 17.9% to 56.9% – an improvement of over 200%.

Wei et al. 2022

+17,5 pp

Self-Consistency: +17.5 Percentage Points

By majority voting over 40 reasoning paths, Self-Consistency improved GSM8K accuracy from 56.9% to 74.4%.

Wang et al. 2022

+794 %

Symbolic Reasoning: +794% on Last Letter

For symbolic tasks like letter concatenation, accuracy jumped from 6.6% to 59% – nearly 8x improvement.

Wei et al. 2022

GSM8K

Math

Standard Prompting17.9 %

Chain-of-Thought56.9 %

Self-Consistency74.4 %

Improvement+217.9 %

SVAMP

Math

Standard Prompting79.0 %

Chain-of-Thought79.0 %

Self-Consistency86.6 %

Improvement+0.0 %

StrategyQA

Reasoning

Standard Prompting65.4 %

Chain-of-Thought77.8 %

Self-Consistency81.6 %

Improvement+19.0 %

CommonsenseQA

Commonsense

Standard Prompting79.0 %

Chain-of-Thought79.9 %

Improvement+1.1 %

Last Letter

Symbolic

Standard Prompting6.6 %

Chain-of-Thought59.0 %

Improvement+793.9 %

Coin Flip

Symbolic

Standard Prompting50.0 %

Chain-of-Thought99.6 %

Improvement+99.2 %

Sources

Wei et al. 2022 - Chain-of-Thought Prompting Wang et al. 2022 - Self-Consistency Kojima et al. 2022 - Zero-Shot CoT

CoT Variants: From Zero-Shot to Graph of Thoughts

The following overview compares all major CoT variants:

Chain-of-Thought Variants Comparison

All scientifically founded CoT techniques from current research papers

Showing 14 of 14 techniques

Practical Examples

Without Chain-of-Thought

Question: "A train travels 120 km in 2 hours. How long does it take for 300 km?"
Answer: "5 hours" (often wrong or without justification)

With Chain-of-Thought

What is Chain-of-Thought Prompting?

Why Does Chain-of-Thought Work?

How Much Does Chain-of-Thought Really Help?

CoT Performance Comparison

Key Research Insights

GSM8K: +218% Improvement

Self-Consistency: +17.5 Percentage Points

Symbolic Reasoning: +794% on Last Letter

GSM8K

SVAMP

StrategyQA

CommonsenseQA

Last Letter

Coin Flip

Sources

CoT Variants: From Zero-Shot to Graph of Thoughts

Chain-of-Thought Variants Comparison

Active Prompting

Complexity-Based CoT

Contrastive Chain-of-Thought(Contrastive CoT)

Graph of Thoughts(GoT)

Plan-and-Solve(PS+)

Reflexion

Self-Refine

Tree of Thoughts(ToT)

Automatic Chain-of-Thought(Auto-CoT)

Few-Shot Chain-of-Thought(Few-Shot CoT)

Least-to-Most Prompting

Program of Thoughts(PoT)

Self-Consistency(SC-CoT)

Zero-Shot Chain-of-Thought(Zero-Shot CoT)

When to Use Chain-of-Thought?

Practical Examples

Without Chain-of-Thought

With Chain-of-Thought

Tips for Effective CoT

Limitations of Chain-of-Thought

Conclusion

Finn Hillebrandt

Related AI Terms

What is Chain-of-Thought Prompting?

Why Does Chain-of-Thought Work?

How Much Does Chain-of-Thought Really Help?

CoT Performance Comparison

Key Research Insights

GSM8K: +218% Improvement

Self-Consistency: +17.5 Percentage Points

Symbolic Reasoning: +794% on Last Letter

GSM8K

SVAMP

StrategyQA

CommonsenseQA

Last Letter

Coin Flip

Sources

CoT Variants: From Zero-Shot to Graph of Thoughts

Chain-of-Thought Variants Comparison

Active Prompting

Complexity-Based CoT

Contrastive Chain-of-Thought(Contrastive CoT)

Graph of Thoughts(GoT)

Plan-and-Solve(PS+)

Reflexion

Self-Refine

Tree of Thoughts(ToT)

Automatic Chain-of-Thought(Auto-CoT)

Few-Shot Chain-of-Thought(Few-Shot CoT)

Least-to-Most Prompting

Program of Thoughts(PoT)

Self-Consistency(SC-CoT)

Zero-Shot Chain-of-Thought(Zero-Shot CoT)

When to Use Chain-of-Thought?

Practical Examples

Without Chain-of-Thought

With Chain-of-Thought

Tips for Effective CoT

Limitations of Chain-of-Thought

Conclusion

Finn Hillebrandt

Related AI Terms