How does "chain-of-thought" prompting work?
Chain-of-thought prompting (CoT) refers to techniques which make a language model generate intermediate reasoning steps in its output. CoT methods can substantially cut down the number of errors a model makes.
Input:
Take the last letters of the name “Lady Gaga” and concatenate them.
Model Output (without intermediate reasoning steps):
The answer is “ya”.
Model Output (with intermediate reasoning steps):
The last letter of “Lady” is “y”. The last letter of “Gaga” is “a”. Concatenating them gives “ya”. So the answer is “ya”.
Chain-of-thought prompting can be done via few-shot prompting (i.e., giving the model examples of chain-of-thought reasoning to emulate) or zero-shot prompting (i.e., asking the model in the input prompt to "think step-by-step").
The difference between standard prompting and chain-of-thought prompting for an arithmetic reasoning problem is illustrated in the figure below.
Source: Wei et al., Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (2023)
Chain-of-thought prompting works especially well on difficult natural language processing tasks like multi-step arithmetic, symbolic reasoning, and common-sense reasoning. This prompting method can also be automated by using LLMs to generate prompts and evaluate responses.
A model’s ability to break down complex problems into intermediate steps is referred to as “chain-of-thought reasoning”, which emerges from increasing model scale. However, the explanations a model generates in chain-of-thought reasoning might not represent its actual reasoning. This implies that chain-of-thought prompting cannot be relied upon to provide a faithfully interpretable window into the model’s actual reasoning.
The main limitation of chain-of-thought prompting is that it generalizes poorly from the examples in the input prompt to harder problems. Further work on techniques like “least-to-most prompting” has tried to address this limitation.