Inference-Time Techniques for LLM Reasoning

Disclaimer: Prompt optimisation is a dynamic field, new tricks are being developed every day, so stay tuned!

This page outlines various prompting techniques that can significantly improve LLM’s reasoning capabilities during inference. While these methods enhance performance, they also increase input prompt length or require multiple prompts, resulting in higher computational costs.

1. CoT Prompting Techniques

Chain-of-Thought (CoT) prompting encourages LLMs to break down complex reasoning into intermediate steps, similar to how humans solve problems. This approach helps models tackle multi-step reasoning tasks more effectively by providing a structured thinking path.

There are several CoT variants:

Few-shot CoT: Provides examples of step-by-step reasoning before asking the model to solve a problem
Zero-shot CoT: Uses simple prompts like “Let’s think step by step” without examples
Analogical prompting: Instructs the LLM to generate relevant examples by itself before solving

Effectiveness hierarchy:

┌────────────────────────┐
│ Most Effective         │
├────────────────────────┤
│ Analogical Prompting   │ ← (LLM produces examples close to human-generated one => initiating reasoning)
├────────────────────────┤
│ Few-shot CoT           │ ← (Human provides examples)
├────────────────────────┤
│ Zero-shot CoT          │ ← ("Let's think step by step")
├────────────────────────┤
│ Zero-shot (basic)      │ ← (Direct question)
├────────────────────────┤
│ Least Effective        │
└────────────────────────┘

2. LLM as a prompt optimiser

Optimiser can include two LLMs:

For generating better prompts
For prompt evaluation

3. Least-to-most prompting

A prompt engineering method that increases the problem-solving capabilities of LLMs by breaking down complex problems into a series of simpler subproblems that get executed sequentially. Implementation can be done by chaining 3 prompts:

Generate/find examples and decompose them
Decompose original task
Solve subproblems

4. Self-Discover

Instruct the LLM to compose reasoning structures for each task (2 stages):

Discover the structure
Ask to solve according to the structure

5. Self-Consistency

This technique involves generating multiple answers and selecting the most consistent final answer. It significantly boosts performance compared to single-path generation, but increases token usage in n times (n - number of generated answers).

6. Tree of Thoughts (ToT)

This approach extends CoT by exploring multiple reasoning paths in parallel, evaluating (by some trained evaluator model) the intermediate steps, and selecting the most promising branches (can reduce costs by cutting down non-promising paths if there is a good evaluator).

7. Iterative Self-Improvement

Methods like Reflexion and Self-Refine are presented, where LLMs generate feedback on their own outputs and iteratively refine them. Self-debugging for code generation is also covered.

8. Limitations of Self-Correction

Self-correction without an oracle verifier can sometimes hurt performance, as LLMs may incorrectly judge their own answers (and such an oracle is often unavailable).

9. Optimization of Inference Cost

Prompting Techniques can improve performance significantly, however bigger (more expensive) models provide boost on complicated tasks where smaller models struggle to show correct solutions.

Overall

The presentation explores various techniques to improve LLM reasoning at inference time, emphasizing the importance of prompting strategies, generating multiple solutions, and iterative self-improvement. It also acknowledges the limitations of current self-correction methods and the ongoing research in this area.