Artificial Intelligence can't operate effectively without contemplation time
Lilian Weng, co-founder of Thinking Machines Lab and a former VP of AI Safety & Robotics at OpenAI, has been advocating for a shift in the future of AI. Instead of prioritising speed, the focus should be on how well AI thinks, to avoid hallucinations and biased results. This approach aims to ensure not only better performance but also transparency, safety, and trust in AI systems.
Weng has written an article titled "Why We Think", explaining how giving language models the ability to "think" at test time can lead to significantly better performance. Methods like sequential revision and reinforcement learning on checkable tasks enable this progress by optimising for correctness, allowing models to learn not just to answer but to reason their way to better answers.
Some users employ external tools like code interpreters or knowledge bases to offload certain steps in AI reasoning. However, Weng's focus is on internal strategies that can be built into AI models. For instance, "thinking tokens" are being injected to slow down reasoning in AI models, giving them "thinking time" to engage in more deliberate, multi-step reasoning.
This approach mirrors Daniel Kahneman’s dual-process theory, moving from fast, instinctive responses to slower, more logical reasoning. Techniques such as chain-of-thought (CoT) prompting and test-time compute help models break down problems into intermediate reasoning steps, reflect, and revise their outputs. This process improves accuracy and performance on complex tasks.
Allowing extra computation at inference time transforms thinking itself into a resource. Techniques such as beam search, best-of sampling, and sequential revision help guide the model to consider multiple solution paths and choose the best one. This reasoning process enables smaller models, when given more "thinking time", to achieve results comparable to much larger models that rely on direct, single-step decoding.
Reinforcement learning on tasks with verifiable correctness further optimises the model’s ability to reason rather than just recall patterns. Hence, the core insight is that giving models the ability to "think longer" — by dynamically allocating computation and iteratively refining their answers — enables them to handle complexity and reasoning challenges more effectively, resulting in higher quality outputs.
Moreover, latent-variable training is being explored to model hidden thought processes in AI systems explicitly. The full breakdown of the logic and theory behind AI reasoning can be found here.
In summary, giving AI models "thinking time" leads to better answers because it enables iterative, reflective reasoning akin to human deliberation, improving problem-solving and answer correctness beyond rapid, one-shot response generation.
Artificial-intelligence models, like those proposed by Lilian Weng, are being designed to think more like humans, engaging in deliberate, multi-step reasoning to improve accuracy and performance. This is achieved through strategies like 'thinking tokens' and techniques such as chain-of-thought prompting, which transform inference time into a thought resource, optimizing for correctness rather than just speed.