My Few Thoughts on OpenAI's o1 family models

Thinking Models Are Good Models

This time, OpenAI's latest o1 family models (o1, o1-preview, and o1-mini) are indeed very powerful, with remarkably impressive performance. I think the most noteworthy points are: 1. They possess extremely strong logical reasoning abilities; 2. The models come with built-in CoT (Chain of Thoughts), requiring minimal user prompting.

I consider these two points very important because in the past, when faced with complex mathematical problems, these language models were often just "guessing answers" rather than truly reasoning step by step. But this time it's different. OpenAI has specifically integrated RL and CoT based on GPT-4o, as well as added special "Reasoning tokens," making the model truly "think."

For example, when I asked o1-preview and o1-mini to calculate 279563 multiplied by 356104, they were both able to "think" first, self-reflect and correct during the thinking process, and then give the answer. Upon verification, both were correct. In the past, this task would have yielded completely incorrect results from LLMs - they would either guess or make fatal logical errors between steps. The same improvement was evident when I gave them the most challenging math problems in 2024 College Entrance Exam in China (check result here). Additionally, I tested them with several questions from the national competitions this year, and the results were also very good. So we can see that this iteration of models performs very strongly on reasoning-related tasks. Logically rigorous and self-consistent reasoning is a necessary condition for us to reach the next level of AI, namely Agent, after all, the latter needs to be able to act on behalf of humans. We can't allow them to make any mistakes, otherwise, it could lead to catastrophic consequences. (Read more about Agent and Model Autonomy here)

Moreover, we can notice the prompt "thought for x seconds". For instance, o1-preview's "thinking" time is relatively long, while mini's "thinking" time is shorter (because the latter has been specifically fine-tuned on competitive math problems). I think the potential this brings is limitless. Now it's "thinking" for a few seconds or minutes, in the future it will be "thinking" for months, making more complex reasoning and analysis, and obtaining more accurate and logical results.

These two phenomena remind me of System 1 and 2 that I mentioned in this article I wrote about possible future improvements in model reasoning abilities. These two concepts were originally used to describe human brain thinking, but now I see that o1 also has this characteristic. By definition, System 1 is responsible for intuitive fast thinking, like 1+1=2; System 2 is responsible for complex thinking that requires reasoning, such as complex mathematical problems, etc. More precise thinking can yield better and deeper results. This reminds me of another article I wrote, where I mentioned that if future models could reach the level of Nobel Prize winners, we could have hundreds of such AI copies form a research group and give them months to "think" and conduct research. Now it seems that models are already very strong in logical reasoning, biology, and other fields, so I believe the probability of this phenomenon occurring is very high. I'm looking forward to seeing AI assist humans in developing important drugs, discovering new materials, and even proving mathematical theorems in the future.

The future of humanity looks bright at the moment.