OpenAI Yann Dubois 谈 GPT-5.5:AI 进步为何突然变得真实OpenAI's Yann Dubois on GPT-5.5: Why AI Progress Suddenly Feels Real
The Takeaway: AI 进步看似阶跃函数,实际是持续能力提升跨越可靠性阈值后带来的真实可用性,特别是强化学习从可验证领域扩展到真实世界工作。
OpenAI Post-Training Frontiers 团队联席负责人 Yann Dubois 分享了 GPT-5.5 的开发洞见。此前在 Stanford 共同创作 Stanford Alpaca 的他,强调去年底模型可靠性达到临界点,用户现在能真正信任 AI 处理大量工作。进步感觉突然加速有三个原因:可靠性跨越、模型自我加速(尤其是编码),以及强化学习从数学/编码竞赛转向真实用户用例。
Dubois 解释了 pre-training、mid-training 和 post-training 的区别,重点讨论 RL 如何帮助模型处理模糊、messy 的真实世界任务。效率提升显著,GPT-5.5 在多数任务上快 2 倍。他对持续学习(continual learning)充满期待,认为这是当前最大未解挑战之一。
"We moved from competitions to usefulness to users, and that's what we are feeling right now." 这一转变标志着 AI 从玩具走向生产力工具的关键时刻。
OpenAI Post-Training Frontiers 团队联席负责人 Yann Dubois 分享了 GPT-5.5 的开发洞见。此前在 Stanford 共同创作 Stanford Alpaca 的他,强调去年底模型可靠性达到临界点,用户现在能真正信任 AI 处理大量工作。进步感觉突然加速有三个原因:可靠性跨越、模型自我加速(尤其是编码),以及强化学习从数学/编码竞赛转向真实用户用例。
Dubois 解释了 pre-training、mid-training 和 post-training 的区别,重点讨论 RL 如何帮助模型处理模糊、messy 的真实世界任务。效率提升显著,GPT-5.5 在多数任务上快 2 倍。他对持续学习(continual learning)充满期待,认为这是当前最大未解挑战之一。
"We moved from competitions to usefulness to users, and that's what we are feeling right now." 这一转变标志着 AI 从玩具走向生产力工具的关键时刻。
The Takeaway: AI progress feels like a step function but is actually continuous capability gains crossing a reliability threshold, enabling genuine real-world usefulness — especially as reinforcement learning moves from verifiable domains into messy real work.
Yann Dubois, co-lead of OpenAI's Post-Training Frontiers team and co-creator of Stanford Alpaca, shares insights from GPT-5.5 development. Reliability crossed a key threshold around December last year, allowing teams to trust models for substantial work. The sudden-feeling acceleration comes from three factors: crossing reliability, self-acceleration via better coding models, and RL expanding from math/coding competitions to real user utility.
Dubois breaks down pre/mid/post-training and highlights how RL tools generalized beyond verifiable rewards. Efficiency improved dramatically — most tasks run 2x faster. He is particularly excited about continual learning as one of the biggest unsolved problems.
"We moved from competitions to usefulness to users, and that's what we are feeling right now." This marks the shift from AI as toy to genuine productivity tool.
查看原文 →
Yann Dubois, co-lead of OpenAI's Post-Training Frontiers team and co-creator of Stanford Alpaca, shares insights from GPT-5.5 development. Reliability crossed a key threshold around December last year, allowing teams to trust models for substantial work. The sudden-feeling acceleration comes from three factors: crossing reliability, self-acceleration via better coding models, and RL expanding from math/coding competitions to real user utility.
Dubois breaks down pre/mid/post-training and highlights how RL tools generalized beyond verifiable rewards. Efficiency improved dramatically — most tasks run 2x faster. He is particularly excited about continual learning as one of the biggest unsolved problems.
"We moved from competitions to usefulness to users, and that's what we are feeling right now." This marks the shift from AI as toy to genuine productivity tool.