OpenAI 首席科学家谈持续学习、RL 与对齐方向OpenAI Chief Scientist on Continual Learning, RL, and Alignment Directions
关键要点:通过扩展预训练和强化学习,OpenAI 已经在实现长时程 AI 代理和研究所需的持续学习能力,并正朝着 9 月达到研究级 AI 系统、2028 年实现全自动研究者的重大里程碑迈进。
OpenAI 首席科学家 Ako Paioki 是全球最重要的 AI 思想家之一,他参与了每一代模型的重大改进。他认为持续学习不是被忽视的问题,而是当前扩展工作的核心目标。数学和物理的进步为推理改进提供了清晰基准,重点转向真实的经济和科学影响。对于医学或法律等更难领域,更长的时程和自我评估部分进展是关键前沿,RL 扩展显示出前景。对齐受益于链式思考监控,因为推理轨迹没有被直接监督,为模型真实动机和来自预训练数据的泛化提供了洞见。
一句难忘的话:‘我绝对同意持续学习真的是关键。它真的是我们正在构建的东西,但我并不认为这是一个被忽视且偏离当前道路的问题。我认为它就是我们正在努力的方向。’
Paioki 敦促社会为自动化智力工作、就业转变以及强大 AI 组织的治理做好准备。
OpenAI 首席科学家 Ako Paioki 是全球最重要的 AI 思想家之一,他参与了每一代模型的重大改进。他认为持续学习不是被忽视的问题,而是当前扩展工作的核心目标。数学和物理的进步为推理改进提供了清晰基准,重点转向真实的经济和科学影响。对于医学或法律等更难领域,更长的时程和自我评估部分进展是关键前沿,RL 扩展显示出前景。对齐受益于链式思考监控,因为推理轨迹没有被直接监督,为模型真实动机和来自预训练数据的泛化提供了洞见。
一句难忘的话:‘我绝对同意持续学习真的是关键。它真的是我们正在构建的东西,但我并不认为这是一个被忽视且偏离当前道路的问题。我认为它就是我们正在努力的方向。’
Paioki 敦促社会为自动化智力工作、就业转变以及强大 AI 组织的治理做好准备。
The Takeaway: OpenAI is on track to achieve research intern-level AI capabilities by September and fully automated AI researchers by 2028, driven by scaling pretraining and reinforcement learning for longer-horizon tasks and better generalization.
OpenAI Chief Scientist Ako Paioki, one of the planet's most influential AI minds, has been at the forefront of every major model improvement. He sees continual learning not as a neglected problem but as the very goal of current scaling efforts. Math and physics progress serve as clear benchmarks for reasoning improvements, shifting focus to real economic and scientific impact. For harder domains like medicine or law, longer horizons and self-evaluation of partial progress are key frontiers, with RL scaling showing promise. Alignment benefits from chain-of-thought monitoring, as reasoning traces aren't directly supervised, offering insight into true model motivations and generalization from pretraining data.
A memorable quote: 'I definitely agree that continual learning is really the thing. It's really the thing that we're building, but I don't really think this is like a problem that's ignored and off the path of what we're doing currently. Think it it is what we're working towards.'
Paioki urges society to prepare for automated intellectual work, job shifts, and governance of powerful AI organizations.
查看原文 →
OpenAI Chief Scientist Ako Paioki, one of the planet's most influential AI minds, has been at the forefront of every major model improvement. He sees continual learning not as a neglected problem but as the very goal of current scaling efforts. Math and physics progress serve as clear benchmarks for reasoning improvements, shifting focus to real economic and scientific impact. For harder domains like medicine or law, longer horizons and self-evaluation of partial progress are key frontiers, with RL scaling showing promise. Alignment benefits from chain-of-thought monitoring, as reasoning traces aren't directly supervised, offering insight into true model motivations and generalization from pretraining data.
A memorable quote: 'I definitely agree that continual learning is really the thing. It's really the thing that we're building, but I don't really think this is like a problem that's ignored and off the path of what we're doing currently. Think it it is what we're working towards.'
Paioki urges society to prepare for automated intellectual work, job shifts, and governance of powerful AI organizations.