OpenAI 首席科学家谈持续学习、RL 扩展及对齐未来方向OpenAI Chief Scientist on Continual Learning, RL Beyond Code, and Alignment Directions
关键要点:OpenAI 正在按计划在 9 月实现研究实习生级 AI 能力,将持续学习视为他们正在扩展的核心路径,而不是炒作的旁枝末节。
OpenAI 首席科学家 Ako Paioki 是塑造 AI 未来的最重要人物之一。他解释说,编码工具在 OpenAI 内部爆炸式增长,处理了大部分实际编码工作,而数学和物理进展为缩小智能差距提供了可衡量的信号。对于医疗或法律等更难领域,挑战在于更长周期的任务,模型需要自行评估部分进展,这与 RL 扩展相吻合。
在对齐方面,思维链提供了一个强大的检查工具,因为它未被直接监督,能以自然语言揭示真实动机。'我认为这里有一些非常令人兴奋的地方,就是训练信号不会与我们对抗。' Paioki 强调要利用模型进步加速研究本身,同时工作角色会随着 AI 演变为更高复杂度——人类将为更大问题设定愿景。
OpenAI 首席科学家 Ako Paioki 是塑造 AI 未来的最重要人物之一。他解释说,编码工具在 OpenAI 内部爆炸式增长,处理了大部分实际编码工作,而数学和物理进展为缩小智能差距提供了可衡量的信号。对于医疗或法律等更难领域,挑战在于更长周期的任务,模型需要自行评估部分进展,这与 RL 扩展相吻合。
在对齐方面,思维链提供了一个强大的检查工具,因为它未被直接监督,能以自然语言揭示真实动机。'我认为这里有一些非常令人兴奋的地方,就是训练信号不会与我们对抗。' Paioki 强调要利用模型进步加速研究本身,同时工作角色会随着 AI 演变为更高复杂度——人类将为更大问题设定愿景。
The Takeaway: OpenAI is on track for research-intern level AI capabilities by September, with continual learning as the core path they're scaling toward rather than a hyped side quest.
OpenAI Chief Scientist Ako Paioki is literally one of the most important people shaping the future of AI. He explains that coding tools have exploded at OpenAI, handling the majority of actual coding, and math and physics progress provides measurable signals for closing intelligence gaps. For tougher domains like medicine or law, the challenge is longer-horizon tasks where models must self-assess partial progress, coinciding with RL scaling.
On alignment, chain-of-thought offers a powerful inspection tool because it's not directly supervised, revealing true motivations in natural language. 'I think there's something very exciting here about just not having the training signal fight against us.' Paioki stresses urgency in using model advances to accelerate research itself, while roles evolve to higher complexity with AI - humans will set the vision for bigger problems.
查看原文 →
OpenAI Chief Scientist Ako Paioki is literally one of the most important people shaping the future of AI. He explains that coding tools have exploded at OpenAI, handling the majority of actual coding, and math and physics progress provides measurable signals for closing intelligence gaps. For tougher domains like medicine or law, the challenge is longer-horizon tasks where models must self-assess partial progress, coinciding with RL scaling.
On alignment, chain-of-thought offers a powerful inspection tool because it's not directly supervised, revealing true motivations in natural language. 'I think there's something very exciting here about just not having the training signal fight against us.' Paioki stresses urgency in using model advances to accelerate research itself, while roles evolve to higher complexity with AI - humans will set the vision for bigger problems.