AI Builders Digest — 2026-04-10

🔥 热点话题

OpenAI 首席科学家谈持续学习、RL 与对齐方向OpenAI Chief Scientist on Continual Learning, RL, and Alignment Directions

关键要点：通过扩展预训练和强化学习，OpenAI 已经在实现长时程 AI 代理和研究所需的持续学习能力，并正朝着 9 月达到研究级 AI 系统、2028 年实现全自动研究者的重大里程碑迈进。

OpenAI 首席科学家 Ako Paioki 是全球最重要的 AI 思想家之一，他参与了每一代模型的重大改进。他认为持续学习不是被忽视的问题，而是当前扩展工作的核心目标。数学和物理的进步为推理改进提供了清晰基准，重点转向真实的经济和科学影响。对于医学或法律等更难领域，更长的时程和自我评估部分进展是关键前沿，RL 扩展显示出前景。对齐受益于链式思考监控，因为推理轨迹没有被直接监督，为模型真实动机和来自预训练数据的泛化提供了洞见。

一句难忘的话：‘我绝对同意持续学习真的是关键。它真的是我们正在构建的东西，但我并不认为这是一个被忽视且偏离当前道路的问题。我认为它就是我们正在努力的方向。’

Paioki 敦促社会为自动化智力工作、就业转变以及强大 AI 组织的治理做好准备。

The Takeaway: OpenAI is on track to achieve research intern-level AI capabilities by September and fully automated AI researchers by 2028, driven by scaling pretraining and reinforcement learning for longer-horizon tasks and better generalization.

OpenAI Chief Scientist Ako Paioki, one of the planet's most influential AI minds, has been at the forefront of every major model improvement. He sees continual learning not as a neglected problem but as the very goal of current scaling efforts. Math and physics progress serve as clear benchmarks for reasoning improvements, shifting focus to real economic and scientific impact. For harder domains like medicine or law, longer horizons and self-evaluation of partial progress are key frontiers, with RL scaling showing promise. Alignment benefits from chain-of-thought monitoring, as reasoning traces aren't directly supervised, offering insight into true model motivations and generalization from pretraining data.

A memorable quote: 'I definitely agree that continual learning is really the thing. It's really the thing that we're building, but I don't really think this is like a problem that's ignored and off the path of what we're doing currently. Think it it is what we're working towards.'

Paioki urges society to prepare for automated intellectual work, job shifts, and governance of powerful AI organizations.

查看原文 →

Andrej Karpathy 指出 AI 能力认知差距正在扩大Andrej Karpathy Highlights Growing AI Capability Perception Gap

Andrej Karpathy，前 Tesla AI 总监和 OpenAI 创始团队成员，观察到人们对 AI 能力的理解差距正在扩大。许多人基于去年的免费 ChatGPT 版本形成看法，看到旧模型的幻觉和怪癖，或语音模式在简单任务上出错。相比之下，在编程、数学和研究等技术领域付费使用最新前沿代理模型如 OpenAI Codex 和 Claude Code 的专业人士，则见证了惊人进步——模型现在能自主处理以前需要数周的多天编码任务。这得益于可验证奖励支持有效的 RL，以及高 B2B 价值促使这些改进优先发展。这两个群体在对话中各说各话，专家看到巨大的生产力和网络影响，而其他人只看到炒作。

Andrej Karpathy, AI researcher and former Tesla Director of AI and OpenAI founding team member, observes a growing gap in understanding AI capabilities. Many people base their views on last year's free ChatGPT tier, seeing hallucinations and quirks in old models or voice modes fumbling simple tasks. In contrast, professionals paying for and using the latest frontier agentic models like OpenAI Codex and Claude Code in technical domains such as programming, math, and research experience staggering progress—models now handle multi-day autonomous coding tasks that previously took weeks. This is due to verifiable rewards enabling effective RL and high B2B value prioritizing these improvements. The two groups speak past each other, with experts seeing massive productivity and cyber implications while others see hype.

查看原文 →查看原文 →

Box CEO Aaron Levie 谈 AI 自动化需求的被低估Box CEO Aaron Levie on Underestimated Demand for AI Automation

Box CEO Aaron Levie 认为，大多数人大幅低估了在非‘软件’领域对软件和自动化的总需求，例如 CPG、零售、制药、银行和医疗保健。代理现在使自动化非结构化数据、可变数据流和复杂工作流成为可能，这些以前成本太高或太复杂。AI 采用是两个世界的故事：聊天工具提供有限收益，而长时程代理提供 100-200% 的生产力提升且没有上限。Prompting 仍然是一项高杠杆技能，相当于给新同事清晰指示。企业面临数据上下文、合规和安全等棘手问题，为代理平台和新组织角色创造了机会。

Box CEO Aaron Levie believes most people substantially underestimate the total demand for software and automation in areas that don’t feel like ‘software,’ such as CPG, retail, pharma, banking, and healthcare. Agents now make it feasible to automate unstructured data, variable data flows, and complex workflows that were previously too costly or complex. AI adoption is a tale of two cities: chat tools offer capped gains, while long-running agents deliver 100-200% productivity with no upper limit. Prompting remains a high-leverage skill equivalent to giving clear instructions to a new colleague. Enterprises face thorny issues like data context, compliance, and security, creating opportunities for agentic platforms and new organizational roles.

查看原文 →查看原文 →查看原文 →

OpenAI 推出 100 美元 ChatGPT Pro 订阅层，Codex 广受欢迎OpenAI Launches $100 ChatGPT Pro Tier as Codex Gains Popularity

OpenAI CEO Sam Altman 表示，看到 Codex 受到如此多的喜爱非常好，并宣布根据强烈需求推出 100 美元 ChatGPT Pro 订阅层。

OpenAI CEO Sam Altman notes it is very nice to see Codex getting so much love and announces the launch of a $100 ChatGPT Pro tier by very popular demand.

查看原文 →

💰 创业成功案例

Google 在 Gemini App 中解锁完整 Lyria 3 音乐生成Google Unlocks Full Lyria 3 Music Generation in Gemini App

Google VP Josh Woodward 宣布，用户现在可以在 Gemini App 中免费创建第一首完整歌曲。在不到 50 天内，已生成超过 1 亿首歌曲。用户每天可获得高达 5 首完整曲目（每首约 3 分钟），使用 Lyria 3，如果达到限制则可创建 30 秒片段，并有升级选项。从图像和视频扩展到音乐创作。

Google VP Josh Woodward announces that users can now create their first full song for free in Gemini App. In under 50 days, over 100 million songs generated. Users get up to 5 full-length tracks (~3 mins each) daily with Lyria 3, or 30-second clips if limit hit, with upgrade option. Expanding from images and video to music creation.

查看原文 →

Vercel CEO Guillermo Rauch 谈代理基础设施是云的未来Vercel CEO Guillermo Rauch on Agentic Infrastructure as the Future of Cloud

Vercel CEO Guillermo Rauch 宣称代理基础设施是云的未来。对于 Claude Code、Codex、Cursor 等编码代理，基础设施必须为代理而非仅开发者设计。为部署代理：长时程计算、沙箱、token 交付网络。Vercel 本身正在变得自我修复、自我优化、自我安全，代理持有寻呼机。这将使现有公司更高效，并支持 AI 原生初创公司。他还指出，每秒一次 npx shadcn init 的惊人规模。

Vercel CEO Guillermo Rauch declares Agentic Infrastructure is the future of the cloud. For coding agents like Claude Code, Codex, Cursor, infra must support agents not just devs. To deploy agents: long-running compute, sandboxes, token delivery network. Vercel itself becoming self-healing, self-optimizing, self-securing with the agent holding the pager. This will make existing companies more efficient and support AI-native startups. He also notes remarkable scale with one npx shadcn init every second.

查看原文 →查看原文 →

🛠️ 开发者工具与技巧

Anthropic 为 Sonnet 代理引入 Opus 顾问工具Anthropic Introduces Opus Advisor Tool for Sonnet Agents

Anthropic 研究员 Alex Albert 和 Claude 团队宣布 Opus 顾问工具在 Claude Platform 上 beta 可用。Sonnet 或 Haiku 代理可在运行中通过调用 Opus 寻求‘打电话给朋友’处理困难决策，在 SWE-bench Multilingual 上性能提升 2.7 个百分点，同时任务成本降低 11.9%。通过 Messages API 在单个请求中添加。

Anthropic researchers Alex Albert and the Claude team announce the beta availability of the advisor tool on the Claude Platform. Sonnet or Haiku agents can 'phone a friend' by calling Opus for hard decisions mid-run, improving performance by 2.7 points on SWE-bench Multilingual while reducing cost by 11.9%. Add via Messages API in a single request.

查看原文 →查看原文 →查看原文 →查看原文 →

Anthropic 团队分享 Claude Code 设置和 Prompting 技巧Anthropic Team Shares Claude Code Setup and Prompting Tips

Anthropic 的 Cat Wu 宣布使用 Bedrock 和 Vertex 设置 Claude Code 的速度更快。同事 Thariq 强调了强大的 Monitor Tool，用户必须明确提示使用（例如，‘启动我的开发服务器并使用 MonitorTool 观察错误’）。Thariq 还强调 prompting 将继续是像写作或公开演讲一样的高杠杆技能——通过 harness 与代理沟通的艺术，以提升人机带宽。

Anthropic's Cat Wu announces faster setup for Claude Code with Bedrock and Vertex. Colleague Thariq highlights the powerful Monitor Tool, which users must explicitly prompt for (e.g., 'start my dev server and use the MonitorTool to observe for errors'). Thariq also emphasizes that prompting will remain a high-leverage skill like writing or public speaking—the art of communicating with agents via the harness to grow human-agent bandwidth.

查看原文 →查看原文 →查看原文 →

YC 总裁 Garry Tan 发布 GBrain 用于代理记忆YC President Garry Tan Releases GBrain for Agent Memory

Y Combinator 总裁兼 CEO Garry Tan 分享了 GBrain，这是一个 MIT 许可的开源工具，可为 OpenClaw 或 Hermes Agent 提供 10,000+ markdown 文件的完美全面回忆。它在 Hermes Agent 上使用相同安装脚本运行，并托管在 Railway 上。非常适合构建你的 mini-AGI。

Y Combinator President and CEO Garry Tan shares GBrain, an MIT-licensed open source tool for perfect total recall of 10,000+ markdown files with OpenClaw or Hermes Agent. It works on Hermes Agent with the same install script and is hosted on Railway. Ideal for building your mini-AGI.

查看原文 →查看原文 →查看原文 →

🌍 其他动态

Peter Yang 谈 AGI 抗性职业和代理偏好Peter Yang on AGI-Proof Careers and Agent Preferences

Roblox 产品负责人 Peter Yang 开玩笑说正在训练他的孩子从事 AGI 抗性职业。他指出两者都可以是正确的：没有模型能像 Opus 那样适合 OpenClaw，但使用 Claude Code 作为个人助理感觉不如 OpenClaw 那么‘我的’。他还询问 Hermes agent 是什么，称它是 OpenClaw 的奢侈包版本。

Product leader Peter Yang (at Roblox) jokes about training his kids for an AGI-proof career. He notes both can be true: no model matches Opus for OpenClaw, yet using Claude Code as personal assistant feels less 'mine' than OpenClaw. He also questions what Hermes agent is, calling it the luxury bag version of OpenClaw.

查看原文 →查看原文 →查看原文 →

Zara Zhang 谈 AI 的策略性使用以发挥个人优势Zara Zhang on Strategic Use of AI for Personal Strengths

Builder Zara Zhang 很少用 AI 写东西，因为她享受写作，觉得它既轻松又有趣。对于写作这种她在行的事，用 AI 反而常常花更多时间（迭代太多）。AI 对她最有用的是帮她处理舒适区之外的事情，比如编程，而不是舒适区之内的事。

Builder Zara Zhang rarely uses AI for writing because she enjoys it and finds it fun and efficient. AI is most useful for her in areas outside her comfort zone, like coding, rather than inside it like writing, where iterations can take more time than doing it herself.

查看原文 →