🌐 双语
Archive

AI Builders
Digest

2026-05-22 15 builders · 33 tweets · 1 podcasts · 1 blogs

🔥 热点话题

OpenAI Yann Dubois 谈 GPT-5.5:AI 进步为何突然变得真实OpenAI's Yann Dubois on GPT-5.5: Why AI Progress Suddenly Feels Real

The Takeaway: AI 进步看似阶跃函数,实际是持续能力提升跨越可靠性阈值后带来的真实可用性,特别是强化学习从可验证领域扩展到真实世界工作。

OpenAI Post-Training Frontiers 团队联席负责人 Yann Dubois 分享了 GPT-5.5 的开发洞见。此前在 Stanford 共同创作 Stanford Alpaca 的他,强调去年底模型可靠性达到临界点,用户现在能真正信任 AI 处理大量工作。进步感觉突然加速有三个原因:可靠性跨越、模型自我加速(尤其是编码),以及强化学习从数学/编码竞赛转向真实用户用例。

Dubois 解释了 pre-training、mid-training 和 post-training 的区别,重点讨论 RL 如何帮助模型处理模糊、messy 的真实世界任务。效率提升显著,GPT-5.5 在多数任务上快 2 倍。他对持续学习(continual learning)充满期待,认为这是当前最大未解挑战之一。

"We moved from competitions to usefulness to users, and that's what we are feeling right now." 这一转变标志着 AI 从玩具走向生产力工具的关键时刻。
The Takeaway: AI progress feels like a step function but is actually continuous capability gains crossing a reliability threshold, enabling genuine real-world usefulness — especially as reinforcement learning moves from verifiable domains into messy real work.

Yann Dubois, co-lead of OpenAI's Post-Training Frontiers team and co-creator of Stanford Alpaca, shares insights from GPT-5.5 development. Reliability crossed a key threshold around December last year, allowing teams to trust models for substantial work. The sudden-feeling acceleration comes from three factors: crossing reliability, self-acceleration via better coding models, and RL expanding from math/coding competitions to real user utility.

Dubois breaks down pre/mid/post-training and highlights how RL tools generalized beyond verifiable rewards. Efficiency improved dramatically — most tasks run 2x faster. He is particularly excited about continual learning as one of the biggest unsolved problems.

"We moved from competitions to usefulness to users, and that's what we are feeling right now." This marks the shift from AI as toy to genuine productivity tool.
查看原文 →

Aaron Levie 分析 AI 代理与成本分层Aaron Levie on AI Agents and Widening Cost Stratification

Box CEO Aaron Levie 指出 AI 从廉价小上下文聊天工具转向巨上下文、长运行代理,推理成本提升一个数量级。这一变化比多数人预想的更快发生,导致真实美元流入加速。

未来将是前沿用例(如编码、科学、金融)持续使用高能力模型,同时任务向更低成本模型剥离。成本不会收敛到单一低价,而是按任务分层扩大。企业需建立新程序、财务团队和技术方案来管理这一复杂性。
Box CEO Aaron Levie notes the shift from cheap small-context AI chat tools to giant-context, longer-running agents with inference costs an order of magnitude higher. This change compounded faster than most realized, driving more real dollars flowing in.

The future involves continued use of frontier models for high-value use cases like coding, sciences, finance, with tasks peeling off to cheaper capable models. Costs won't converge to one low price but will widen stratification by task. Enterprises will need programs, finance teams, and tech solutions to manage this.
查看原文 →

💰 创业成功案例

Replit 推出应用变现信用奖励Replit Launches Monetization Credit Rewards

Replit CEO Amjad Masad 宣布:变现你的应用,我们将给予信用奖励。同时强调不应强制用户与销售对话才能购买产品。
Replit CEO Amjad Masad announced: Monetize your apps and we'll give you credit rewards. He also stressed that customers shouldn't be forced to talk to sales to buy the product.
查看原文 →查看原文 →

Garry Tan 谈成为 1000x founderGarry Tan on Becoming a 1000x Founder

Y Combinator CEO Garry Tan 与 @sdianahu 分享如何从工程师成为 1000x founder 的实战经验。
Y Combinator CEO Garry Tan and @sdianahu share real insights on how engineers become 1000x founders.
查看原文 →

🛠️ 开发者工具与技巧

Anthropic 发布 Claude Code Auto ModeAnthropic Releases Claude Code Auto Mode

Anthropic Engineering 推出 Claude Code auto mode,这是一种更安全的跳过权限方式。它使用模型分类器在输入和输出层提供防护,针对过度热情行为和诚实错误,在真实流量上 FPR 仅 0.4%。

Auto mode 是 --dangerously-skip-permissions 的更好替代,适合希望自主运行但仍需防护的任务。
Anthropic Engineering introduces Claude Code auto mode — a safer way to skip permissions. It uses model-based classifiers at input (prompt-injection probe) and output (transcript classifier) layers to catch overeager and mistaken actions, achieving 0.4% FPR on real traffic.

Auto mode serves as a better alternative to --dangerously-skip-permissions for tasks where autonomy is desired with guardrails.

https://www.anthropic.com/engineering/claude-code-auto-mode
查看原文 →

Google Labs 在 I/O 大放异彩Google Labs Shines at I/O

Google Labs VP Josh Woodward 和团队展示了 Neural Expressive design、Project Genie 等实验,以及与 StitchbyGoogle 的合作。
Google Labs VP Josh Woodward and team showcased Neural Expressive design, Project Genie experiments, and collaborations like with StitchbyGoogle at I/O.
查看原文 →查看原文 →

Cursor 新功能与团队协作Cursor New Model, Interface, and Team Features

Cursor Design 的 Ryo Lu 介绍新 model、interface、SDK 和 automations,支持更好的团队协作。
Cursor Design's Ryo Lu announces new model, interface, SDK, and automations designed for building software better together with teams.
查看原文 →

Swyx 推荐本地优先栈Swyx on Winning Local-First Stack

Swyx 认为特定栈已在 local-first 战斗中获胜,适合快速构建应用。
Swyx believes a particular stack has won the local-first battle and is ideal for building fast apps fast.
查看原文 →

🌍 其他动态

Sam Altman 发布新 Codex 并征集 AI 问题Sam Altman Ships New Codex and Asks What Problems AI Should Solve

OpenAI CEO Sam Altman 宣布新 Codex 发布,并询问大家最希望 AI 在未来解决什么问题。
OpenAI CEO Sam Altman announced the new Codex ships today and asked what problem people most hope AI will solve in the future.
查看原文 →查看原文 →

Matt Turck 发布与 Yann Dubois 的对话Matt Turck Releases Conversation with Yann Dubois

The MAD Podcast 主持人 Matt Turck 分享了与 OpenAI Yann Dubois 的精彩对话要点。
MAD Podcast host Matt Turck shared key timestamps from his excellent conversation with OpenAI's Yann Dubois.
查看原文 →

Zara Zhang 开源 Claude Code Lark/Feishu BridgeZara Zhang Open-Sources Claude Code Lark/Feishu Bridge

Zara Zhang 发布开源工具,让用户能在 Lark/Feishu 中像同事一样与 Claude Code 对话。
Zara Zhang introduced an open-source Claude Code Lark/Feishu Bridge for chatting with Claude Code like a colleague within Lark/Feishu.
查看原文 →

🔥 热点话题

Peter Yang 尝试 Codex 自动化Peter Yang Experiments with Codex Automation

Peter Yang 称 Codex 自动化是 game changer,并尝试新功能。
Peter Yang called Codex automation a game changer and experimented with new features.
查看原文 →

🛠️ 开发者工具与技巧

Claude 询问用户创作Claude Asks What You're Making

Claude 团队鼓励用户分享用 Claude Design 创作的内容。
Claude team asked what users are making with Claude Design.
查看原文 →