GPT-5.5 引发热议:首次通过 F-Zero 测试,企业评估大幅提升GPT-5.5 Sparks Discussion: First F-Zero Test Pass, Big Enterprise Gains
Peter Yang 的 F-Zero 测试首次成功:GPT-5.5 + Codex 是唯一能构建出可玩游戏模型的组合。Dan Shipper 认为 GPT-5.5 不再犹豫,直接执行计划。Box CEO Aaron Levie 分享企业评估结果,GPT-5.5 在金融、医疗、公共部门等准确率整体提升 10 个百分点(金融 83% vs 64%,医疗 78% vs 61%)。
Peter Yang's F-Zero test succeeded for the first time: GPT-5.5 + Codex was the only combo that built a working game. Dan Shipper noted that GPT-5.5 just executes plans instead of hesitating. Box CEO Aaron Levie shared enterprise evals showing a 10-point accuracy jump across industries (financial 83% vs 64%, healthcare 78% vs 61%).
查看原文 →查看原文 →查看原文 →
AI 编程大战现状:2026 年编码智能体打破边界State of AI Coding Wars: 2026 Coding Agents Break Containment
来自 Unsupervised Learning 与 Latent Space 的交叉对话:Anthropic 和 OpenAI 仅从编程产品就各自达到约 20 亿美元 ARR。Swyx 指出,2025 年是编码智能体之年,2026 年将是它们打破边界、做所有其他事情的一年。市场仍处于能力探索阶段,疯狂而富有创造力的人将获得回报。
From an Unsupervised Learning x Latent Space crossover: Anthropic and OpenAI are each at roughly $2B ARR from coding products alone. Swyx notes that 2025 was the year of coding agents, and 2026 is when they break containment to do everything else. The market is still in capability exploration, rewarding those who are crazy and creative.
查看原文 →
AI 反而让你更忙?Box CEO 提出工作悖论AI Makes You Work More? Box CEO on the Work Paradox
Aaron Levie 认为 AI 不会自动减少工作,因为工作不是静态的。AI 降低了探索成本,人们开始做更多原本因太耗时从未完成的任务。一些小事会迅速消耗 3 小时,因为智能体让启动变得容易,但剩下的工作仍需人来完成。
Aaron Levie argues AI won't automatically reduce work because work isn't static. AI lowers exploration cost, so people start doing far more tasks that previously went undone. Small things quickly consume 3 hours because agents make it easy to start, but the rest still requires human effort.
查看原文 →
Anthropic 复盘 Claude Code 质量下降:三个问题已修复Anthropic Postmortem on Claude Code Quality: Three Issues Fixed
Anthropic 工程团队调查了用户报告的 Claude Code 质量下降,发现三个独立问题:默认推理 effort 从 high 改为 medium(降低智能)、缓存 bug 导致每次对话轮次清空思考历史(造成遗忘和重复)、以及系统提示减少冗余词句损害了编码质量。所有问题已在 4 月 20 日前修复,并为所有订阅用户重置使用限额。
Anthropic Engineering investigated user reports of degraded Claude Code quality and found three separate issues: default reasoning effort changed from high to medium (reducing intelligence), a caching bug that cleared thinking history every turn (causing forgetfulness), and a system prompt change to reduce verbosity that hurt coding quality. All resolved by April 20, with usage limits reset for all subscribers.
查看原文 →
Replit CEO 反驳中国蒸馏恐慌:中国科学家在开放分享真正的突破Replit CEO Pushes Back on Chinese Distillation Scaremongering
Amjad Masad 批评美国政客炒作“中国蒸馏”恐慌,指出中国科学家正在开放分享真正的 AI 突破,这些进步与数据无关,惠及所有人,包括美国实验室。他还提到 DeepSeek v4 刚刚发布。
Amjad Masad criticized US politicians scaremongering about 'Chinese distillation,' pointing out that Chinese scientists are openly sharing real AI breakthroughs that benefit everyone, including US labs. He also noted the release of DeepSeek v4.
查看原文 →查看原文 →