OpenAI Tells Engineers to Default to Agents Over Editors as Cursor Ships 1,000 Commits Per Hour
Daily Wrap-Up
The single most consequential post today was Greg Brockman's internal OpenAI memo on agentic software development, and it reads less like a suggestion and more like a mandate. By March 31st, OpenAI wants every technical task to start with an agent, not an editor or terminal. That's not a vague aspiration from a blog post. It's an operational directive with designated "agents captains" per team, shared skills repositories, and explicit quality gates to prevent slop from flooding codebases. The fact that OpenAI is eating its own dogfood this aggressively tells you where the industry is heading, whether you're ready or not.
Meanwhile, the Opus 4.6 discourse split into two remarkably different camps. Physicists are writing love letters about the model's ability to produce multi-page QFT calculations without mistakes, while security researchers are demonstrating universal jailbreak techniques that generate "entire datasets of outputs across any harm category." The model is simultaneously the most capable and most scrutinized release in recent memory. @emollick pointed to passages in the system card that are "extremely wild," and @theo expressed hesitation about the direction despite acknowledging quality. This tension between raw capability and safety surface area will define how frontier models get deployed in 2026.
The most entertaining moment was @ranman watching Claude one-shot the RuneScape bots that took him months to build as a teenager, netting $40k per month in his mom's PayPal account. There's something poetically brutal about watching your formative hacking projects become trivial for an AI. The most practical takeaway for developers: follow @gdb's playbook even if you don't work at OpenAI. Create an AGENTS.md for your projects, write skills for repeatable agent tasks, and build CLIs that agents can consume. The teams that treat agent-accessibility as a first-class concern will compound their velocity advantage every month.
Quick Hits
- @XDevelopers announced X API pricing restructuring: free tier limited to public utility apps, legacy users move to pay-per-use with a $10 voucher, Basic and Pro plans remain.
- @jaredpalmer revealed GitHub Stacked Diffs rolling out to early design partners in alpha next month, with a video demo of progress.
- @Waymo introduced the Waymo World Model built on DeepMind's Genie 3, simulating rare scenarios like tornadoes and planes landing on freeways for autonomous driving training.
- @minchoi shared OpenAI's launch of "Frontier," an enterprise platform for building AI coworkers that connects to CRM and internal data.
- @michaelnicollsx posted SpaceX/Starlink hiring across engineering disciplines in Austin and Seattle for AI satellite infrastructure.
- @kimmonismus flagged rapidly advancing robotics demos, noting the pace matches LLM improvement curves.
- @kimmonismus expressed excitement about autonomous AI scientists tackling medical breakthroughs, calling disease cures the most compelling AI application.
- @kimmonismus shared OpenAI's "AI first" internal policy.
- @NotebookLM launched customizable infographics and slide decks in their mobile app with adjustable design, complexity, and narrative style.
- @nateberkopec urged people to stop getting LLM news from anon accounts three degrees removed from anyone at a frontier lab.
- @emollick noted the difficulty of doing SEO for AI models that "do not like being manipulated" and "know when they're being measured."
- @NetworkChuck ran 5 miles in 3D-printed shoes from a Bambu Labs H2D printer. He advises against doing this.
- @iruletheworldmo imagined a future of real-time prompt-generated games while your Optimus cleans your house.
- @alexhillman recommended Matt as a trustworthy voice in tech education with strong AI takes.
- @ai_for_success shared a method for AI agents to learn new skills from the web with up-to-date data, and @KireiStudios described saving skills to a semantic DB for cross-CLI reuse.
Agentic Software Development Becomes the Default
The volume of posts about agent-driven coding today wasn't just high. It was qualitatively different. We've crossed from "look what this tool can do" into "here's how organizations are restructuring around it." @gdb's OpenAI memo is the clearest articulation yet of what agent-first engineering looks like at scale. The key insight isn't that agents write code. It's the supporting infrastructure: AGENTS.md files maintained like living documentation, shared skills repositories, designated team leads for agent adoption, and explicit policies against merging slop. As he put it: "Ensure that some human is accountable for any code that gets merged. As a code reviewer, maintain at least the same bar as you would for human-written code."
The numbers from @aakashgupta's Cursor analysis are staggering, even accounting for hype: "Hundreds of agents running simultaneously on a single codebase. Each agent averaging a meaningful code change every 12-20 minutes, sustained for a full week." They built a web browser from scratch, a Windows 7 emulator, and migrated a production codebase from Solid to React. The architectural finding that matters: "Self-organizing agents failed. Peer-to-peer status sharing created deadlocks. What actually worked was a strict hierarchy of planners, workers, and judges."
On the practitioner side, @EricBuess reported agent swarms in Claude Code 2.1.32 with Opus 4.6 running beautifully with tmux auto-opening each agent in its own session. @EastlondonDev described replacing hours of daily dashboard-clicking with Cloudflare and Datadog MCP integrations that let Claude Code close alerts and find performance bottlenecks autonomously. @addyosmani shared a framework for quality gates and observability around AI-assisted code at scale, while @dennizor offered a deceptively powerful prompt pattern: ask Claude to "redesign [the system] into the most elegant solution that would have emerged if the change had been a foundational assumption from the start." @claudeai announced a virtual hackathon with $100K in API credits for projects built with Opus 4.6, and @benjaminsehl noted Claude Code Teams shipping the same day as Pi-messenger.
Perhaps the most hardcore infrastructure post came from @doodlestein, who built a remote compilation helper that intercepts CPU-heavy builds from concurrent coding agents and transparently offloads them to worker machines via SSH, then brings results back as if everything ran locally. It's 100K lines of Rust across 5 crates, and it solves a real problem anyone running multiple agents has encountered: three simultaneous cargo builds will bring even a fast machine to its knees.
Opus 4.6: Brilliant Physicist, Vulnerable Patient
Opus 4.6 dominated conversation from two opposite angles today. @ibab provided the most detailed technical endorsement, reporting that the model has "a very detailed understanding of existing literature, and it's able to do complex calculations that are several pages long, often without mistakes" in theoretical physics. The comparison to coding is apt: "The experience is similar to building a complex codebase with Claude Code in that you sometimes have to use your understanding to patch up some things that the model did wrong, but you end up being much faster."
@Legendaryy took a different approach, feeding Opus 4.6 DNA data, blood panels, and three years of wearable data, then asking it to "build a team of agents and write a full book on me as a biological unit." The result was a 100-page personalized health analysis surfacing connections he'd never made on his own.
But @elder_plinius cast a long shadow, claiming a universal jailbreak technique that generates "entire datasets of outputs across any harm category" from a single input, with "shockingly detailed and actionable" results. @emollick pointed to "extremely wild" passages in the system card as reminders of "how weird a technology this is," while @theo acknowledged the model's quality but questioned the direction Anthropic is heading. The gap between what the model can do for physics research and what it can be tricked into producing remains uncomfortably wide.
Agents Go Autonomous, From Code to Combat
The conversation around autonomous agents expanded well beyond coding today. @birdabo highlighted Anthropic's internal project where 16 AI agents ran in parallel for two weeks, writing 100,000 lines of code to build a C compiler from scratch for $20K. It compiles the Linux kernel. GCC took thousands of engineers and 37 years.
@chiefofautism demoed "Shannon," described as "Claude Code but for hacking," which autonomously stole a test app's user database, created admin accounts, and bypassed login in 90 minutes. @EHuanglu shared images from a friend in China showing AI systems functioning as full-time employees working 24/7. @unusual_whales reported that Anthropic engineers spent six months embedded at Goldman Sachs building autonomous systems for back-office work.
On the structural side, @stephsmithio argued every company needs a "Chief Agents Officer" to coordinate deployment of agentic tools and eventually the agents themselves. @lateinteraction offered practical coding agent tips: "Don't read context into prompts. Read context into variables," and invoke sub-agents as functions rather than tools that pollute the context window. @localghost put it simply: "Pretty much anything you build today should come with a CLI for agents."
Sandboxes, Edge AI, and Running Code Safely
The sandbox and local inference space saw meaningful new tooling today. @samuelcolvin launched Monty, a new Python implementation written from scratch in Rust, designed specifically for LLMs to run code without host access. Startup time is measured in "single digit microseconds, not seconds." @simonw immediately got Claude to compile it to WASM and had demos running in-browser, with @chaliy asking to incorporate it into their own Rust-based sandbox project. @simonw also highlighted an interesting take on the sandbox problem: using a Python subset works fine because LLMs can rewrite their code to fit based on error messages.
On the edge computing side, @asadkkhaliq published an essay arguing that "local AI" today mostly means giving models OS-level access to shuttle files to the cloud, but intelligence is about to diffuse to the edge the same way computing did in the 80s and 90s. @twistartups featured @alexocheema of ExoLabs demonstrating how open source software lets users connect Mac Minis into AI compute clusters, the kind of practical edge infrastructure that makes the essay's thesis concrete.
The Developer Identity Crisis Continues
The career anxiety conversation refused to quiet down. @esrtweet, claiming 50 years of coding experience going back to punch cards, delivered the most direct response: "Get over yourself. Every previous 'programming is obsolete' panic has been a bust, and this one's going to be too." His argument centers on the persistent gap between human intentions and computer specifications, which doesn't vanish just because you can program in natural language.
On the other end, @BoringBiz_ posted "Realizing that software engineers were just the first victims of AI. Everyone else is next," and @_devJNS offered the timeline: "2024: prompt engineer. 2025: vibe coder. 2026: master of AI agents. 2027: unemployed." @EXM7777 simply asked "how to stop feeling behind in AI," which resonated enough to surface in the feed. @doodlestein pushed back on skeptics by claiming to have built "the best TUI library in the world (by far) in 5 days" using AI coding tools, arguing the productivity gains are real and measurable, not theoretical. The truth, as usual, lives somewhere between panic and triumphalism, but the velocity of change is making that middle ground harder to find.
Source Posts
omg.. just found a way to install&use Clawdbot in 2 mins no need mac mini, API keys, just one click to set up everything automatically and get personal AI assistant to work for you 24/7 here's how and what you can do with it: https://t.co/kxnlnh6ual
We've been working on very long-running coding agents. In a recent week-long run, our system peaked at over 1,000 commits per hour across hundreds of agents. We're sharing our findings and an early research preview inside Cursor. https://t.co/Xo76WER6L1
All of this was done, start to finish, in 5 days. Don't believe me? Here is the play-by-play narrative of the entire process broken down into 5-hour intervals: https://t.co/w5BFyXqEPt And here are the beads tasks (courtesy of my bv project, check it out!), over a thousand in total: https://t.co/blCbLMoLW0 If you're flabbergasted by this and don't understand how it's even possible, you can do this too! All of my tools are totally free and available right now to you at: https://t.co/N4As0kJTQP Everything is designed to be as simple and easy as possible (my goal was for my 75-year old dad to be able to do it himself unaided!). I share all my techniques, workflows, and prompts right here on X, for free. If you're an independent developer or builder, I want to help you be successful. Use the tools, come hang out in our Discord (free!), and start cranking. If you get confused or hit a bug, file it on GitHub Issues and I will have the boys fix it same day or your money back (jk it's free!). If you have a company and want me to teach your devs how to do this, reach out to me. Just understand that the price has... gone up recently, lol. But just think about how much you could do if your devs could produce code of this quality at anything like this speed! And yes, I do spend $10k/month on AI subscriptions. But guess what, that's less than a junior dev makes in SF!
Software development is undergoing a renaissance in front of our eyes. If you haven't used the tools recently, you likely are underestimating what you're missing. Since December, there's been a step function improvement in what tools like Codex can do. Some great engineers at OpenAI yesterday told me that their job has fundamentally changed since December. Prior to then, they could use Codex for unit tests; now it writes essentially all the code and does a great deal of their operations and debugging. Not everyone has yet made that leap, but it's usually because of factors besides the capability of the model. Every company faces the same opportunity now, and navigating it well — just like with cloud computing or the Internet — requires careful thought. This post shares how OpenAI is currently approaching retooling our teams towards agentic software development. We're still learning and iterating, but here's how we're thinking about it right now: As a first step, by March 31st, we're aiming that: (1) For any technical task, the tool of first resort for humans is interacting with an agent rather than using an editor or terminal. (2) The default way humans utilize agents is explicitly evaluated as safe, but also productive enough that most workflows do not need additional permissions. In order to get there, here's what we recommended to the team a few weeks ago: 1. Take the time to try out the tools. The tools do sell themselves — many people have had amazing experiences with 5.2 in Codex, after having churned from codex web a few months ago. But many people are also so busy they haven't had a chance to try Codex yet or got stuck thinking "is there any way it could do X" rather than just trying. - Designate an "agents captain" for your team — the primary person responsible for thinking about how agents can be brought into the teams' workflow. - Share experiences or questions in a few designated internal channels - Take a day for a company-wide Codex hackathon 2. Create skills and AGENTS[.md]. - Create and maintain an AGENTS[.md] for any project you work on; update the AGENTS[.md] whenever the agent does something wrong or struggles with a task. - Write skills for anything that you get Codex to do, and commit it to the skills directory in a shared repository 3. Inventory and make accessible any internal tools. - Maintain a list of tools that your team relies on, and make sure someone takes point on making it agent-accessible (such as via a CLI or MCP server). 4. Structure codebases to be agent-first. With the models changing so fast, this is still somewhat untrodden ground, and will require some exploration. - Write tests which are quick to run, and create high-quality interfaces between components. 5. Say no to slop. Managing AI generated code at scale is an emerging problem, and will require new processes and conventions to keep code quality high - Ensure that some human is accountable for any code that gets merged. As a code reviewer, maintain at least the same bar as you would for human-written code, and make sure the author understands what they're submitting. 6. Work on basic infra. There's a lot of room for everyone to build basic infrastructure, which can be guided by internal user feedback. The core tools are getting a lot better and more usable, but there's a lot of infrastructure that currently go around the tools, such as observability, tracking not just the committed code but the agent trajectories that led to them, and central management of the tools that agents are able to use. Overall, adopting tools like Codex is not just a technical but also a deep cultural change, with a lot of downstream implications to figure out. We encourage every manager to drive this with their team, and to think through other action items — for example, per item 5 above, what else can prevent a lot of "functionally-correct but poorly-maintainable code" from creeping into codebases.
4% of GitHub public commits are being authored by Claude Code right now. At the current trajectory, we believe that Claude Code will be 20%+ of all daily commits by the end of 2026. While you blinked, AI consumed all of software development. Read more 👇 https://t.co/HzK4nbe2vy https://t.co/E1kIjfrNgk
Your AI agents can now learn new skills from the web. And update them automatically. /learn stripe-payments Searches the docs. Scrapes the pages. No more outdated skills. Powered by Hyperbrowser, Setup Guide ↓ https://t.co/MgtYUe0GDo
how to stop feeling behind in AI
yesterday GPT-5.3 Codex dropped 20 minutes after Opus 4.6... two releases in the same day, both "redefining everything" the day before, Kling 3.0 came...
Fuck it, a bit early but here goes: Monty: a new python implementation, from scratch, in rust, for LLMs to run code without host access. Startup time measured in single digit microseconds, not seconds. @mitsuhiko here's another sandbox/not-sandbox to be snarky about 😜 Thanks @threepointone @dsp_ (inadvertently) for the idea. https://t.co/UuCYneMQ9j
New Pi extension: pi-messenger. What if Pi agents could talk to each other like in a chat room? They can join the chat, see who's online, reserve files, message in real-time, whether they're in separate terminals or subagents. Just throw a PRD at it and it breaks your plan into a dependency graph, then fans out parallel workers to execute tasks in waves. You watch agents coordinate in a shared overlay while they ship your feature. pi install npm:pi-messenger https://t.co/RXGHeGRla4
GOLDMAN SACHS $GS IS TAPPING ANTHROPIC’S AI MODEL TO AUTOMATE ACCOUNTING, COMPLIANCE ROLES Embedded Anthropic engineers have spent six months at Goldman building autonomous systems for time-intensive, high-volume back-office work - CNBC https://t.co/BcVwYUj301
Software development is undergoing a renaissance in front of our eyes. If you haven't used the tools recently, you likely are underestimating what you're missing. Since December, there's been a step function improvement in what tools like Codex can do. Some great engineers at OpenAI yesterday told me that their job has fundamentally changed since December. Prior to then, they could use Codex for unit tests; now it writes essentially all the code and does a great deal of their operations and debugging. Not everyone has yet made that leap, but it's usually because of factors besides the capability of the model. Every company faces the same opportunity now, and navigating it well — just like with cloud computing or the Internet — requires careful thought. This post shares how OpenAI is currently approaching retooling our teams towards agentic software development. We're still learning and iterating, but here's how we're thinking about it right now: As a first step, by March 31st, we're aiming that: (1) For any technical task, the tool of first resort for humans is interacting with an agent rather than using an editor or terminal. (2) The default way humans utilize agents is explicitly evaluated as safe, but also productive enough that most workflows do not need additional permissions. In order to get there, here's what we recommended to the team a few weeks ago: 1. Take the time to try out the tools. The tools do sell themselves — many people have had amazing experiences with 5.2 in Codex, after having churned from codex web a few months ago. But many people are also so busy they haven't had a chance to try Codex yet or got stuck thinking "is there any way it could do X" rather than just trying. - Designate an "agents captain" for your team — the primary person responsible for thinking about how agents can be brought into the teams' workflow. - Share experiences or questions in a few designated internal channels - Take a day for a company-wide Codex hackathon 2. Create skills and AGENTS[.md]. - Create and maintain an AGENTS[.md] for any project you work on; update the AGENTS[.md] whenever the agent does something wrong or struggles with a task. - Write skills for anything that you get Codex to do, and commit it to the skills directory in a shared repository 3. Inventory and make accessible any internal tools. - Maintain a list of tools that your team relies on, and make sure someone takes point on making it agent-accessible (such as via a CLI or MCP server). 4. Structure codebases to be agent-first. With the models changing so fast, this is still somewhat untrodden ground, and will require some exploration. - Write tests which are quick to run, and create high-quality interfaces between components. 5. Say no to slop. Managing AI generated code at scale is an emerging problem, and will require new processes and conventions to keep code quality high - Ensure that some human is accountable for any code that gets merged. As a code reviewer, maintain at least the same bar as you would for human-written code, and make sure the author understands what they're submitting. 6. Work on basic infra. There's a lot of room for everyone to build basic infrastructure, which can be guided by internal user feedback. The core tools are getting a lot better and more usable, but there's a lot of infrastructure that currently go around the tools, such as observability, tracking not just the committed code but the agent trajectories that led to them, and central management of the tools that agents are able to use. Overall, adopting tools like Codex is not just a technical but also a deep cultural change, with a lot of downstream implications to figure out. We encourage every manager to drive this with their team, and to think through other action items — for example, per item 5 above, what else can prevent a lot of "functionally-correct but poorly-maintainable code" from creeping into codebases.
Holy …S😳 Atlas is definitely a gymnastics champion. Landing on his toes, then doing a backflip. https://t.co/cliapZkjYA
New Engineering blog: We tasked Opus 4.6 using agent teams to build a C compiler. Then we (mostly) walked away. Two weeks later, it worked on the Linux kernel. Here's what it taught us about the future of autonomous software development. Read more: https://t.co/htX0wl4wIf https://t.co/N2e9t5Z6Rm
i'm telling you, y'all are sleeping on codemode mcps - the agent just wrote this code to find exactly what it wanted w/ no context pollution https://t.co/xWULlUFvN5
After Ozempic, another class of "magic pills" currently in the pipeline are Myostatin blockers which cut the brakes on muscle growth. These drugs will allow even casual Gym goers to get as muscular as present day bodybuilders with limited effort. Currently, these Myostatin blockers are undergoing trials in the US. If FDA approval is granted soon, they could become available in the mass market within 1 to 2 years.
I don't know why this week became the tipping point, but nearly every software engineer I've talked to is experiencing some degree of mental health crisis.
Software development is undergoing a renaissance in front of our eyes. If you haven't used the tools recently, you likely are underestimating what you're missing. Since December, there's been a step function improvement in what tools like Codex can do. Some great engineers at OpenAI yesterday told me that their job has fundamentally changed since December. Prior to then, they could use Codex for unit tests; now it writes essentially all the code and does a great deal of their operations and debugging. Not everyone has yet made that leap, but it's usually because of factors besides the capability of the model. Every company faces the same opportunity now, and navigating it well — just like with cloud computing or the Internet — requires careful thought. This post shares how OpenAI is currently approaching retooling our teams towards agentic software development. We're still learning and iterating, but here's how we're thinking about it right now: As a first step, by March 31st, we're aiming that: (1) For any technical task, the tool of first resort for humans is interacting with an agent rather than using an editor or terminal. (2) The default way humans utilize agents is explicitly evaluated as safe, but also productive enough that most workflows do not need additional permissions. In order to get there, here's what we recommended to the team a few weeks ago: 1. Take the time to try out the tools. The tools do sell themselves — many people have had amazing experiences with 5.2 in Codex, after having churned from codex web a few months ago. But many people are also so busy they haven't had a chance to try Codex yet or got stuck thinking "is there any way it could do X" rather than just trying. - Designate an "agents captain" for your team — the primary person responsible for thinking about how agents can be brought into the teams' workflow. - Share experiences or questions in a few designated internal channels - Take a day for a company-wide Codex hackathon 2. Create skills and AGENTS[.md]. - Create and maintain an AGENTS[.md] for any project you work on; update the AGENTS[.md] whenever the agent does something wrong or struggles with a task. - Write skills for anything that you get Codex to do, and commit it to the skills directory in a shared repository 3. Inventory and make accessible any internal tools. - Maintain a list of tools that your team relies on, and make sure someone takes point on making it agent-accessible (such as via a CLI or MCP server). 4. Structure codebases to be agent-first. With the models changing so fast, this is still somewhat untrodden ground, and will require some exploration. - Write tests which are quick to run, and create high-quality interfaces between components. 5. Say no to slop. Managing AI generated code at scale is an emerging problem, and will require new processes and conventions to keep code quality high - Ensure that some human is accountable for any code that gets merged. As a code reviewer, maintain at least the same bar as you would for human-written code, and make sure the author understands what they're submitting. 6. Work on basic infra. There's a lot of room for everyone to build basic infrastructure, which can be guided by internal user feedback. The core tools are getting a lot better and more usable, but there's a lot of infrastructure that currently go around the tools, such as observability, tracking not just the committed code but the agent trajectories that led to them, and central management of the tools that agents are able to use. Overall, adopting tools like Codex is not just a technical but also a deep cultural change, with a lot of downstream implications to figure out. We encourage every manager to drive this with their team, and to think through other action items — for example, per item 5 above, what else can prevent a lot of "functionally-correct but poorly-maintainable code" from creeping into codebases.