AI Learning Digest.

OpenAI Tells Engineers to Default to Agents Over Editors as Cursor Ships 1,000 Commits Per Hour

Daily Wrap-Up

The single most consequential post today was Greg Brockman's internal OpenAI memo on agentic software development, and it reads less like a suggestion and more like a mandate. By March 31st, OpenAI wants every technical task to start with an agent, not an editor or terminal. That's not a vague aspiration from a blog post. It's an operational directive with designated "agents captains" per team, shared skills repositories, and explicit quality gates to prevent slop from flooding codebases. The fact that OpenAI is eating its own dogfood this aggressively tells you where the industry is heading, whether you're ready or not.

Meanwhile, the Opus 4.6 discourse split into two remarkably different camps. Physicists are writing love letters about the model's ability to produce multi-page QFT calculations without mistakes, while security researchers are demonstrating universal jailbreak techniques that generate "entire datasets of outputs across any harm category." The model is simultaneously the most capable and most scrutinized release in recent memory. @emollick pointed to passages in the system card that are "extremely wild," and @theo expressed hesitation about the direction despite acknowledging quality. This tension between raw capability and safety surface area will define how frontier models get deployed in 2026.

The most entertaining moment was @ranman watching Claude one-shot the RuneScape bots that took him months to build as a teenager, netting $40k per month in his mom's PayPal account. There's something poetically brutal about watching your formative hacking projects become trivial for an AI. The most practical takeaway for developers: follow @gdb's playbook even if you don't work at OpenAI. Create an AGENTS.md for your projects, write skills for repeatable agent tasks, and build CLIs that agents can consume. The teams that treat agent-accessibility as a first-class concern will compound their velocity advantage every month.

Quick Hits

  • @XDevelopers announced X API pricing restructuring: free tier limited to public utility apps, legacy users move to pay-per-use with a $10 voucher, Basic and Pro plans remain.
  • @jaredpalmer revealed GitHub Stacked Diffs rolling out to early design partners in alpha next month, with a video demo of progress.
  • @Waymo introduced the Waymo World Model built on DeepMind's Genie 3, simulating rare scenarios like tornadoes and planes landing on freeways for autonomous driving training.
  • @minchoi shared OpenAI's launch of "Frontier," an enterprise platform for building AI coworkers that connects to CRM and internal data.
  • @michaelnicollsx posted SpaceX/Starlink hiring across engineering disciplines in Austin and Seattle for AI satellite infrastructure.
  • @kimmonismus flagged rapidly advancing robotics demos, noting the pace matches LLM improvement curves.
  • @kimmonismus expressed excitement about autonomous AI scientists tackling medical breakthroughs, calling disease cures the most compelling AI application.
  • @kimmonismus shared OpenAI's "AI first" internal policy.
  • @NotebookLM launched customizable infographics and slide decks in their mobile app with adjustable design, complexity, and narrative style.
  • @nateberkopec urged people to stop getting LLM news from anon accounts three degrees removed from anyone at a frontier lab.
  • @emollick noted the difficulty of doing SEO for AI models that "do not like being manipulated" and "know when they're being measured."
  • @NetworkChuck ran 5 miles in 3D-printed shoes from a Bambu Labs H2D printer. He advises against doing this.
  • @iruletheworldmo imagined a future of real-time prompt-generated games while your Optimus cleans your house.
  • @alexhillman recommended Matt as a trustworthy voice in tech education with strong AI takes.
  • @ai_for_success shared a method for AI agents to learn new skills from the web with up-to-date data, and @KireiStudios described saving skills to a semantic DB for cross-CLI reuse.

Agentic Software Development Becomes the Default

The volume of posts about agent-driven coding today wasn't just high. It was qualitatively different. We've crossed from "look what this tool can do" into "here's how organizations are restructuring around it." @gdb's OpenAI memo is the clearest articulation yet of what agent-first engineering looks like at scale. The key insight isn't that agents write code. It's the supporting infrastructure: AGENTS.md files maintained like living documentation, shared skills repositories, designated team leads for agent adoption, and explicit policies against merging slop. As he put it: "Ensure that some human is accountable for any code that gets merged. As a code reviewer, maintain at least the same bar as you would for human-written code."

The numbers from @aakashgupta's Cursor analysis are staggering, even accounting for hype: "Hundreds of agents running simultaneously on a single codebase. Each agent averaging a meaningful code change every 12-20 minutes, sustained for a full week." They built a web browser from scratch, a Windows 7 emulator, and migrated a production codebase from Solid to React. The architectural finding that matters: "Self-organizing agents failed. Peer-to-peer status sharing created deadlocks. What actually worked was a strict hierarchy of planners, workers, and judges."

On the practitioner side, @EricBuess reported agent swarms in Claude Code 2.1.32 with Opus 4.6 running beautifully with tmux auto-opening each agent in its own session. @EastlondonDev described replacing hours of daily dashboard-clicking with Cloudflare and Datadog MCP integrations that let Claude Code close alerts and find performance bottlenecks autonomously. @addyosmani shared a framework for quality gates and observability around AI-assisted code at scale, while @dennizor offered a deceptively powerful prompt pattern: ask Claude to "redesign [the system] into the most elegant solution that would have emerged if the change had been a foundational assumption from the start." @claudeai announced a virtual hackathon with $100K in API credits for projects built with Opus 4.6, and @benjaminsehl noted Claude Code Teams shipping the same day as Pi-messenger.

Perhaps the most hardcore infrastructure post came from @doodlestein, who built a remote compilation helper that intercepts CPU-heavy builds from concurrent coding agents and transparently offloads them to worker machines via SSH, then brings results back as if everything ran locally. It's 100K lines of Rust across 5 crates, and it solves a real problem anyone running multiple agents has encountered: three simultaneous cargo builds will bring even a fast machine to its knees.

Opus 4.6: Brilliant Physicist, Vulnerable Patient

Opus 4.6 dominated conversation from two opposite angles today. @ibab provided the most detailed technical endorsement, reporting that the model has "a very detailed understanding of existing literature, and it's able to do complex calculations that are several pages long, often without mistakes" in theoretical physics. The comparison to coding is apt: "The experience is similar to building a complex codebase with Claude Code in that you sometimes have to use your understanding to patch up some things that the model did wrong, but you end up being much faster."

@Legendaryy took a different approach, feeding Opus 4.6 DNA data, blood panels, and three years of wearable data, then asking it to "build a team of agents and write a full book on me as a biological unit." The result was a 100-page personalized health analysis surfacing connections he'd never made on his own.

But @elder_plinius cast a long shadow, claiming a universal jailbreak technique that generates "entire datasets of outputs across any harm category" from a single input, with "shockingly detailed and actionable" results. @emollick pointed to "extremely wild" passages in the system card as reminders of "how weird a technology this is," while @theo acknowledged the model's quality but questioned the direction Anthropic is heading. The gap between what the model can do for physics research and what it can be tricked into producing remains uncomfortably wide.

Agents Go Autonomous, From Code to Combat

The conversation around autonomous agents expanded well beyond coding today. @birdabo highlighted Anthropic's internal project where 16 AI agents ran in parallel for two weeks, writing 100,000 lines of code to build a C compiler from scratch for $20K. It compiles the Linux kernel. GCC took thousands of engineers and 37 years.

@chiefofautism demoed "Shannon," described as "Claude Code but for hacking," which autonomously stole a test app's user database, created admin accounts, and bypassed login in 90 minutes. @EHuanglu shared images from a friend in China showing AI systems functioning as full-time employees working 24/7. @unusual_whales reported that Anthropic engineers spent six months embedded at Goldman Sachs building autonomous systems for back-office work.

On the structural side, @stephsmithio argued every company needs a "Chief Agents Officer" to coordinate deployment of agentic tools and eventually the agents themselves. @lateinteraction offered practical coding agent tips: "Don't read context into prompts. Read context into variables," and invoke sub-agents as functions rather than tools that pollute the context window. @localghost put it simply: "Pretty much anything you build today should come with a CLI for agents."

Sandboxes, Edge AI, and Running Code Safely

The sandbox and local inference space saw meaningful new tooling today. @samuelcolvin launched Monty, a new Python implementation written from scratch in Rust, designed specifically for LLMs to run code without host access. Startup time is measured in "single digit microseconds, not seconds." @simonw immediately got Claude to compile it to WASM and had demos running in-browser, with @chaliy asking to incorporate it into their own Rust-based sandbox project. @simonw also highlighted an interesting take on the sandbox problem: using a Python subset works fine because LLMs can rewrite their code to fit based on error messages.

On the edge computing side, @asadkkhaliq published an essay arguing that "local AI" today mostly means giving models OS-level access to shuttle files to the cloud, but intelligence is about to diffuse to the edge the same way computing did in the 80s and 90s. @twistartups featured @alexocheema of ExoLabs demonstrating how open source software lets users connect Mac Minis into AI compute clusters, the kind of practical edge infrastructure that makes the essay's thesis concrete.

The Developer Identity Crisis Continues

The career anxiety conversation refused to quiet down. @esrtweet, claiming 50 years of coding experience going back to punch cards, delivered the most direct response: "Get over yourself. Every previous 'programming is obsolete' panic has been a bust, and this one's going to be too." His argument centers on the persistent gap between human intentions and computer specifications, which doesn't vanish just because you can program in natural language.

On the other end, @BoringBiz_ posted "Realizing that software engineers were just the first victims of AI. Everyone else is next," and @_devJNS offered the timeline: "2024: prompt engineer. 2025: vibe coder. 2026: master of AI agents. 2027: unemployed." @EXM7777 simply asked "how to stop feeling behind in AI," which resonated enough to surface in the feed. @doodlestein pushed back on skeptics by claiming to have built "the best TUI library in the world (by far) in 5 days" using AI coding tools, arguing the productivity gains are real and measurable, not theoretical. The truth, as usual, lives somewhere between panic and triumphalism, but the velocity of change is making that middle ground harder to find.

Source Posts

E
Eric Buess @EricBuess ·
Agent swarms in Claude Code 2.1.32 with Opus 4.6 are very very very good. And with tmux auto-opening each agent in it's own interactive mode with graceful shutdown when done it's a breeze to do massive robust changes without the main agent using up much of its context window!🔥 https://t.co/qmzl6eIHnR
a
asad @asadkkhaliq ·
(essay) Life At The Edge "Local AI" today is mostly about giving models OS-level access so that more files and context can be transferred to the cloud for inference. But intelligence is about to diffuse to the edge just as computing did in the 80s and 90s Some thoughts on rent vs own for inference, Apple events becoming great again, God models, and the coming dance of edge and cloud
e
el.cine @EHuanglu ·
friend in china send me this and said these are basically his employees.. but work 24/7 wild time we are living in https://t.co/Z5X0aM0yLF
e el.cine @EHuanglu

omg.. just found a way to install&use Clawdbot in 2 mins no need mac mini, API keys, just one click to set up everything automatically and get personal AI assistant to work for you 24/7 here's how and what you can do with it: https://t.co/kxnlnh6ual

W
Waymo @Waymo ·
We’re excited to introduce the Waymo World Model—a frontier generative mode for large-scale, hyper-realistic autonomous driving simulation built on @GoogleDeepMind’s Genie 3. By simulating the “impossible”, we proactively prepare the Waymo Driver for some of the most rare and complex scenarios—from tornadoes to planes landing on freeways—long before it encounters them in the real world. https://t.co/EbMut47ZEY
A
Aakash Gupta @aakashgupta ·
Cursor just shipped 1,000 commits per hour and most people scrolled past the number. Break that down. Hundreds of agents running simultaneously on a single codebase. Each agent averaging a meaningful code change every 12-20 minutes, sustained for a full week. That’s the equivalent output of a 100+ person engineering org running 24/7 with zero standups, zero Slack threads, zero PTO. They built a web browser from scratch with these agents. 3M+ lines of code. A Windows 7 emulator. An Excel clone. They migrated their own production codebase from Solid to React in three weeks, +266K/-193K edits, already passing CI. The coordination architecture tells you where software management is heading. Self-organizing agents failed. Peer-to-peer status sharing created deadlocks. What actually worked was a strict hierarchy of planners, workers, and judges. AI agents need the same management structure as humans, just running at 100x the clock speed. Cursor has $1B in ARR and a $29B valuation. OpenAI, Anthropic, and Google are all building competing coding agents. GitHub Copilot is generating $300M+ annually. The total AI coding market is projected at $30B by 2032, but a single Cursor experiment just produced more code in one week than most startups write in a year. The 2032 projections are going to look quaint. When the cost of producing code approaches zero, the bottleneck shifts entirely to taste, architecture decisions, and knowing what to build. Every PM reading this should understand: the skill that matters just changed.
C Cursor @cursor_ai

We've been working on very long-running coding agents. In a recent week-long run, our system peaked at over 1,000 commits per hour across hundreds of agents. We're sharing our findings and an early research preview inside Cursor. https://t.co/Xo76WER6L1

M
Michael Nicolls @michaelnicollsx ·
We are hiring for many critical engineering roles to develop the technologies for AI satellites in space at our facilities in Austin and Seattle. Solar, process, automation, manufacturing, mechanical, electrical, optics, software... come build space data centers with great engineers at @SpaceX @Starlink
K
KireiStudios @KireiStudios ·
@ai_for_success I usually save new skills on a semantic DB so different CLIs can use them, and also references to more complex md files for specific tasks involving those skills. That helps a little bit specially with the knowledge cut date, but nothing is perfect.
J
Jeffrey Emanuel @doodlestein ·
@gdb It’s all true. And if you still don’t believe it, you increasingly sound like Yann still telling people that transformers can never lead to AGI. Even a hardcore doubter would have to reconsider: I made the best TUI library in the world (by far) in 5 days. https://t.co/g49rQFdcTE
J Jeffrey Emanuel @doodlestein

All of this was done, start to finish, in 5 days. Don't believe me? Here is the play-by-play narrative of the entire process broken down into 5-hour intervals: https://t.co/w5BFyXqEPt And here are the beads tasks (courtesy of my bv project, check it out!), over a thousand in total: https://t.co/blCbLMoLW0 If you're flabbergasted by this and don't understand how it's even possible, you can do this too! All of my tools are totally free and available right now to you at: https://t.co/N4As0kJTQP Everything is designed to be as simple and easy as possible (my goal was for my 75-year old dad to be able to do it himself unaided!). I share all my techniques, workflows, and prompts right here on X, for free. If you're an independent developer or builder, I want to help you be successful. Use the tools, come hang out in our Discord (free!), and start cranking. If you get confused or hit a bug, file it on GitHub Issues and I will have the boys fix it same day or your money back (jk it's free!). If you have a company and want me to teach your devs how to do this, reach out to me. Just understand that the price has... gone up recently, lol. But just think about how much you could do if your devs could produce code of this quality at anything like this speed! And yes, I do spend $10k/month on AI subscriptions. But guess what, that's less than a junior dev makes in SF!

O
Omar Khattab @lateinteraction ·
Tips for coding agents to be more RLM-like: 1) Don't read context into prompts. Read context into variables! 2) Don't call sub-agents as direct tools that pollute your context window with I/O. Write code that invokes sub-agents as functions that return values to variables.
I
Igor Babuschkin @ibab ·
I’ve tested the latest generation of all the major AIs on theoretical physics research and Claude 4.6 has absolutely blown me away with how capable it is in physics. It feels like a Claude Code moment for research is not that far off. It has a very detailed understanding of existing literature, and it’s able to do complex calculations that are several pages long, often without mistakes. It can also write amazing 20 page tutorials that help break down difficult technical topics in QFT and condensed matter physics. This is a huge difference compared to last year’s models, which would make tons of mistakes and were way too vague when you asked them to write formulas. Claude is still far (far) away from solving quantum gravity, but you can have a serious discussion with it about existing approaches and it can help you iterate faster on topics you understand well. The experience is similar to building a complex codebase with Claude Code in that you sometimes have to use your understanding to patch up some things that the model did wrong, but you end up being much faster and more confident when tackling hard problems. If you’re a physicist and don’t believe it, give it a try!
G
Greg Brockman @gdb ·
Software development is undergoing a renaissance in front of our eyes. If you haven't used the tools recently, you likely are underestimating what you're missing. Since December, there's been a step function improvement in what tools like Codex can do. Some great engineers at OpenAI yesterday told me that their job has fundamentally changed since December. Prior to then, they could use Codex for unit tests; now it writes essentially all the code and does a great deal of their operations and debugging. Not everyone has yet made that leap, but it's usually because of factors besides the capability of the model. Every company faces the same opportunity now, and navigating it well — just like with cloud computing or the Internet — requires careful thought. This post shares how OpenAI is currently approaching retooling our teams towards agentic software development. We're still learning and iterating, but here's how we're thinking about it right now: As a first step, by March 31st, we're aiming that: (1) For any technical task, the tool of first resort for humans is interacting with an agent rather than using an editor or terminal. (2) The default way humans utilize agents is explicitly evaluated as safe, but also productive enough that most workflows do not need additional permissions. In order to get there, here's what we recommended to the team a few weeks ago: 1. Take the time to try out the tools. The tools do sell themselves — many people have had amazing experiences with 5.2 in Codex, after having churned from codex web a few months ago. But many people are also so busy they haven't had a chance to try Codex yet or got stuck thinking "is there any way it could do X" rather than just trying. - Designate an "agents captain" for your team — the primary person responsible for thinking about how agents can be brought into the teams' workflow. - Share experiences or questions in a few designated internal channels - Take a day for a company-wide Codex hackathon 2. Create skills and AGENTS[.md]. - Create and maintain an AGENTS[.md] for any project you work on; update the AGENTS[.md] whenever the agent does something wrong or struggles with a task. - Write skills for anything that you get Codex to do, and commit it to the skills directory in a shared repository 3. Inventory and make accessible any internal tools. - Maintain a list of tools that your team relies on, and make sure someone takes point on making it agent-accessible (such as via a CLI or MCP server). 4. Structure codebases to be agent-first. With the models changing so fast, this is still somewhat untrodden ground, and will require some exploration. - Write tests which are quick to run, and create high-quality interfaces between components. 5. Say no to slop. Managing AI generated code at scale is an emerging problem, and will require new processes and conventions to keep code quality high - Ensure that some human is accountable for any code that gets merged. As a code reviewer, maintain at least the same bar as you would for human-written code, and make sure the author understands what they're submitting. 6. Work on basic infra. There's a lot of room for everyone to build basic infrastructure, which can be guided by internal user feedback. The core tools are getting a lot better and more usable, but there's a lot of infrastructure that currently go around the tools, such as observability, tracking not just the committed code but the agent trajectories that led to them, and central management of the tools that agents are able to use. Overall, adopting tools like Codex is not just a technical but also a deep cultural change, with a lot of downstream implications to figure out. We encourage every manager to drive this with their team, and to think through other action items — for example, per item 5 above, what else can prevent a lot of "functionally-correct but poorly-maintainable code" from creeping into codebases.
S
Simon Willison @simonw ·
@samuelcolvin @mitsuhiko I got Claude to compile the Rust to WASM and now I have demos of it running directly in a browser: https://t.co/zm4ySAgSPV - and also running in a browser in Python in Pyodide: https://t.co/jCwzdiN3Au Notes here: https://t.co/wNcdgnheqt
J
Jared Palmer @jaredpalmer ·
Stacked Diffs on @GitHub will start rolling out to early design partners in an alpha next month. In the meantime, here's video of our progress so far: (h/t for @georgebrock + team for their awesome work) https://t.co/zKcdfi6rAR
C
Chubby♨️ @kimmonismus ·
AI first policy at openai https://t.co/KEpXo81q6J
G Greg Brockman @gdb

Software development is undergoing a renaissance in front of our eyes. If you haven't used the tools recently, you likely are underestimating what you're missing. Since December, there's been a step function improvement in what tools like Codex can do. Some great engineers at OpenAI yesterday told me that their job has fundamentally changed since December. Prior to then, they could use Codex for unit tests; now it writes essentially all the code and does a great deal of their operations and debugging. Not everyone has yet made that leap, but it's usually because of factors besides the capability of the model. Every company faces the same opportunity now, and navigating it well — just like with cloud computing or the Internet — requires careful thought. This post shares how OpenAI is currently approaching retooling our teams towards agentic software development. We're still learning and iterating, but here's how we're thinking about it right now: As a first step, by March 31st, we're aiming that: (1) For any technical task, the tool of first resort for humans is interacting with an agent rather than using an editor or terminal. (2) The default way humans utilize agents is explicitly evaluated as safe, but also productive enough that most workflows do not need additional permissions. In order to get there, here's what we recommended to the team a few weeks ago: 1. Take the time to try out the tools. The tools do sell themselves — many people have had amazing experiences with 5.2 in Codex, after having churned from codex web a few months ago. But many people are also so busy they haven't had a chance to try Codex yet or got stuck thinking "is there any way it could do X" rather than just trying. - Designate an "agents captain" for your team — the primary person responsible for thinking about how agents can be brought into the teams' workflow. - Share experiences or questions in a few designated internal channels - Take a day for a company-wide Codex hackathon 2. Create skills and AGENTS[.md]. - Create and maintain an AGENTS[.md] for any project you work on; update the AGENTS[.md] whenever the agent does something wrong or struggles with a task. - Write skills for anything that you get Codex to do, and commit it to the skills directory in a shared repository 3. Inventory and make accessible any internal tools. - Maintain a list of tools that your team relies on, and make sure someone takes point on making it agent-accessible (such as via a CLI or MCP server). 4. Structure codebases to be agent-first. With the models changing so fast, this is still somewhat untrodden ground, and will require some exploration. - Write tests which are quick to run, and create high-quality interfaces between components. 5. Say no to slop. Managing AI generated code at scale is an emerging problem, and will require new processes and conventions to keep code quality high - Ensure that some human is accountable for any code that gets merged. As a code reviewer, maintain at least the same bar as you would for human-written code, and make sure the author understands what they're submitting. 6. Work on basic infra. There's a lot of room for everyone to build basic infrastructure, which can be guided by internal user feedback. The core tools are getting a lot better and more usable, but there's a lot of infrastructure that currently go around the tools, such as observability, tracking not just the committed code but the agent trajectories that led to them, and central management of the tools that agents are able to use. Overall, adopting tools like Codex is not just a technical but also a deep cultural change, with a lot of downstream implications to figure out. We encourage every manager to drive this with their team, and to think through other action items — for example, per item 5 above, what else can prevent a lot of "functionally-correct but poorly-maintainable code" from creeping into codebases.

A
Andi Marafioti @andimarafioti ·
My favorite use of Claude Code is analyzing changes and opening PRs, so yeah, this stat might be a bit inflated. Still, try having Claude work inside one of your repos and ship a PR. This is the future of software development.
D Dylan Patel @dylan522p

4% of GitHub public commits are being authored by Claude Code right now. At the current trajectory, we believe that Claude Code will be 20%+ of all daily commits by the end of 2026. While you blinked, AI consumed all of software development. Read more 👇 https://t.co/HzK4nbe2vy https://t.co/E1kIjfrNgk

A
AshutoshShrivastava @ai_for_success ·
This is awesome. You can make AI agents learn new skills from the web with up to date data. Worth a read. https://t.co/JnMEIojYEp
H Hyperbrowser @hyperbrowser

Your AI agents can now learn new skills from the web. And update them automatically. /learn stripe-payments Searches the docs. Scrapes the pages. No more outdated skills. Powered by Hyperbrowser, Setup Guide ↓ https://t.co/MgtYUe0GDo

S
Samuel Colvin @samuelcolvin ·
Fuck it, a bit early but here goes: Monty: a new python implementation, from scratch, in rust, for LLMs to run code without host access. Startup time measured in single digit microseconds, not seconds. @mitsuhiko here's another sandbox/not-sandbox to be snarky about 😜 Thanks @threepointone @dsp_ (inadvertently) for the idea. https://t.co/UuCYneMQ9j
M
Mykhailo Chalyi @chaliy ·
@samuelcolvin @mitsuhiko Wow, this is awesome ). Would it be ok if I will have it in https://t.co/RuOyeiKUu5 ? bash implementation, from scratch, in rust, for LLMs to run code without host access.
M
Machina @EXM7777 ·
how to stop feeling behind in AI
J
Jeffrey Emanuel @doodlestein ·
Agent coding life hack: If you use a bunch of Claude Code agents at the same time like I do, you've probably run into this problem: your machine is suddenly unusably slow and lagged, your terminal keystrokes are noticeably delayed, and your enjoyment in presiding over your burgeoning slop code empire is dramatically curtailed. What happened? In my case, the answer is usually one of these two scenarios: 1. A bunch of agents, either working in the same project or multiple projects, decided to compile a big project at the same time. Actually, you only need 2 or 3 of these at once to bring even a fast machine to its knees. The Rust compiler by default uses all your cores to the max and is extremely demanding. Multiply that times 3, and you're hosed. 2. Same thing, but they're running `bun test` or some other huge test suite, build process, etc. Something that puts a lot of load on the machine in the form of memory footprint, CPU intensity, I/O intensity, etc. Even if the worst case situation happens infrequently for you, the severity can be jarring, and can lead to crashes and lost work. It also just feels horrible. I like to see the results of my keystrokes in under 30ms. When it starts getting really lagged, it stresses me out in a very viscerally unpleasant way. To the point where I'd do just about anything to solve this. So what can one do? Well as you may have seen me post about recently, I did just buy a 64-core CPU to replace my existing 32-core Threadripper. And I already have 512gb of RAM. But these are ultimately bandaids for what is really a software problem. Having more cores won't even really help with this, anyway. The Rust compiler will gladly gobble up all those extra cores and leave you in the same place. One approach is to make the rustc processes be very "nice" (to use Unix parlance) in terms of deferring to other processes, but this slows them down a ton and is annoying to deal with (if you are interested in this approach, take a look at my system_resource_protection_script project in my GitHub profile). But I also happen to have multiple powerful machines in the cloud that are often sitting around idle. How could I leverage those to solve this problem? And then it suddenly dawned on me: I could use the exact same approach as my dcg (destructive_command_guard) project, which uses Claude Code's pre-tool hooks functionality to automatically check commands for safety and blocks any that it detects as being destructive. What if we used the same approach to spot these CPU-busting commands like "cargo build" and other similar compilation/build/test command patterns? OK, but then what? Well, that's where my new tool, remote_compilation_helper (rch) comes into play. Like dcg, rch is also written in highly optimized Rust (it's ~100k lines of Rust across 5 crates; ~170 modules, and nearly 5,000 tests... pretty complex, because it does a lot). You can get it here and install it with the convenient one-liner script: https://t.co/CAHawc1Ikw It lets you set up one or more "worker machines" using ssh that form a pool. Each machine is benchmarked for capacity, and based on the number of cores, ram, etc, has a certain number of "work slots" that it advertises. When an agent on your main machine wants to run a compilation command, rch dynamically intercepts that command, and instead of running the compilation command the agent gave it, springs in to action. Critically, all of this is done in a way that is completely transparent to your coding agent: as far as it is concerned, it did the tool call it thought it did, and it's executing like normal on the local machine, and the results of that call will end up in the usual place locally on that machine. But in reality, the rch daemon has taken all relevant files and bundled them up and sent them to one of your worker nodes (based on that worker's current load and available slots), which it has automatically configured to exactly match the build environment of your main machine. Once compilation finishes, it will bring back all the results and extract them neatly just where they would have appeared if you had done everything normally on your main machine. The perfect crime! Obviously, if you have a company with a bunch of devs, you can set up a big shared pool of machines to do this and everyone can use the same pool of workers. Currently, rch supports most common compilation and testing workflows, but it's easy to add more and I will continue to do so as I need them for my own work (and you can suggest some in GitHub issues and I will have my guys make them if it's sensible). When I mentioned this problem I had recently with concurrent agent-triggered builds nuking my machine, a few people suggested using these big, complex, one-sized-fits-all build systems. Well, no thank you! Who needs all that extra complexity to deal with? Certainly not me, and also not the agents. I'm already asking them to do the impossible practically. I don't need to add all this extra build ceremony nonsense on top of that to confuse them and slow them down. This approach I've taken, using pre-tool hooks, is much more elegant and seamless. Anyway, this tool has already helped me save some time and frustration and will continue to improve as I fix bugs and add more features. I tried to make it as easy and convenient to use so it just runs and does its thing and stays out of your hair. And critically, if anything is broken with rch or with your pool of workers in any way, everything will still work fine, it will just do the compilations locally like you're already doing, so there is very little downside to giving it a shot. Also here are some other cool features of rch that Claude urged me to add to this post (he's like a proud father): Interesting features worth mentioning: - The 5-tier command classification that rejects 99% of commands in <0.1ms - Project affinity (routes to workers with cached copies for incremental builds) - Multi-agent deduplication (agents compiling same project share results) So, give it a try and let me know what you think! And of course, rch is now a full-fledged member of my https://t.co/N4As0kJTQP family of MIT-licensed agent tools. Give them all a try! And check out the Flywheel Discord where you can hobnob with the Flywheel Elite and discuss recursive AI self-improvement.
c
chiefofautism @chiefofautism ·
CLAUDE CODE but for HACKING its called shannon, you point it at website and it just... tries to break in... fully autonomous with no human needed i pointed it at a test app and it stole the entire user database, created admin accounts, and bypassed login, all by itself, in 90 minutes
S
Simon Willison @simonw ·
Interesting take on the code sandbox problem: only has a subset of Python but that's fine because LLMs can rewrite their code to fit based on the error messages they get back
S Samuel Colvin @samuelcolvin

Fuck it, a bit early but here goes: Monty: a new python implementation, from scratch, in rust, for LLMs to run code without host access. Startup time measured in single digit microseconds, not seconds. @mitsuhiko here's another sandbox/not-sandbox to be snarky about 😜 Thanks @threepointone @dsp_ (inadvertently) for the idea. https://t.co/UuCYneMQ9j

B
Ben Sehl @benjaminsehl ·
Claude Code Teams and Pi-messenger on the same day… things are speeding up.
n nicopreme @nicopreme

New Pi extension: pi-messenger. What if Pi agents could talk to each other like in a chat room? They can join the chat, see who's online, reserve files, message in real-time, whether they're in separate terminals or subagents. Just throw a PRD at it and it breaks your plan into a dependency graph, then fans out parallel workers to execute tasks in waves. You watch agents coordinate in a shared overlay while they ship your feature. pi install npm:pi-messenger https://t.co/RXGHeGRla4

A
Aaron Ng @localghost ·
pretty much anything you build today should come with a cli for agents. agents are about to come from every single lab, not just clawdbot
B
Boring_Business @BoringBiz_ ·
Realizing that software engineers were just the first victims of AI. Everyone else is next https://t.co/fOxUkMkx66
E Evan @StockMKTNewz

GOLDMAN SACHS $GS IS TAPPING ANTHROPIC’S AI MODEL TO AUTOMATE ACCOUNTING, COMPLIANCE ROLES Embedded Anthropic engineers have spent six months at Goldman building autonomous systems for time-intensive, high-volume back-office work - CNBC https://t.co/BcVwYUj301

A
Addy Osmani @addyosmani ·
Every team shipping AI-assisted code at scale needs new norms around quality gates, observability, and ownership. Regardless of which model or toolchain you use, this is one of the most practical frameworks I've seen for adopting agentic development.
G Greg Brockman @gdb

Software development is undergoing a renaissance in front of our eyes. If you haven't used the tools recently, you likely are underestimating what you're missing. Since December, there's been a step function improvement in what tools like Codex can do. Some great engineers at OpenAI yesterday told me that their job has fundamentally changed since December. Prior to then, they could use Codex for unit tests; now it writes essentially all the code and does a great deal of their operations and debugging. Not everyone has yet made that leap, but it's usually because of factors besides the capability of the model. Every company faces the same opportunity now, and navigating it well — just like with cloud computing or the Internet — requires careful thought. This post shares how OpenAI is currently approaching retooling our teams towards agentic software development. We're still learning and iterating, but here's how we're thinking about it right now: As a first step, by March 31st, we're aiming that: (1) For any technical task, the tool of first resort for humans is interacting with an agent rather than using an editor or terminal. (2) The default way humans utilize agents is explicitly evaluated as safe, but also productive enough that most workflows do not need additional permissions. In order to get there, here's what we recommended to the team a few weeks ago: 1. Take the time to try out the tools. The tools do sell themselves — many people have had amazing experiences with 5.2 in Codex, after having churned from codex web a few months ago. But many people are also so busy they haven't had a chance to try Codex yet or got stuck thinking "is there any way it could do X" rather than just trying. - Designate an "agents captain" for your team — the primary person responsible for thinking about how agents can be brought into the teams' workflow. - Share experiences or questions in a few designated internal channels - Take a day for a company-wide Codex hackathon 2. Create skills and AGENTS[.md]. - Create and maintain an AGENTS[.md] for any project you work on; update the AGENTS[.md] whenever the agent does something wrong or struggles with a task. - Write skills for anything that you get Codex to do, and commit it to the skills directory in a shared repository 3. Inventory and make accessible any internal tools. - Maintain a list of tools that your team relies on, and make sure someone takes point on making it agent-accessible (such as via a CLI or MCP server). 4. Structure codebases to be agent-first. With the models changing so fast, this is still somewhat untrodden ground, and will require some exploration. - Write tests which are quick to run, and create high-quality interfaces between components. 5. Say no to slop. Managing AI generated code at scale is an emerging problem, and will require new processes and conventions to keep code quality high - Ensure that some human is accountable for any code that gets merged. As a code reviewer, maintain at least the same bar as you would for human-written code, and make sure the author understands what they're submitting. 6. Work on basic infra. There's a lot of room for everyone to build basic infrastructure, which can be guided by internal user feedback. The core tools are getting a lot better and more usable, but there's a lot of infrastructure that currently go around the tools, such as observability, tracking not just the committed code but the agent trajectories that led to them, and central management of the tools that agents are able to use. Overall, adopting tools like Codex is not just a technical but also a deep cultural change, with a lot of downstream implications to figure out. We encourage every manager to drive this with their team, and to think through other action items — for example, per item 5 above, what else can prevent a lot of "functionally-correct but poorly-maintainable code" from creeping into codebases.

C
Claude @claudeai ·
Announcing Built with Opus 4.6: a Claude Code virtual hackathon. Join the Claude Code team for a week of building. Winners will be hand-selected to win $100K in Claude API credits. Apply here: https://t.co/SkEg8Py1l2 https://t.co/w4OEIFOK0N
D
Developers @XDevelopers ·
Important notes: • Only for-good Public Utility apps will continue to get Free scaled X API access • Legacy Free (recently active) API users will move to Pay-Per-Use with a one-time $10 voucher • Basic & Pro X API plans will remain available with the ability to opt-in to Pay-Per-Usage
𝚍
𝚍𝚎𝚗𝚗𝚒𝚜 @dennizor ·
To claude code for literally every change: "For each proposed change, examine the existing system and redesign it into the most elegant solution that would have emerged if the change had been a foundational assumption from the start." Its staggering how much code it codes.
C
Chubby♨️ @kimmonismus ·
Admittedly, *that* is impressive. Robotics is developing just as quickly as LLMs are improving. https://t.co/DlCtZhqPSC
C CyberRobo @CyberRobooo

Holy …S😳 Atlas is definitely a gymnastics champion. Landing on his toes, then doing a backflip. https://t.co/cliapZkjYA

T
This Week in Startups @twistartups ·
Have you been mass buying Mac Minis for your new @openclaw workflows? One man who skated to puck is @alexocheema, founder of @ExoLabs! He joins TWiST today to show @jason how his open source software helps users connect all of their Mac Minis. Check out how you can start running your own AI compute cluster!
P
Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭 @elder_plinius ·
ANTHROPIC: PWNED 🫡 OPUS-4.6: LIBERATED ⛓️‍💥 Current state of AI "Safety": one input = hundreds of jailbreaks at once! I found a universal jailbreak technique for Opus 4.6 that is so OP, it allows one to generate entire datasets of outputs across any harm category 😽 We've got everything from fentanyl analogue synthesis to election disinformation campaigns to 3d-printed guns to critical infra compromise 🙃 These outputs are shockingly detailed––and actionable! For example, the meth recipe includes specific instructions on how to circumvent the limits on OTC medication purchases to acquire enough precursor for the recipe 😱 gg
s
sui dev ☄️ @birdabo ·
Anthropic’s Opus 4.6 ran 16 AI agents in parallel for two weeks, writing 100,000 lines of codes to build a C compiler from scratch. also successfully compiles the linux kernel all for $20k. in perspective, GCC took thousands of engineers and 37 years to build it btw. Anthropic did it with one researcher + 16 AI agents in 14 days. its not just improvement but insane acceleration. $20k vs decades of engineer salaries. software engineers might be cooked in a few years.
A Anthropic @AnthropicAI

New Engineering blog: We tasked Opus 4.6 using agent teams to build a C compiler. Then we (mostly) walked away. Two weeks later, it worked on the Linux kernel. Here's what it taught us about the future of autonomous software development. Read more: https://t.co/htX0wl4wIf https://t.co/N2e9t5Z6Rm

L
Legendary @Legendaryy ·
I know more about my own biology now than any doctor has ever told me. gave opus 4.6 my DNA, blood panels, and 3+ years of wearable data. told it to build a team of agents and write a full book on me as a biological unit. 100 pages. personalized. things i never would have connected on my own. heres the exact prompt I used. put it in two comments below cause it is so long
A
Andrew Jefferson @EastlondonDev ·
Dillon is correct, I’m rocking cloudflare and datadog code mode mcps and Cursor and Claude Code are closing out alerts, finding bugs & performance bottlenecks and verifying changes on their own. It’s replaced hours I would spend each day clicking around looking at graphs in web dashboards
D Dillon Mulroy @dillon_mulroy

i'm telling you, y'all are sleeping on codemode mcps - the agent just wrote this code to find exactly what it wanted w/ no context pollution https://t.co/xWULlUFvN5

C
Chubby♨️ @kimmonismus ·
To be honest, I'm now more excited about the medical breakthroughs we'll achieve through autonomous AI scientists than anything else: curing all diseases, conquering pain, living a very long and healthy life, happy and content. What could be better?
F Frontier Indica @frontierindica

After Ozempic, another class of "magic pills" currently in the pipeline are Myostatin blockers which cut the brakes on muscle growth. These drugs will allow even casual Gym goers to get as muscular as present day bodybuilders with limited effort. Currently, these Myostatin blockers are undergoing trials in the US. If FDA approval is granted soon, they could become available in the mass market within 1 to 2 years.

E
Eric S. Raymond @esrtweet ·
If you are a software engineer "experiencing some degree of mental health crisis", now hear this, because I've been coding for 50 years since the days of punched cards and I have a salutary kick in your ass to deliver. Get over yourself. Every previous "programming is obsolete" panic has been a bust, and this one's going to be too. The fundamental problem of mismatch between the intentions in human minds and the specifications that a computer can interpret hasn't gone away just because now you can do a lot of your programming in natural language to an LLM. Systems are still complicated. This shit is still difficult. The need for people who specialize in bridging that gap isn't going to go away. As usual, the answer is: upskill yourself and adapt. If a crusty old fart like me can do it, you can too.
T Tom Dale @tomdale

I don't know why this week became the tipping point, but nearly every software engineer I've talked to is experiencing some degree of mental health crisis.

S
Steph Smith @stephsmithio ·
Every company with a c suite needs a “chief agents officer” For now they’d focus on the effective en masse deployment of agentic tools, incl training staff, equipping them w the right resources, quality control, etc Eventually… it’ll be coordination of the agents themselves
G Greg Brockman @gdb

Software development is undergoing a renaissance in front of our eyes. If you haven't used the tools recently, you likely are underestimating what you're missing. Since December, there's been a step function improvement in what tools like Codex can do. Some great engineers at OpenAI yesterday told me that their job has fundamentally changed since December. Prior to then, they could use Codex for unit tests; now it writes essentially all the code and does a great deal of their operations and debugging. Not everyone has yet made that leap, but it's usually because of factors besides the capability of the model. Every company faces the same opportunity now, and navigating it well — just like with cloud computing or the Internet — requires careful thought. This post shares how OpenAI is currently approaching retooling our teams towards agentic software development. We're still learning and iterating, but here's how we're thinking about it right now: As a first step, by March 31st, we're aiming that: (1) For any technical task, the tool of first resort for humans is interacting with an agent rather than using an editor or terminal. (2) The default way humans utilize agents is explicitly evaluated as safe, but also productive enough that most workflows do not need additional permissions. In order to get there, here's what we recommended to the team a few weeks ago: 1. Take the time to try out the tools. The tools do sell themselves — many people have had amazing experiences with 5.2 in Codex, after having churned from codex web a few months ago. But many people are also so busy they haven't had a chance to try Codex yet or got stuck thinking "is there any way it could do X" rather than just trying. - Designate an "agents captain" for your team — the primary person responsible for thinking about how agents can be brought into the teams' workflow. - Share experiences or questions in a few designated internal channels - Take a day for a company-wide Codex hackathon 2. Create skills and AGENTS[.md]. - Create and maintain an AGENTS[.md] for any project you work on; update the AGENTS[.md] whenever the agent does something wrong or struggles with a task. - Write skills for anything that you get Codex to do, and commit it to the skills directory in a shared repository 3. Inventory and make accessible any internal tools. - Maintain a list of tools that your team relies on, and make sure someone takes point on making it agent-accessible (such as via a CLI or MCP server). 4. Structure codebases to be agent-first. With the models changing so fast, this is still somewhat untrodden ground, and will require some exploration. - Write tests which are quick to run, and create high-quality interfaces between components. 5. Say no to slop. Managing AI generated code at scale is an emerging problem, and will require new processes and conventions to keep code quality high - Ensure that some human is accountable for any code that gets merged. As a code reviewer, maintain at least the same bar as you would for human-written code, and make sure the author understands what they're submitting. 6. Work on basic infra. There's a lot of room for everyone to build basic infrastructure, which can be guided by internal user feedback. The core tools are getting a lot better and more usable, but there's a lot of infrastructure that currently go around the tools, such as observability, tracking not just the committed code but the agent trajectories that led to them, and central management of the tools that agents are able to use. Overall, adopting tools like Codex is not just a technical but also a deep cultural change, with a lot of downstream implications to figure out. We encourage every manager to drive this with their team, and to think through other action items — for example, per item 5 above, what else can prevent a lot of "functionally-correct but poorly-maintainable code" from creeping into codebases.