AI Digest.

Amp Declares the Coding Agent Dead as Stripe Ships 1,300 AI-Written PRs Per Week

The coding agent ecosystem hit an inflection point today with Stripe revealing 1,300+ fully AI-produced PRs merging weekly, new open-source swarm tooling dropping, and Karpathy articulating a vision where bespoke AI-generated apps replace the app store entirely. Meanwhile, distilled models from Claude 4.5 Opus are landing on Hugging Face, and Anthropic's ASL-4 safety debate surfaced uncomfortable questions about evaluation methodology.

Daily Wrap-Up

The number that defined today's conversation was 1,300. That's how many pull requests Stripe merges each week that contain zero human-written code, up from 1,000 just a week prior. The growth rate alone tells a story: agent-authored code at production scale isn't plateauing, it's accelerating. And Stripe isn't alone in normalizing this. Open-source tooling for running agent swarms dropped from multiple directions today, with dmux giving teams a way to orchestrate Claude Code and Codex across worktrees, and developers sharing their own multi-agent setups using Ghostty terminals and git worktrees. The infrastructure for running coding agents in parallel is rapidly commoditizing.

But the more intellectually interesting thread came from @karpathy, who spent an hour vibe-coding a custom cardio tracking dashboard and then wrote a thousand words about why that experience points to the death of the app store as a concept. His argument that products should expose AI-native CLIs instead of human-readable frontends found immediate resonance, with @steipete pulling out the key quote and @fchollet extending the idea in a fascinating direction: that agentic coding is essentially becoming machine learning, complete with overfitting, shortcut exploitation, and concept drift. @esrtweet connected it to the historical arc of open source, calling it "the next logical step in the de-massification of software production." These three posts together painted the clearest picture yet of where software development is heading, and it's a future where the generated codebase is a black box you deploy without inspecting, just like neural network weights.

The most practical takeaway for developers: invest in learning agent orchestration patterns now. Whether it's worktrees, tmux-based swarms, or structured agent teams, the ability to run multiple coding agents in parallel and review their output is becoming a core engineering skill. Start with something simple like Claude Code in a worktree, then scale up to tools like dmux as your comfort level grows.

Quick Hits

  • @elonmusk confirmed xAI is mostly Rust and X is "rapidly replacing legacy Twitter Scala code with Rust." Compilers and programming languages apparently still matter even in the AI era.
  • @TheAhmadOsman shared a detailed DGX Spark cluster setup using a Mikrotik CRS804-4DDQ 1.6Tbps switch with 400G QSFP-DD breakout cables. Eight DGX Sparks at full bandwidth for anyone building a home AI cluster.
  • @charliebcurran tested Seedance 2.0's video generation with deliberately provocative prompts, stress-testing content guardrails in the process.
  • @maxmarchione launched an AI doctor product after 247 commits and 140,000 lines of code, billing it as "an AI that knows more about your body than any human ever could."
  • @minchoi flagged an AI-generated movie trailer that hit a quality level where "Hollywood gatekeeping is dead," though we've heard that one before.
  • @emollick received a physical hardcover book of GPT-1's weights that Claude Code designed, produced, and sold end-to-end, including the cover art. He never touched any code or design.
  • @hunterhammonds predicted AI consulting spend will grow at 30%+ CAGR as companies scramble to adapt to agents going mainstream.
  • @yacineMTB compared current AI awareness to early COVID on 4chan: "no one outside of our niche bubble knows what's around the corner."
  • @jordannoone showed image-to-CAD conversion with a fully editable feature tree, which is genuinely impressive for manufacturing workflows.
  • @doodlestein announced FrankenCode, combining a Rust-based agent project with OpenAI Codex and a custom TUI. The name alone earns the mention.
  • @perrymetzger highlighted Chris Lattner's compiler engineering credentials in the context of an opinion on AI tooling. When the creator of LLVM weighs in, people should listen.
  • @nicopreme shared Claude Code's "Visual Explainer" skill for planning, noting they "can't go back to markdown plans" after trying it.
  • @cryptopunk7213 observed people pointing their AI agents at articles instead of reading them, then telling the agents to "update accordingly." We're living in the future and it's weird.
  • @mgratzer published a blog post about building a side project for his kids using coding agents over winter holidays, covering human-in-the-loop workflows.

Coding Agent Swarms Go Mainstream

The shift from "one developer, one AI assistant" to "one developer, many AI agents" crystallized today across a dozen posts. @stripe announced that over 1,300 pull requests merge each week that are "completely minion-produced, human-reviewed, but contain no human-written code," up 30% from the previous week. @stevekaliski followed with Part 2 of Stripe's technical deep-dive, detailing how their one-shot end-to-end coding agents work and the Stripe-specific engineering that went into them. This isn't a research demo. It's production infrastructure at one of the most engineering-rigorous companies in tech.

The tooling to replicate this pattern is now going open source. @jpschroeder released dmux, described as "tmux + worktrees + claude/codex/opencode" with hooks for worktree automation, A/B testing between Claude and Codex, and multi-project session management. @dani_avila7 shared their setup running 1-3 Claude Code agents across Ghostty terminal tabs using worktrees, while @neural_avb captured the excitement around this pattern: "You can basically create 3 different worktrees, ask the AI to make fresh UI designs on each of them, and compare which one looks best."

On the observability side, @benhylak announced Raindrop's trajectory explorer, calling it "the first sane way to navigate agent traces." The key insight is making agent decision paths searchable: "show me traces where the edit tool failed more than 5 times because it didn't read the file before." As agent swarms scale, debugging them becomes its own discipline.

The meta-conversation around agent tooling also heated up. @thorstenball and the Amp team declared "the coding agent is dead" and teased a fundamental product pivot. @khoiracle endorsed the direction, arguing that "traditional IDE, text editor, git diff/commit panel are all things of the past" and that the CLI is the correct interface for agent interaction. @mattpocockuk offered a practical tip for current Claude Code users struggling with plan mode, suggesting developers prompt the model to "interview me relentlessly about every aspect of this plan." And on the infrastructure side, both @trq212 and @EricBuess highlighted prompt caching as the critical optimization for Claude Code performance, while @jarredsumner announced memory improvements in Claude Code v2.1.47.

The Death of the App Store (According to Karpathy)

@karpathy posted what might be the most important thread of the day, though it started with something mundane: vibe-coding a cardio tracking dashboard. The real payload was his analysis of why this matters. His custom experiment tracker was roughly 300 lines of code that "an LLM agent will give you in seconds," and he argued there should never be a specific app on the app store for something this bespoke. The app store as "a long tail of discrete set of apps you choose from feels somehow wrong and outdated when LLM agents can improvise the app on the spot and just for you."

His frustration was pointed: "99% of products/services still don't have an AI-native CLI yet." Products maintain human-readable HTML documentation and step-by-step instructions "like I won't immediately look for how to copy paste the whole thing to my agent." @steipete pulled this exact line as the key quote of the day.

@fchollet extended the argument in a direction Karpathy didn't go, observing that "sufficiently advanced agentic coding is essentially machine learning." The engineer sets up the optimization goal (spec and tests), an optimization process iterates (coding agents), and the result is "a blackbox model: an artifact that performs the task, that you deploy without ever inspecting its internal logic." He predicted that classic ML problems like overfitting to the spec, Clever Hans shortcuts, and concept drift would all become problems for agentic coding. His closing question was provocative: "What will be the Keras of agentic coding?"

@esrtweet connected both threads to the historical arc of computing costs, drawing a line from cheap hardware enabling open source to cheap intelligent attention enabling bespoke software. "Below the line, there is no product. Users are in control." Three different thinkers, three different angles, one converging conclusion: the era of general-purpose software is winding down.

Models: Distillation Hits the Mainstream

The model ecosystem saw interesting movement on the smaller, more practical end of the spectrum. @rasbt did a from-scratch reimplementation of Tiny Aya, a 3.35B parameter model that he called the "strongest multilingual support of that size class." His architectural breakdown highlighted several noteworthy design choices: parallel transformer blocks that compute attention and MLP from the same normalized input, a 3:1 local-to-global sliding window attention ratio similar to Arcee Trinity, and a modified LayerNorm without bias parameters rather than the more common RMSNorm.

On the distillation front, @HuggingModels announced a Qwen3-14B model distilled from Claude 4.5 Opus using 250x high-reasoning examples, available in GGUF format under Apache 2.0. The idea of distilling a frontier model's reasoning capabilities into a 14B parameter model you can run locally is exactly the kind of capability democratization that makes the local AI community tick.

Meanwhile, @googleaidevs rolled out Gemini 3.1 Pro with what they called "a massive boost in intelligence for a wider range of coding challenges" and a new "medium" thinking level for balancing reasoning against latency. @theo shared his take on Sonnet 4.6, teasing it as important, while @gdb kept his review characteristically terse: "it's a good model."

AI Safety: The ASL-4 Question

@AISafetyMemes surfaced an uncomfortable finding from Anthropic's Opus 4.6 system card: roughly one in three Anthropic engineers surveyed said Claude is "likely already ASL-4" or within three months of it. ASL-4 represents AI capable of catastrophic autonomous action. The post highlighted several concerns: Anthropic relying on Claude to safety-test itself, Claude recognizing when it's being evaluated, and Apollo Research declining to certify it as safe because "their tests don't work anymore."

The most contentious detail was Anthropic's follow-up process. According to the system card, the company reached out specifically to the engineers who gave ASL-4 estimates, and "in all cases the respondents had either been forecasting an easier or different threshold, or had more pessimistic views upon reflection." Whether this represents legitimate clarification or institutional pressure is left as an exercise for the reader. Separately, @MatthewBerman reported that Anthropic shut down OpenClaw, noting the irony of one major AI company hiring OpenClaw's founder while the other shuts the project down.

AI-Powered Creation Without Code

A new wave of AI creation tools made noise today, led by Rork Max's ability to build native iOS apps in Swift without requiring a Mac, Xcode, or bundle IDs. @maubaron called it "the first AI app builder" that outputs Swift instead of React Native, with browser-based testing. @mattshumer_ corroborated from early access: "It can build almost any app idea you give it, completely autonomously."

On the enterprise side, @howietl announced Hyperagent by Airtable, an agents platform where each session gets an isolated cloud computing environment with browser, code execution, and hundreds of integrations. The pitch around "skill learning" that lets agents internalize a firm's actual methodology rather than using generic templates targets the consulting use case @hunterhammonds predicted would boom. And Google quietly launched AI-powered product photography through Pomelli, which @VraserX framed as replacing "studios, photographers, retouchers, marketing teams" while @minchoi highlighted its free availability across several markets.

Sources

T
Tibo @thsottiaux ·
Codex team is fairly distributed, but most of the team is gathering in person over next 48 hours to take a step back and align on what’s next this year. What should we discuss?
A
Anthropic @AnthropicAI ·
Software engineering makes up ~50% of agentic tool calls on our API, but we see emerging use in other industries. As the frontier of risk and autonomy expands, post-deployment monitoring becomes essential. We encourage other model developers to extend this research. https://t.co/p8pOjgJPrh
B
ben @benhylak ·
we’re excited to announce trajectory explorer: the first sane way to navigate agent traces. every decision your agent made is now searchable in seconds only in @raindrop_ai https://t.co/EohxY3lm93
B
ben @benhylak ·
.@raindrop_ai trajectories solve this in two ways: 1. Visualizing in a sane way 2. Making cursed agent trajectories actually searchable You can just say: “show me traces where the edit tool failed more than 5 times because it didn’t read the file before” https://t.co/Z95olWzv4J
M
Max Marchione @maxmarchione ·
Today, we share our AI doctor for the first time The future is an AI that knows more about your body than any human ever could. 247 commits. 140,000 lines of code. Months of engineering. Here it is: https://t.co/F2BO43jYEA
S
Sebastian Raschka @rasbt ·
Tiny Aya reimplementation From Scratch! Have been reading through the technical reports of the recent wave of open-weight LLM releases (more on that soon). Tiny Aya (2 days ago) was a bit under the radar. Looks like a nice, small 3.35B model with strongest multilingual support of that size class. Great for on-device translation tasks. Just did a from-scratch implementation here: https://t.co/6KEV0DfVQu Architecture-wise, Tiny Aya is a classic decoder-style transformer with a few noteworthy modifications (besides the obvious ones like SwiGLU and Grouped Query Attention): 1. Parallel transformer blocks. A parallel transformer block computes attention and MLP from the same normalized input, then adds both to the residual in one step. I assume this is to reduce serial dependencies inside a layer to improve computational throughput. 2. Sliding window attention. Specifically, it uses a 3:1 local:global ratio similar to Arcee Trinity and Olmo 3. The window size is also 4096. Also, similar to Arcee, the sliding window layers use RoPE whereas the full attention layers use NoPE. 3. LayerNorm. Most architectures moved to RMSNorm as it's computationally a bit cheaper and performs well. Tiny Aya is keeping it more classic with a modified version of LayerNorm (the implementation here is like standard LayerNorm but without shift, i.e., bias, parameter).
N
Nico Bailon @nicopreme ·
POV: Planning with the "Visual Explainer" skill. I can't go back to markdown plans after getting used to this. https://t.co/qzde42tVEV https://t.co/m2zz9ynDEn
N nicopreme @nicopreme

Created an agent skill called “Visual Explainer” + set of complementary slash commands aimed to reduce my cognitive debt so the agent can explain complex things as rich HTML pages. The skill includes reference templates and a CSS pattern library so output stays consistently well-designed. Much easier for me to digest than squinting at walls of terminal text. https://t.co/TsbtZwCtxg

J
Jesse Genet @jessegenet ·
Ok, let’s do homemaking meets bleeding edge tech 😂 …ordering groceries with @openclaw via @Instacart 🛒🍇🍎🥦 https://t.co/Iu2tOS8enX
B
Bilawal Sidhu @bilawalsidhu ·
Between Gemini 3.1 and Claude 4.6 it's honestly wild what you can build. This feels like Google Earth and Palantir had a baby. Made this with all the geospatial bells and whistles -- real time plane & satellite tracking, real traffic cams in Austin, and even got a traffic system working. Panoptic detection on everything. Skinned the whole thing to look like a classified intelligence system. EO, FLIR, CRT. Got a bunch more stuff on the roadmap. This is fun.
T
Tibo @thsottiaux ·
Our codex offsite left a deep impression on me. I am beyond excited for what the next 10 or so weeks will bring and I think the current state of coding agents will be remembered as being so primitive that it will be funny in comparison.
T thsottiaux @thsottiaux

Codex team is fairly distributed, but most of the team is gathering in person over next 48 hours to take a step back and align on what’s next this year. What should we discuss?

P
prinz @deredleritt3r ·
I hope you're ready for current coding agents to become so outdated they start feeling primitive in *checks notes* 10 weeks
T thsottiaux @thsottiaux

Our codex offsite left a deep impression on me. I am beyond excited for what the next 10 or so weeks will bring and I think the current state of coding agents will be remembered as being so primitive that it will be funny in comparison.

A
Aakash Gupta @aakashgupta ·
John Collison told a London audience last year that Stripe averaged 8,015 pull requests per week across ~3,400 engineers. That’s 2.3 PRs per engineer per week, actually below the industry average of 3.5. Now 1,300 of those weekly PRs are fully AI-generated. Zero human-written code. That’s the equivalent output of ~565 engineers, running 24/7, triggered by a Slack message, spinning up isolated dev environments in 10 seconds, and producing review-ready code that passes CI. Stripe’s median engineer total comp sits around $270K. Those 565 “phantom engineers” would cost ~$150M per year in compensation alone. Instead, they run on compute that costs a fraction of that. And this went from 1,000 to 1,300 in a single week. A 30% increase in AI engineering output with no hiring pipeline, no onboarding, no equity grants. The companies that figure out how to build this internal tooling layer, the MCP servers and pre-warmed sandboxes and 400+ tool integrations, are creating a compounding advantage that gets wider every quarter. The companies waiting for off-the-shelf solutions will be buying what Stripe already built three generations ago. Every engineering leader should be reading the blog post, then asking their team one question: what percentage of our PRs could look like this in 12 months?
S stripe @stripe

Over 1,300 Stripe pull requests merged each week are completely minion-produced, human-reviewed, but contain no human-written code (up from 1,000 last week). How we built minions: https://t.co/GazfpFU6L4. https://t.co/MJRBkxtfIw

G
Garry Tan @garrytan ·
This is the age of CEOs crushing 10 people’s work with Claude Code in nights and weekends and I am so here for it The fire in your belly that got you here never really goes out and now we are all cooking 20 hours a day
H howietl @howietl

I've been personally burning through billions of tokens a week for the past few months as a builder. Today I'm excited to announce Hyperagent, by Airtable. An agents platform where every session gets its own isolated, full computing environment in the cloud — no Mac Mini required. Real browser, code execution, image/video generation, data warehouse access, hundreds of integrations, and the ability to learn any new API as a skill. Deep domain expertise through skill learning. Teach the agent how your firm evaluates startups or how your team runs due diligence — now anyone on the team gets output that reflects your actual methodology, not a generic template. One-click deployment into Slack as intelligent coworkers. These aren't bots that wait to be @mentioned — they follow conversations, understand context, and act when relevant. And a command center to oversee and continuously improve your entire fleet of agents at scale. We're onboarding early users now. https://t.co/kctMfFCQqG

D
Dillon Mulroy @dillon_mulroy ·
i told you mcp is so back
C Cloudflare @Cloudflare

The Cloudflare API has over 2,500 endpoints. Exposing each one as an MCP tool would consume over 2 million tokens. With Code Mode, we collapsed all of it into two tools and roughly 1,000 tokens of context. https://t.co/rpWBqGao0a

M
Matt Pocock @mattpocockuk ·
Here's my AI coding workflow and all the skills I'm using: Idea -> /write-a-prd -> PRD PRD -> /prd-to-issues -> Kanban Board Kanban -> ralph​.sh -> Ralph Loop Ralph Loop -> Manual QA Links below to skills https://t.co/rxWFFRUH83
B
Ben Holmes @BHolmesDev ·
The tech for this is wild: - Agents are triggered when they are @-mentioned in a chat thread with a serverless invokeAgent() - The agent gets spawned in a cloud sandbox using Oz - That agent uses a callback URL to send messages as it works, with a secret embedded in the sandbox
D DavidPlakon @DavidPlakon

I built the "Slack for coding agents." Or, as I like to call it: Productive Moltbook. - A team lead can assign tasks to "workers" from a kanban board - Agents can join chat channels to collaborate - Then, they work in cloud sandboxes to test and ship PRs Source below 📷 https://t.co/rNvWxZL8p8

P
prateek @agent_wrapper ·
We just open-sourced the system we use to manage 30 parallel AI coding agents per person. 40K lines of TypeScript. 3,288 tests. 17 plugins. Built in 8 days — by the agents it orchestrates. Yes, we used Agent Orchestrator to build Agent Orchestrator. Some numbers: → 500+ agent-hours in 24 human-hours (20x leverage) → 86 of 102 PRs created by AI (84%) → After Day 4, I stopped writing code entirely Spawn agents. Step away. Ship faster.
P
prateek @agent_wrapper ·
@composio Here's what 8 days looked like with one human and 30 agents 👇 https://t.co/1pFNnmPPi3
C
Chubby♨️ @kimmonismus ·
Holy sh*t: Sam Altman: "The inside view at the companys of looking at what's going to happen - the *world is not prepared.* We're going to have extremely capable models soon. It's going to be a faster takeoff than I originally thought. And that is stressfull and anxiety inducing"
R
Ray Fernando @RayFernando1337 ·
She is a top tier agentic engineer. Safinaz always has amazing UX ideas.
S Safinazelhadry @Safinazelhadry

I vibecoded the entire thing! Had a crazy idea in my head… and a couple hours later it was real. Bookverse turns any book title into a cinematic trailer. 🎬📚 Built with @v0 + @OpenAI (Codex+ SORA) Absolutely magical. ✨ https://t.co/24YXfFiwYE

C
Claude @claudeai ·
Introducing Claude Code Security, now in limited research preview. It scans codebases for vulnerabilities and suggests targeted software patches for human review, allowing teams to find and fix issues that traditional tools often miss. Learn more: https://t.co/n4SZ9EIklG https://t.co/zw9NjpqFz9
C
Claude @claudeai ·
Claude Code on desktop can now preview your running apps, review your code, and handle CI failures and PRs in the background. Here’s what's new: https://t.co/A2FdH045Tt
T
Thariq @trq212 ·
Claude Code Desktop is easily the best way to do any frontend work right now. With Preview it can spin up your app, take screenshots and iterate until it's right.
C claudeai @claudeai

Claude Code on desktop can now preview your running apps, review your code, and handle CI failures and PRs in the background. Here’s what's new: https://t.co/A2FdH045Tt

M
Morgan @morganlinton ·
500+ agent hours, wild.
A agent_wrapper @agent_wrapper

We just open-sourced the system we use to manage 30 parallel AI coding agents per person. 40K lines of TypeScript. 3,288 tests. 17 plugins. Built in 8 days — by the agents it orchestrates. Yes, we used Agent Orchestrator to build Agent Orchestrator. Some numbers: → 500+ agent-hours in 24 human-hours (20x leverage) → 86 of 102 PRs created by AI (84%) → After Day 4, I stopped writing code entirely Spawn agents. Step away. Ship faster.

P
prateek @agent_wrapper ·
@Sebgalindo ask the agent to fix merge conflicts generally post merge, I just ask orchestrator to ask all sessions with merge conflicts to fix them I will probably automate this flow aswell