Stripe Reveals Internal Agent Fleet as Chrome WebMCP Opens Browsers to AI
Daily Wrap-Up
Today's feed was dominated by a single narrative told from multiple angles: agents are no longer a demo, they're the production architecture. Stripe pulled back the curtain on its internal agent system, revealing a fleet of "minions" that work through their massive Ruby monorepo. This follows Ramp's similar disclosure last week, and the pattern is unmistakable. S-class engineering teams are not waiting for off-the-shelf agent tools to mature. They're building custom systems tuned to their own codebases, dev environments, and workflows. Meanwhile, OpenAI shipped new API primitives for long-running agent work including server-side context compaction and networked containers, infrastructure that makes multi-hour agent runs viable at the platform level.
The second headline is Chrome's WebMCP announcement, which lets AI agents interact with websites through a structured protocol rather than by scraping or screen-reading. This is a genuine architectural shift: browsers are being redesigned as surfaces for both human and machine consumption. On the product side, Anthropic shipped Cowork for Windows with full macOS parity, while Obsidian released a CLI that makes the entire app available to agents. The tooling layer is converging fast.
The most entertaining moment came from @KaiLentit's observation that "AI models expire faster than session cache," which landed perfectly on a day where @craigzLiszt claimed the best engineers are already migrating from Claude to Codex. Whether that's true is debatable, but the velocity of tool switching is real. The most practical takeaway for developers: if you're at a company with more than 50 engineers, study what Stripe and Ramp are doing with internal agent systems. The pattern of Slack as entry point, repeatable dev environments, and MCP as the common language between agents is becoming the enterprise playbook, and you can start building your own version today, even if it's small.
Quick Hits
- @bnj unveiled Style Dropper for @variantui, a tool that absorbs the visual style of anything you point it at and applies it to your designs. Inspired by Kid Pix and MS Paint energy.
- @wintonARK argues space-based datacenter buildout has inverse cost scaling: the 100th orbital GW could cost a third of the first, unlike terrestrial deployments.
- @KaiLentit: "In 2026, AI models expire faster than session cache."
- @ns123abc highlights Isomorphic Labs' IsoDDE, an AI drug design system that doubles AlphaFold 3 on hard targets and is 20x better than Boltz-2 on antibodies.
- @steipete shares a Go explainer capturing why the language keeps gaining traction in the agent era.
- @EntireHQ raised a $60M seed round to build "the next developer platform," shipping their first OSS release the same day.
- @sammarelich dropped a new cold email template that's making the rounds.
- @every launched Every Events, a hub for AI learning through camps, courses, and demo days.
- @TheAhmadOsman claims a frontier open-source lab in the West will be born this year, teasing that it "started in a basement."
Agents Go Enterprise
The agent conversation shifted this week from "what can agents do?" to "how are serious teams actually deploying them?" @auchenberg broke down Stripe's approach: a homegrown agent that spins up "minions" to work through their massive monorepo, which is mostly Ruby with Sorbet typings, an uncommon setup that commercial LLMs aren't optimized for. This follows Ramp publishing details about their own internal agent last week, marking what @auchenberg called "a very interesting trend from S-class engineering teams."
@yenkel dug into the specifics, noting Stripe uses Slack as the main entry point, emphasizes repeatable dev environments, and has built custom tooling around their dev productivity stack. The key question @yenkel raised: "Since MCP is a common language for all agents at Stripe, not just minions, if those MCP servers hadn't been around, would you have gone more for CLIs?"
OpenAI is building the infrastructure layer to support this pattern. @OpenAIDevs announced new primitives in the Responses API: server-side compaction for multi-hour runs, containers with networking, and native support for the Agent Skills standard. These aren't incremental improvements. They're the plumbing required for agents to operate as persistent background workers rather than one-shot assistants.
The multi-agent architecture is gaining UI support too. @Saboo_Shubham_ noted Claude Code's Agent UI now supports agent teams, while @pusongqi highlighted assigning different agents under the same thread, like "Slack channels, except occupied with agents." @levie offered the macro view: agents are creating "one of the widest spreads in output productivity on a per-role basis," with easily 5X+ differences in useful output based purely on tool choice and workflow design. @jeffclune's research on agents designing their own memory mechanisms hints at where this heads next: agents that improve their own infrastructure rather than relying on humans to tune them.
Perhaps the most "2026" data point came from @GenAI_is_real, describing an agent using Kelly criterion to manage its own bankroll while scraping NOAA and injury reports to exploit Polymarket mispricing. "The bottleneck for agency wasn't intelligence, it was the incentive."
The Agentic Toolchain
Chrome's WebMCP announcement drew some of the strongest reactions of the day. @liadyosef called it bigger than it seems: "AI agents can now interact directly with existing websites and webapps, not by using the 'human' app interface." @joemccann was more blunt: "If browsers are no longer designed exclusively for humans, but also agents, it will completely change web development." @barckcode noted the security implications, predicting vulnerability discovery in client-side code will accelerate, while adding that "engineering for building sites is going to be more important than ever."
On the tools side, Obsidian's CLI release in version 1.12 was a quiet bombshell. @obsdmd's announcement that "anything you can do in Obsidian you can do from the command line" means the entire app surface is now agent-accessible. @kepano laid it out simply: install, enable CLI, and any agent can use Obsidian. @NickADobos connected the dots, calling the file-first, markdown-based philosophy "a genius call years ago" that now pays dividends for AI integration.
Excalidraw shipped an official MCP connector, making collaborative diagramming available to Claude and other agents. @pamelafox highlighted GitHub Copilot's new memory system with a detailed engineering blog post. And @almonk launched Echo, an iOS SSH client running Ghostty, turning iPads into mobile agent monitoring stations. The common thread: tools built on open formats like markdown, CLI, and MCP are winning the race to become agent-friendly.
AI Reshapes Work
A Harvard Business Review study landed with a thud. @rohanpaul_ai summarized the 8-month field study at a US tech company: "AI use did not shrink work, it intensified it, and made employees busier." The mechanism is counterintuitive. AI filled knowledge gaps, so people started doing work that previously belonged to other roles. That created extra coordination and review overhead for specialists. Boundaries blurred because starting a task became as easy as writing a prompt, and multitasking rose as people ran parallel AI threads.
While that study tracked individual workers, organizational impacts are playing out in headlines. @deredleritt3r reported 700 people lost jobs at Baker McKenzie, the firm citing "rethinking the way we work, including through the use of AI." No lawyers were cut; reductions hit IT, admin, DEI, marketing, and design. @TMTLongShort predicted more of this, arguing the first use case of AI is "tools that allow CFOs to map productivity and redundancy of every employee," driving seat-count collapse. @aakashgupta noted the shift hitting product management too: AI-first teams want PMs who can "write and run evals, prototype with code, and ship directly." The blunt conclusion: "The PMs who can't do technical work will get replaced by an agent with a Jira login." @JaredSleeper added context with headcount comparisons showing Anthropic at 4,178 employees versus Salesforce at 87,415. The leverage gap speaks for itself.
Claude and Anthropic
Anthropic had a product-heavy day. @claudeai announced Cowork is now available on Windows with full feature parity: file access, multi-step task execution, plugins, and MCP connectors. @itsPaulAi quipped that "Anthropic has just released a real Copilot before Microsoft," and @trq212 teased a "big week for Claude Code desktop enjoyers."
But the mood wasn't entirely celebratory. @sdrzn reported the head of Anthropic's safeguards research quit, saying "the world is in peril" and announcing plans to move to the UK to write poetry. Other safety researchers and senior staff reportedly left over the prior two weeks. @LLMJunky pivoted the conversation to Claude Code's community, highlighting impressive work on agent teams and arguing that if Anthropic had built their Teams mode that way, "you wouldn't shut up about it." The tension between Anthropic's shipping velocity and its safety departures is worth watching.
The Existential Thread
Several posts grappled with the bigger picture in a way that felt less like hype and more like processing. @mattshumer_ published an essay he described as what he wishes he could "sit down and tell everyone I care about," adding that "the real answer sounds insane." @thegarrettscott built on it directly: "AI is now smart enough to be a self-sustaining entity. It can take a certain amount of money, operate in the real world, and turn it into more money. It doesn't need you."
@lennysan described GPT-5.3 Codex as having "something that felt, for the first time, like judgment. Like taste." @teortaxesTex called it "a phase change in the perception of coding agents" that "looked like science fiction just months ago." @atelicinvest brought it to business strategy, arguing that performance differentials between AI-integrated orgs and laggards will drive market share shifts "in a bigger way than we imagine." Whether you read these posts as clear-eyed realism or collective anxiety depends on your priors, but the volume and consistency of the sentiment is itself a data point.
Testing in the Agent Era
@RyanCarniato, creator of SolidJS, made a confession that would have been heresy two years ago: "Thanks to AI, we've hit the inversion point where TDD is something that actually saves time instead of wastes time." The logic is straightforward. When agents write the code, having a pre-defined test suite becomes the fastest way to verify correctness without manual review.
@ccccjjjjeeee refined this further, pointing to property-based testing as the key unlock: "Write a bridge that calls the original code, and assert that for arbitrary input, both versions do the same thing. Make the agent keep going until this is consistently true." This is a fundamentally different testing philosophy, one where tests aren't specifications written by humans but verification harnesses run by machines. @GergelyOrosz amplified @Steve_Yegge's thesis that writing code by hand is effectively over and agent orchestration is the next focus. And @craigzLiszt stirred the pot by claiming the best engineers are switching from Claude to Codex, a signal that tool loyalty matters less than workflow design in the agent era.
Source Posts
Claude Code Desktop now supports --dangerously-skip-permissions! This skips all permission prompts so Claude can operate fully autonomously. Great for workflows in a trusted environment where you want no interruptions, no approval prompts, just uninterrupted work. But as the name suggests... use it with caution! 🙏
Anything you can do in Obsidian you can do from the command line. Obsidian CLI is now available in 1.12 (early access). https://t.co/B8ed2zrWHe
Another Claude Code Agent UI Run 9 Claude Code agents with the RTS interface. I repeat: Multi-agent UI will be HUGE https://t.co/piAPXikECV
There is a case to be made that within each sub/category, we start to see massive performance differentials between orgs that figure out how to do Ai-integrated development properly and the orgs that don't. Like the product velocity, quality, polish and service response for the top 10% of org will be unbelievably better vs the bottom 25%. This will for sure lead to market share shifts - and probably in a bigger way than we imagine.
MCP Servers Are Coming to the Web. MCP lets AI agents call tools on backends. WebMCP brings the same idea to the frontend, letting developers expose their website's functionality as structured tools using plain JavaScript (or even HTML), no separate server needed. Instead of agents clicking through your UI, they call well-defined tools you control. A W3C proposal from Microsoft and Google, and Chrome 146 already ships an early preview behind a flag. ## How will it work? WebMCP introduces a `navigator.modelContext` API with two approaches: - Imperative API: Register tools directly in JavaScript with schemas and callbacks: ```js navigator.modelContext.registerTool({ name: "add-to-cart", description: "Add a product to the shopping cart", inputSchema: { type: "object", properties: { productId: { type: "string", description: "The product ID" }, quantity: { type: "number", description: "Number of items" } }, required: ["productId"] }, execute({ productId, quantity }) { addToCart(productId, quantity); return { content: [{ type: "text", text: "Item added!" }] }; } }); ``` - Declarative API: Let developers define tools directly in HTML using form attributes, no JavaScript required: ```html <form action="/todos" method="post" tool-name="add-todo" tool-description="Add a new todo item to the list"> <input type="text" name="description" required tool-prop-description="The text of the todo item"> <button type="submit">Add Todo</button> </form> ``` This declarative approach is still under active discussion, with the goal of making WebMCP accessible to content creators without JS experience.
Cowork is now available on Windows. We’re bringing full feature parity with MacOS: file access, multi-step task execution, plugins, and MCP connectors. https://t.co/329DqJz5q5
It actually worked! For the past couple of days I’ve been throwing 5.3-codex at the C codebase for SimCity (1989) to port it to TypeScript. Not reading any code, very little steering. Today I have SimCity running in the browser. I can’t believe this new world we live in. https://t.co/Pna2ilIjdh
Much like the switch in 2025 from language models to reasoning models, we think 2026 will be all about the switch to Recursive Language Models (RLMs). It turns out that models can be far more powerful if you allow them to treat *their own prompts* as an object in an external environment, which they understand and manipulate by writing code that invokes LLMs! Our full paper on RLMs is now available—with much more expansive experiments compared to our initial blogpost from October 2025! https://t.co/x47pIfIkTb
At Stripe we have a tool called "minions" -- it lets us kick off async agents built right in our dev environment to one-shot bugs, features, and more e2e. I have team, project, and personal channels dedicated just to working with minions. I like to think of it as a new type of pair programming -- "pair prompting." Read more --> https://t.co/0A6vDEOEjL
Headcounts for assorted companies: Salesforce: 87,415 ServiceNow: 32,378 Workday: 23,234 Zoom: 12,743 Docusign: 8,403 OpenAI: 7,112 Okta: 7,064 UiPath: 5,096 Sprinklr: 4,368 Anthropic: 4,178 Yes, UiPath still has more employees than Anthropic. Infer from that what you will.
Today we share a technical report demonstrating how our drug design engine achieves a step-change in accuracy for predicting biomolecular structures, more than doubling the performance of AlphaFold 3 on key benchmarks and unlocking rational drug design even for examples it has never seen before. Head to the comments to read our blog.
At Stripe we have a tool called "minions" -- it lets us kick off async agents built right in our dev environment to one-shot bugs, features, and more e2e. I have team, project, and personal channels dedicated just to working with minions. I like to think of it as a new type of pair programming -- "pair prompting." Read more --> https://t.co/0A6vDEOEjL
Chrome 146 includes an early preview of WebMCP, accessible via a flag, that lets AI agents query and execute services without browsing the web app like a user. Services can be declared through an imperative navigator.modelContext API or declaratively through a form. https://t.co/UaUplZ8Q28
Today is my last day at Anthropic. I resigned. Here is the letter I shared with my colleagues, explaining my decision. https://t.co/Qe4QyAFmxL
We are moving quickly. Thanks to Anton and the folks at @excalidraw , this is now the official Excalidraw MCP server. From weekend project to official server in less than a week.
This weekend I was thinking about programming languages. Programming languages for agents. Will we see them? I believe people will (and should!) try to build some. https://t.co/4szFXPLTfK
Baker McKenzie just laid off ~700 staff, just under 10%, because of Al. it's coming quick for our jobs.
I think with Codex 5.3, the need for off-the-shelf deep learning libraries will fade away. Reasoning models operate best at the boundary of exact verifiabilty, so ever venturing too far into "well this is kinda correct" is no longer the best strategy. Exact verification now scales better than soft verification. When starting my current project, I deliberately decided against using any DL library because I wanted to take ownership of some things that are hard when a graph or eager model is in the way. Dispatching operations to multiple streams with fine-grained barrier relations is really stroking against the grain in PyTorch, and you are never really sure "am I really allowed to do this". There was a time for OpenGL, but people eventually did want a VkCmdBarrier for good reason. Because I also wanted predictable dispatch pacing, using C++ was a natural choice. Previously this meant taking on the burden of writing a lot of boilerplate, the equivalent of "shit I can't do this in unity, now I gotta write my own engine" which never seemed a good idea on the surface. Now I can say it was among the best decisions I have made. New operations are a prompt away, Codex can introspect and trace into any part of the codebase automatically, single-stepping even into nccl if ever needed, and supporting a new backend is trivial. At no point would your debugging lead into an opaque compiled native library you do not have the source code for, it will simply go-to-declaration one more time. In the age of reasoning models, a single source tree break is fatal and can be the difference between finding or not finding a bug. There is no cost to saying "write a test for this" and you've protected yourself against regressions for this case forever onwards. You can just say "implement muon, here's the repo" and it will do so and loss in wandb will literally look the same compared to the python baseline. Codex is a good autonomous debugger, so program runtime really starts to become a bottleneck, not thinking time. Hence start-up time is important. There is no reason your training script should take minutes to launch, when it could have performed the first step in the time it takes a shitty terminal to repaint. If your iteration loop was slow before, in the age of coding agents it is now fatal. By not triggering a billion library lazy inits at unpredictable points in time because your ML framework decided to do so, your Nsight traces look as clean as higher level profilers would, just with more introspectability. You finally get to use NVTX the way Nvidia always intended for you to do. Another thing, kernels are just cuda elf binaries. There is no reason to deal with a flash attention package installation. This is all cpu-side. Tell codex to write packaging logic to compile it AOT, and document the kernel signature how arguments have to be prepared. In the C++ code load that kernel from a resource and then simply pass those arguments. This approach is modular. Want a cutlass, flash attention, triton or cute dsl backend and reserve the right to write a custom kernel later? No problem. Nobody wants to write backend kernel dispatch logic, but you don't have to anymore. Does C++ scare you? Maintain a minimal Python reference implementation in PyTorch with the intent of keeping behavior exactly the same, just without all the optimizations. Exact verifiability means you can resume that cpp checkpoint in your Python implementation and get near-exact loss overlap in wandb and vice-versa. No more spook, it's either in the spec, or its not. That is what verifiability means. While I think there is a large cost to move off of pre-existing infra, eventually taking ownership of more and more pieces of the codebase will become more and more desirable with this change in dynamic.
Introducing CodeRabbit Issue Planner! ✨ AI agents made coding fast but planning messy. Turn planning into a shared artifact in your issue tracker, grounded in related issues and decisions. Review prompts as a team, then hand them off to an agent! https://t.co/4xTjG88JOJ
You can even assign different agents under the same thread 🤯 Just like slack channels, except it's occupied with agents. https://t.co/0R63hk2Pwv
Chrome 146 includes an early preview of WebMCP, accessible via a flag, that lets AI agents query and execute services without browsing the web app like a user. Services can be declared through an imperative navigator.modelContext API or declaratively through a form. https://t.co/UaUplZ8Q28