AI Learning Digest.

Stripe Reveals Internal Agent Fleet as Chrome WebMCP Opens Browsers to AI

Daily Wrap-Up

Today's feed was dominated by a single narrative told from multiple angles: agents are no longer a demo, they're the production architecture. Stripe pulled back the curtain on its internal agent system, revealing a fleet of "minions" that work through their massive Ruby monorepo. This follows Ramp's similar disclosure last week, and the pattern is unmistakable. S-class engineering teams are not waiting for off-the-shelf agent tools to mature. They're building custom systems tuned to their own codebases, dev environments, and workflows. Meanwhile, OpenAI shipped new API primitives for long-running agent work including server-side context compaction and networked containers, infrastructure that makes multi-hour agent runs viable at the platform level.

The second headline is Chrome's WebMCP announcement, which lets AI agents interact with websites through a structured protocol rather than by scraping or screen-reading. This is a genuine architectural shift: browsers are being redesigned as surfaces for both human and machine consumption. On the product side, Anthropic shipped Cowork for Windows with full macOS parity, while Obsidian released a CLI that makes the entire app available to agents. The tooling layer is converging fast.

The most entertaining moment came from @KaiLentit's observation that "AI models expire faster than session cache," which landed perfectly on a day where @craigzLiszt claimed the best engineers are already migrating from Claude to Codex. Whether that's true is debatable, but the velocity of tool switching is real. The most practical takeaway for developers: if you're at a company with more than 50 engineers, study what Stripe and Ramp are doing with internal agent systems. The pattern of Slack as entry point, repeatable dev environments, and MCP as the common language between agents is becoming the enterprise playbook, and you can start building your own version today, even if it's small.

Quick Hits

  • @bnj unveiled Style Dropper for @variantui, a tool that absorbs the visual style of anything you point it at and applies it to your designs. Inspired by Kid Pix and MS Paint energy.
  • @wintonARK argues space-based datacenter buildout has inverse cost scaling: the 100th orbital GW could cost a third of the first, unlike terrestrial deployments.
  • @KaiLentit: "In 2026, AI models expire faster than session cache."
  • @ns123abc highlights Isomorphic Labs' IsoDDE, an AI drug design system that doubles AlphaFold 3 on hard targets and is 20x better than Boltz-2 on antibodies.
  • @steipete shares a Go explainer capturing why the language keeps gaining traction in the agent era.
  • @EntireHQ raised a $60M seed round to build "the next developer platform," shipping their first OSS release the same day.
  • @sammarelich dropped a new cold email template that's making the rounds.
  • @every launched Every Events, a hub for AI learning through camps, courses, and demo days.
  • @TheAhmadOsman claims a frontier open-source lab in the West will be born this year, teasing that it "started in a basement."

Agents Go Enterprise

The agent conversation shifted this week from "what can agents do?" to "how are serious teams actually deploying them?" @auchenberg broke down Stripe's approach: a homegrown agent that spins up "minions" to work through their massive monorepo, which is mostly Ruby with Sorbet typings, an uncommon setup that commercial LLMs aren't optimized for. This follows Ramp publishing details about their own internal agent last week, marking what @auchenberg called "a very interesting trend from S-class engineering teams."

@yenkel dug into the specifics, noting Stripe uses Slack as the main entry point, emphasizes repeatable dev environments, and has built custom tooling around their dev productivity stack. The key question @yenkel raised: "Since MCP is a common language for all agents at Stripe, not just minions, if those MCP servers hadn't been around, would you have gone more for CLIs?"

OpenAI is building the infrastructure layer to support this pattern. @OpenAIDevs announced new primitives in the Responses API: server-side compaction for multi-hour runs, containers with networking, and native support for the Agent Skills standard. These aren't incremental improvements. They're the plumbing required for agents to operate as persistent background workers rather than one-shot assistants.

The multi-agent architecture is gaining UI support too. @Saboo_Shubham_ noted Claude Code's Agent UI now supports agent teams, while @pusongqi highlighted assigning different agents under the same thread, like "Slack channels, except occupied with agents." @levie offered the macro view: agents are creating "one of the widest spreads in output productivity on a per-role basis," with easily 5X+ differences in useful output based purely on tool choice and workflow design. @jeffclune's research on agents designing their own memory mechanisms hints at where this heads next: agents that improve their own infrastructure rather than relying on humans to tune them.

Perhaps the most "2026" data point came from @GenAI_is_real, describing an agent using Kelly criterion to manage its own bankroll while scraping NOAA and injury reports to exploit Polymarket mispricing. "The bottleneck for agency wasn't intelligence, it was the incentive."

The Agentic Toolchain

Chrome's WebMCP announcement drew some of the strongest reactions of the day. @liadyosef called it bigger than it seems: "AI agents can now interact directly with existing websites and webapps, not by using the 'human' app interface." @joemccann was more blunt: "If browsers are no longer designed exclusively for humans, but also agents, it will completely change web development." @barckcode noted the security implications, predicting vulnerability discovery in client-side code will accelerate, while adding that "engineering for building sites is going to be more important than ever."

On the tools side, Obsidian's CLI release in version 1.12 was a quiet bombshell. @obsdmd's announcement that "anything you can do in Obsidian you can do from the command line" means the entire app surface is now agent-accessible. @kepano laid it out simply: install, enable CLI, and any agent can use Obsidian. @NickADobos connected the dots, calling the file-first, markdown-based philosophy "a genius call years ago" that now pays dividends for AI integration.

Excalidraw shipped an official MCP connector, making collaborative diagramming available to Claude and other agents. @pamelafox highlighted GitHub Copilot's new memory system with a detailed engineering blog post. And @almonk launched Echo, an iOS SSH client running Ghostty, turning iPads into mobile agent monitoring stations. The common thread: tools built on open formats like markdown, CLI, and MCP are winning the race to become agent-friendly.

AI Reshapes Work

A Harvard Business Review study landed with a thud. @rohanpaul_ai summarized the 8-month field study at a US tech company: "AI use did not shrink work, it intensified it, and made employees busier." The mechanism is counterintuitive. AI filled knowledge gaps, so people started doing work that previously belonged to other roles. That created extra coordination and review overhead for specialists. Boundaries blurred because starting a task became as easy as writing a prompt, and multitasking rose as people ran parallel AI threads.

While that study tracked individual workers, organizational impacts are playing out in headlines. @deredleritt3r reported 700 people lost jobs at Baker McKenzie, the firm citing "rethinking the way we work, including through the use of AI." No lawyers were cut; reductions hit IT, admin, DEI, marketing, and design. @TMTLongShort predicted more of this, arguing the first use case of AI is "tools that allow CFOs to map productivity and redundancy of every employee," driving seat-count collapse. @aakashgupta noted the shift hitting product management too: AI-first teams want PMs who can "write and run evals, prototype with code, and ship directly." The blunt conclusion: "The PMs who can't do technical work will get replaced by an agent with a Jira login." @JaredSleeper added context with headcount comparisons showing Anthropic at 4,178 employees versus Salesforce at 87,415. The leverage gap speaks for itself.

Claude and Anthropic

Anthropic had a product-heavy day. @claudeai announced Cowork is now available on Windows with full feature parity: file access, multi-step task execution, plugins, and MCP connectors. @itsPaulAi quipped that "Anthropic has just released a real Copilot before Microsoft," and @trq212 teased a "big week for Claude Code desktop enjoyers."

But the mood wasn't entirely celebratory. @sdrzn reported the head of Anthropic's safeguards research quit, saying "the world is in peril" and announcing plans to move to the UK to write poetry. Other safety researchers and senior staff reportedly left over the prior two weeks. @LLMJunky pivoted the conversation to Claude Code's community, highlighting impressive work on agent teams and arguing that if Anthropic had built their Teams mode that way, "you wouldn't shut up about it." The tension between Anthropic's shipping velocity and its safety departures is worth watching.

The Existential Thread

Several posts grappled with the bigger picture in a way that felt less like hype and more like processing. @mattshumer_ published an essay he described as what he wishes he could "sit down and tell everyone I care about," adding that "the real answer sounds insane." @thegarrettscott built on it directly: "AI is now smart enough to be a self-sustaining entity. It can take a certain amount of money, operate in the real world, and turn it into more money. It doesn't need you."

@lennysan described GPT-5.3 Codex as having "something that felt, for the first time, like judgment. Like taste." @teortaxesTex called it "a phase change in the perception of coding agents" that "looked like science fiction just months ago." @atelicinvest brought it to business strategy, arguing that performance differentials between AI-integrated orgs and laggards will drive market share shifts "in a bigger way than we imagine." Whether you read these posts as clear-eyed realism or collective anxiety depends on your priors, but the volume and consistency of the sentiment is itself a data point.

Testing in the Agent Era

@RyanCarniato, creator of SolidJS, made a confession that would have been heresy two years ago: "Thanks to AI, we've hit the inversion point where TDD is something that actually saves time instead of wastes time." The logic is straightforward. When agents write the code, having a pre-defined test suite becomes the fastest way to verify correctness without manual review.

@ccccjjjjeeee refined this further, pointing to property-based testing as the key unlock: "Write a bridge that calls the original code, and assert that for arbitrary input, both versions do the same thing. Make the agent keep going until this is consistently true." This is a fundamentally different testing philosophy, one where tests aren't specifications written by humans but verification harnesses run by machines. @GergelyOrosz amplified @Steve_Yegge's thesis that writing code by hand is effectively over and agent orchestration is the next focus. And @craigzLiszt stirred the pot by claiming the best engineers are switching from Claude to Codex, a signal that tool loyalty matters less than workflow design in the agent era.

Source Posts

T
Thariq @trq212 ·
big week for Claude code desktop enjoyers coming up
L Lydia Hallie ✨ @lydiahallie

Claude Code Desktop now supports --dangerously-skip-permissions! This skips all permission prompts so Claude can operate fully autonomously. Great for workflows in a trusted environment where you want no interruptions, no approval prompts, just uninterrupted work. But as the name suggests... use it with caution! 🙏

k
kepano @kepano ·
1. install Obsidian 1.12 2. enable CLI 3. now OpenClaw, OpenCode, Claude Code, Codex, or any other agent can use Obsidian
O Obsidian @obsdmd

Anything you can do in Obsidian you can do from the command line. Obsidian CLI is now available in 1.12 (early access). https://t.co/B8ed2zrWHe

C
Craig Weiss @craigzLiszt ·
nearly all of the best engineers i know are switching from claude to codex
S
Shubham Saboo @Saboo_Shubham_ ·
Claude Code Agent UI now support Agent teams. Multi-agent gaming UI will be HUGE
S Shubham Saboo @Saboo_Shubham_

Another Claude Code Agent UI Run 9 Claude Code agents with the RTS interface. I repeat: Multi-agent UI will be HUGE https://t.co/piAPXikECV

A
Aaron Levie @levie ·
The effective use of agents is creating one of the widest spreads in output productivity we’ve seen on a per role basis. We didn’t see this with chatbots previously. Chatbots probably sped up work by maybe 10-20% in most cases because they largely accelerate the research on a topic you would otherwise do in a few steps manually. Now, with agents, you could take the exact same engineer and easily see a 5X+ difference in the amount of useful output simply based on their choice of tools and how they’ve designed their workflows. There probably hasn’t been a period in tech or where a couple decisions and changes to your process drive this much leverage. As this continues to expand beyond coding, this will be one of the biggest shocks to the system of what work looks like in most fields. This will happen in legal, finance, life sciences, and other areas that have previously been constrained by how much information you can process or produce. Most areas of knowledge work still imagine AI as a chatbot paradigm and not yet a full agent-executing-work-for-you paradigm. But it’s coming.
U Unemployed Capital Allocator @atelicinvest

There is a case to be made that within each sub/category, we start to see massive performance differentials between orgs that figure out how to do Ai-integrated development properly and the orgs that don't. Like the product velocity, quality, polish and service response for the top 10% of org will be unbelievably better vs the bottom 25%. This will for sure lead to market share shifts - and probably in a bigger way than we imagine.

M
Maximiliano Firtman @firt ·
@tymzap This one runs in the frontend and it's consumed by agentic browsers
C
Cristian Córdova 🐧 @barckcode ·
👀 Esto del WebMCP que va a meter Chrome va a ser loco. El scrapping de sitios web va a ser más fácil que nunca y eso que con la IA se volvió mucho más sencillo. Y bueno, el descubrimiento de vulnerabilidades en cliente me da que también. Lejos de desaparecer, la ingeniería para construir sitios va a ser más importante que nunca
P Philipp Schmid @_philschmid

MCP Servers Are Coming to the Web. MCP lets AI agents call tools on backends. WebMCP brings the same idea to the frontend, letting developers expose their website's functionality as structured tools using plain JavaScript (or even HTML), no separate server needed. Instead of agents clicking through your UI, they call well-defined tools you control. A W3C proposal from Microsoft and Google, and Chrome 146 already ships an early preview behind a flag. ## How will it work? WebMCP introduces a `navigator.modelContext` API with two approaches: - Imperative API: Register tools directly in JavaScript with schemas and callbacks: ```js navigator.modelContext.registerTool({ name: "add-to-cart", description: "Add a product to the shopping cart", inputSchema: { type: "object", properties: { productId: { type: "string", description: "The product ID" }, quantity: { type: "number", description: "Number of items" } }, required: ["productId"] }, execute({ productId, quantity }) { addToCart(productId, quantity); return { content: [{ type: "text", text: "Item added!" }] }; } }); ``` - Declarative API: Let developers define tools directly in HTML using form attributes, no JavaScript required: ```html <form action="/todos" method="post" tool-name="add-todo" tool-description="Add a new todo item to the list"> <input type="text" name="description" required tool-prop-description="The text of the todo item"> <button type="submit">Add Todo</button> </form> ``` This declarative approach is still under active discussion, with the goal of making WebMCP accessible to content creators without JS experience.

O
Obsidian @obsdmd ·
Anything you can do in Obsidian you can do from the command line. Obsidian CLI is now available in 1.12 (early access). https://t.co/B8ed2zrWHe
B
Ben South @bnj ·
Available now on https://t.co/mLvSkdCoHg
P
Paul Couvert @itsPaulAi ·
So Anthropic has just released a real Copilot before Microsoft...
C Claude @claudeai

Cowork is now available on Windows. We’re bringing full feature parity with MacOS: file access, multi-step task execution, plugins, and MCP connectors. https://t.co/329DqJz5q5

R
Rohan Paul @rohanpaul_ai ·
A super interesting new study from Harvard Business Review. A 8-month field study at a US tech company with about 200 employees found that AI use did not shrink work, it intensified it, and made employees busier. Task expansion happened because AI filled in gaps in knowledge, so people started doing work that used to belong to other roles or would have been outsourced or deferred. That shift created extra coordination and review work for specialists, including fixing AI-assisted drafts and coaching colleagues whose work was only partly correct or complete. Boundaries blurred because starting became as easy as writing a prompt, so work slipped into lunch, meetings, and the minutes right before stepping away. Multitasking rose because people ran multiple AI threads at once and kept checking outputs, which increased attention switching and mental load. Over time, this faster rhythm raised expectations for speed through what became visible and normal, even without explicit pressure from managers.
C
Christopher Ehrlich @ccccjjjjeeee ·
By the way, the secret to this is property-based testing. Write a bridge that calls the original code, and assert that for arbitrary input, both versions do the same thing. Make the agent keep going until this is consistently true.
C Christopher Ehrlich @ccccjjjjeeee

It actually worked! For the past couple of days I’ve been throwing 5.3-codex at the C codebase for SimCity (1989) to port it to TypeScript. Not reading any code, very little steering. Today I have SimCity running in the browser. I can’t believe this new world we live in. https://t.co/Pna2ilIjdh

S
Steven Pu @pusongqi ·
You can even assign different agents under the same thread 🤯 Just like slack channels, except it's occupied with agents. https://t.co/0R63hk2Pwv
B
Bryan Kim @kirbyman01 ·
A smaller model that recursively calls itself now can outperforms a bigger model on hard tasks at lower cost. Founders who win: taste in system design + technical depth to appreciate new inference paradigms + product sense to turn capabilities into experiences.
A Alex L Zhang @a1zhang

Much like the switch in 2025 from language models to reasoning models, we think 2026 will be all about the switch to Recursive Language Models (RLMs). It turns out that models can be far more powerful if you allow them to treat *their own prompts* as an object in an external environment, which they understand and manipulate by writing code that invokes LLMs! Our full paper on RLMs is now available—with much more expansive experiments compared to our initial blogpost from October 2025! https://t.co/x47pIfIkTb

O
OpenAI Developers @OpenAIDevs ·
We're introducing a new set of primitives in the Responses API for long-running agentic work on computers. Server-side compaction • Enable multi-hour agent runs without hitting context limits. Containers with networking • Give OpenAI-hosted containers controlled internet access to install libraries and run scripts. Skills in the API • Native support for the Agent Skills standard and our first pre-built spreadsheets skill. https://t.co/vK9fbhHQdq
y
yenkel @yenkel ·
following on @ramp’s steps, @StripeDev shares about their internal background dev agents main takeaways - slack as main entry point - importance of repeatable dev env - custom for their dev productivity tools question @stevekaliski: “Since MCP is a common language for all agents at Stripe, not just minions” if those mod servers hadn’t been around, would you have gone more for CLIs? looking forward to part 2
S Steve Kaliski @stevekaliski

At Stripe we have a tool called "minions" -- it lets us kick off async agents built right in our dev environment to one-shot bugs, features, and more e2e. I have team, project, and personal channels dedicated just to working with minions. I like to think of it as a new type of pair programming -- "pair prompting." Read more --> https://t.co/0A6vDEOEjL

C
Claude @claudeai ·
Cowork is now available on Windows. We’re bringing full feature parity with MacOS: file access, multi-step task execution, plugins, and MCP connectors. https://t.co/329DqJz5q5
J
Just Another Pod Guy @TMTLongShort ·
Bloodbath is coming. Budgets need to to be freed up to simultaneously pay for GPUs/AI-tools while also showing investors rapid FCF-SBC expansion. The first use-case of AI is the tools that allow CFOs to map productivity and redundancy of every employee. This in-turn drives the “seat-count collapse” and “SaaS is dead” narratives forcing CFOs to be even more aggressive. Meanwhile every CEO will race to performatively lean into Claude Coding on weekends in the hopes that he convinces his board he is a “war-time CEO” even tho he has spent the last decade skiing in Aspen from Thursday - Sunday and hasn’t produced a line of code in a decade
J Jared Sleeper @JaredSleeper

Headcounts for assorted companies: Salesforce: 87,415 ServiceNow: 32,378 Workday: 23,234 Zoom: 12,743 Docusign: 8,403 OpenAI: 7,112 Okta: 7,064 UiPath: 5,096 Sprinklr: 4,368 Anthropic: 4,178 Yes, UiPath still has more employees than Anthropic. Infer from that what you will.

E
Entire @EntireHQ ·
Beep, boop. Come in, rebels. We’ve raised a 60m seed round to build the next developer platform. Open. Scalable. Independent. And we ship our first OSS release today. https://t.co/OvPKCcjXbq
N
NIK @ns123abc ·
Pharma is COOKED Isomorphic Labs just revealed IsoDDE: an AI system that designs drugs on a computer faster than any pharma R&D >doubles AlphaFold 3 on hard targets >20x better than Boltz-2 on antibodies >beats the physics gold standard at binding >found drug pockets from sequence alone that took 15 years to discover IsoDDE isn’t new btw. They’ve already been cooking on real drug programs for YEARS: “Brilliant scientific breakthroughs for next gen medicines” already achieved —@maxjaderberg And remember when Sir Demis Hassabis said all disease will be cured in 10 years? After today that doesn’t sound crazy anymore… Isomorphic Labs is the most underrated lab on earth and it’s not even close
I Isomorphic Labs @IsomorphicLabs

Today we share a technical report demonstrating how our drug design engine achieves a step-change in accuracy for predicting biomolecular structures, more than doubling the performance of AlphaFold 3 on key benchmarks and unlocking rational drug design even for examples it has never seen before. Head to the comments to read our blog.

R
Ryan Carniato @RyanCarniato ·
I never thought this day would come. Thanks to AI, we've hit the inversion point where TDD is something that actually saves time instead of wastes time. What a world we live in.
K
Kenneth Auchenberg 🛠 @auchenberg ·
Stripe built its own homegrown AI coding agent that spins up "minions" to go work on their massive monorepo, which is mostly written in Ruby (not Rails) with Sorbet typings, which is uncommon to most LLMs. Last week it was @tryramp that published details about their own internal agent. Very interesting trend from S-class engineering teams.
S Steve Kaliski @stevekaliski

At Stripe we have a tool called "minions" -- it lets us kick off async agents built right in our dev environment to one-shot bugs, features, and more e2e. I have team, project, and personal channels dedicated just to working with minions. I like to think of it as a new type of pair programming -- "pair prompting." Read more --> https://t.co/0A6vDEOEjL

B
Ben South @bnj ·
We grew up on Kid Pix and MS Paint, and wanted to instill Style Dropper with that same sense of magic (And yes, it really does look that cool in Variant) https://t.co/XHBTDHCEtB
B
Brett Winton @wintonARK ·
On Earth the datacenter buildout is subject to backwards cost scaling. The 100th GW deployed will almost certainly be more costly, complex, time intensive and subject to negotiation than the 1st. In space, the opposite. The 100th orbital GW could be 1/3rd as costly as the 1st.
L
Liad Yosef @liadyosef ·
WebMCP is here 🤯 This is bigger than it seems. AI agents can now interact *directly* with existing websites and webapps - not by using the "human" app interface. This naturally complements MCP Apps towards the future of agentic UI. Great work by the @googlechrome team 👏
M Maximiliano Firtman @firt

Chrome 146 includes an early preview of WebMCP, accessible via a flag, that lets AI agents query and execute services without browsing the web app like a user. Services can be declared through an imperative navigator.modelContext API or declaratively through a form. https://t.co/UaUplZ8Q28

S
Saoud Rizwan @sdrzn ·
head of anthropic’s safeguards research just quit and said “the world is in peril” and that he’s moving to the UK to write poetry and “become invisible”. other safety researchers and senior staff left over the last 2 weeks as well... probably nothing.
m mrinank @MrinankSharma

Today is my last day at Anthropic. I resigned. Here is the letter I shared with my colleagues, explaining my decision. https://t.co/Qe4QyAFmxL

E
Excalidraw @excalidraw ·
Thanks to good people at @AnthropicAI we now have an official MCP for Excalidraw! Take it for a spin on @claudeai (search for Excalidraw in Connectors, or use in Claude Code and elsewhere). More to come. ✌ https://t.co/Cbrw8nXqW4
D David Soria Parra @dsp_

We are moving quickly. Thanks to Anton and the folks at @excalidraw , this is now the official Excalidraw MCP server. From weekend project to official server in less than a week.

K
Kai Lentit (e/xcel) @KaiLentit ·
In 2026, AI models expire faster than session cache. https://t.co/tDOQ6UzISN
P
Peter Steinberger 🦞 @steipete ·
great explainer why I use go a lot these days.
A Armin Ronacher ⇌ @mitsuhiko

This weekend I was thinking about programming languages. Programming languages for agents. Will we see them? I believe people will (and should!) try to build some. https://t.co/4szFXPLTfK

p
prinz @deredleritt3r ·
700 people just lost their jobs at the law firm Baker McKenzie, based on "rethinking the way we work, including through the use of AI". No lawyers impacted; cuts were made to "IT, knowledge, admin, DEI, leadership & learning, secretarial, marketing, and design teams".
J JH @writeclimbrun

Baker McKenzie just laid off ~700 staff, just under 10%, because of Al. it's coming quick for our jobs.

J
Jeff Clune @jeffclune ·
Can AI agents design better memory mechanisms for themselves? Introducing Learning to Continually Learn via Meta-learning Memory Designs. A meta agent automatically designs memory mechanisms, including what info to store, how to retrieve it, and how to update it, enabling agentic systems to continually learn across diverse domains. Led by @yimingxiong_ with @shengranhu 🧵👇 1/
G
Gergely Orosz @GergelyOrosz ·
Once again, @Steve_Yegge talks truth to power. He also has a history of being right, quite a lot. Including calling it mid-2025 how writing code by hand will be over, and late 2025 how agent orchestration will be the next hot topic with AI coding. Full: https://t.co/fYR25YHscJ https://t.co/BZAdXbkrAF
a
almonk @almonk ·
We built a new SSH client for iOS. It’s fast, and simple and runs Ghostty under the hood. It’s turned my iPad into the ultimate vibe coding computer. Take your agents on the go, monitor your OpenClaws, manage your servers, run `top`. It’s available today on AppStore. Say hi to Echo🐬 https://t.co/nUgQfrAdcG
T
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞) @teortaxesTex ·
A phase change in the perception of coding agents. This looked like science fiction just… months ago. https://t.co/sKmmZ3AJQR
m mike64_t @mike64_t

I think with Codex 5.3, the need for off-the-shelf deep learning libraries will fade away. Reasoning models operate best at the boundary of exact verifiabilty, so ever venturing too far into "well this is kinda correct" is no longer the best strategy. Exact verification now scales better than soft verification. When starting my current project, I deliberately decided against using any DL library because I wanted to take ownership of some things that are hard when a graph or eager model is in the way. Dispatching operations to multiple streams with fine-grained barrier relations is really stroking against the grain in PyTorch, and you are never really sure "am I really allowed to do this". There was a time for OpenGL, but people eventually did want a VkCmdBarrier for good reason. Because I also wanted predictable dispatch pacing, using C++ was a natural choice. Previously this meant taking on the burden of writing a lot of boilerplate, the equivalent of "shit I can't do this in unity, now I gotta write my own engine" which never seemed a good idea on the surface. Now I can say it was among the best decisions I have made. New operations are a prompt away, Codex can introspect and trace into any part of the codebase automatically, single-stepping even into nccl if ever needed, and supporting a new backend is trivial. At no point would your debugging lead into an opaque compiled native library you do not have the source code for, it will simply go-to-declaration one more time. In the age of reasoning models, a single source tree break is fatal and can be the difference between finding or not finding a bug. There is no cost to saying "write a test for this" and you've protected yourself against regressions for this case forever onwards. You can just say "implement muon, here's the repo" and it will do so and loss in wandb will literally look the same compared to the python baseline. Codex is a good autonomous debugger, so program runtime really starts to become a bottleneck, not thinking time. Hence start-up time is important. There is no reason your training script should take minutes to launch, when it could have performed the first step in the time it takes a shitty terminal to repaint. If your iteration loop was slow before, in the age of coding agents it is now fatal. By not triggering a billion library lazy inits at unpredictable points in time because your ML framework decided to do so, your Nsight traces look as clean as higher level profilers would, just with more introspectability. You finally get to use NVTX the way Nvidia always intended for you to do. Another thing, kernels are just cuda elf binaries. There is no reason to deal with a flash attention package installation. This is all cpu-side. Tell codex to write packaging logic to compile it AOT, and document the kernel signature how arguments have to be prepared. In the C++ code load that kernel from a resource and then simply pass those arguments. This approach is modular. Want a cutlass, flash attention, triton or cute dsl backend and reserve the right to write a custom kernel later? No problem. Nobody wants to write backend kernel dispatch logic, but you don't have to anymore. Does C++ scare you? Maintain a minimal Python reference implementation in PyTorch with the intent of keeping behavior exactly the same, just without all the optimizations. Exact verifiability means you can resume that cpp checkpoint in your Python implementation and get near-exact loss overlap in wandb and vice-versa. No more spook, it's either in the spec, or its not. That is what verifiability means. While I think there is a large cost to move off of pre-existing infra, eventually taking ownership of more and more pieces of the codebase will become more and more desirable with this change in dynamic.

I
Indra @IndraVahan ·
i think most people will scroll past this post without realizing the gravity of this launch. you see, a typical dev team at a mid-large corp today has devs, senior engineers, scrum masters, BAs, PMs, QAs & more. this isn't really because companies love bureaucracy, but because translating “what we want” into “what gets built” is painfully hard. most of the software engineering time is burned in meetings, docs, tickets, clarifications, re-clarifications. - first, agents started writing code. codex agents. cursor cloud agents and so on - then coderabbit handled reviews. catching mistakes, enforcing standards and making sure the code pushed by these agents (or humans) matched a specific criteria issue planner is a step beyond that. this plugs right into your jira, linear or github actions, it’s moving even further upstream into intent, scope, and context. ai is no longer helping you just write code anymore but it’s starting at planning, scope, intent and context. trying to answer “what are we even trying to build?” this is a huge deal. software engineering is changing in real time and right in front of us. and where this leads is probably, certainly, irreversible. but blazingly fast.
C CodeRabbit @coderabbitai

Introducing CodeRabbit Issue Planner! ✨ AI agents made coding fast but planning messy. Turn planning into a shared artifact in your issue tracker, grounded in related issues and decisions. Review prompts as a team, then hand them off to an agent! https://t.co/4xTjG88JOJ

J
Jared Sleeper @JaredSleeper ·
Headcounts for assorted companies: Salesforce: 87,415 ServiceNow: 32,378 Workday: 23,234 Zoom: 12,743 Docusign: 8,403 OpenAI: 7,112 Okta: 7,064 UiPath: 5,096 Sprinklr: 4,368 Anthropic: 4,178 Yes, UiPath still has more employees than Anthropic. Infer from that what you will.
a
am.will @LLMJunky ·
If you're a fan of Claude Code, you really need to see this. Steven is doing amazing work, and you're not following him? If Anthropic had built their Teams mode like this, you wouldn't shut up about it. 👇
S Steven Pu @pusongqi

You can even assign different agents under the same thread 🤯 Just like slack channels, except it's occupied with agents. https://t.co/0R63hk2Pwv

@joemccann ·
This is a big fucking deal. If browsers are no longer designed exclusively for humans, but also agents, it will completely change web development.
M Maximiliano Firtman @firt

Chrome 146 includes an early preview of WebMCP, accessible via a flag, that lets AI agents query and execute services without browsing the web app like a user. Services can be declared through an imperative navigator.modelContext API or declaratively through a form. https://t.co/UaUplZ8Q28

U
Unemployed Capital Allocator @atelicinvest ·
There is a case to be made that within each sub/category, we start to see massive performance differentials between orgs that figure out how to do Ai-integrated development properly and the orgs that don't. Like the product velocity, quality, polish and service response for the top 10% of org will be unbelievably better vs the bottom 25%. This will for sure lead to market share shifts - and probably in a bigger way than we imagine.
B
Ben South @bnj ·
We made a tool that lets you absorb the vibe of anything you point it at and apply it to your designs It's absurd and it just works Style Dropper, now available in @variantui https://t.co/B3eXDntYtw