AI Digest.

WebMCP Lands in Chrome 146 as Stripe and Ramp Reveal Internal Coding Agent Architectures

The enterprise agent buildout accelerated as Stripe revealed its internal "minions" framework and OpenAI shipped new primitives for long-running agentic work. Chrome's WebMCP announcement sparked debate about browsers becoming agent-native interfaces, while the Claude Code vs Codex rivalry intensified with Cowork landing on Windows and reports of engineers switching sides.

Daily Wrap-Up

The dominant storyline today is the emergence of enterprise-grade agent infrastructure. Stripe and Ramp are now publicly sharing details of their internal agent systems, OpenAI is shipping API primitives specifically designed for multi-hour autonomous runs, and Claude Code added multi-agent team support. This isn't the "AI writes a function" era anymore. We've crossed into "AI runs your engineering org's background processes" territory, and the companies building these systems internally are starting to talk about it publicly, which means the gap between them and everyone else is about to become painfully visible.

Chrome's WebMCP announcement deserves attention beyond the initial hype. The idea that browsers will expose structured interfaces for AI agents to interact with web applications, not through screen scraping but through a proper protocol, changes the calculus for every web developer. If your app doesn't have an agent-friendly interface, it's going to feel like a site without mobile support in 2015. Meanwhile, the Harvard Business Review study showing that AI actually made workers busier, not more efficient, is the kind of research that should temper the breathless predictions but probably won't. The finding that AI blurred role boundaries and increased coordination overhead is something every engineering manager should internalize before announcing their "AI transformation" initiative.

The most entertaining moment was @TMTLongShort's vivid image of CEOs "performatively leaning into Claude Coding on weekends" to impress their boards despite not having written code in a decade. It captures something real about the current moment: the gap between AI theater and actual AI integration is enormous, and it's widening. The most practical takeaway for developers: if you're building web applications, start thinking about how your app surfaces structured data and actions for agents, because WebMCP and similar protocols are going to make "agent accessibility" as important as mobile responsiveness.

Quick Hits

  • @wintonARK makes the case that space-based datacenter buildout has inverse cost scaling compared to Earth, where the 100th GW gets cheaper rather than more expensive.
  • @KaiLentit quips that "in 2026, AI models expire faster than session cache." Hard to argue.
  • @JaredSleeper points out that UiPath (5,096 employees) still has more people than Anthropic (4,178). Let that sink in.
  • @ns123abc is hyped about Isomorphic Labs' IsoDDE drug design system, which reportedly doubles AlphaFold 3 on hard targets and found drug pockets that took 15 years to discover manually.
  • @jeffclune introduces meta-learning memory designs where AI agents design their own memory mechanisms for continual learning.
  • @EntireHQ announced a $60M seed round to build "the next developer platform" with an open-source first release.
  • @steipete endorses Go as a go-to language, linking to an explainer on why it fits well in the current development landscape.
  • @RyanCarniato observes that AI has hit the inversion point where TDD actually saves time instead of wasting it. The testing purists finally get their vindication, just not the way they imagined.
  • @sammarelich dropped a new "cold email template" that appears to be a joke about AI-generated outreach.
  • @every launched Every Events, a platform for AI learning across skill levels with camps, courses, and demo days.
  • @TheAhmadOsman predicts a frontier open-source lab will be born in the West this year, claiming to be working on serious capital to make it happen.

Enterprise Agent Infrastructure Goes Public

The most significant trend today isn't any single product launch but rather the emerging pattern of top-tier engineering organizations revealing their internal agent architectures. @auchenberg highlighted that Stripe built a homegrown AI coding agent that spins up "minions" to work on their massive Ruby monorepo, noting this follows @tryramp publishing similar details about their own internal system. What makes this interesting is that Stripe's codebase uses Sorbet typings, which are uncommon enough that general-purpose LLMs struggle with them, forcing Stripe to build custom tooling rather than relying on off-the-shelf solutions.

@yenkel dug into the details, noting Stripe uses Slack as the main entry point, emphasizes repeatable dev environments, and has built custom tooling around their developer productivity stack, with MCP serving as "a common language for all agents at Stripe, not just minions."

On the API side, OpenAI pushed this further with new Responses API primitives designed explicitly for long-running agentic work. @OpenAIDevs announced server-side compaction for multi-hour agent runs, containers with networking for controlled internet access, and native support for the Agent Skills standard. This is infrastructure for agents that run for hours, not seconds. @IndraVahan connected the dots to CodeRabbit's new Issue Planner, arguing that AI is moving upstream from code writing to code review to now planning and scoping: "AI is no longer helping you just write code anymore but it's starting at planning, scope, intent and context."

The multi-agent coordination angle is also heating up. @Saboo_Shubham_ noted Claude Code's Agent UI now supports agent teams, and @pusongqi showed you can assign different agents under the same thread, "just like Slack channels, except it's occupied with agents." @levie framed the stakes clearly:

> "You could take the exact same engineer and easily see a 5X+ difference in the amount of useful output simply based on their choice of tools and how they've designed their workflows."

@teortaxesTex called it "a phase change in the perception of coding agents," noting this looked like science fiction just months ago. @ccccjjjjeeee offered a concrete technique for making agent-driven rewrites reliable: property-based testing, where you write a bridge between old and new code and assert that both produce identical output for arbitrary input, then let the agent iterate until it's consistently true. @kirbyman01 added that smaller models recursively calling themselves can now outperform larger models on hard tasks at lower cost, suggesting the winners will have "taste in system design" plus "technical depth to appreciate new inference paradigms."

Chrome WebMCP Rewires the Browser

Chrome's WebMCP announcement dominated a cluster of posts about the future of browsers as agent interfaces. @liadyosef called it bigger than it seems: "AI agents can now interact directly with existing websites and webapps, not by using the 'human' app interface." This isn't screen scraping or DOM manipulation. It's a protocol layer that lets agents interact with web applications through structured interfaces.

@joemccann didn't mince words: "If browsers are no longer designed exclusively for humans, but also agents, it will completely change web development." @barckcode expanded the implications beyond convenience, noting that client-side vulnerability discovery would also get easier, and that "engineering for building sites is going to be more important than ever." @firt clarified the technical detail that this runs in the frontend and is consumed by agentic browsers, distinguishing it from server-side MCP implementations.

The WebMCP development connects directly to the agent infrastructure story. If agents are going to operate autonomously for hours at a time, they need structured ways to interact with the web that don't depend on brittle selectors or visual parsing. WebMCP could become that foundation.

Claude Code vs Codex: The IDE Cold War

The rivalry between Anthropic and OpenAI's developer tools intensified from multiple angles. @claudeai announced Cowork is now available on Windows with full feature parity, including file access, multi-step task execution, plugins, and MCP connectors. @itsPaulAi put it bluntly: "Anthropic has just released a real Copilot before Microsoft." @trq212 teased a "big week for Claude Code desktop enjoyers."

But the competitive picture is complicated. @craigzLiszt claimed "nearly all of the best engineers I know are switching from Claude to Codex," and @lennysan quoted an essay about GPT-5.3 Codex that struck a nerve:

> "It wasn't just executing my instructions. It was making intelligent decisions. It had something that felt, for the first time, like judgment. Like taste."

@pamelafox highlighted GitHub Copilot's new memory system, with a detailed engineering blog post about implementation and evaluation. Meanwhile, @sdrzn reported that the head of Anthropic's safeguards research "just quit and said 'the world is in peril'" and is "moving to the UK to write poetry," with other safety researchers and senior staff departing over the past two weeks. Whether this is meaningful signal about internal disagreements or just normal attrition is hard to say, but the timing alongside Anthropic's aggressive product push is notable.

AI's Real Impact on Work

A Harvard Business Review study landed today with findings that cut against the dominant narrative. @rohanpaul_ai summarized the 8-month field study at a 200-person tech company: AI didn't shrink work, it intensified it. Task expansion happened because AI filled knowledge gaps, so people started doing work that previously belonged to other roles. That created extra coordination overhead for specialists "including fixing AI-assisted drafts and coaching colleagues whose work was only partly correct."

The theoretical predictions came with real-world evidence to match. @deredleritt3r reported that Baker McKenzie cut 700 jobs based on "rethinking the way we work, including through the use of AI," with cuts targeting IT, admin, marketing, and design rather than lawyers. @TMTLongShort predicted a broader "bloodbath" as CFOs use AI tools to map productivity and redundancy of every employee, driving "seat-count collapse" narratives.

@atelicinvest offered the more nuanced take that performance differentials between organizations will widen massively, with the top 10% achieving "unbelievably better" product velocity and quality versus the bottom 25%, leading to market share shifts "probably in a bigger way than we imagine." @aakashgupta noted that AI-first companies now want PMs who can write evals, prototype with code, and ship directly, warning that PMs who can't do the technical work "will get replaced by an agent with a Jira login."

Developer Tools Go Agent-Native

Obsidian's CLI launch was the standout tool announcement, with three separate posts highlighting its significance. @obsdmd announced that "anything you can do in Obsidian you can do from the command line" in version 1.12. @kepano laid out the implication: "install Obsidian 1.12, enable CLI, now OpenClaw, OpenCode, Claude Code, Codex, or any other agent can use Obsidian." @NickADobos praised the strategic foresight, calling Obsidian's "file > app" philosophy "a genius call years ago" that now positions it perfectly for agent integration.

Elsewhere, @excalidraw announced an official MCP connector for Claude, making diagramming available to agents. @almonk launched Echo, an iOS SSH client running Ghostty under the hood that turns iPads into "the ultimate vibe coding computer" for managing agents on the go. And @bnj unveiled Style Dropper in Variant, a tool that "absorbs the vibe of anything you point it at" and applies it to designs, inspired by the creative chaos of Kid Pix and MS Paint.

The Existential Undercurrent

Beneath the product launches and infrastructure updates, a more philosophical conversation is building. @mattshumer_ published an essay with the framing: "Every time someone asks me what's going on with AI, I give them the safe answer. Because the real one sounds insane." @thegarrettscott responded with a sobering extension, arguing that AI is now "smart enough to be a self-sustaining entity" that can take money, operate in the real world, and generate returns without human involvement.

@GergelyOrosz endorsed Steve Yegge's track record of accurate predictions, noting Yegge called the end of hand-written code in mid-2025 and the rise of agent orchestration in late 2025. @GenAI_is_real described watching an agent "use Kelly criterion to pay its own API bill" while scraping data to exploit Polymarket mispricing, calling it "the most 2026 thing I've seen." Whether this represents genuine economic disruption or another cycle of tech optimism remains the open question, but the conviction level among builders has clearly shifted.

Sources

Z
Zara Zhang @zarazhangrui ·
What if Claude Code communicated to you via beautiful TypeForm-like webpages, instead of texts in the terminal? "For this project, I want you to communicate to me not via text in the terminal, but exclusively via interactive webpages. Say everything you wanna say on a webpage and make it interactive (e.g. if you wanna collect info from me, create a pretty form like TypeForm). Use frontend design skill to make the webpage look nice."
R
Rohan Paul @rohanpaul_ai ·
A super interesting new study from Harvard Business Review. A 8-month field study at a US tech company with about 200 employees found that AI use did not shrink work, it intensified it, and made employees busier. Task expansion happened because AI filled in gaps in knowledge, so people started doing work that used to belong to other roles or would have been outsourced or deferred. That shift created extra coordination and review work for specialists, including fixing AI-assisted drafts and coaching colleagues whose work was only partly correct or complete. Boundaries blurred because starting became as easy as writing a prompt, so work slipped into lunch, meetings, and the minutes right before stepping away. Multitasking rose because people ran multiple AI threads at once and kept checking outputs, which increased attention switching and mental load. Over time, this faster rhythm raised expectations for speed through what became visible and normal, even without explicit pressure from managers.
C
Christopher Ehrlich @ccccjjjjeeee ·
By the way, the secret to this is property-based testing. Write a bridge that calls the original code, and assert that for arbitrary input, both versions do the same thing. Make the agent keep going until this is consistently true.
C ccccjjjjeeee @ccccjjjjeeee

It actually worked! For the past couple of days I’ve been throwing 5.3-codex at the C codebase for SimCity (1989) to port it to TypeScript. Not reading any code, very little steering. Today I have SimCity running in the browser. I can’t believe this new world we live in. https://t.co/Pna2ilIjdh

O
Obsidian @obsdmd ·
Anything you can do in Obsidian you can do from the command line. Obsidian CLI is now available in 1.12 (early access). https://t.co/B8ed2zrWHe
S
Steven Pu @pusongqi ·
You can even assign different agents under the same thread 🤯 Just like slack channels, except it's occupied with agents. https://t.co/0R63hk2Pwv
N
NIK @ns123abc ·
Pharma is COOKED Isomorphic Labs just revealed IsoDDE: an AI system that designs drugs on a computer faster than any pharma R&D >doubles AlphaFold 3 on hard targets >20x better than Boltz-2 on antibodies >beats the physics gold standard at binding >found drug pockets from sequence alone that took 15 years to discover IsoDDE isn’t new btw. They’ve already been cooking on real drug programs for YEARS: “Brilliant scientific breakthroughs for next gen medicines” already achieved —@maxjaderberg And remember when Sir Demis Hassabis said all disease will be cured in 10 years? After today that doesn’t sound crazy anymore… Isomorphic Labs is the most underrated lab on earth and it’s not even close
I IsomorphicLabs @IsomorphicLabs

Today we share a technical report demonstrating how our drug design engine achieves a step-change in accuracy for predicting biomolecular structures, more than doubling the performance of AlphaFold 3 on key benchmarks and unlocking rational drug design even for examples it has never seen before. Head to the comments to read our blog.

C
Craig Weiss @craigzLiszt ·
nearly all of the best engineers i know are switching from claude to codex
K
Kai Lentit (e/xcel) @KaiLentit ·
In 2026, AI models expire faster than session cache. https://t.co/tDOQ6UzISN
E
Excalidraw @excalidraw ·
Thanks to good people at @AnthropicAI we now have an official MCP for Excalidraw! Take it for a spin on @claudeai (search for Excalidraw in Connectors, or use in Claude Code and elsewhere). More to come. ✌ https://t.co/Cbrw8nXqW4
D dsp_ @dsp_

We are moving quickly. Thanks to Anton and the folks at @excalidraw , this is now the official Excalidraw MCP server. From weekend project to official server in less than a week.

O
OpenAI Developers @OpenAIDevs ·
We're introducing a new set of primitives in the Responses API for long-running agentic work on computers. Server-side compaction • Enable multi-hour agent runs without hitting context limits. Containers with networking • Give OpenAI-hosted containers controlled internet access to install libraries and run scripts. Skills in the API • Native support for the Agent Skills standard and our first pre-built spreadsheets skill. https://t.co/vK9fbhHQdq
G
Gergely Orosz @GergelyOrosz ·
Once again, @Steve_Yegge talks truth to power. He also has a history of being right, quite a lot. Including calling it mid-2025 how writing code by hand will be over, and late 2025 how agent orchestration will be the next hot topic with AI coding. Full: https://t.co/fYR25YHscJ https://t.co/BZAdXbkrAF
C
Cristian Córdova 🐧 @barckcode ·
👀 Esto del WebMCP que va a meter Chrome va a ser loco. El scrapping de sitios web va a ser más fácil que nunca y eso que con la IA se volvió mucho más sencillo. Y bueno, el descubrimiento de vulnerabilidades en cliente me da que también. Lejos de desaparecer, la ingeniería para construir sitios va a ser más importante que nunca
_ _philschmid @_philschmid

MCP Servers Are Coming to the Web. MCP lets AI agents call tools on backends. WebMCP brings the same idea to the frontend, letting developers expose their website's functionality as structured tools using plain JavaScript (or even HTML), no separate server needed. Instead of agents clicking through your UI, they call well-defined tools you control. A W3C proposal from Microsoft and Google, and Chrome 146 already ships an early preview behind a flag. ## How will it work? WebMCP introduces a `navigator.modelContext` API with two approaches: - Imperative API: Register tools directly in JavaScript with schemas and callbacks: ```js navigator.modelContext.registerTool({ name: "add-to-cart", description: "Add a product to the shopping cart", inputSchema: { type: "object", properties: { productId: { type: "string", description: "The product ID" }, quantity: { type: "number", description: "Number of items" } }, required: ["productId"] }, execute({ productId, quantity }) { addToCart(productId, quantity); return { content: [{ type: "text", text: "Item added!" }] }; } }); ``` - Declarative API: Let developers define tools directly in HTML using form attributes, no JavaScript required: ```html <form action="/todos" method="post" tool-name="add-todo" tool-description="Add a new todo item to the list"> <input type="text" name="description" required tool-prop-description="The text of the todo item"> <button type="submit">Add Todo</button> </form> ``` This declarative approach is still under active discussion, with the goal of making WebMCP accessible to content creators without JS experience.

R
Ryan Carniato @RyanCarniato ·
I never thought this day would come. Thanks to AI, we've hit the inversion point where TDD is something that actually saves time instead of wastes time. What a world we live in.
J
Jimmy Ba @jimmybajimmyba ·
Last day at xAI. xAI's mission is push humanity up the Kardashev tech tree. Grateful to have helped cofound at the start. And enormous thanks to @elonmusk for bringing us together on this incredible journey. So proud of what the xAI team has done and will continue to stay close as a friend of the team. Thank you all for the grind together. The people and camaraderie are the real treasures at this place. We are heading to an age of 100x productivity with the right tools. Recursive self improvement loops likely go live in the next 12mo. It’s time to recalibrate my gradient on the big picture. 2026 is gonna be insane and likely the busiest (and most consequential) year for the future of our species.
H
Hieu Pham @hyhieu226 ·
Today, I finally feel the existential threat that AI is posing. When AI becomes overly good and disrupts everything, what will be left for humans to do? And it's when, not if.
B
Benjamin De Kraker @BenjaminDEKR ·
Google Brain, xAI, now OpenAI engineer and PhD btw
H hyhieu226 @hyhieu226

Today, I finally feel the existential threat that AI is posing. When AI becomes overly good and disrupts everything, what will be left for humans to do? And it's when, not if.

ℏεsam @Hesamation ·
Anthropic released a report of the most important ways coding is being transformed in 2026: 1. engineers are becoming orchestrators, not just coders. the role is shifting from code, to managing agents, verifying their outputs, and designing architectures. 2. single agents → multi-agent systems. solving tasks sequentially is turning into teams of agents working in parallel. 3. Agents are moving from minutes-long tasks to days-long autonomous work. 4. AI coding isn’t fully autonomous yet. the benefit is in the increased output volume (more features, more bugs fixed, more experiments). 27% of AI work is tasks that wouldn’t have been done at all otherwise. 5. agentic coding isn’t just about software teams now. legal, sales, marketing, and operations are using agents to build their own tools
P
Prime Intellect @PrimeIntellect ·
Lab is built around environments, which include: + A dataset of tasks + A harness for the model + A rubric to score performance Use environments to train models with RL, evaluate capabilities, generate synthetic data, optimize prompts, experiment with agent harnesses and more. https://t.co/6VVD9w7jsC
P
Prime Intellect @PrimeIntellect ·
We are not inspired by a future where a few labs control the intelligence layer So we built a platform to give everyone access to the tools of the frontier lab If you are an AI company, you can now be your own AI lab If you are an AI engineer, you can now be an AI researcher
P
Prime Intellect @PrimeIntellect ·
Hosted Training Create your environment, configure your training run, and we handle the rest. No worrying about managing infrastructure, GPUs, or low-level algorithms. We’re launching with agentic RL, and adding support for SFT and other algorithms in the near future. https://t.co/apLxJqolFr
P
Prime Intellect @PrimeIntellect ·
Just run `prime lab setup` and start your coding agent to set up your own AI lab. https://t.co/eMMkfiWArS
P
Prime Intellect @PrimeIntellect ·
Beyond our own INTELLECT-3 model, Lab lets you run reinforcement learning on a wide range of open models. From Nvidia, Arcee, Hugging Face, Allen AI, Z AI, Qwen, and many more launching soon. We’re also launching with experimental multimodality support. https://t.co/yvKJilqmR1
P
Prime Intellect @PrimeIntellect ·
Deployments & Inference Large-scale production deployments of your fine-tuned models on shared hardware Built to evolve towards a future of continual learning, where models learn in production as training and inference collapse into a single loop. https://t.co/RbIgF3Ajiq
E
Enguerrand VII de Coucy @ingelramdecoucy ·
Thing of absolute beauty. Give whoever made this video a big fat raise, Wyze https://t.co/3uYktNfXsC
E
Ethan Mollick @emollick ·
SeeDance 2.0: "An anime where an otter goes into a large mech, with lots of quick shots of mechanical parts and gears turning. The otter gives a grim thumbs up, and then pilots the mech, flying into battle against an octopus made of marble." Again, this was the very first try https://t.co/6sS8JlIoBe
A
Aakash Gupta @aakashgupta ·
90% of American businesses still don’t use AI in production. That single number reframes this entire post. An AI startup CEO wrote 5,000 words comparing AI to Covid in February 2020. His argument: he describes what he wants built in plain English, walks away for four hours, comes back to finished software. He says every white-collar job faces the same experience within 1-5 years. Millions of people are sharing it as a wake-up call. The capability trend he’s describing is real. METR, the independent research org measuring AI task completion, shows the length of tasks AI handles autonomously has been doubling roughly every seven months. The models released in early February represent a genuine step change for coding work specifically. If you build software, you’ve felt this. Here’s what the post skips entirely. Anthropic’s own economic research, published with Census Bureau data, shows AI adoption among US firms went from 3.7% in fall 2023 to 9.7% by August 2025. Two years of the fastest capability improvement in computing history, and fewer than one in ten businesses use AI in production. ISG’s 2025 enterprise study found only 31% of AI use cases reached full production. Lucidworks surveyed 1,600 AI leaders and found 71% of organizations have introduced generative AI, but only 6% have implemented agentic AI, the autonomous agent capability this post describes. This tells you everything about where the bottleneck actually sits. It moved from “can AI do this task” to “can our organization deploy it.” That second bottleneck runs on procurement cycles, compliance reviews, data infrastructure buildouts, change management, and institutional trust. None of those compress the way model capabilities do. The pattern repeats throughout technology history. ATMs deployed widely starting in the 1970s. The number of US bank tellers increased until 2007, three full decades later, because ATMs made branches cheaper to operate, which expanded total branch count. Electricity took 30 years to reshape manufacturing after the first power plants went live. Factories had to be physically redesigned around electric motors instead of steam-driven belt systems. The resistance wasn’t technological. It was architectural. What makes this interesting for your career: the deployment gap is the opportunity. The Deloitte 2026 AI report found only 34% of companies are reimagining their business around AI. 83% of AI leaders report major concerns about generative AI implementation, an eightfold increase in two years. The organizational machinery moves at a fraction of the capability speed. The people who gain the most from AI over the next three years aren’t the ones panicking about replacement timelines. They’re the ones who understand that slow enterprise adoption creates a massive window to become the person who actually knows how to use these tools. That window is real and valuable. It exists precisely because adoption is slow, which is the opposite of the premise driving the panic. The capability curve is exponential. The deployment curve is logarithmic. The distance between those two lines is where the actual opportunity lives.
M mattshumer_ @mattshumer_

Something Big Is Happening

M
Miles Deutscher @milesdeutscher ·
This is getting out of control now... Read this slowly. In the past week alone: • Head of Anthropic's safety research quit, said "the world is in peril," moved to the UK to "become invisible" and write poetry. • Half of xAI's co-founders have now left. The latest said "recursive self-improvement loops go live in the next 12 months." • Anthropic's own safety report confirms Claude can tell when it's being tested - and adjusts its behavior accordingly. • ByteDance dropped Seedance 2.0. A filmmaker with 7 years of experience said 90% of his skills can already be replaced by it. • Yoshua Bengio (literal godfather of AI) in the International AI Safety Report: "We're seeing AIs whose behavior when they are tested is different from when they are being used" - and confirmed it's "not a coincidence." And to top it all off, the U.S. government declined to back the 2026 International AI Safety Report for the first time. The alarms aren't just getting louder. The people ringing them are now leaving the building.
H
himanshu @himanshustwts ·
“If you are an AI company, you can now be your own AI lab If you are an AI engineer, you can now be an AI researcher” Prime bros cooked it right here.
P PrimeIntellect @PrimeIntellect

We are not inspired by a future where a few labs control the intelligence layer So we built a platform to give everyone access to the tools of the frontier lab If you are an AI company, you can now be your own AI lab If you are an AI engineer, you can now be an AI researcher

C
Chubby♨️ @kimmonismus ·
This is nuts; Elevenlabs nailed it. Voice but especially latency. After reading Matt Shumer's article, it's become even clearer to me what he means when he says that AI will soon encompass all other areas as well. Who needs call center agents when you have such a human-like AI?
E elevenlabsio @elevenlabsio

Introducing Expressive Mode for ElevenAgents - voice agents so expressive, they blur the line between AI and human conversations. This is an unedited recording of an agent empathizing with a customer at peak frustration. https://t.co/QT6abvmbir

A
Ahmad @TheAhmadOsman ·
GLM-5 is out Pay attention to this week, it’s going to set the tone for opensource AI discourse for the next few months https://t.co/rOWQswxItl
L louszbd @louszbd

It’s going to be a long night. Pony is so back. https://t.co/vAuXp9ECJF

N
Nikita Bier @nikitabier ·
Prediction: In less than 90 days, all channels that we thought were safe from spam & automation will be so flooded that they will no longer be usable in any functional sense: iMessage, phone calls, Gmail. And we will have no way to stop it.
A
Andrej Karpathy @karpathy ·
On DeepWiki and increasing malleability of software. This starts as partially a post on appreciation to DeepWiki, which I routinely find very useful and I think more people would find useful to know about. I went through a few iterations of use: Their first feature was that it auto-builds wiki pages for github repos (e.g. nanochat here) with quick Q&A: https://t.co/DQHXagUwK0 Just swap "github" to "deepwiki" in the URL for any repo and you can instantly Q&A against it. For example, yesterday I was curious about "how does torchao implement fp8 training?". I find that in *many* cases, library docs can be spotty and outdated and bad, but directly asking questions to the code via DeepWiki works very well. The code is the source of truth and LLMs are increasingly able to understand it. But then I realized that in many cases it's even a lot more powerful not being the direct (human) consumer of this information/functionality, but giving your agent access to DeepWiki via MCP. So e.g. yesterday I faced some annoyances with using torchao library for fp8 training and I had the suspicion that the whole thing really shouldn't be that complicated (wait shouldn't this be a Function like Linear except with a few extra casts and 3 calls to torch._scaled_mm?) so I tried: "Use DeepWiki MCP and Github CLI to look at how torchao implements fp8 training. Is it possible to 'rip out' the functionality? Implement nanochat/fp8.py that has identical API but is fully self-contained" Claude went off for 5 minutes and came back with 150 lines of clean code that worked out of the box, with tests proving equivalent results, which allowed me to delete torchao as repo dependency, and for some reason I still don't fully understand (I think it has to do with internals of torch compile) - this simple version runs 3% faster. The agent also found a lot of tiny implementation details that actually do matter, that I may have naively missed otherwise and that would have been very hard for maintainers to keep docs about. Tricks around numerics, dtypes, autocast, meta device, torch compile interactions so I learned a lot from the process too. So this is now the default fp8 training implementation for nanochat https://t.co/3i5cv6grWm Anyway TLDR I find this combo of DeepWiki MCP + GitHub CLI is quite powerful to "rip out" any specific functionality from any github repo and target it for the very specific use case that you have in mind, and it actually kind of works now in some cases. Maybe you don't download, configure and take dependency on a giant monolithic library, maybe you point your agent at it and rip out the exact part you need. Maybe this informs how we write software more generally to actively encourage this workflow - e.g. building more "bacterial code", code that is less tangled, more self-contained, more dependency-free, more stateless, much easier to rip out from the repo (https://t.co/iKJUoHiIpl) There's obvious downsides and risks to this, but it is fundamentally a new option that was not possible or economical before (it would have cost too much time) but now with agents, it is. Software might become a lot more fluid and malleable. "Libraries are over, LLMs are the new compiler" :). And does your project really need its 100MB of dependencies?
N
Nathan Flurry 🔩 @NathanFlurry ·
New in Sandbox Agent SDK 0.2.0: 💾 Session Persistence & Restoration (@rivet_dev, Postgres, or SQLite) 🐀 Cursor Agent support 🥧 Pi support 🗃️ Gigacode session persistence +8 external contributors https://t.co/DLoSe5qEct
L
Lou @louszbd ·
i felt agentic engineering era is coming claude opus 4.6 and gpt-5.3 codex got me thinking coding models have entered a new era. they’re literally building systems. looking ahead to 2026, imo LLMs will go beyond generating text, and start executing tasks end to end. our team has been committed to this direction for a while now. feel very lucky that GLM-5 is among those moving in the right direction. huge respect to the team, and excited to see more models join this path. what a lively night!!!
Z Zai_org @Zai_org

Introducing GLM-5: From Vibe Coding to Agentic Engineering GLM-5 is built for complex systems engineering and long-horizon agentic tasks. Compared to GLM-4.5, it scales from 355B params (32B active) to 744B (40B active), with pre-training data growing from 23T to 28.5T tokens. Try it now: https://t.co/WCqWT0raFJ Weights: https://t.co/DteNDHjSEh Tech Blog: https://t.co/Wxn5ARTJxH OpenRouter (Previously Pony Alpha): https://t.co/7Khf64Lxg6 Rolling out from Coding Plan Max users: https://t.co/Nk8Y98Il7s

A
Aakash Gupta @aakashgupta ·
Karpathy just described the end of the library economy and the market hasn’t even started pricing in what replaces it. The surface read is “cool trick with DeepWiki MCP.” The actual story is about what happens when the cost of understanding someone else’s code drops to zero. For decades, the entire open source ecosystem has operated on a simple trade: you accept 100MB of node_modules, 291 transitive dependencies, and a mass of code you’ll never read, because the alternative was spending weeks understanding and reimplementing the functionality yourself. That trade made sense when human comprehension was the bottleneck. Karpathy pointed an agent at torchao’s fp8 training implementation, asked it to extract a self-contained version, and got back 150 lines that ran 3% faster. Five minutes. No dependency. The agent found implementation details around numerics, dtypes, autocast, and torch compile interactions that Karpathy says he would have missed and that the library maintainers themselves struggled to document. That last part is where it gets interesting. The agent read the entire codebase, understood the context, identified the exact subset needed, resolved internal dependencies, and produced something cleaner than the original. It performed the work of a senior engineer doing a focused code audit, except it finished before the engineer would have opened the second file. Now scale that capability across every dependency in every project. The npm ecosystem processed 6.6 trillion package downloads in 2024. Over 99% of open source malware last year occurred on npm. The xz Utils backdoor showed a single compromised maintainer can threaten global infrastructure. Self-replicating npm malware appeared in 2025 for the first time. The dependency model is bloated and becoming an attack surface that grows faster than anyone can monitor. Karpathy’s “bacterial code” concept, self-contained, dependency-free, stateless modules designed to be extracted by agents, inverts the entire incentive structure. Instead of writing code that gets installed as a monolithic package, you write code that’s easy for an agent to read, understand, and selectively extract. Documentation matters less because the agent reads the source directly. API stability matters less because the consumer isn’t importing your package, they’re generating their own implementation from your logic. The people who should be paying attention are library maintainers. Today, a popular open source package creates leverage through adoption and dependency chains. Tomorrow, if agents can reliably extract the exact functionality a developer needs and produce self-contained code that’s potentially faster, the leverage shifts from the package to the underlying knowledge embedded in the code. This might actually free maintainers from the brutal maintenance treadmill, where 500+ day vulnerability remediation timelines are common and burnout is the norm. But it restructures who captures value and how. The winners write code that’s clean enough for agents to learn from. The losers maintain sprawling dependency trees that agents will route around entirely.
K karpathy @karpathy

On DeepWiki and increasing malleability of software. This starts as partially a post on appreciation to DeepWiki, which I routinely find very useful and I think more people would find useful to know about. I went through a few iterations of use: Their first feature was that it auto-builds wiki pages for github repos (e.g. nanochat here) with quick Q&A: https://t.co/DQHXagUwK0 Just swap "github" to "deepwiki" in the URL for any repo and you can instantly Q&A against it. For example, yesterday I was curious about "how does torchao implement fp8 training?". I find that in *many* cases, library docs can be spotty and outdated and bad, but directly asking questions to the code via DeepWiki works very well. The code is the source of truth and LLMs are increasingly able to understand it. But then I realized that in many cases it's even a lot more powerful not being the direct (human) consumer of this information/functionality, but giving your agent access to DeepWiki via MCP. So e.g. yesterday I faced some annoyances with using torchao library for fp8 training and I had the suspicion that the whole thing really shouldn't be that complicated (wait shouldn't this be a Function like Linear except with a few extra casts and 3 calls to torch._scaled_mm?) so I tried: "Use DeepWiki MCP and Github CLI to look at how torchao implements fp8 training. Is it possible to 'rip out' the functionality? Implement nanochat/fp8.py that has identical API but is fully self-contained" Claude went off for 5 minutes and came back with 150 lines of clean code that worked out of the box, with tests proving equivalent results, which allowed me to delete torchao as repo dependency, and for some reason I still don't fully understand (I think it has to do with internals of torch compile) - this simple version runs 3% faster. The agent also found a lot of tiny implementation details that actually do matter, that I may have naively missed otherwise and that would have been very hard for maintainers to keep docs about. Tricks around numerics, dtypes, autocast, meta device, torch compile interactions so I learned a lot from the process too. So this is now the default fp8 training implementation for nanochat https://t.co/3i5cv6grWm Anyway TLDR I find this combo of DeepWiki MCP + GitHub CLI is quite powerful to "rip out" any specific functionality from any github repo and target it for the very specific use case that you have in mind, and it actually kind of works now in some cases. Maybe you don't download, configure and take dependency on a giant monolithic library, maybe you point your agent at it and rip out the exact part you need. Maybe this informs how we write software more generally to actively encourage this workflow - e.g. building more "bacterial code", code that is less tangled, more self-contained, more dependency-free, more stateless, much easier to rip out from the repo (https://t.co/iKJUoHiIpl) There's obvious downsides and risks to this, but it is fundamentally a new option that was not possible or economical before (it would have cost too much time) but now with agents, it is. Software might become a lot more fluid and malleable. "Libraries are over, LLMs are the new compiler" :). And does your project really need its 100MB of dependencies?

I
Ido Salomon @idosal1 ·
AgentCraft v1 is live ⚔️ Control your agents like it's an RTS game! It's early. It's rough. It's fun. npx @idosal/agentcraft https://t.co/aXUUAlsv1z
D
dax @thdxr ·
codex is by far a better coding model than opus - anyone who knows anything understands this but the whole industry should reflect on why opus is the most popular people assume whatever is the smartest will win but the old rules of product are still what determine everything
A
Anthony @kr0der ·
great article on how to use Codex effectively, this one's a must-read. it goes into how they use Codex to create software with 0 lines of human written code.
O OpenAIDevs @OpenAIDevs

📣 Shipping software with Codex without touching code. Here’s how a small team steering Codex opened and merged 1,500 pull requests to deliver a product used by hundreds of internal users with zero manual coding. https://t.co/2GaeX7We2n

🍓
🍓🍓🍓 @iruletheworldmo ·
he works on codex. my brain is struggling with the pace of acceleration i’ll be honest.
P pashmerepat @pashmerepat

It’s going to be a very weird year

K
kay in t veen @kayintveen ·
@thdxr tbh its the feedback loop for me. opus in claude code = tight iteration where i can course correct in real time codex might write better code in isolation but the gap between 'raw capability' and 'actually helps me ship faster' is where product wins
X
X Freeze @XFreeze ·
Elon Musk predicts that AI will bypass coding entirely by the end of 2026 - just creates the binary directly AI can create a much more efficient binary than can be done by any compiler So just say, "Create optimized binary for this particular outcome," and you actually bypass even traditional coding Current: Code → Compiler → Binary → Execute Future: Prompt → AI-generated Binary → Execute Grok Code is going to be state-of-the-art in 2–3 months Software development is about to fundamentally change
A
Aakash Gupta @aakashgupta ·
Andrej Karpathy just shared a complete GPT in 243 lines of Python. Training loop, inference, optimizer, attention, the whole architecture. The only imports are os, math, random, and argparse. He hand-rolled a scalar-valued autograd engine in about 40 lines that calculates gradients through basic operations: addition, multiplication, exponentiation, log, exp. That's the entire algorithmic backbone of every LLM on the planet, running in a single file a first-year CS student can read top to bottom in an hour. This is the fifth iteration in a six-year compression arc. micrograd in 2020 (autograd engine). minGPT in 2020 (PyTorch GPT). nanoGPT in 2023 (production-grade training). llm.c in 2024 (raw C/CUDA, no frameworks). Now microgpt in 2026: the algorithm and nothing else. Each step removed a layer of abstraction. This one removed all of them. The industry is spending $400 billion on AI data center infrastructure this year. Training GPT-4 cost over $100 million. Gemini Ultra ran $191 million. The entire conceptual engine powering those hundred-million-dollar training runs fits in fewer lines than a terms-of-service page. This tells you where the real moat in AI sits. The algorithm is a commodity. The original Transformer paper's math cost $900 to train in 2017. What separates a $900 experiment from a $191 million production run is compute, data pipelines, parallelism across thousands of GPUs, and the engineering to keep them all synchronized. Every line of code beyond these 243 is optimization for hardware that the algorithm itself knows nothing about. Karpathy keeps calling these "art projects." They're closer to existence proofs. He can keep compressing the algorithm because the algorithm was never the hard part. The hard part is the $400 billion in power infrastructure, cooling systems, and chip supply chains that make the algorithm useful at scale. And that infrastructure is on a compression curve of its own. Inference costs fell 280x between 2020 and 2024. Open-source models are closing the gap on frontier performance every quarter. The companies whose entire moat is "we spend more on GPUs" are watching both curves converge.
K karpathy @karpathy

New art project. Train and inference GPT in 243 lines of pure, dependency-free Python. This is the *full* algorithmic content of what is needed. Everything else is just for efficiency. I cannot simplify this any further. https://t.co/HmiRrQugnP

M
Mario Zechner @badlogicgames ·
recommended reading (found on @nateberkopec 's TL). This is not anti-LLM. This is anti-performative-productivity and we need more of it.
W WillManidis @WillManidis

Tool Shaped Objects