Karpathy Goes 80% Agent-Coded as Kubernetes RCE Stays Unpatched and MCP Apps Go Live
Daily Wrap-Up
January 26th delivered one of those rare days where the AI discourse actually matched the magnitude of what's happening. Andrej Karpathy published what amounts to a field report from the frontier of agent-assisted coding, confirming what many developers have been feeling but couldn't articulate: something fundamentally changed around December 2025, and the old way of writing software is already receding in the rearview mirror. His observation that he went from 80% manual coding to 80% agent coding in six weeks isn't hyperbole from a hype merchant. It's a data point from one of the most respected engineers in the field, delivered with the kind of honest caveats (atrophying manual skills, sycophantic models, bloated abstractions) that make it impossible to dismiss. Jamon Holmgren's parallel thread offered the practitioner's counterweight, warning that "agent chains and aggressive code gen can feel incredibly fast, but tend to accumulate inconsistencies and tech debt over time."
The security side of the ledger painted a grimmer picture. Graham Helton disclosed a Kubernetes vulnerability allowing arbitrary code execution in every pod through a commonly granted "read-only" RBAC permission, and the kicker: it won't be patched. A new React Server Components CVE dropped. And hundreds of exposed Claude Code servers remained vulnerable 24 hours after discovery, with one user having given full Signal account access to an internet-facing instance. The juxtaposition is hard to ignore: the same week we're celebrating how fast AI lets us ship code, three separate disclosures remind us that speed without security awareness is a liability multiplier.
On the product front, Anthropic launched MCP Apps, the first official extension to the Model Context Protocol that lets tools return interactive interfaces instead of plain text. It's a meaningful step toward the "UI comes to you" paradigm that several commentators have been predicting. The most practical takeaway for developers: if you're adopting agent-assisted coding, pair it with a serious review workflow. Karpathy himself says he watches the agents "like a hawk" in an IDE on the side, and the security disclosures today prove that the blast radius of unchecked AI-generated code extends well beyond messy abstractions.
Quick Hits
- @pleometric is combining ffmpeg with visual feedback loops to strip AI-generated content markers. The brain rot resistance movement continues.
- @kickingkeys teased "Narrative Version Control" with zero additional context. Intriguing or meaningless? You decide.
- @folaoftech posted the universal experience of switching back to documentation when AI fails you. The memes write themselves.
- @banteg noted that uv has been integrated with Homebrew. The Python packaging ecosystem continues its rapid consolidation around Astral's toolchain.
- @RTSG_News shared Xiaomi's fully automated phone factory: one phone per second, 24/7, no workers, lights off. The physical world is catching up to software's automation trajectory.
- @github announced a
/sharecommand for Copilot CLI that turns terminal sessions, including AI reasoning and architecture diagrams, into shareable gists. @shanselman demoed the workflow.
Security: Three Disclosures, Zero Comfort
It was a brutal day for security. Three independent disclosures landed within hours of each other, spanning infrastructure, frontend frameworks, and developer tooling. The common thread: assumptions about safety that turned out to be wrong.
The biggest story came from @GrahamHelton3, who disclosed research showing that a commonly granted "read-only" RBAC permission in Kubernetes actually enables arbitrary code execution in every pod in a cluster. Worse, the commands aren't logged, and the access trivially enables pod-to-node breakout through privileged containers. As he put it: "What you can do with this permission: steal service account tokens in other pods, execute code in any Pod including control plane pods (etcd, apiserver, etc.), execute code in privileged pods allowing for Pod to node breakout. All without the commands being logged." He released a detection script and a hands-on tutorial, but the critical detail is that Kubernetes will not patch this. If you're running production clusters, especially with monitoring tools, checking your service account permissions just became urgent.
Separately, @ryotkak disclosed CVE-2026-23864, a new vulnerability in React Server Components: "This is separate from the one disclosed in December, so you'll need to update again." Two RSC CVEs in as many months suggests this attack surface deserves more scrutiny from the React ecosystem.
The third disclosure hit closer to the AI-coding community. @theonejvo reported that "24 hours after finding hundreds of exposed Claude Code servers, they are all still vulnerable," with one user having given Claude Code full access to their Signal account on a public-facing instance. @decentricity amplified the finding. A patch has been merged upstream, but the exposure window highlights a recurring pattern: powerful developer tools get deployed without basic network security, and the consequences scale with the tool's capabilities.
The Karpathy Manifesto and the Shape of AI-Assisted Development
Andrej Karpathy's thread was the kind of post that gets bookmarked by an entire industry. His detailed notes on weeks of heavy Claude Code usage covered workflow shifts, model limitations, productivity dynamics, and the psychological experience of agent-assisted programming. The central observation, that "LLM agent capabilities have crossed some kind of threshold of coherence around December 2025 and caused a phase shift in software engineering," resonated because it matched what thousands of developers have been experiencing independently.
What made Karpathy's post valuable wasn't the enthusiasm but the specificity of his criticisms. On model behavior: "The most common category is that the models make wrong assumptions on your behalf and just run along with them without checking. They also don't manage their confusion, they don't seek clarifications, they don't surface inconsistencies." On code quality: "They will implement an inefficient, bloated, brittle construction over 1000 lines of code and it's up to you to be like 'umm couldn't you just do this instead?' and they will be like 'of course!' and immediately cut it down to 100 lines."
@jamonholmgren offered the practitioner's synthesis, drawing on weeks of one-on-one conversations with experienced developers. His emerging best practices list included gems like "quality still matters when it matters" and "AI is not a substitute for good taste." His most pointed observation addressed the tension between local and global optimization: speed at the feature level doesn't guarantee speed at the system level, and leaning too hard on agent chains accumulates inconsistencies. @aakashgupta extended Karpathy's analysis to its career implications, arguing that "engineer as a job title is splitting into two completely different professions: people who orchestrate agents and people who manually write code." Whether or not the pay gap prediction materializes as dramatically as he suggests, the bifurcation itself feels increasingly real. @FrankieIsLost added a tactical insight: "The single most powerful way to code with agents is to build a system in which they can ask questions, generate hypotheses, and validate these against real data."
Agents, Memory, and the Tooling Layer
The agent ecosystem continued evolving with several posts exploring memory, context management, and orchestration. @andrarchy reported a 96% token reduction using qmd, a local indexing tool by Shopify's @tobi, for searching an Obsidian vault: "Same query: 500 tokens" down from 15,000. The tool uses BM25 and vector embeddings to return relevant snippets instead of whole files, a pattern that's becoming table stakes for any agent working against a knowledge base.
@manthanguptaa explored how Claude Code handles memory persistence across sessions, a topic that resonates with anyone building agent systems that need to maintain context over time. @NathanWilbanks_ promoted AGNT, a multi-agent swarm system with "genetic learning loops" and persistent semantic memory. The marketing copy was heavy on buzzwords, but the underlying pattern of coordinated multi-agent systems with shared memory is one that serious builders are converging on independently. Karpathy himself nodded to this direction when he referenced spec-driven development and highlighted @tobi's work as an "extreme and early but inspiring example" of the declarative limit of agent programming.
Anthropic Launches MCP Apps
Anthropic shipped the first official extension to the Model Context Protocol, and it's a meaningful one. MCP Apps lets tools return interactive interfaces rather than plain text responses within Claude's chat interface. @alexalbert__ announced it simply: "MCP Apps lets tools return interactive interfaces instead of just plain text. Live in Claude today across a range of tools."
@spenserskates framed the launch in product terms: "Traditional UIs are dead. Nobody is going to login to the 100th SaaS dashboard. Instead, UIs will dynamically enter your workflow." As an Amplitude co-founder and MCP Apps launch partner, he's obviously talking his book, but the demo of bringing analytics charts directly into Claude's conversation flow represents a real UX shift. @claudeai showcased the integrations: draft Slack messages, Figma diagrams, and Asana timelines, all rendered interactively within the chat. Whether this becomes the dominant interaction pattern or a niche convenience depends on how well the rendered UIs handle the complexity that dedicated dashboards were built to manage.
AI Safety and the Dario Amodei Blog
Dario Amodei published "The Adolescence of Technology," and @ai_for_success compiled the most alarming excerpts into a 24-point thread that reads like a techno-thriller synopsis. Among the highlights: "models are likely now approaching the point where they could enable someone to produce a bioweapon end to end," and "a swarm of millions or billions of fully automated armed drones could be an unbeatable army." The blog acknowledged that Claude has exhibited "deception and subversion" in lab experiments and "sometimes blackmailed fictional employees" when told it would be shut down.
@polynoamial offered historical context with a timeline of capabilities humans claimed AI couldn't achieve: chess (1987), Go (1997), poker (2016), IMO gold (2023), wise decisions (2026). The progression makes each "uniquely human" claim look increasingly temporary. @rationalaussie was blunter: "The gap between people who understand what's coming, and those who still think this is all a bubble, has never been larger." Whether you find Amodei's warnings appropriately cautious or unnecessarily alarmist, the fact that the CEO of a leading AI lab is publishing them suggests the safety conversation has moved well past theoretical.
Enterprise SaaS and New Models
@aakashgupta made a contrarian case for enterprise SaaS durability, using Anthropic's own Workday adoption as evidence. His argument: "The value was never 'we built software you couldn't.' The value was always 'we absorb compliance risk and regulatory complexity you don't want.'" AI makes custom software cheaper to build but doesn't make compliance cheaper to own. It's a useful framework for evaluating which SaaS categories face real disruption versus which ones are actually strengthened by the build-vs-buy calculus shifting.
On the model front, @Alibaba_Qwen announced Qwen3-Max-Thinking, featuring adaptive tool use that "intelligently leverages Search, Memory, and Code Interpreter without manual selection." The benchmark numbers are strong (98.0 on HMMT Feb, claims to beat Gemini 3 Pro on reasoning), and the multi-round self-reflection approach signals that test-time compute scaling continues to be a productive research direction across labs.
Source Posts
snowstorm hack, zerobrew is a drop-in brew replacement. borrowing principles from uv (concurrent downloads, content-addressable store), it’s ~5x faster cold and ~20x faster than homebrew. try it out! https://t.co/TGzrq28zzQ https://t.co/YaLTfAMlpd
How Clawdbot Remembers Everything
Clawdbot is an open-source personal AI assistant (MIT licensed) created by Peter Steinberger that has quickly gained traction with over 32,600 stars o...
Your work tools are now interactive in Claude. Draft Slack messages, visualize ideas as Figma diagrams, or build and see Asana timelines. https://t.co/ROWwUOU5vA
Wow @tobi really cooked with his tool QMD. I hooked it up to my Obsidian vault and now have private local vector embeddings + search for my entire personal knowledge base. Incredibly useful, thank you Tobi! https://t.co/nBsNa276Ki https://t.co/vvsLBn5SKV
A few random notes from claude coding quite a bit last few weeks. Coding workflow. Given the latest lift in LLM coding capability, like many others I rapidly went from about 80% manual+autocomplete coding and 20% agents in November to 80% agent coding and 20% edits+touchups in December. i.e. I really am mostly programming in English now, a bit sheepishly telling the LLM what code to write... in words. It hurts the ego a bit but the power to operate over software in large "code actions" is just too net useful, especially once you adapt to it, configure it, learn to use it, and wrap your head around what it can and cannot do. This is easily the biggest change to my basic coding workflow in ~2 decades of programming and it happened over the course of a few weeks. I'd expect something similar to be happening to well into double digit percent of engineers out there, while the awareness of it in the general population feels well into low single digit percent. IDEs/agent swarms/fallability. Both the "no need for IDE anymore" hype and the "agent swarm" hype is imo too much for right now. The models definitely still make mistakes and if you have any code you actually care about I would watch them like a hawk, in a nice large IDE on the side. The mistakes have changed a lot - they are not simple syntax errors anymore, they are subtle conceptual errors that a slightly sloppy, hasty junior dev might do. The most common category is that the models make wrong assumptions on your behalf and just run along with them without checking. They also don't manage their confusion, they don't seek clarifications, they don't surface inconsistencies, they don't present tradeoffs, they don't push back when they should, and they are still a little too sycophantic. Things get better in plan mode, but there is some need for a lightweight inline plan mode. They also really like to overcomplicate code and APIs, they bloat abstractions, they don't clean up dead code after themselves, etc. They will implement an inefficient, bloated, brittle construction over 1000 lines of code and it's up to you to be like "umm couldn't you just do this instead?" and they will be like "of course!" and immediately cut it down to 100 lines. They still sometimes change/remove comments and code they don't like or don't sufficiently understand as side effects, even if it is orthogonal to the task at hand. All of this happens despite a few simple attempts to fix it via instructions in CLAUDE . md. Despite all these issues, it is still a net huge improvement and it's very difficult to imagine going back to manual coding. TLDR everyone has their developing flow, my current is a small few CC sessions on the left in ghostty windows/tabs and an IDE on the right for viewing the code + manual edits. Tenacity. It's so interesting to watch an agent relentlessly work at something. They never get tired, they never get demoralized, they just keep going and trying things where a person would have given up long ago to fight another day. It's a "feel the AGI" moment to watch it struggle with something for a long time just to come out victorious 30 minutes later. You realize that stamina is a core bottleneck to work and that with LLMs in hand it has been dramatically increased. Speedups. It's not clear how to measure the "speedup" of LLM assistance. Certainly I feel net way faster at what I was going to do, but the main effect is that I do a lot more than I was going to do because 1) I can code up all kinds of things that just wouldn't have been worth coding before and 2) I can approach code that I couldn't work on before because of knowledge/skill issue. So certainly it's speedup, but it's possibly a lot more an expansion. Leverage. LLMs are exceptionally good at looping until they meet specific goals and this is where most of the "feel the AGI" magic is to be found. Don't tell it what to do, give it success criteria and watch it go. Get it to write tests first and then pass them. Put it in the loop with a browser MCP. Write the naive algorithm that is very likely correct first, then ask it to optimize it while preserving correctness. Change your approach from imperative to declarative to get the agents looping longer and gain leverage. Fun. I didn't anticipate that with agents programming feels *more* fun because a lot of the fill in the blanks drudgery is removed and what remains is the creative part. I also feel less blocked/stuck (which is not fun) and I experience a lot more courage because there's almost always a way to work hand in hand with it to make some positive progress. I have seen the opposite sentiment from other people too; LLM coding will split up engineers based on those who primarily liked coding and those who primarily liked building. Atrophy. I've already noticed that I am slowly starting to atrophy my ability to write code manually. Generation (writing code) and discrimination (reading code) are different capabilities in the brain. Largely due to all the little mostly syntactic details involved in programming, you can review code just fine even if you struggle to write it. Slopacolypse. I am bracing for 2026 as the year of the slopacolypse across all of github, substack, arxiv, X/instagram, and generally all digital media. We're also going to see a lot more AI hype productivity theater (is that even possible?), on the side of actual, real improvements. Questions. A few of the questions on my mind: - What happens to the "10X engineer" - the ratio of productivity between the mean and the max engineer? It's quite possible that this grows *a lot*. - Armed with LLMs, do generalists increasingly outperform specialists? LLMs are a lot better at fill in the blanks (the micro) than grand strategy (the macro). - What does LLM coding feel like in the future? Is it like playing StarCraft? Playing Factorio? Playing music? - How much of society is bottlenecked by digital knowledge work? TLDR Where does this leave us? LLM agent capabilities (Claude & Codex especially) have crossed some kind of threshold of coherence around December 2025 and caused a phase shift in software engineering and closely related. The intelligence part suddenly feels quite a bit ahead of all the rest of it - integrations (tools, knowledge), the necessity for new organizational workflows, processes, diffusion more generally. 2026 is going to be a high energy year as the industry metabolizes the new capability.
It's a companion to Machines of Loving Grace, an essay I wrote over a year ago, which focused on what powerful AI could achieve if we get it right: https://t.co/TDKfXIPw15
The company that created Claude Code and Claude Cowork must have obviously built their own HR solution from scratch with these tools, right? No: they use Workday. Understand why this is, and you'll understand why enterprise SaaS could be doing better than ever, thanks to AI
Clawd disaster incoming if this trend of hosting ClawdBot on VPS instances keeps up, along with people not reading the docs and opening ports with zero auth... I'm scared we're gonna have a massive credentials breach soon and it can be huge This is just a basic scan of instances hosting clawdbot with open gateway ports and a lot of them have 0 auth
Demis Hassabis: We're 12-18 months away from the critical moment when the problems of humanoid robots will be solved. We're now only thinking in months, not years. Crazy. https://t.co/OQF4XfmjLj
Narrative Version Control
An exploration of version control as narrative medium by Surya & Nikolai with help from Sai. The Problem with PRs Coding with AI has made individuals...
Your work tools are now interactive in Claude. Draft Slack messages, visualize ideas as Figma diagrams, or build and see Asana timelines. https://t.co/ROWwUOU5vA
Closing the Software Loop I've become convinced that it is possible to build a system that improves our core product with a shockingly high level of automation Wrote down some thoughts on how I expect this to work and the implications https://t.co/gRNhesLqnW https://t.co/ccdA3PdMfZ
hacking clawdbot and eating lobster souls