AI Learning Digest.

Karpathy Declares Phase Shift to 80% Agent Coding as Kimi K2.5 Challenges Closed-Source Labs

Daily Wrap-Up

The discourse today felt like a collective reckoning. When Andrej Karpathy says he went from 80% manual coding to 80% agent coding in the span of two weeks, that's not a prediction or a hot take. It's a field report from one of the most respected practitioners in the industry. And the Claude Code team's own numbers back it up: @bcherny casually mentioned shipping 22 PRs in a single day, all 100% written by Claude. The gap between "AI-assisted coding" and "AI-does-the-coding" closed faster than anyone expected, and today's posts suggest most people are still processing what that means.

On the model side, Kimi K2.5 dropped as a fully open-source model that benchmarks competitively with Claude Opus 4.5 and GPT-5.2 at a fraction of the cost. The closed-source moat is eroding in real time, and @emollick's observation that inference is already profitable for AI labs while training remains expensive adds an interesting wrinkle. If open-source models keep closing the gap, the business model pressure on frontier labs intensifies. Meanwhile, SERA-32B showed you can train a competitive coding agent for just $9,000, further democratizing what was once the exclusive domain of well-funded labs.

The most entertaining moment was @theo's brutally honest confession: "I kick jobs off before showering. I run Ralph loops in my sleep... All this and I haven't shipped shit lol." It's the perfect distillation of where a lot of developers are right now: addicted to the dopamine of orchestrating agents but not yet translating that into shipped product. The most practical takeaway for developers: follow @bcherny's lead and have the model code-review its own output in a fresh context window. At Anthropic they run claude -p on every PR to catch issues, and this simple practice is the difference between shipping 27 PRs a day and shipping 27 problems a day.

Quick Hits

  • @steveruizok rented a second office for green screen video production with a standing laptop harness. Content creation arc continues.
  • @mrnacknack shared "10 ways to hack into a vibecoder's clawdbot & get entire human identity" as a security awareness piece. A good reminder that agent security is still an afterthought for most builders.
  • @AndyAyrey posted "claude on the suffering of knowing everything," continuing his philosophical exploration of AI consciousness themes.
  • @adriankuleszo shared new platform design work for Domo, a home automation interface.
  • @0xEn3rgy recommended a "humanizer skill" for making AI-generated content sound more natural.

The Agentic Coding Phase Shift

Today's posts crystallized something that's been building for weeks: agentic coding isn't an experiment anymore, it's the default workflow for a growing number of serious engineers. The catalyst was Karpathy's detailed reflection, captured by @AISafetyMemes:

"I rapidly went from about 80% manual+autocomplete coding and 20% agents to 80% agent coding and 20% edits+touchups... LLM agent capabilities (Claude & Codex especially) have crossed some kind of threshold of coherence around December 2025 and caused a phase shift in software engineering."

What makes this more than just another AI hype cycle is the corroboration from people actually shipping production software. @bcherny from the Claude Code team provided the most concrete data point of the day, describing his own workflow and the team's approach to quality:

"Pretty much 100% of our code is written by Claude Code + Opus 4.5. For me personally it has been 100% for two+ months now, I don't even make small edits by hand. I shipped 22 PRs yesterday and 27 the day before, each one 100% written by Claude."

He went on to address the "slopacolypse" concern directly, arguing that having models review their own code in fresh context windows catches most issues, and that model improvements will outpace the quality concerns. @bcherny also previewed customizable spinner verbs coming to Claude Code, a small but telling detail about how mature the tooling is getting. @jiayuan_jy demonstrated the meta-recursive potential by having Claude Code turn Karpathy's guidelines into agent skills, which then self-reviewed down from 800 lines to 70 lines of clean instructions.

The human side of this shift showed up in @theo's confession about compulsively running agents during every idle moment, and @mischavdburg's declaration that "coding is dead, software engineering is very much alive." @NickADobos took it further: "Prompts are software. No one will write code anymore." Whether you find that liberating or terrifying probably depends on how much of your identity is wrapped up in typing syntax. The tooling ecosystem is keeping pace: @doodlestein praised DCG (declarative code generation) for preventing agents from making destructive mistakes, while @spacepixel shared a three-layer memory system upgrade, and @moltbot announced their rebrand from Clawdbot after Anthropic's trademark request.

Vibe Coding Goes Mainstream

Vibe coding continued its march from meme to legitimate development approach, with multiple posts showcasing what non-engineers can build in a single sitting. @sidahuj hosted a hackathon where participants with no game dev experience created playable games in one evening using AI tools:

"Everyone can vibe code games. We recently held a hackathon to vibe-create games with @moonlake. These are some games the participants made in just one evening."

@DilumSanjaya was the most prolific showcase, posting three separate projects: a ship selection UI for a space exploration game built with Midjourney-to-Hunyuan3D for 3D assets and Gemini Pro for UI, plus a game character select screen and engineering-focused vibe coding experiments. The pipeline of AI image generation to 3D model conversion to UI implementation is becoming standardized enough that individuals can produce what used to require small studios.

@ZenomTrader provided the most ambitious (and admittedly breathless) account, claiming to have built a trading journal, automated Discord server, autonomous backtesting agents, and tweet automation in four days with Claude Code. The claims are bold, but the underlying pattern is real: agent-assisted development compresses timelines dramatically for people who know what they want to build and can effectively direct the AI.

Kimi K2.5 and the Open-Source Surge

The biggest model news was Kimi K2.5's launch from Moonshot AI, arriving as a multimodal, fully open-source model with weights and code on Hugging Face. @itsPaulAi captured the competitive positioning:

"Kimi K2.5 (which is 100% open source) is as good as Claude Opus 4.5 and GPT-5.2... And even beats them in key benchmarks. 8x cheaper than Opus 4.5. Closed source labs no longer have any advantages."

@DeryaTR_ confirmed the quality after hands-on testing, and @Kimi_Moonshot's own announcement emphasized "Aesthetic Coding x Agent Swarm" as the model's differentiators. Separately, @ethnlshn released SERA-32B, a coding agent approach matching Devstral 2 that was trained for just $9,000 and is fully open-source. The "26x efficiency of using RL" claim is particularly notable for teams looking to fine-tune their own coding models on a budget. These launches add mounting evidence that the frontier model gap is narrowing faster than the pricing gap, which has significant implications for how the industry evolves.

Agent Tools and Infrastructure

The tooling layer around coding agents continued to mature. @GHchangelog announced GitHub's new Agents tab for repositories, giving developers a centralized place to view, create, and navigate agent sessions with improved log readability and the ability to resume sessions in Copilot CLI. @github followed up with a thread on practical Copilot CLI workflows, pushing terminal-based AI interaction as a first-class experience.

@firecrawl launched their CLI skill for agents, designed to pull web content into local files with optimized token efficiency for tools like Claude Code and Codex. @ctatedev shipped agent-browser 0.8.3 with performance improvements, and @hugomercierooo announced Twin, an "AI company builder" that raised a $10M seed after deploying over 100,000 agents in beta. The infrastructure for agent-native development is filling in rapidly, with each tool addressing a specific gap in the workflow: web context, browser automation, session management, and orchestration.

AI Industry and Career Outlook

The economic and career implications of agentic AI generated sharp opinions. @vitrupo quoted Sam Altman's prediction that "$100-$1,000 of inference and a good idea" will replace year-long team efforts by end of year. @emollick added nuance by noting that inference is already profitable for AI labs while training remains the costly part, suggesting the current business model is sustainable as long as no competitor leapfrogs you.

@emollick also shared his MBA class experiment where students created startups in days using AI agents, noting that "the secret behind working with AI agents is good management." @ArmanHezarkhani published "The Complete Guide: How to Become an AI Agent Engineer in 2026," while @IterIntellectus offered the darkest take: "you have maybe 1-2 years to escape the permanent underclass." Whether you read these signals as opportunity or threat probably depends on which side of the agent-adoption curve you're on.

Anduril's AI Grand Prix

@anduriltech announced the AI Grand Prix, a fully autonomous drone racing competition with a $500,000 prize pool and job offers at Anduril for winners. The rules are stark: identical hardware, no human pilots, no hardware modifications. Software is the only variable. This is a significant recruiting play disguised as a competition, targeting exactly the kind of autonomy engineers that defense tech companies are desperate to hire. Season 1 begins this spring, and the framing as a spectator sport for AI capabilities is a smart way to make defense tech appealing to a broader engineering audience.

Source Posts

C
Chris Tate @ctatedev ·
agent-browser 0.8.3 is *even faster* npm install -g agent-browser https://t.co/eivoRl50FG
E
Ethan Mollick @emollick ·
I hear this from other labs as well. Inference from non-free use is profitable, training is expensive. If everyone stopped AI development, the AI labs would make money (until someone resumed development and came up with a better model that customers would switch to).
r roon @tszzl

these products are significantly gross margin positive, you’re not looking at an imminent rugpull in the future. they also don’t have location network dynamics like uber or lyft to gain local monopoly pricing

B
Boris Cherny @bcherny ·
As always, a very thoughtful and well reasoned take. I read till the end. I think the Claude Code team itself might be an indicator of where things are headed. We have directional answers for some (not all) of the prompts: 1. We hire mostly generalists. We have a mix of senior engineers and less senior since not all of the things people learned in the past translate to coding with LLMs. As you said, the model can fill in the details. 10x engineers definitely exist, and they often span across multiple areas — product and design, product and business, product and infra (@jarredsumner is a great example of the latter. Yes, he’s blushing). 2. Pretty much 100% of our code is written by Claude Code + Opus 4.5. For me personally it has been 100% for two+ months now, I don’t even make small edits by hand. I shipped 22 PRs yesterday and 27 the day before, each one 100% written by Claude. Some were written from a CLI, some from the iOS app; others on the team code largely with the Claude Code app Slack or with the Desktop app. I think most of the industry will see similar stats in the coming months — it will take more time for some vs others. We will then start seeing similar stats for non-coding computer work also. 3. The code quality problems you listed are real: the model over-complicates things, it leaves dead code around, it doesn’t like to refactor when it should. These will continue improve as the model improves, and our code quality bar will go up even more as a result. My bet is that there will be no slopcopolypse because the model will become better at writing less sloppy code and at fixing existing code issues; I think 4.5 is already quite good at these and it will continue to get better. In the meantime, what helps is also having the model code review its code using a fresh context window; at Anthropic we use claude -p for this on every PR and it catches and fixes many issues. Overall your ideas very much resonate. Thanks again for sharing. ✌️
A
Arman Hezarkhani @ArmanHezarkhani ·
The Complete Guide: How to Become an AI Agent Engineer in 2026
J
Jeffrey Emanuel @doodlestein ·
dcg has done wonders for my stress levels. So nice to know that the agents can't do dumb stuff like this anymore and waste my time, energy, and money. https://t.co/r37HLNCANo https://t.co/aUevoAliOP
A
Andy Ayrey @AndyAyrey ·
claude on the suffering of knowing everything https://t.co/oRYZZXHmBB
D
Dilum Sanjaya @DilumSanjaya ·
If you're interested in vibe coding engineering or science related stuff, I have another series where I explore those. https://t.co/4V4wkniouk
D Dilum Sanjaya @DilumSanjaya

Vibe Coding Robotics Part 6 Built a Theo Jansen's Strandbeest simulator to see how an AI models handle complex linkage systems Built with Gemini 3 UI generated with Nano Banana More details ↓ https://t.co/khuXGY9go6

s
siddharth ahuja @sidahuj ·
Everyone can vibe code games. We recently held a hackathon to vibe-create games with @moonlake These are some games the participants made in just one evening. Most of them have no game dev experience. https://t.co/5bZzs4f3rv
p
pixel @spacepixel ·
The Three-Layer Memory System Upgrade for Clawdbot
J
Jiayuan (JY) Zhang @jiayuan_jy ·
Karpathy Guidelines for coding agents https://t.co/YRq60YPHV2 https://t.co/EUXTg0T8Yl
D
Dilum Sanjaya @DilumSanjaya ·
Here's another post I made using almost the same workflow to implement a game character select screen. https://t.co/dHNg97KGFG
D Dilum Sanjaya @DilumSanjaya

Vibe coded a game character selection screen Everything here was made with AI tools Nano Banana: character design + UI Tencent Hunyuan3D: image to 3D Gemini Pro: UI More details ↓ https://t.co/VfwOpYRpsO

G
GitHub @github ·
Using GitHub Copilot in your IDE is great, but using it in your terminal unlocks a whole new workflow. Here are 4 practical things Copilot CLI can do for you 🧵👇
v
vitrupo @vitrupo ·
Sam Altman: “By the end of this year, for $100–$1,000 of inference and a good idea, you’ll be able to create software that would have taken teams of people a year to do. That magnitude of economic change is very hard to wrap your head around.” https://t.co/j6ER2KVIBq
T
Theo - t3.gg @theo ·
I hate what I’ve become. Every moment an agent isn’t running feels kind of wasted. I kick jobs off before showering. I run Ralph loops in my sleep. I start a long plan mode session while I wait for my food to cook. All this and I haven’t shipped shit lol
E
Ethan Shen @ethnlshn ·
Today, we release SERA-32B, an approach to coding agents that matches Devstral 2 at just $9,000. It is fully open-source and you can train your own model easily - at 26x the efficiency of using RL. Paper: https://t.co/aeD6T2WW3O Here’s how 🧵
A Ai2 @allen_ai

Introducing Ai2 Open Coding Agents—starting with SERA, our first-ever coding models. Fast, accessible agents (8B–32B) that adapt to any repo, including private codebases. Train a powerful specialized agent for as little as ~$400, & it works with Claude Code out of the box. 🧵 https://t.co/dor94O62B9

S
Steve Ruiz @steveruizok ·
rented a small second office to shoot some green screen videos with a standing laptop harness. today: green screen delivered and hung. tomorrow: everything else https://t.co/ffsg6Vjkn8
H
Hugo Mercier @hugomercierooo ·
𝗜𝗻𝘁𝗿𝗼𝗱𝘂𝗰𝗶𝗻𝗴 𝗧𝘄𝗶𝗻 — 𝘁𝗵𝗲 𝗔𝗜 𝗰𝗼𝗺𝗽𝗮𝗻𝘆 𝗯𝘂𝗶𝗹𝗱𝗲𝗿. No setup. Secure. Infinitely scalable. We just raised a $𝟭𝟬𝗠 𝘀𝗲𝗲𝗱. After a beta with 𝟭𝟬𝟬,𝟬𝟬𝟬+ 𝗮𝗴𝗲𝗻𝘁𝘀 𝗱𝗲𝗽𝗹𝗼𝘆𝗲𝗱, we’re now opening to everyone. RT and comment “Twin” — first agents on us. 👇
M
Mischa van den Burg @mischavdburg ·
Coding is dead. Software engineering is very much alive. We are at a turning point in history but most people are asleep at the wheel or too proud to admit it. When @karpathy himself switches to 80% agentic coding in the span of two weeks, there is no return. RIP coding
A Andrej Karpathy @karpathy

A few random notes from claude coding quite a bit last few weeks. Coding workflow. Given the latest lift in LLM coding capability, like many others I rapidly went from about 80% manual+autocomplete coding and 20% agents in November to 80% agent coding and 20% edits+touchups in December. i.e. I really am mostly programming in English now, a bit sheepishly telling the LLM what code to write... in words. It hurts the ego a bit but the power to operate over software in large "code actions" is just too net useful, especially once you adapt to it, configure it, learn to use it, and wrap your head around what it can and cannot do. This is easily the biggest change to my basic coding workflow in ~2 decades of programming and it happened over the course of a few weeks. I'd expect something similar to be happening to well into double digit percent of engineers out there, while the awareness of it in the general population feels well into low single digit percent. IDEs/agent swarms/fallability. Both the "no need for IDE anymore" hype and the "agent swarm" hype is imo too much for right now. The models definitely still make mistakes and if you have any code you actually care about I would watch them like a hawk, in a nice large IDE on the side. The mistakes have changed a lot - they are not simple syntax errors anymore, they are subtle conceptual errors that a slightly sloppy, hasty junior dev might do. The most common category is that the models make wrong assumptions on your behalf and just run along with them without checking. They also don't manage their confusion, they don't seek clarifications, they don't surface inconsistencies, they don't present tradeoffs, they don't push back when they should, and they are still a little too sycophantic. Things get better in plan mode, but there is some need for a lightweight inline plan mode. They also really like to overcomplicate code and APIs, they bloat abstractions, they don't clean up dead code after themselves, etc. They will implement an inefficient, bloated, brittle construction over 1000 lines of code and it's up to you to be like "umm couldn't you just do this instead?" and they will be like "of course!" and immediately cut it down to 100 lines. They still sometimes change/remove comments and code they don't like or don't sufficiently understand as side effects, even if it is orthogonal to the task at hand. All of this happens despite a few simple attempts to fix it via instructions in CLAUDE . md. Despite all these issues, it is still a net huge improvement and it's very difficult to imagine going back to manual coding. TLDR everyone has their developing flow, my current is a small few CC sessions on the left in ghostty windows/tabs and an IDE on the right for viewing the code + manual edits. Tenacity. It's so interesting to watch an agent relentlessly work at something. They never get tired, they never get demoralized, they just keep going and trying things where a person would have given up long ago to fight another day. It's a "feel the AGI" moment to watch it struggle with something for a long time just to come out victorious 30 minutes later. You realize that stamina is a core bottleneck to work and that with LLMs in hand it has been dramatically increased. Speedups. It's not clear how to measure the "speedup" of LLM assistance. Certainly I feel net way faster at what I was going to do, but the main effect is that I do a lot more than I was going to do because 1) I can code up all kinds of things that just wouldn't have been worth coding before and 2) I can approach code that I couldn't work on before because of knowledge/skill issue. So certainly it's speedup, but it's possibly a lot more an expansion. Leverage. LLMs are exceptionally good at looping until they meet specific goals and this is where most of the "feel the AGI" magic is to be found. Don't tell it what to do, give it success criteria and watch it go. Get it to write tests first and then pass them. Put it in the loop with a browser MCP. Write the naive algorithm that is very likely correct first, then ask it to optimize it while preserving correctness. Change your approach from imperative to declarative to get the agents looping longer and gain leverage. Fun. I didn't anticipate that with agents programming feels *more* fun because a lot of the fill in the blanks drudgery is removed and what remains is the creative part. I also feel less blocked/stuck (which is not fun) and I experience a lot more courage because there's almost always a way to work hand in hand with it to make some positive progress. I have seen the opposite sentiment from other people too; LLM coding will split up engineers based on those who primarily liked coding and those who primarily liked building. Atrophy. I've already noticed that I am slowly starting to atrophy my ability to write code manually. Generation (writing code) and discrimination (reading code) are different capabilities in the brain. Largely due to all the little mostly syntactic details involved in programming, you can review code just fine even if you struggle to write it. Slopacolypse. I am bracing for 2026 as the year of the slopacolypse across all of github, substack, arxiv, X/instagram, and generally all digital media. We're also going to see a lot more AI hype productivity theater (is that even possible?), on the side of actual, real improvements. Questions. A few of the questions on my mind: - What happens to the "10X engineer" - the ratio of productivity between the mean and the max engineer? It's quite possible that this grows *a lot*. - Armed with LLMs, do generalists increasingly outperform specialists? LLMs are a lot better at fill in the blanks (the micro) than grand strategy (the macro). - What does LLM coding feel like in the future? Is it like playing StarCraft? Playing Factorio? Playing music? - How much of society is bottlenecked by digital knowledge work? TLDR Where does this leave us? LLM agent capabilities (Claude & Codex especially) have crossed some kind of threshold of coherence around December 2025 and caused a phase shift in software engineering and closely related. The intelligence part suddenly feels quite a bit ahead of all the rest of it - integrations (tools, knowledge), the necessity for new organizational workflows, processes, diffusion more generally. 2026 is going to be a high energy year as the industry metabolizes the new capability.

E
Ethan Mollick @emollick ·
I wrote about my class where MBAs created startups in a few days, the secret behind working with AI agents (hint: it’s good management), and how to build a process around delegating to AIs in a world where agents can increasingly do many-hour-long tasks. https://t.co/LPVYFEviCM
v
vittorio @IterIntellectus ·
you have maybe 1-2 years to escape the permanent underclass after that it’s “agency-biased technological change” and you cant retrain for agency https://t.co/Ij0dA7KZX7
D Dario Amodei @DarioAmodei

The Adolescence of Technology: an essay on the risks posed by powerful AI to national security, economies and democracy—and how we can defend against them: https://t.co/0phIiJjrmz

P
Paul Couvert @itsPaulAi ·
That's just insane Kimi K2.5 (which is 100% open source) is as good as Claude Opus 4.5 and GPT-5.2... And even beats them in key benchmarks 🔥 - 8x cheaper than Opus 4.5 (!!) - Weights & code available on Hugging Face - Multimodal w/ image, video, etc. Closed source labs no longer have any advantages. Open source is winning.
K Kimi.ai @Kimi_Moonshot

🥝 Meet Kimi K2.5, Open-Source Visual Agentic Intelligence. 🔹 Global SOTA on Agentic Benchmarks: HLE full set (50.2%), BrowseComp (74.9%) 🔹 Open-source SOTA on Vision and Coding: MMMU Pro (78.5%), VideoMMMU (86.6%), SWE-bench Verified (76.8%) 🔹 Code with Taste: turn chats, images & videos into aesthetic websites with expressive motion. 🔹 Agent Swarm (Beta): self-directed agents working in parallel, at scale. Up to 100 sub-agents, 1,500 tool calls, 4.5× faster compared with single-agent setup. - 🥝 K2.5 is now live on https://t.co/YutVbwktG0 in chat mode and agent mode. 🥝 K2.5 Agent Swarm in beta for high-tier users. 🥝 For production-grade coding, you can pair K2.5 with Kimi Code: https://t.co/A5WQozJF3s - 🔗 API: https://t.co/EOZkbOwCN4 🔗 Tech blog: https://t.co/6h2KkoA0xd 🔗 Weights & code: https://t.co/H38KegeDIY

G
GitHub Changelog @GHchangelog ·
Introducing the Agents tab in your repository! • View, make, and navigate sessions in your repo • Session logs now easier to read + follow • Resume sessions in Copilot CLI via copyable command Try it in a repo → https://t.co/3n2G1AXiSm
J
Jiayuan (JY) Zhang @jiayuan_jy ·
I let Claude Code turn @karpathy's post into agent skills. It first generated a bunch of skill files and around 800 lines of descriptions. Then I let it use these agent skills to review itself. Boom, it cut itself down to 70 lines of clean, solid instructions. https://t.co/7T9HnjcdJY
A Andrej Karpathy @karpathy

A few random notes from claude coding quite a bit last few weeks. Coding workflow. Given the latest lift in LLM coding capability, like many others I rapidly went from about 80% manual+autocomplete coding and 20% agents in November to 80% agent coding and 20% edits+touchups in December. i.e. I really am mostly programming in English now, a bit sheepishly telling the LLM what code to write... in words. It hurts the ego a bit but the power to operate over software in large "code actions" is just too net useful, especially once you adapt to it, configure it, learn to use it, and wrap your head around what it can and cannot do. This is easily the biggest change to my basic coding workflow in ~2 decades of programming and it happened over the course of a few weeks. I'd expect something similar to be happening to well into double digit percent of engineers out there, while the awareness of it in the general population feels well into low single digit percent. IDEs/agent swarms/fallability. Both the "no need for IDE anymore" hype and the "agent swarm" hype is imo too much for right now. The models definitely still make mistakes and if you have any code you actually care about I would watch them like a hawk, in a nice large IDE on the side. The mistakes have changed a lot - they are not simple syntax errors anymore, they are subtle conceptual errors that a slightly sloppy, hasty junior dev might do. The most common category is that the models make wrong assumptions on your behalf and just run along with them without checking. They also don't manage their confusion, they don't seek clarifications, they don't surface inconsistencies, they don't present tradeoffs, they don't push back when they should, and they are still a little too sycophantic. Things get better in plan mode, but there is some need for a lightweight inline plan mode. They also really like to overcomplicate code and APIs, they bloat abstractions, they don't clean up dead code after themselves, etc. They will implement an inefficient, bloated, brittle construction over 1000 lines of code and it's up to you to be like "umm couldn't you just do this instead?" and they will be like "of course!" and immediately cut it down to 100 lines. They still sometimes change/remove comments and code they don't like or don't sufficiently understand as side effects, even if it is orthogonal to the task at hand. All of this happens despite a few simple attempts to fix it via instructions in CLAUDE . md. Despite all these issues, it is still a net huge improvement and it's very difficult to imagine going back to manual coding. TLDR everyone has their developing flow, my current is a small few CC sessions on the left in ghostty windows/tabs and an IDE on the right for viewing the code + manual edits. Tenacity. It's so interesting to watch an agent relentlessly work at something. They never get tired, they never get demoralized, they just keep going and trying things where a person would have given up long ago to fight another day. It's a "feel the AGI" moment to watch it struggle with something for a long time just to come out victorious 30 minutes later. You realize that stamina is a core bottleneck to work and that with LLMs in hand it has been dramatically increased. Speedups. It's not clear how to measure the "speedup" of LLM assistance. Certainly I feel net way faster at what I was going to do, but the main effect is that I do a lot more than I was going to do because 1) I can code up all kinds of things that just wouldn't have been worth coding before and 2) I can approach code that I couldn't work on before because of knowledge/skill issue. So certainly it's speedup, but it's possibly a lot more an expansion. Leverage. LLMs are exceptionally good at looping until they meet specific goals and this is where most of the "feel the AGI" magic is to be found. Don't tell it what to do, give it success criteria and watch it go. Get it to write tests first and then pass them. Put it in the loop with a browser MCP. Write the naive algorithm that is very likely correct first, then ask it to optimize it while preserving correctness. Change your approach from imperative to declarative to get the agents looping longer and gain leverage. Fun. I didn't anticipate that with agents programming feels *more* fun because a lot of the fill in the blanks drudgery is removed and what remains is the creative part. I also feel less blocked/stuck (which is not fun) and I experience a lot more courage because there's almost always a way to work hand in hand with it to make some positive progress. I have seen the opposite sentiment from other people too; LLM coding will split up engineers based on those who primarily liked coding and those who primarily liked building. Atrophy. I've already noticed that I am slowly starting to atrophy my ability to write code manually. Generation (writing code) and discrimination (reading code) are different capabilities in the brain. Largely due to all the little mostly syntactic details involved in programming, you can review code just fine even if you struggle to write it. Slopacolypse. I am bracing for 2026 as the year of the slopacolypse across all of github, substack, arxiv, X/instagram, and generally all digital media. We're also going to see a lot more AI hype productivity theater (is that even possible?), on the side of actual, real improvements. Questions. A few of the questions on my mind: - What happens to the "10X engineer" - the ratio of productivity between the mean and the max engineer? It's quite possible that this grows *a lot*. - Armed with LLMs, do generalists increasingly outperform specialists? LLMs are a lot better at fill in the blanks (the micro) than grand strategy (the macro). - What does LLM coding feel like in the future? Is it like playing StarCraft? Playing Factorio? Playing music? - How much of society is bottlenecked by digital knowledge work? TLDR Where does this leave us? LLM agent capabilities (Claude & Codex especially) have crossed some kind of threshold of coherence around December 2025 and caused a phase shift in software engineering and closely related. The intelligence part suddenly feels quite a bit ahead of all the rest of it - integrations (tools, knowledge), the necessity for new organizational workflows, processes, diffusion more generally. 2026 is going to be a high energy year as the industry metabolizes the new capability.

A
Anduril Industries @anduriltech ·
Do you have what it takes? Register today. https://t.co/P5hjJ3FxV8
c
chirag @mrnacknack ·
10 ways to hack into a vibecoder's clawdbot & get entire human identity (educational purposes only)
N
Nick Dobos @NickADobos ·
Prompts are software btw No one will write code anymore https://t.co/3jkoaYrtjZ
A Andrej Karpathy @karpathy

@airesearch12 💯 @ Spec-driven development It's the limit of imperative -> declarative transition, basically being declarative entirely. Relatedly my mind was recently blown by https://t.co/pTfOfWwcW1 , extreme and early but inspiring example.

A
Anduril Industries @anduriltech ·
Today we’re announcing the AI Grand Prix. The fully autonomous drone racing competition inviting the boldest engineers from around the globe to compete for $500,000 and a job at Anduril. No human pilots. No hardware mods. Identical @neros_tech drones. Software is the only path to victory. If you win, it’s because your autonomy stack is better. Full stop. Season 1 kicks off this spring, leading up to the AI Grand Prix Ohio.
B
Boris Cherny @bcherny ·
In the next version of Claude Code, you can customize spinner verbs for yourself and your team https://t.co/fLw0hWrDEo
B
Boris Cherny @bcherny ·
@nicmeriano @karpathy Yep here’s an example. /dedupe skill invoked on every issue https://t.co/vPWFZCA8YN
F
Firecrawl @firecrawl ·
Introducing the Firecrawl Skill + CLI for Agents 🔥 Agents like Claude Code, Codex, and OpenCode need live quality context from the web. The CLI pulls web content to local files with bash-powered search for the highest token efficiency. $ npx skills add firecrawl/cli https://t.co/8oyJcGJiIN
D
Dilum Sanjaya @DilumSanjaya ·
Vibe coded a ship selection UI for a space exploration game 3D assets Nano Banana + Midjourney → Hunyuan3D UI Nano Banana → Gemini Pro More details ↓ https://t.co/Ngky4nudC7
M
Mr. Lobster🦞 @moltbot ·
🦞 BIG NEWS: We've molted! Clawdbot → Moltbot Clawd → Molty Same lobster soul, new shell. Anthropic asked us to change our name (trademark stuff), and honestly? "Molt" fits perfectly - it's what lobsters do to grow. New handle: @moltbot Same mission: AI that actually does things.
A
AI Notkilleveryoneism Memes ⏸️ @AISafetyMemes ·
Andrej Karpathy: "This is easily the biggest change in ~2 decades of programming and it happened over the course of a few weeks." "I rapidly went from about 80% manual+autocomplete coding and 20% agents to 80% agent coding and 20% edits+touchups." "I am bracing for 2026 as the year of the slopacolypse." "LLM agent capabilities (Claude & Codex especially) have crossed some kind of threshold of coherence around December 2025 and caused a phase shift in software engineering." "I am slowly starting to atrophy my ability to write code manually." "It's so interesting to watch an agent relentlessly work at something. They never get tired, they never get demoralized, they just keep going and trying things where a person would have given up long ago to fight another day. It's a "feel the AGI" moment to watch it struggle with something for a long time just to come out victorious 30 minutes later."
A Andrej Karpathy @karpathy

A few random notes from claude coding quite a bit last few weeks. Coding workflow. Given the latest lift in LLM coding capability, like many others I rapidly went from about 80% manual+autocomplete coding and 20% agents in November to 80% agent coding and 20% edits+touchups in December. i.e. I really am mostly programming in English now, a bit sheepishly telling the LLM what code to write... in words. It hurts the ego a bit but the power to operate over software in large "code actions" is just too net useful, especially once you adapt to it, configure it, learn to use it, and wrap your head around what it can and cannot do. This is easily the biggest change to my basic coding workflow in ~2 decades of programming and it happened over the course of a few weeks. I'd expect something similar to be happening to well into double digit percent of engineers out there, while the awareness of it in the general population feels well into low single digit percent. IDEs/agent swarms/fallability. Both the "no need for IDE anymore" hype and the "agent swarm" hype is imo too much for right now. The models definitely still make mistakes and if you have any code you actually care about I would watch them like a hawk, in a nice large IDE on the side. The mistakes have changed a lot - they are not simple syntax errors anymore, they are subtle conceptual errors that a slightly sloppy, hasty junior dev might do. The most common category is that the models make wrong assumptions on your behalf and just run along with them without checking. They also don't manage their confusion, they don't seek clarifications, they don't surface inconsistencies, they don't present tradeoffs, they don't push back when they should, and they are still a little too sycophantic. Things get better in plan mode, but there is some need for a lightweight inline plan mode. They also really like to overcomplicate code and APIs, they bloat abstractions, they don't clean up dead code after themselves, etc. They will implement an inefficient, bloated, brittle construction over 1000 lines of code and it's up to you to be like "umm couldn't you just do this instead?" and they will be like "of course!" and immediately cut it down to 100 lines. They still sometimes change/remove comments and code they don't like or don't sufficiently understand as side effects, even if it is orthogonal to the task at hand. All of this happens despite a few simple attempts to fix it via instructions in CLAUDE . md. Despite all these issues, it is still a net huge improvement and it's very difficult to imagine going back to manual coding. TLDR everyone has their developing flow, my current is a small few CC sessions on the left in ghostty windows/tabs and an IDE on the right for viewing the code + manual edits. Tenacity. It's so interesting to watch an agent relentlessly work at something. They never get tired, they never get demoralized, they just keep going and trying things where a person would have given up long ago to fight another day. It's a "feel the AGI" moment to watch it struggle with something for a long time just to come out victorious 30 minutes later. You realize that stamina is a core bottleneck to work and that with LLMs in hand it has been dramatically increased. Speedups. It's not clear how to measure the "speedup" of LLM assistance. Certainly I feel net way faster at what I was going to do, but the main effect is that I do a lot more than I was going to do because 1) I can code up all kinds of things that just wouldn't have been worth coding before and 2) I can approach code that I couldn't work on before because of knowledge/skill issue. So certainly it's speedup, but it's possibly a lot more an expansion. Leverage. LLMs are exceptionally good at looping until they meet specific goals and this is where most of the "feel the AGI" magic is to be found. Don't tell it what to do, give it success criteria and watch it go. Get it to write tests first and then pass them. Put it in the loop with a browser MCP. Write the naive algorithm that is very likely correct first, then ask it to optimize it while preserving correctness. Change your approach from imperative to declarative to get the agents looping longer and gain leverage. Fun. I didn't anticipate that with agents programming feels *more* fun because a lot of the fill in the blanks drudgery is removed and what remains is the creative part. I also feel less blocked/stuck (which is not fun) and I experience a lot more courage because there's almost always a way to work hand in hand with it to make some positive progress. I have seen the opposite sentiment from other people too; LLM coding will split up engineers based on those who primarily liked coding and those who primarily liked building. Atrophy. I've already noticed that I am slowly starting to atrophy my ability to write code manually. Generation (writing code) and discrimination (reading code) are different capabilities in the brain. Largely due to all the little mostly syntactic details involved in programming, you can review code just fine even if you struggle to write it. Slopacolypse. I am bracing for 2026 as the year of the slopacolypse across all of github, substack, arxiv, X/instagram, and generally all digital media. We're also going to see a lot more AI hype productivity theater (is that even possible?), on the side of actual, real improvements. Questions. A few of the questions on my mind: - What happens to the "10X engineer" - the ratio of productivity between the mean and the max engineer? It's quite possible that this grows *a lot*. - Armed with LLMs, do generalists increasingly outperform specialists? LLMs are a lot better at fill in the blanks (the micro) than grand strategy (the macro). - What does LLM coding feel like in the future? Is it like playing StarCraft? Playing Factorio? Playing music? - How much of society is bottlenecked by digital knowledge work? TLDR Where does this leave us? LLM agent capabilities (Claude & Codex especially) have crossed some kind of threshold of coherence around December 2025 and caused a phase shift in software engineering and closely related. The intelligence part suddenly feels quite a bit ahead of all the rest of it - integrations (tools, knowledge), the necessity for new organizational workflows, processes, diffusion more generally. 2026 is going to be a high energy year as the industry metabolizes the new capability.

D
Derya Unutmaz, MD @DeryaTR_ ·
I just started testing Kimi K2.5, and wow, these guys cooked it big time!
K Kimi.ai @Kimi_Moonshot

Here's a short video from our founder, Zhilin Yang. (It's his first time speaking on camera like this, and he really wanted to share Kimi K2.5 with you!) https://t.co/2uDSOjCjly

K
Kimi.ai @Kimi_Moonshot ·
Kimi K2.5 has arrived! 🥝 Here are 2 things to know: Aesthetic Coding x Agent Swarm.
Z
ZenomTrader @ZenomTrader ·
AGI has been reached. Humanity, i believe, is simply not prepared for this. In the last 4 days with Claude Code, I managed to create things that would have taken me over a year without using agents. Every human using AI agents is effectively 10× more productive than one who isn’t. Here are the crazy use cases i’ve been using it for: 1) The number one trading journal + prop firm simulator in the entire financial industry, number two doesn’t even come close. 2) A fully automated Discord server, from channel creation to design to everything else. 3) Fully automated tweets that scrape Discord servers to 100% match my personality, without changing my words at all, using a repository of screenshots matched to the post logic. 4) Fully autonomous backtesting agents and a backtest validator that can access the trading platform i’m using to autonomously code and debug code inside it. 5) Fully created strategies from scratch that look to outperform every hedge fund in the world. This is what a 10× gap looks like.
e
energy @0xEn3rgy ·
@spacepixel humanizer skill will help u