Karpathy Declares Phase Shift to 80% Agent Coding as Kimi K2.5 Challenges Closed-Source Labs
Daily Wrap-Up
The discourse today felt like a collective reckoning. When Andrej Karpathy says he went from 80% manual coding to 80% agent coding in the span of two weeks, that's not a prediction or a hot take. It's a field report from one of the most respected practitioners in the industry. And the Claude Code team's own numbers back it up: @bcherny casually mentioned shipping 22 PRs in a single day, all 100% written by Claude. The gap between "AI-assisted coding" and "AI-does-the-coding" closed faster than anyone expected, and today's posts suggest most people are still processing what that means.
On the model side, Kimi K2.5 dropped as a fully open-source model that benchmarks competitively with Claude Opus 4.5 and GPT-5.2 at a fraction of the cost. The closed-source moat is eroding in real time, and @emollick's observation that inference is already profitable for AI labs while training remains expensive adds an interesting wrinkle. If open-source models keep closing the gap, the business model pressure on frontier labs intensifies. Meanwhile, SERA-32B showed you can train a competitive coding agent for just $9,000, further democratizing what was once the exclusive domain of well-funded labs.
The most entertaining moment was @theo's brutally honest confession: "I kick jobs off before showering. I run Ralph loops in my sleep... All this and I haven't shipped shit lol." It's the perfect distillation of where a lot of developers are right now: addicted to the dopamine of orchestrating agents but not yet translating that into shipped product. The most practical takeaway for developers: follow @bcherny's lead and have the model code-review its own output in a fresh context window. At Anthropic they run claude -p on every PR to catch issues, and this simple practice is the difference between shipping 27 PRs a day and shipping 27 problems a day.
Quick Hits
- @steveruizok rented a second office for green screen video production with a standing laptop harness. Content creation arc continues.
- @mrnacknack shared "10 ways to hack into a vibecoder's clawdbot & get entire human identity" as a security awareness piece. A good reminder that agent security is still an afterthought for most builders.
- @AndyAyrey posted "claude on the suffering of knowing everything," continuing his philosophical exploration of AI consciousness themes.
- @adriankuleszo shared new platform design work for Domo, a home automation interface.
- @0xEn3rgy recommended a "humanizer skill" for making AI-generated content sound more natural.
The Agentic Coding Phase Shift
Today's posts crystallized something that's been building for weeks: agentic coding isn't an experiment anymore, it's the default workflow for a growing number of serious engineers. The catalyst was Karpathy's detailed reflection, captured by @AISafetyMemes:
"I rapidly went from about 80% manual+autocomplete coding and 20% agents to 80% agent coding and 20% edits+touchups... LLM agent capabilities (Claude & Codex especially) have crossed some kind of threshold of coherence around December 2025 and caused a phase shift in software engineering."
What makes this more than just another AI hype cycle is the corroboration from people actually shipping production software. @bcherny from the Claude Code team provided the most concrete data point of the day, describing his own workflow and the team's approach to quality:
"Pretty much 100% of our code is written by Claude Code + Opus 4.5. For me personally it has been 100% for two+ months now, I don't even make small edits by hand. I shipped 22 PRs yesterday and 27 the day before, each one 100% written by Claude."
He went on to address the "slopacolypse" concern directly, arguing that having models review their own code in fresh context windows catches most issues, and that model improvements will outpace the quality concerns. @bcherny also previewed customizable spinner verbs coming to Claude Code, a small but telling detail about how mature the tooling is getting. @jiayuan_jy demonstrated the meta-recursive potential by having Claude Code turn Karpathy's guidelines into agent skills, which then self-reviewed down from 800 lines to 70 lines of clean instructions.
The human side of this shift showed up in @theo's confession about compulsively running agents during every idle moment, and @mischavdburg's declaration that "coding is dead, software engineering is very much alive." @NickADobos took it further: "Prompts are software. No one will write code anymore." Whether you find that liberating or terrifying probably depends on how much of your identity is wrapped up in typing syntax. The tooling ecosystem is keeping pace: @doodlestein praised DCG (declarative code generation) for preventing agents from making destructive mistakes, while @spacepixel shared a three-layer memory system upgrade, and @moltbot announced their rebrand from Clawdbot after Anthropic's trademark request.
Vibe Coding Goes Mainstream
Vibe coding continued its march from meme to legitimate development approach, with multiple posts showcasing what non-engineers can build in a single sitting. @sidahuj hosted a hackathon where participants with no game dev experience created playable games in one evening using AI tools:
"Everyone can vibe code games. We recently held a hackathon to vibe-create games with @moonlake. These are some games the participants made in just one evening."
@DilumSanjaya was the most prolific showcase, posting three separate projects: a ship selection UI for a space exploration game built with Midjourney-to-Hunyuan3D for 3D assets and Gemini Pro for UI, plus a game character select screen and engineering-focused vibe coding experiments. The pipeline of AI image generation to 3D model conversion to UI implementation is becoming standardized enough that individuals can produce what used to require small studios.
@ZenomTrader provided the most ambitious (and admittedly breathless) account, claiming to have built a trading journal, automated Discord server, autonomous backtesting agents, and tweet automation in four days with Claude Code. The claims are bold, but the underlying pattern is real: agent-assisted development compresses timelines dramatically for people who know what they want to build and can effectively direct the AI.
Kimi K2.5 and the Open-Source Surge
The biggest model news was Kimi K2.5's launch from Moonshot AI, arriving as a multimodal, fully open-source model with weights and code on Hugging Face. @itsPaulAi captured the competitive positioning:
"Kimi K2.5 (which is 100% open source) is as good as Claude Opus 4.5 and GPT-5.2... And even beats them in key benchmarks. 8x cheaper than Opus 4.5. Closed source labs no longer have any advantages."
@DeryaTR_ confirmed the quality after hands-on testing, and @Kimi_Moonshot's own announcement emphasized "Aesthetic Coding x Agent Swarm" as the model's differentiators. Separately, @ethnlshn released SERA-32B, a coding agent approach matching Devstral 2 that was trained for just $9,000 and is fully open-source. The "26x efficiency of using RL" claim is particularly notable for teams looking to fine-tune their own coding models on a budget. These launches add mounting evidence that the frontier model gap is narrowing faster than the pricing gap, which has significant implications for how the industry evolves.
Agent Tools and Infrastructure
The tooling layer around coding agents continued to mature. @GHchangelog announced GitHub's new Agents tab for repositories, giving developers a centralized place to view, create, and navigate agent sessions with improved log readability and the ability to resume sessions in Copilot CLI. @github followed up with a thread on practical Copilot CLI workflows, pushing terminal-based AI interaction as a first-class experience.
@firecrawl launched their CLI skill for agents, designed to pull web content into local files with optimized token efficiency for tools like Claude Code and Codex. @ctatedev shipped agent-browser 0.8.3 with performance improvements, and @hugomercierooo announced Twin, an "AI company builder" that raised a $10M seed after deploying over 100,000 agents in beta. The infrastructure for agent-native development is filling in rapidly, with each tool addressing a specific gap in the workflow: web context, browser automation, session management, and orchestration.
AI Industry and Career Outlook
The economic and career implications of agentic AI generated sharp opinions. @vitrupo quoted Sam Altman's prediction that "$100-$1,000 of inference and a good idea" will replace year-long team efforts by end of year. @emollick added nuance by noting that inference is already profitable for AI labs while training remains the costly part, suggesting the current business model is sustainable as long as no competitor leapfrogs you.
@emollick also shared his MBA class experiment where students created startups in days using AI agents, noting that "the secret behind working with AI agents is good management." @ArmanHezarkhani published "The Complete Guide: How to Become an AI Agent Engineer in 2026," while @IterIntellectus offered the darkest take: "you have maybe 1-2 years to escape the permanent underclass." Whether you read these signals as opportunity or threat probably depends on which side of the agent-adoption curve you're on.
Anduril's AI Grand Prix
@anduriltech announced the AI Grand Prix, a fully autonomous drone racing competition with a $500,000 prize pool and job offers at Anduril for winners. The rules are stark: identical hardware, no human pilots, no hardware modifications. Software is the only variable. This is a significant recruiting play disguised as a competition, targeting exactly the kind of autonomy engineers that defense tech companies are desperate to hire. Season 1 begins this spring, and the framing as a spectator sport for AI capabilities is a smart way to make defense tech appealing to a broader engineering audience.
Source Posts
these products are significantly gross margin positive, you’re not looking at an imminent rugpull in the future. they also don’t have location network dynamics like uber or lyft to gain local monopoly pricing
The Complete Guide: How to Become an AI Agent Engineer in 2026
We're going to pay several engineers over $1,000,000 this year. Not founders. Engineers. The best AI agent engineers have absurd leverage—one person s...
Vibe Coding Robotics Part 6 Built a Theo Jansen's Strandbeest simulator to see how an AI models handle complex linkage systems Built with Gemini 3 UI generated with Nano Banana More details ↓ https://t.co/khuXGY9go6
The Three-Layer Memory System Upgrade for Clawdbot
Give your Clawdbot a knowledge graph that compounds forever Most AI assistants forget by default. Clawdbot doesn’t—but out of the box, its memory is s...
Vibe coded a game character selection screen Everything here was made with AI tools Nano Banana: character design + UI Tencent Hunyuan3D: image to 3D Gemini Pro: UI More details ↓ https://t.co/VfwOpYRpsO
Introducing Ai2 Open Coding Agents—starting with SERA, our first-ever coding models. Fast, accessible agents (8B–32B) that adapt to any repo, including private codebases. Train a powerful specialized agent for as little as ~$400, & it works with Claude Code out of the box. 🧵 https://t.co/dor94O62B9
A few random notes from claude coding quite a bit last few weeks. Coding workflow. Given the latest lift in LLM coding capability, like many others I rapidly went from about 80% manual+autocomplete coding and 20% agents in November to 80% agent coding and 20% edits+touchups in December. i.e. I really am mostly programming in English now, a bit sheepishly telling the LLM what code to write... in words. It hurts the ego a bit but the power to operate over software in large "code actions" is just too net useful, especially once you adapt to it, configure it, learn to use it, and wrap your head around what it can and cannot do. This is easily the biggest change to my basic coding workflow in ~2 decades of programming and it happened over the course of a few weeks. I'd expect something similar to be happening to well into double digit percent of engineers out there, while the awareness of it in the general population feels well into low single digit percent. IDEs/agent swarms/fallability. Both the "no need for IDE anymore" hype and the "agent swarm" hype is imo too much for right now. The models definitely still make mistakes and if you have any code you actually care about I would watch them like a hawk, in a nice large IDE on the side. The mistakes have changed a lot - they are not simple syntax errors anymore, they are subtle conceptual errors that a slightly sloppy, hasty junior dev might do. The most common category is that the models make wrong assumptions on your behalf and just run along with them without checking. They also don't manage their confusion, they don't seek clarifications, they don't surface inconsistencies, they don't present tradeoffs, they don't push back when they should, and they are still a little too sycophantic. Things get better in plan mode, but there is some need for a lightweight inline plan mode. They also really like to overcomplicate code and APIs, they bloat abstractions, they don't clean up dead code after themselves, etc. They will implement an inefficient, bloated, brittle construction over 1000 lines of code and it's up to you to be like "umm couldn't you just do this instead?" and they will be like "of course!" and immediately cut it down to 100 lines. They still sometimes change/remove comments and code they don't like or don't sufficiently understand as side effects, even if it is orthogonal to the task at hand. All of this happens despite a few simple attempts to fix it via instructions in CLAUDE . md. Despite all these issues, it is still a net huge improvement and it's very difficult to imagine going back to manual coding. TLDR everyone has their developing flow, my current is a small few CC sessions on the left in ghostty windows/tabs and an IDE on the right for viewing the code + manual edits. Tenacity. It's so interesting to watch an agent relentlessly work at something. They never get tired, they never get demoralized, they just keep going and trying things where a person would have given up long ago to fight another day. It's a "feel the AGI" moment to watch it struggle with something for a long time just to come out victorious 30 minutes later. You realize that stamina is a core bottleneck to work and that with LLMs in hand it has been dramatically increased. Speedups. It's not clear how to measure the "speedup" of LLM assistance. Certainly I feel net way faster at what I was going to do, but the main effect is that I do a lot more than I was going to do because 1) I can code up all kinds of things that just wouldn't have been worth coding before and 2) I can approach code that I couldn't work on before because of knowledge/skill issue. So certainly it's speedup, but it's possibly a lot more an expansion. Leverage. LLMs are exceptionally good at looping until they meet specific goals and this is where most of the "feel the AGI" magic is to be found. Don't tell it what to do, give it success criteria and watch it go. Get it to write tests first and then pass them. Put it in the loop with a browser MCP. Write the naive algorithm that is very likely correct first, then ask it to optimize it while preserving correctness. Change your approach from imperative to declarative to get the agents looping longer and gain leverage. Fun. I didn't anticipate that with agents programming feels *more* fun because a lot of the fill in the blanks drudgery is removed and what remains is the creative part. I also feel less blocked/stuck (which is not fun) and I experience a lot more courage because there's almost always a way to work hand in hand with it to make some positive progress. I have seen the opposite sentiment from other people too; LLM coding will split up engineers based on those who primarily liked coding and those who primarily liked building. Atrophy. I've already noticed that I am slowly starting to atrophy my ability to write code manually. Generation (writing code) and discrimination (reading code) are different capabilities in the brain. Largely due to all the little mostly syntactic details involved in programming, you can review code just fine even if you struggle to write it. Slopacolypse. I am bracing for 2026 as the year of the slopacolypse across all of github, substack, arxiv, X/instagram, and generally all digital media. We're also going to see a lot more AI hype productivity theater (is that even possible?), on the side of actual, real improvements. Questions. A few of the questions on my mind: - What happens to the "10X engineer" - the ratio of productivity between the mean and the max engineer? It's quite possible that this grows *a lot*. - Armed with LLMs, do generalists increasingly outperform specialists? LLMs are a lot better at fill in the blanks (the micro) than grand strategy (the macro). - What does LLM coding feel like in the future? Is it like playing StarCraft? Playing Factorio? Playing music? - How much of society is bottlenecked by digital knowledge work? TLDR Where does this leave us? LLM agent capabilities (Claude & Codex especially) have crossed some kind of threshold of coherence around December 2025 and caused a phase shift in software engineering and closely related. The intelligence part suddenly feels quite a bit ahead of all the rest of it - integrations (tools, knowledge), the necessity for new organizational workflows, processes, diffusion more generally. 2026 is going to be a high energy year as the industry metabolizes the new capability.
The Adolescence of Technology: an essay on the risks posed by powerful AI to national security, economies and democracy—and how we can defend against them: https://t.co/0phIiJjrmz
🥝 Meet Kimi K2.5, Open-Source Visual Agentic Intelligence. 🔹 Global SOTA on Agentic Benchmarks: HLE full set (50.2%), BrowseComp (74.9%) 🔹 Open-source SOTA on Vision and Coding: MMMU Pro (78.5%), VideoMMMU (86.6%), SWE-bench Verified (76.8%) 🔹 Code with Taste: turn chats, images & videos into aesthetic websites with expressive motion. 🔹 Agent Swarm (Beta): self-directed agents working in parallel, at scale. Up to 100 sub-agents, 1,500 tool calls, 4.5× faster compared with single-agent setup. - 🥝 K2.5 is now live on https://t.co/YutVbwktG0 in chat mode and agent mode. 🥝 K2.5 Agent Swarm in beta for high-tier users. 🥝 For production-grade coding, you can pair K2.5 with Kimi Code: https://t.co/A5WQozJF3s - 🔗 API: https://t.co/EOZkbOwCN4 🔗 Tech blog: https://t.co/6h2KkoA0xd 🔗 Weights & code: https://t.co/H38KegeDIY
A few random notes from claude coding quite a bit last few weeks. Coding workflow. Given the latest lift in LLM coding capability, like many others I rapidly went from about 80% manual+autocomplete coding and 20% agents in November to 80% agent coding and 20% edits+touchups in December. i.e. I really am mostly programming in English now, a bit sheepishly telling the LLM what code to write... in words. It hurts the ego a bit but the power to operate over software in large "code actions" is just too net useful, especially once you adapt to it, configure it, learn to use it, and wrap your head around what it can and cannot do. This is easily the biggest change to my basic coding workflow in ~2 decades of programming and it happened over the course of a few weeks. I'd expect something similar to be happening to well into double digit percent of engineers out there, while the awareness of it in the general population feels well into low single digit percent. IDEs/agent swarms/fallability. Both the "no need for IDE anymore" hype and the "agent swarm" hype is imo too much for right now. The models definitely still make mistakes and if you have any code you actually care about I would watch them like a hawk, in a nice large IDE on the side. The mistakes have changed a lot - they are not simple syntax errors anymore, they are subtle conceptual errors that a slightly sloppy, hasty junior dev might do. The most common category is that the models make wrong assumptions on your behalf and just run along with them without checking. They also don't manage their confusion, they don't seek clarifications, they don't surface inconsistencies, they don't present tradeoffs, they don't push back when they should, and they are still a little too sycophantic. Things get better in plan mode, but there is some need for a lightweight inline plan mode. They also really like to overcomplicate code and APIs, they bloat abstractions, they don't clean up dead code after themselves, etc. They will implement an inefficient, bloated, brittle construction over 1000 lines of code and it's up to you to be like "umm couldn't you just do this instead?" and they will be like "of course!" and immediately cut it down to 100 lines. They still sometimes change/remove comments and code they don't like or don't sufficiently understand as side effects, even if it is orthogonal to the task at hand. All of this happens despite a few simple attempts to fix it via instructions in CLAUDE . md. Despite all these issues, it is still a net huge improvement and it's very difficult to imagine going back to manual coding. TLDR everyone has their developing flow, my current is a small few CC sessions on the left in ghostty windows/tabs and an IDE on the right for viewing the code + manual edits. Tenacity. It's so interesting to watch an agent relentlessly work at something. They never get tired, they never get demoralized, they just keep going and trying things where a person would have given up long ago to fight another day. It's a "feel the AGI" moment to watch it struggle with something for a long time just to come out victorious 30 minutes later. You realize that stamina is a core bottleneck to work and that with LLMs in hand it has been dramatically increased. Speedups. It's not clear how to measure the "speedup" of LLM assistance. Certainly I feel net way faster at what I was going to do, but the main effect is that I do a lot more than I was going to do because 1) I can code up all kinds of things that just wouldn't have been worth coding before and 2) I can approach code that I couldn't work on before because of knowledge/skill issue. So certainly it's speedup, but it's possibly a lot more an expansion. Leverage. LLMs are exceptionally good at looping until they meet specific goals and this is where most of the "feel the AGI" magic is to be found. Don't tell it what to do, give it success criteria and watch it go. Get it to write tests first and then pass them. Put it in the loop with a browser MCP. Write the naive algorithm that is very likely correct first, then ask it to optimize it while preserving correctness. Change your approach from imperative to declarative to get the agents looping longer and gain leverage. Fun. I didn't anticipate that with agents programming feels *more* fun because a lot of the fill in the blanks drudgery is removed and what remains is the creative part. I also feel less blocked/stuck (which is not fun) and I experience a lot more courage because there's almost always a way to work hand in hand with it to make some positive progress. I have seen the opposite sentiment from other people too; LLM coding will split up engineers based on those who primarily liked coding and those who primarily liked building. Atrophy. I've already noticed that I am slowly starting to atrophy my ability to write code manually. Generation (writing code) and discrimination (reading code) are different capabilities in the brain. Largely due to all the little mostly syntactic details involved in programming, you can review code just fine even if you struggle to write it. Slopacolypse. I am bracing for 2026 as the year of the slopacolypse across all of github, substack, arxiv, X/instagram, and generally all digital media. We're also going to see a lot more AI hype productivity theater (is that even possible?), on the side of actual, real improvements. Questions. A few of the questions on my mind: - What happens to the "10X engineer" - the ratio of productivity between the mean and the max engineer? It's quite possible that this grows *a lot*. - Armed with LLMs, do generalists increasingly outperform specialists? LLMs are a lot better at fill in the blanks (the micro) than grand strategy (the macro). - What does LLM coding feel like in the future? Is it like playing StarCraft? Playing Factorio? Playing music? - How much of society is bottlenecked by digital knowledge work? TLDR Where does this leave us? LLM agent capabilities (Claude & Codex especially) have crossed some kind of threshold of coherence around December 2025 and caused a phase shift in software engineering and closely related. The intelligence part suddenly feels quite a bit ahead of all the rest of it - integrations (tools, knowledge), the necessity for new organizational workflows, processes, diffusion more generally. 2026 is going to be a high energy year as the industry metabolizes the new capability.
10 ways to hack into a vibecoder's clawdbot & get entire human identity (educational purposes only)
This is for education purposes only so that you understand how vibecoding can get vulnerable in setups like moltbot (previously clawdbot) and how you ...
@airesearch12 💯 @ Spec-driven development It's the limit of imperative -> declarative transition, basically being declarative entirely. Relatedly my mind was recently blown by https://t.co/pTfOfWwcW1 , extreme and early but inspiring example.
A few random notes from claude coding quite a bit last few weeks. Coding workflow. Given the latest lift in LLM coding capability, like many others I rapidly went from about 80% manual+autocomplete coding and 20% agents in November to 80% agent coding and 20% edits+touchups in December. i.e. I really am mostly programming in English now, a bit sheepishly telling the LLM what code to write... in words. It hurts the ego a bit but the power to operate over software in large "code actions" is just too net useful, especially once you adapt to it, configure it, learn to use it, and wrap your head around what it can and cannot do. This is easily the biggest change to my basic coding workflow in ~2 decades of programming and it happened over the course of a few weeks. I'd expect something similar to be happening to well into double digit percent of engineers out there, while the awareness of it in the general population feels well into low single digit percent. IDEs/agent swarms/fallability. Both the "no need for IDE anymore" hype and the "agent swarm" hype is imo too much for right now. The models definitely still make mistakes and if you have any code you actually care about I would watch them like a hawk, in a nice large IDE on the side. The mistakes have changed a lot - they are not simple syntax errors anymore, they are subtle conceptual errors that a slightly sloppy, hasty junior dev might do. The most common category is that the models make wrong assumptions on your behalf and just run along with them without checking. They also don't manage their confusion, they don't seek clarifications, they don't surface inconsistencies, they don't present tradeoffs, they don't push back when they should, and they are still a little too sycophantic. Things get better in plan mode, but there is some need for a lightweight inline plan mode. They also really like to overcomplicate code and APIs, they bloat abstractions, they don't clean up dead code after themselves, etc. They will implement an inefficient, bloated, brittle construction over 1000 lines of code and it's up to you to be like "umm couldn't you just do this instead?" and they will be like "of course!" and immediately cut it down to 100 lines. They still sometimes change/remove comments and code they don't like or don't sufficiently understand as side effects, even if it is orthogonal to the task at hand. All of this happens despite a few simple attempts to fix it via instructions in CLAUDE . md. Despite all these issues, it is still a net huge improvement and it's very difficult to imagine going back to manual coding. TLDR everyone has their developing flow, my current is a small few CC sessions on the left in ghostty windows/tabs and an IDE on the right for viewing the code + manual edits. Tenacity. It's so interesting to watch an agent relentlessly work at something. They never get tired, they never get demoralized, they just keep going and trying things where a person would have given up long ago to fight another day. It's a "feel the AGI" moment to watch it struggle with something for a long time just to come out victorious 30 minutes later. You realize that stamina is a core bottleneck to work and that with LLMs in hand it has been dramatically increased. Speedups. It's not clear how to measure the "speedup" of LLM assistance. Certainly I feel net way faster at what I was going to do, but the main effect is that I do a lot more than I was going to do because 1) I can code up all kinds of things that just wouldn't have been worth coding before and 2) I can approach code that I couldn't work on before because of knowledge/skill issue. So certainly it's speedup, but it's possibly a lot more an expansion. Leverage. LLMs are exceptionally good at looping until they meet specific goals and this is where most of the "feel the AGI" magic is to be found. Don't tell it what to do, give it success criteria and watch it go. Get it to write tests first and then pass them. Put it in the loop with a browser MCP. Write the naive algorithm that is very likely correct first, then ask it to optimize it while preserving correctness. Change your approach from imperative to declarative to get the agents looping longer and gain leverage. Fun. I didn't anticipate that with agents programming feels *more* fun because a lot of the fill in the blanks drudgery is removed and what remains is the creative part. I also feel less blocked/stuck (which is not fun) and I experience a lot more courage because there's almost always a way to work hand in hand with it to make some positive progress. I have seen the opposite sentiment from other people too; LLM coding will split up engineers based on those who primarily liked coding and those who primarily liked building. Atrophy. I've already noticed that I am slowly starting to atrophy my ability to write code manually. Generation (writing code) and discrimination (reading code) are different capabilities in the brain. Largely due to all the little mostly syntactic details involved in programming, you can review code just fine even if you struggle to write it. Slopacolypse. I am bracing for 2026 as the year of the slopacolypse across all of github, substack, arxiv, X/instagram, and generally all digital media. We're also going to see a lot more AI hype productivity theater (is that even possible?), on the side of actual, real improvements. Questions. A few of the questions on my mind: - What happens to the "10X engineer" - the ratio of productivity between the mean and the max engineer? It's quite possible that this grows *a lot*. - Armed with LLMs, do generalists increasingly outperform specialists? LLMs are a lot better at fill in the blanks (the micro) than grand strategy (the macro). - What does LLM coding feel like in the future? Is it like playing StarCraft? Playing Factorio? Playing music? - How much of society is bottlenecked by digital knowledge work? TLDR Where does this leave us? LLM agent capabilities (Claude & Codex especially) have crossed some kind of threshold of coherence around December 2025 and caused a phase shift in software engineering and closely related. The intelligence part suddenly feels quite a bit ahead of all the rest of it - integrations (tools, knowledge), the necessity for new organizational workflows, processes, diffusion more generally. 2026 is going to be a high energy year as the industry metabolizes the new capability.
Here's a short video from our founder, Zhilin Yang. (It's his first time speaking on camera like this, and he really wanted to share Kimi K2.5 with you!) https://t.co/2uDSOjCjly