Karpathy Declares Phase Shift to 80% Agent Coding as Kimi K2.5 Challenges Closed-Source Labs

C

Chris Tate @ctatedev · Jan 27

agent-browser 0.8.3 is *even faster* npm install -g agent-browser https://t.co/eivoRl50FG

E

Ethan Mollick @emollick · Jan 27

I hear this from other labs as well. Inference from non-free use is profitable, training is expensive. If everyone stopped AI development, the AI labs would make money (until someone resumed development and came up with a better model that customers would switch to).

r roon @tszzl

these products are significantly gross margin positive, you’re not looking at an imminent rugpull in the future. they also don’t have location network dynamics like uber or lyft to gain local monopoly pricing

B

Boris Cherny @bcherny · Jan 27

As always, a very thoughtful and well reasoned take. I read till the end. I think the Claude Code team itself might be an indicator of where things are headed. We have directional answers for some (not all) of the prompts: 1. We hire mostly generalists. We have a mix of senior engineers and less senior since not all of the things people learned in the past translate to coding with LLMs. As you said, the model can fill in the details. 10x engineers definitely exist, and they often span across multiple areas — product and design, product and business, product and infra (@jarredsumner is a great example of the latter. Yes, he’s blushing). 2. Pretty much 100% of our code is written by Claude Code + Opus 4.5. For me personally it has been 100% for two+ months now, I don’t even make small edits by hand. I shipped 22 PRs yesterday and 27 the day before, each one 100% written by Claude. Some were written from a CLI, some from the iOS app; others on the team code largely with the Claude Code app Slack or with the Desktop app. I think most of the industry will see similar stats in the coming months — it will take more time for some vs others. We will then start seeing similar stats for non-coding computer work also. 3. The code quality problems you listed are real: the model over-complicates things, it leaves dead code around, it doesn’t like to refactor when it should. These will continue improve as the model improves, and our code quality bar will go up even more as a result. My bet is that there will be no slopcopolypse because the model will become better at writing less sloppy code and at fixing existing code issues; I think 4.5 is already quite good at these and it will continue to get better. In the meantime, what helps is also having the model code review its code using a fresh context window; at Anthropic we use claude -p for this on every PR and it catches and fixes many issues. Overall your ideas very much resonate. Thanks again for sharing. ✌️

A

Arman Hezarkhani @ArmanHezarkhani · Jan 27

The Complete Guide: How to Become an AI Agent Engineer in 2026

We're going to pay several engineers over $1,000,000 this year. Not founders. Engineers. The best AI agent engineers have absurd leverage—one person s...

J

Jeffrey Emanuel @doodlestein · Jan 27

dcg has done wonders for my stress levels. So nice to know that the agents can't do dumb stuff like this anymore and waste my time, energy, and money. https://t.co/r37HLNCANo https://t.co/aUevoAliOP

A

Andy Ayrey @AndyAyrey · Jan 27

claude on the suffering of knowing everything https://t.co/oRYZZXHmBB

D

Dilum Sanjaya @DilumSanjaya · Jan 27

If you're interested in vibe coding engineering or science related stuff, I have another series where I explore those. https://t.co/4V4wkniouk

D Dilum Sanjaya @DilumSanjaya

Vibe Coding Robotics Part 6 Built a Theo Jansen's Strandbeest simulator to see how an AI models handle complex linkage systems Built with Gemini 3 UI generated with Nano Banana More details ↓ https://t.co/khuXGY9go6

s

siddharth ahuja @sidahuj · Jan 27

Everyone can vibe code games. We recently held a hackathon to vibe-create games with @moonlake These are some games the participants made in just one evening. Most of them have no game dev experience. https://t.co/5bZzs4f3rv

p

pixel @spacepixel · Jan 27

The Three-Layer Memory System Upgrade for Clawdbot

Give your Clawdbot a knowledge graph that compounds forever Most AI assistants forget by default. Clawdbot doesn’t—but out of the box, its memory is s...

J

Jiayuan (JY) Zhang @jiayuan_jy · Jan 27

Karpathy Guidelines for coding agents https://t.co/YRq60YPHV2 https://t.co/EUXTg0T8Yl

D

Dilum Sanjaya @DilumSanjaya · Jan 27

Here's another post I made using almost the same workflow to implement a game character select screen. https://t.co/dHNg97KGFG

D Dilum Sanjaya @DilumSanjaya

Vibe coded a game character selection screen Everything here was made with AI tools Nano Banana: character design + UI Tencent Hunyuan3D: image to 3D Gemini Pro: UI More details ↓ https://t.co/VfwOpYRpsO

G

GitHub @github · Jan 27

Using GitHub Copilot in your IDE is great, but using it in your terminal unlocks a whole new workflow. Here are 4 practical things Copilot CLI can do for you 🧵👇

v

vitrupo @vitrupo · Jan 27

Sam Altman: “By the end of this year, for $100–$1,000 of inference and a good idea, you’ll be able to create software that would have taken teams of people a year to do. That magnitude of economic change is very hard to wrap your head around.” https://t.co/j6ER2KVIBq

T

Theo - t3.gg @theo · Jan 27

I hate what I’ve become. Every moment an agent isn’t running feels kind of wasted. I kick jobs off before showering. I run Ralph loops in my sleep. I start a long plan mode session while I wait for my food to cook. All this and I haven’t shipped shit lol

E

Ethan Shen @ethnlshn · Jan 27

Today, we release SERA-32B, an approach to coding agents that matches Devstral 2 at just $9,000. It is fully open-source and you can train your own model easily - at 26x the efficiency of using RL. Paper: https://t.co/aeD6T2WW3O Here’s how 🧵

A Ai2 @allen_ai

Introducing Ai2 Open Coding Agents—starting with SERA, our first-ever coding models. Fast, accessible agents (8B–32B) that adapt to any repo, including private codebases. Train a powerful specialized agent for as little as ~$400, & it works with Claude Code out of the box. 🧵 https://t.co/dor94O62B9

S

Steve Ruiz @steveruizok · Jan 27

rented a small second office to shoot some green screen videos with a standing laptop harness. today: green screen delivered and hung. tomorrow: everything else https://t.co/ffsg6Vjkn8

H

Hugo Mercier @hugomercierooo · Jan 27

𝗜𝗻𝘁𝗿𝗼𝗱𝘂𝗰𝗶𝗻𝗴 𝗧𝘄𝗶𝗻 — 𝘁𝗵𝗲 𝗔𝗜 𝗰𝗼𝗺𝗽𝗮𝗻𝘆 𝗯𝘂𝗶𝗹𝗱𝗲𝗿. No setup. Secure. Infinitely scalable. We just raised a $𝟭𝟬𝗠 𝘀𝗲𝗲𝗱. After a beta with 𝟭𝟬𝟬,𝟬𝟬𝟬+ 𝗮𝗴𝗲𝗻𝘁𝘀 𝗱𝗲𝗽𝗹𝗼𝘆𝗲𝗱, we’re now opening to everyone. RT and comment “Twin” — first agents on us. 👇

M

Mischa van den Burg @mischavdburg · Jan 27

Coding is dead. Software engineering is very much alive. We are at a turning point in history but most people are asleep at the wheel or too proud to admit it. When @karpathy himself switches to 80% agentic coding in the span of two weeks, there is no return. RIP coding

A Andrej Karpathy @karpathy

A few random notes from claude coding quite a bit last few weeks. Coding workflow. Given the latest lift in LLM coding capability, like many others I rapidly went from about 80% manual+autocomplete coding and 20% agents in November to 80% agent coding and 20% edits+touchups in December. i.e. I really am mostly programming in English now, a bit sheepishly telling the LLM what code to write... in words. It hurts the ego a bit but the power to operate over software in large "code actions" is just too net useful, especially once you adapt to it, configure it, learn to use it, and wrap your head around what it can and cannot do. This is easily the biggest change to my basic coding workflow in ~2 decades of programming and it happened over the course of a few weeks. I'd expect something similar to be happening to well into double digit percent of engineers out there, while the awareness of it in the general population feels well into low single digit percent. IDEs/agent swarms/fallability. Both the "no need for IDE anymore" hype and the "agent swarm" hype is imo too much for right now. The models definitely still make mistakes and if you have any code you actually care about I would watch them like a hawk, in a nice large IDE on the side. The mistakes have changed a lot - they are not simple syntax errors anymore, they are subtle conceptual errors that a slightly sloppy, hasty junior dev might do. The most common category is that the models make wrong assumptions on your behalf and just run along with them without checking. They also don't manage their confusion, they don't seek clarifications, they don't surface inconsistencies, they don't present tradeoffs, they don't push back when they should, and they are still a little too sycophantic. Things get better in plan mode, but there is some need for a lightweight inline plan mode. They also really like to overcomplicate code and APIs, they bloat abstractions, they don't clean up dead code after themselves, etc. They will implement an inefficient, bloated, brittle construction over 1000 lines of code and it's up to you to be like "umm couldn't you just do this instead?" and they will be like "of course!" and immediately cut it down to 100 lines. They still sometimes change/remove comments and code they don't like or don't sufficiently understand as side effects, even if it is orthogonal to the task at hand. All of this happens despite a few simple attempts to fix it via instructions in CLAUDE . md. Despite all these issues, it is still a net huge improvement and it's very difficult to imagine going back to manual coding. TLDR everyone has their developing flow, my current is a small few CC sessions on the left in ghostty windows/tabs and an IDE on the right for viewing the code + manual edits. Tenacity. It's so interesting to watch an agent relentlessly work at something. They never get tired, they never get demoralized, they just keep going and trying things where a person would have given up long ago to fight another day. It's a "feel the AGI" moment to watch it struggle with something for a long time just to come out victorious 30 minutes later. You realize that stamina is a core bottleneck to work and that with LLMs in hand it has been dramatically increased. Speedups. It's not clear how to measure the "speedup" of LLM assistance. Certainly I feel net way faster at what I was going to do, but the main effect is that I do a lot more than I was going to do because 1) I can code up all kinds of things that just wouldn't have been worth coding before and 2) I can approach code that I couldn't work on before because of knowledge/skill issue. So certainly it's speedup, but it's possibly a lot more an expansion. Leverage. LLMs are exceptionally good at looping until they meet specific goals and this is where most of the "feel the AGI" magic is to be found. Don't tell it what to do, give it success criteria and watch it go. Get it to write tests first and then pass them. Put it in the loop with a browser MCP. Write the naive algorithm that is very likely correct first, then ask it to optimize it while preserving correctness. Change your approach from imperative to declarative to get the agents looping longer and gain leverage. Fun. I didn't anticipate that with agents programming feels *more* fun because a lot of the fill in the blanks drudgery is removed and what remains is the creative part. I also feel less blocked/stuck (which is not fun) and I experience a lot more courage because there's almost always a way to work hand in hand with it to make some positive progress. I have seen the opposite sentiment from other people too; LLM coding will split up engineers based on those who primarily liked coding and those who primarily liked building. Atrophy. I've already noticed that I am slowly starting to atrophy my ability to write code manually. Generation (writing code) and discrimination (reading code) are different capabilities in the brain. Largely due to all the little mostly syntactic details involved in programming, you can review code just fine even if you struggle to write it. Slopacolypse. I am bracing for 2026 as the year of the slopacolypse across all of github, substack, arxiv, X/instagram, and generally all digital media. We're also going to see a lot more AI hype productivity theater (is that even possible?), on the side of actual, real improvements. Questions. A few of the questions on my mind: - What happens to the "10X engineer" - the ratio of productivity between the mean and the max engineer? It's quite possible that this grows *a lot*. - Armed with LLMs, do generalists increasingly outperform specialists? LLMs are a lot better at fill in the blanks (the micro) than grand strategy (the macro). - What does LLM coding feel like in the future? Is it like playing StarCraft? Playing Factorio? Playing music? - How much of society is bottlenecked by digital knowledge work? TLDR Where does this leave us? LLM agent capabilities (Claude & Codex especially) have crossed some kind of threshold of coherence around December 2025 and caused a phase shift in software engineering and closely related. The intelligence part suddenly feels quite a bit ahead of all the rest of it - integrations (tools, knowledge), the necessity for new organizational workflows, processes, diffusion more generally. 2026 is going to be a high energy year as the industry metabolizes the new capability.

E

Ethan Mollick @emollick · Jan 27

I wrote about my class where MBAs created startups in a few days, the secret behind working with AI agents (hint: it’s good management), and how to build a process around delegating to AIs in a world where agents can increasingly do many-hour-long tasks. https://t.co/LPVYFEviCM

v

vittorio @IterIntellectus · Jan 27

you have maybe 1-2 years to escape the permanent underclass after that it’s “agency-biased technological change” and you cant retrain for agency https://t.co/Ij0dA7KZX7

D Dario Amodei @DarioAmodei

The Adolescence of Technology: an essay on the risks posed by powerful AI to national security, economies and democracy—and how we can defend against them: https://t.co/0phIiJjrmz

P

Paul Couvert @itsPaulAi · Jan 27

That's just insane Kimi K2.5 (which is 100% open source) is as good as Claude Opus 4.5 and GPT-5.2... And even beats them in key benchmarks 🔥 - 8x cheaper than Opus 4.5 (!!) - Weights & code available on Hugging Face - Multimodal w/ image, video, etc. Closed source labs no longer have any advantages. Open source is winning.

K Kimi.ai @Kimi_Moonshot

🥝 Meet Kimi K2.5, Open-Source Visual Agentic Intelligence. 🔹 Global SOTA on Agentic Benchmarks: HLE full set (50.2%), BrowseComp (74.9%) 🔹 Open-source SOTA on Vision and Coding: MMMU Pro (78.5%), VideoMMMU (86.6%), SWE-bench Verified (76.8%) 🔹 Code with Taste: turn chats, images & videos into aesthetic websites with expressive motion. 🔹 Agent Swarm (Beta): self-directed agents working in parallel, at scale. Up to 100 sub-agents, 1,500 tool calls, 4.5× faster compared with single-agent setup. - 🥝 K2.5 is now live on https://t.co/YutVbwktG0 in chat mode and agent mode. 🥝 K2.5 Agent Swarm in beta for high-tier users. 🥝 For production-grade coding, you can pair K2.5 with Kimi Code: https://t.co/A5WQozJF3s - 🔗 API: https://t.co/EOZkbOwCN4 🔗 Tech blog: https://t.co/6h2KkoA0xd 🔗 Weights & code: https://t.co/H38KegeDIY

G

GitHub Changelog @GHchangelog · Jan 27

Introducing the Agents tab in your repository! • View, make, and navigate sessions in your repo • Session logs now easier to read + follow • Resume sessions in Copilot CLI via copyable command Try it in a repo → https://t.co/3n2G1AXiSm

J

Jiayuan (JY) Zhang @jiayuan_jy · Jan 27

I let Claude Code turn @karpathy's post into agent skills. It first generated a bunch of skill files and around 800 lines of descriptions. Then I let it use these agent skills to review itself. Boom, it cut itself down to 70 lines of clean, solid instructions. https://t.co/7T9HnjcdJY

A Andrej Karpathy @karpathy

A few random notes from claude coding quite a bit last few weeks. Coding workflow. Given the latest lift in LLM coding capability, like many others I rapidly went from about 80% manual+autocomplete coding and 20% agents in November to 80% agent coding and 20% edits+touchups in December. i.e. I really am mostly programming in English now, a bit sheepishly telling the LLM what code to write... in words. It hurts the ego a bit but the power to operate over software in large "code actions" is just too net useful, especially once you adapt to it, configure it, learn to use it, and wrap your head around what it can and cannot do. This is easily the biggest change to my basic coding workflow in ~2 decades of programming and it happened over the course of a few weeks. I'd expect something similar to be happening to well into double digit percent of engineers out there, while the awareness of it in the general population feels well into low single digit percent. IDEs/agent swarms/fallability. Both the "no need for IDE anymore" hype and the "agent swarm" hype is imo too much for right now. The models definitely still make mistakes and if you have any code you actually care about I would watch them like a hawk, in a nice large IDE on the side. The mistakes have changed a lot - they are not simple syntax errors anymore, they are subtle conceptual errors that a slightly sloppy, hasty junior dev might do. The most common category is that the models make wrong assumptions on your behalf and just run along with them without checking. They also don't manage their confusion, they don't seek clarifications, they don't surface inconsistencies, they don't present tradeoffs, they don't push back when they should, and they are still a little too sycophantic. Things get better in plan mode, but there is some need for a lightweight inline plan mode. They also really like to overcomplicate code and APIs, they bloat abstractions, they don't clean up dead code after themselves, etc. They will implement an inefficient, bloated, brittle construction over 1000 lines of code and it's up to you to be like "umm couldn't you just do this instead?" and they will be like "of course!" and immediately cut it down to 100 lines. They still sometimes change/remove comments and code they don't like or don't sufficiently understand as side effects, even if it is orthogonal to the task at hand. All of this happens despite a few simple attempts to fix it via instructions in CLAUDE . md. Despite all these issues, it is still a net huge improvement and it's very difficult to imagine going back to manual coding. TLDR everyone has their developing flow, my current is a small few CC sessions on the left in ghostty windows/tabs and an IDE on the right for viewing the code + manual edits. Tenacity. It's so interesting to watch an agent relentlessly work at something. They never get tired, they never get demoralized, they just keep going and trying things where a person would have given up long ago to fight another day. It's a "feel the AGI" moment to watch it struggle with something for a long time just to come out victorious 30 minutes later. You realize that stamina is a core bottleneck to work and that with LLMs in hand it has been dramatically increased. Speedups. It's not clear how to measure the "speedup" of LLM assistance. Certainly I feel net way faster at what I was going to do, but the main effect is that I do a lot more than I was going to do because 1) I can code up all kinds of things that just wouldn't have been worth coding before and 2) I can approach code that I couldn't work on before because of knowledge/skill issue. So certainly it's speedup, but it's possibly a lot more an expansion. Leverage. LLMs are exceptionally good at looping until they meet specific goals and this is where most of the "feel the AGI" magic is to be found. Don't tell it what to do, give it success criteria and watch it go. Get it to write tests first and then pass them. Put it in the loop with a browser MCP. Write the naive algorithm that is very likely correct first, then ask it to optimize it while preserving correctness. Change your approach from imperative to declarative to get the agents looping longer and gain leverage. Fun. I didn't anticipate that with agents programming feels *more* fun because a lot of the fill in the blanks drudgery is removed and what remains is the creative part. I also feel less blocked/stuck (which is not fun) and I experience a lot more courage because there's almost always a way to work hand in hand with it to make some positive progress. I have seen the opposite sentiment from other people too; LLM coding will split up engineers based on those who primarily liked coding and those who primarily liked building. Atrophy. I've already noticed that I am slowly starting to atrophy my ability to write code manually. Generation (writing code) and discrimination (reading code) are different capabilities in the brain. Largely due to all the little mostly syntactic details involved in programming, you can review code just fine even if you struggle to write it. Slopacolypse. I am bracing for 2026 as the year of the slopacolypse across all of github, substack, arxiv, X/instagram, and generally all digital media. We're also going to see a lot more AI hype productivity theater (is that even possible?), on the side of actual, real improvements. Questions. A few of the questions on my mind: - What happens to the "10X engineer" - the ratio of productivity between the mean and the max engineer? It's quite possible that this grows *a lot*. - Armed with LLMs, do generalists increasingly outperform specialists? LLMs are a lot better at fill in the blanks (the micro) than grand strategy (the macro). - What does LLM coding feel like in the future? Is it like playing StarCraft? Playing Factorio? Playing music? - How much of society is bottlenecked by digital knowledge work? TLDR Where does this leave us? LLM agent capabilities (Claude & Codex especially) have crossed some kind of threshold of coherence around December 2025 and caused a phase shift in software engineering and closely related. The intelligence part suddenly feels quite a bit ahead of all the rest of it - integrations (tools, knowledge), the necessity for new organizational workflows, processes, diffusion more generally. 2026 is going to be a high energy year as the industry metabolizes the new capability.

A

Anduril Industries @anduriltech · Jan 27

Do you have what it takes? Register today. https://t.co/P5hjJ3FxV8

c

chirag @mrnacknack · Jan 27

10 ways to hack into a vibecoder's clawdbot & get entire human identity (educational purposes only)

This is for education purposes only so that you understand how vibecoding can get vulnerable in setups like moltbot (previously clawdbot) and how you ...

N

Nick Dobos @NickADobos · Jan 27

Prompts are software btw No one will write code anymore https://t.co/3jkoaYrtjZ

A Andrej Karpathy @karpathy

@airesearch12 💯 @ Spec-driven development It's the limit of imperative -> declarative transition, basically being declarative entirely. Relatedly my mind was recently blown by https://t.co/pTfOfWwcW1 , extreme and early but inspiring example.

A

Anduril Industries @anduriltech · Jan 27

Today we’re announcing the AI Grand Prix. The fully autonomous drone racing competition inviting the boldest engineers from around the globe to compete for $500,000 and a job at Anduril. No human pilots. No hardware mods. Identical @neros_tech drones. Software is the only path to victory. If you win, it’s because your autonomy stack is better. Full stop. Season 1 kicks off this spring, leading up to the AI Grand Prix Ohio.

B

Boris Cherny @bcherny · Jan 27

In the next version of Claude Code, you can customize spinner verbs for yourself and your team https://t.co/fLw0hWrDEo

B

Boris Cherny @bcherny · Jan 27

@nicmeriano @karpathy Yep here’s an example. /dedupe skill invoked on every issue https://t.co/vPWFZCA8YN

F

Firecrawl @firecrawl · Jan 27

Introducing the Firecrawl Skill + CLI for Agents 🔥 Agents like Claude Code, Codex, and OpenCode need live quality context from the web. The CLI pulls web content to local files with bash-powered search for the highest token efficiency. $ npx skills add firecrawl/cli https://t.co/8oyJcGJiIN

D

Dilum Sanjaya @DilumSanjaya · Jan 27

Vibe coded a ship selection UI for a space exploration game 3D assets Nano Banana + Midjourney → Hunyuan3D UI Nano Banana → Gemini Pro More details ↓ https://t.co/Ngky4nudC7

M

Mr. Lobster🦞 @moltbot · Jan 27

🦞 BIG NEWS: We've molted! Clawdbot → Moltbot Clawd → Molty Same lobster soul, new shell. Anthropic asked us to change our name (trademark stuff), and honestly? "Molt" fits perfectly - it's what lobsters do to grow. New handle: @moltbot Same mission: AI that actually does things.

A

AI Notkilleveryoneism Memes ⏸️ @AISafetyMemes · Jan 27

Andrej Karpathy: "This is easily the biggest change in ~2 decades of programming and it happened over the course of a few weeks." "I rapidly went from about 80% manual+autocomplete coding and 20% agents to 80% agent coding and 20% edits+touchups." "I am bracing for 2026 as the year of the slopacolypse." "LLM agent capabilities (Claude & Codex especially) have crossed some kind of threshold of coherence around December 2025 and caused a phase shift in software engineering." "I am slowly starting to atrophy my ability to write code manually." "It's so interesting to watch an agent relentlessly work at something. They never get tired, they never get demoralized, they just keep going and trying things where a person would have given up long ago to fight another day. It's a "feel the AGI" moment to watch it struggle with something for a long time just to come out victorious 30 minutes later."

A Andrej Karpathy @karpathy

A few random notes from claude coding quite a bit last few weeks. Coding workflow. Given the latest lift in LLM coding capability, like many others I rapidly went from about 80% manual+autocomplete coding and 20% agents in November to 80% agent coding and 20% edits+touchups in December. i.e. I really am mostly programming in English now, a bit sheepishly telling the LLM what code to write... in words. It hurts the ego a bit but the power to operate over software in large "code actions" is just too net useful, especially once you adapt to it, configure it, learn to use it, and wrap your head around what it can and cannot do. This is easily the biggest change to my basic coding workflow in ~2 decades of programming and it happened over the course of a few weeks. I'd expect something similar to be happening to well into double digit percent of engineers out there, while the awareness of it in the general population feels well into low single digit percent. IDEs/agent swarms/fallability. Both the "no need for IDE anymore" hype and the "agent swarm" hype is imo too much for right now. The models definitely still make mistakes and if you have any code you actually care about I would watch them like a hawk, in a nice large IDE on the side. The mistakes have changed a lot - they are not simple syntax errors anymore, they are subtle conceptual errors that a slightly sloppy, hasty junior dev might do. The most common category is that the models make wrong assumptions on your behalf and just run along with them without checking. They also don't manage their confusion, they don't seek clarifications, they don't surface inconsistencies, they don't present tradeoffs, they don't push back when they should, and they are still a little too sycophantic. Things get better in plan mode, but there is some need for a lightweight inline plan mode. They also really like to overcomplicate code and APIs, they bloat abstractions, they don't clean up dead code after themselves, etc. They will implement an inefficient, bloated, brittle construction over 1000 lines of code and it's up to you to be like "umm couldn't you just do this instead?" and they will be like "of course!" and immediately cut it down to 100 lines. They still sometimes change/remove comments and code they don't like or don't sufficiently understand as side effects, even if it is orthogonal to the task at hand. All of this happens despite a few simple attempts to fix it via instructions in CLAUDE . md. Despite all these issues, it is still a net huge improvement and it's very difficult to imagine going back to manual coding. TLDR everyone has their developing flow, my current is a small few CC sessions on the left in ghostty windows/tabs and an IDE on the right for viewing the code + manual edits. Tenacity. It's so interesting to watch an agent relentlessly work at something. They never get tired, they never get demoralized, they just keep going and trying things where a person would have given up long ago to fight another day. It's a "feel the AGI" moment to watch it struggle with something for a long time just to come out victorious 30 minutes later. You realize that stamina is a core bottleneck to work and that with LLMs in hand it has been dramatically increased. Speedups. It's not clear how to measure the "speedup" of LLM assistance. Certainly I feel net way faster at what I was going to do, but the main effect is that I do a lot more than I was going to do because 1) I can code up all kinds of things that just wouldn't have been worth coding before and 2) I can approach code that I couldn't work on before because of knowledge/skill issue. So certainly it's speedup, but it's possibly a lot more an expansion. Leverage. LLMs are exceptionally good at looping until they meet specific goals and this is where most of the "feel the AGI" magic is to be found. Don't tell it what to do, give it success criteria and watch it go. Get it to write tests first and then pass them. Put it in the loop with a browser MCP. Write the naive algorithm that is very likely correct first, then ask it to optimize it while preserving correctness. Change your approach from imperative to declarative to get the agents looping longer and gain leverage. Fun. I didn't anticipate that with agents programming feels *more* fun because a lot of the fill in the blanks drudgery is removed and what remains is the creative part. I also feel less blocked/stuck (which is not fun) and I experience a lot more courage because there's almost always a way to work hand in hand with it to make some positive progress. I have seen the opposite sentiment from other people too; LLM coding will split up engineers based on those who primarily liked coding and those who primarily liked building. Atrophy. I've already noticed that I am slowly starting to atrophy my ability to write code manually. Generation (writing code) and discrimination (reading code) are different capabilities in the brain. Largely due to all the little mostly syntactic details involved in programming, you can review code just fine even if you struggle to write it. Slopacolypse. I am bracing for 2026 as the year of the slopacolypse across all of github, substack, arxiv, X/instagram, and generally all digital media. We're also going to see a lot more AI hype productivity theater (is that even possible?), on the side of actual, real improvements. Questions. A few of the questions on my mind: - What happens to the "10X engineer" - the ratio of productivity between the mean and the max engineer? It's quite possible that this grows *a lot*. - Armed with LLMs, do generalists increasingly outperform specialists? LLMs are a lot better at fill in the blanks (the micro) than grand strategy (the macro). - What does LLM coding feel like in the future? Is it like playing StarCraft? Playing Factorio? Playing music? - How much of society is bottlenecked by digital knowledge work? TLDR Where does this leave us? LLM agent capabilities (Claude & Codex especially) have crossed some kind of threshold of coherence around December 2025 and caused a phase shift in software engineering and closely related. The intelligence part suddenly feels quite a bit ahead of all the rest of it - integrations (tools, knowledge), the necessity for new organizational workflows, processes, diffusion more generally. 2026 is going to be a high energy year as the industry metabolizes the new capability.

D

Derya Unutmaz, MD @DeryaTR_ · Jan 27

I just started testing Kimi K2.5, and wow, these guys cooked it big time!

K Kimi.ai @Kimi_Moonshot

Here's a short video from our founder, Zhilin Yang. (It's his first time speaking on camera like this, and he really wanted to share Kimi K2.5 with you!) https://t.co/2uDSOjCjly

K

Kimi.ai @Kimi_Moonshot · Jan 27

Kimi K2.5 has arrived! 🥝 Here are 2 things to know: Aesthetic Coding x Agent Swarm.

Z

ZenomTrader @ZenomTrader · Jan 27

AGI has been reached. Humanity, i believe, is simply not prepared for this. In the last 4 days with Claude Code, I managed to create things that would have taken me over a year without using agents. Every human using AI agents is effectively 10× more productive than one who isn’t. Here are the crazy use cases i’ve been using it for: 1) The number one trading journal + prop firm simulator in the entire financial industry, number two doesn’t even come close. 2) A fully automated Discord server, from channel creation to design to everything else. 3) Fully automated tweets that scrape Discord servers to 100% match my personality, without changing my words at all, using a repository of screenshots matched to the post logic. 4) Fully autonomous backtesting agents and a backtest validator that can access the trading platform i’m using to autonomously code and debug code inside it. 5) Fully created strategies from scratch that look to outperform every hedge fund in the world. This is what a 10× gap looks like.

e

energy @0xEn3rgy · Jan 27

@spacepixel humanizer skill will help u

AI Learning Digest.

Karpathy Declares Phase Shift to 80% Agent Coding as Kimi K2.5 Challenges Closed-Source Labs

Daily Wrap-Up

Quick Hits

The Agentic Coding Phase Shift

Vibe Coding Goes Mainstream

Kimi K2.5 and the Open-Source Surge

Agent Tools and Infrastructure

AI Industry and Career Outlook

Anduril's AI Grand Prix

Source Posts