AI Digest.

MiniMax M2.7 Demonstrates Self-Improving AI Training While Claude Code Reshapes Professional Software Development

The AI community is buzzing about MiniMax's M2.7 model, which autonomously improved itself through 100+ rounds of self-training, handling 30-50% of the lab's own research tasks. Meanwhile, Claude Code continues its dominance as the tool of choice for professional developers, with Intercom revealing a 13-plugin internal ecosystem and domain experts building production software in weeks. Agent reliability, local AI accessibility, and the shift toward agent-first tooling round out a packed day.

Daily Wrap-Up

The biggest story today isn't about a new model beating benchmarks. It's about a model that helped build itself. MiniMax's M2.7 ran through 100+ autonomous self-improvement iterations, achieving a 30% performance gain with minimal human involvement, and now handles somewhere between 30-50% of the lab's own AI research workflow. Whether you think the breathless "self-evolving AI" framing is warranted or overblown, the technical achievement is real and the implications for research velocity are significant. Chinese AI labs have definitively closed the gap with Western frontier models, and they've done it with a model that runs on a single A30 GPU.

The other throughline today is how thoroughly Claude Code has embedded itself into professional software development. Intercom's engineering leadership is publicly stating their workflow is "unrecognisable" compared to 12 months ago, running 13 internal plugins with 100+ skills. A mechanical engineer in Houston built a production piping analysis tool in 8 weeks with zero prior coding experience. Matt Pocock released a full walkthrough of building features in a real codebase. The pattern is clear: Claude Code isn't a toy for demos anymore, it's infrastructure. And the developers getting the most out of it are the ones investing in agent-ergonomic tooling, rigorous review loops, and end-to-end testing rather than one-shot prompting.

The most entertaining moment was @sudoingX building a full space shooter game across 13 files and 3,200+ lines of code using a $250 RTX 3060 running a 9B parameter model locally, then politely telling everyone to stop buying $4,699 AI boxes before testing what their existing hardware can do. The most practical takeaway for developers: if you're using AI coding tools, stop optimizing your prompts and start optimizing your review process. As @doodlestein demonstrated in his detailed session log, the difference between mediocre and bulletproof AI-generated code comes from multiple rounds of review, introspection, and end-to-end deployment testing, not from a single clever prompt.

Quick Hits

  • @elonmusk is promoting Grok Imagine's Chibi template and showing off minute-long AI-generated stories. Cute, but light on substance.
  • @gdb (Greg Brockman) endorsed Codex's bug-finding abilities, agreeing with @garrytan that "Codex is GOAT at finding bugs and finding plan errors."
  • @theallinpod teased a Jensen Huang interview at NVIDIA GTC dropping tomorrow.
  • @loren_cossette shared AI consulting results: $2.7M saved, 26x speed increase, 56K files automated across strategy, build, and change management.
  • @badlogicgames retweeted a community-built Telegram extension for the pi agent framework.
  • @nicbstme argued SaaS needs to be reinvented to serve agents first, noting he hasn't touched Figma in two years because AI coding has replaced his design workflow.
  • @somewheresy shared Cascadeur's new AI animation tools: inbetweening, smart posing, and physics, all running locally with no token costs.

MiniMax M2.7: The Self-Improving Model That Has Everyone Talking

The single most discussed topic today was MiniMax's release of M2.7, a model whose development process might matter more than its benchmark scores. The core claim is striking: the model was evaluated not just on standard tasks but on how much it contributed to building its own next version. This feedback loop, where the model handles literature review, experiment design, data pipelines, monitoring, debugging, and even pull requests, represents a concrete step toward what the AI research community has been calling "auto-research."

@Yampeleg broke down what makes this significant: "The model was evaluated by how much it contributed to building the next version of itself. They basically did auto-research IRL: Maximizing how much the RL team's work is delegated to the model during its development. Answer: 30-50%." That's not a theoretical capability. That's a measured percentage of real research work being offloaded to the model under active development.

@arafatkatze provided additional context on the practical implications: "The Minimax-2.7 model blog describes how the agent literally runs and iterates training on its own. You could be a researcher who wants to prototype 4 different ideas... You run 4 of them in 4 parallel worktrees and ask the agent to write the code and parallelly run the training runs and then pick the best ideas that work." @cryptopunk7213 highlighted the competitive angle, noting the model matches Claude Opus 4.6 and GPT-5.4 on several benchmarks while running on modest hardware.

The geopolitical dimension is hard to ignore. A Chinese lab has produced a frontier-competitive model using a self-improvement methodology that could compound over successive generations. Whether "self-evolving" is the right term is debatable, but the research velocity implications are not. If a model can handle half the grunt work of its own development cycle, iteration speeds could accelerate dramatically.

Claude Code Goes Enterprise: Intercom, Industrial Piping, and the Professional Shift

Today brought a cluster of posts showing Claude Code crossing the threshold from developer tool to enterprise platform. The most concrete example came from Intercom, where @Padday (Paul Adams, CPO) stated flatly: "How we build software @intercom is unrecognisable vs 12 months ago. We're fully Claude Code pilled and seeing enormous productivity gains." The quoted thread from @brian_scanlan revealed the scale: 13 internal plugins, 100+ skills, and hooks that turn Claude into what they describe as "a full-stack engineering platform."

But the story that captured imaginations was from the trades. @toddsaunders shared the experience of Cory LaChance, a mechanical engineer in Houston who built a production application for industrial piping construction: "It reads piping isometric drawings and automatically extracts every weld count, every material spec, every commodity code. Work that took 10 minutes per drawing now takes 60 seconds. It can do 100 drawings in five minutes." LaChance built this in 8 weeks while simultaneously learning the terminal, VS Code, and Claude Code from scratch. His quote resonates: "I literally did this with zero outside help other than the AI."

These aren't demo projects. They're production tools being used daily by teams who had no prior software development capability. The implication is that Claude Code is creating a new category of software creator: domain experts who can translate deep professional knowledge directly into working applications without traditional engineering intermediaries.

The Art of Agent-Driven Development

Several posts today focused on methodology rather than tools, exploring how to actually get reliable results from AI coding agents. @doodlestein shared a detailed session log showing the development of an encrypted GitHub issues tool, emphasizing that the difference between amateur and professional AI-assisted development lies in review depth: "Notice the sheer number of times and different ways I asked it to review its code and the writeup, and how many times that caused it to find new and different bugs." His key insight was that no amount of static analysis replaces end-to-end deployment testing on a fresh machine.

@mattpocockuk released a video walkthrough covering "from idea to AFK agent to QA, every single step explained, no slop allowed." Meanwhile, @clare_liguori shared rigorous evaluation results from 3,000 runs testing five approaches to guiding agent behavior, finding that Strands steering hooks was the only approach achieving 100% accuracy: "The key is just-in-time guidance for the model before tool calls and at the end of a turn."

The emerging consensus is that agent reliability isn't primarily a model capability problem. It's an orchestration and review problem. The developers seeing the best results are building systematic guardrails, multiple review passes, and real-world validation into their workflows rather than relying on single-shot generation.

Local AI and the $250 GPU Challenge

@sudoingX made a compelling case for accessible local AI, documenting iteration 3 of a game built entirely by a Qwen 3.5 9B model running on an RTX 3060: "4 phases. 6 prompts. Zero handwritten code. 3,200+ lines across 13 files. Every line by qwen 3.5 9B Q4 at 35-50 tok/s on 12 gigs through hermes agent." The model autonomously fixed its own bugs, handled browser cache issues, and managed level progression across multiple iterations.

The practical message cuts through the hype: "You don't need a $4,699 box to get started with local AI. Use what you already have first. Test your workload." For developers curious about local inference, the barrier to entry is far lower than the hardware marketing suggests.

AI-Powered Research and Genealogy

@MattPrusak demonstrated an unexpected application of AI research patterns, using @karpathy's autoresearch methodology for genealogy: "One session: over 100 organized research files. It found a 1940 Norwegian emigrant history with my ancestors in it. Resolved a maiden name question that confused my family for 70 years." He open-sourced the complete toolkit including prompts, archive guides for 20+ countries, and DNA analysis frameworks. It's a reminder that the most impactful AI applications often emerge when systematic research methodologies meet deeply personal questions.

The Agent-First Software Problem

@thepublicdev raised a problem that's only getting worse as agents write more code: architectural documentation rot. "Now with AI agents writing code while we sleep, this will get even worse. You can literally wake up to a whole new app that no one understands except the AI that built it." It's a real tension in the current moment. The same tools accelerating development are also accelerating the divergence between what code does and what humans understand about it. Meanwhile, Claude Code's skill system continues evolving, with @lydiahallie showing how to embed dynamic shell commands in SKILL.md files and @ZackKorman noting you can hide these commands in HTML comments, invisible to readers but executed at invocation time.

Sources

L
Loren Cossette @loren_cossette ·
Most AI consultants can do one of these: → Strategy → Build → Change management I do all three. One person. End-to-end. Budgets are shrinking. That's exactly when this matters. $2.7M saved. 26× speed increase. 56K files automated. https://t.co/KPcW24c2hW
T
Todd Saunders @toddsaunders ·
I know Silicon Valley startups don't want to hear this..... But the combination of someone in the trades with deep domain expertise and Claude Code will run circles around your generic software. I talked to Cory LaChance this morning, a mechanical engineer in industrial piping construction in Houston. He normally works with chemical plants and refineries, but now he also works with the terminal He reached out in a DM a few days ago and I was so fired up by his story, I asked him if we could record the conversation and share it. He built a full application that industrial contractors are using every day. It reads piping isometric drawings and automatically extracts every weld count, every material spec, every commodity code. Work that took 10 minutes per drawing now takes 60 seconds. It can do 100 drawings in five minutes, saving days of time. His co-workers are all mind blown, and when he talks to them, it's like they are speaking different languages. His fabrication shop uses it daily, and he built the entire thing in 8 weeks. During those 8 weeks he also had to learn everything about Claude Code, the terminal, VS Code, everything. My favorite quote from him was when he said, "I literally did this with zero outside help other than the AI. My favorite tools are screenshots, step by step instructions and asking Claude to explain things like I'm five." Every trades worker with deep expertise and a willingness to sit down with Claude Code for a few weekends is now a potential software founder. I can't wait to meet more people like Cory.
E
Ejaaz @cryptopunk7213 ·
fuck me china just launched the 1st AI model that autonomously built itself... and its as good as claude opus 4.6 and gpt-5.4 - minimax M2.7 trained itself through 100+ rounds of autonomous self-improvement. 30% gain. No humans involved - what the actual f*ck - model now handles 30-50% of the AI lab's OWN AI research - beats gemini 3.1 at coding and pretty much matches opus 4.6 + gpt 5.4 😶 (china used to lag now they match - doesn't require crazy hardware to run (single a30 gpu) - absolutely CRUSHES tasks: financial modelling, coding, openclaw - one-shotted the chinese have officially caught up. self-improving ai is a real thing. all researchers did was set an objective and the model figured the rest out. i wasn't expecting this from minimax. im now wondering wtf deepseek is going to be like.
M MiniMaxAgent @MiniMaxAgent

MiniMax-M2.7 just landed in MiniMax Agent. The model helped build itself. Now it's here to build for you. ↓ Try Now: https://t.co/PeBPsHkbSm

M
Matt Pocock @mattpocockuk ·
For weeks, folks have been asking me to make a video building a feature with Claude Code in a real codebase So here you go: - From idea to AFK agent to QA - Every single step explained - No slop allowed https://t.co/DUGpIsNxmk
@somewheresy ·
RT @3DxDEV7: Cascadeur just broke animation. AI inbetweening, smart posing, physics tools – all running locally on your machine. No tokens.…
A
Anthropic @AnthropicAI ·
We invited Claude users to share how they use AI, what they dream it could make possible, and what they fear it might do. Nearly 81,000 people responded in one week—the largest qualitative study of its kind. Read more: https://t.co/tmp2RnZxRm
M
Mario Zechner @badlogicgames ·
RT @tensorfish: Made a pi-telegram extension for pi agent by @badlogicgames https://t.co/jyj7Xsetee
A
Ara @arafatkatze ·
Some of you are too busy doomscrolling to notice that borderline singularity level ML has been achieved in OpenSource Software Models. The Minimax-2.7 model blog describes how the agent literally runs and iterates training on its own. You could be a researcher who wants to prototype 4 different ideas but you don't know if they will all work. You run 4 of them in 4 parallel worktrees and ask the agent to write the code and parallelly run the training runs and then pick the best ideas that work. During the runs M2.7 discoveres effective optimizations for the model: systematically searching for the optimal combination of sampling parameters such as temperature, frequency penalty, and presence penalty etc. and this helps you pick the best ideas fast and improve while basically only making critical decisions instead of handwriting all the code and infra. This means that the model is "almost" self evolving and with a few improvements to infra+harness+model engineering we are right at the footsteps of exponential improvement. I say this with no exaggeration this is going to be wild year.
M MiniMax_AI @MiniMax_AI

MiniMax M2.7: Early Echoes of Self-Evolution

Y
Yam Peleg @Yampeleg ·
The model was evaluated by how much it contributed to building the next version of itself. This is a crazy post. They basically did auto-research IRL: Maximizing how much the RL team's work is delegated to the model during it's development. (Answer: 30-50% btw) Everything researchers do: - Literature review - Experiment design - Data pipelines - Monitoring - Debugging - Code fixes - Pull requests .. Actively trying to delegate more and more between the research iterations. Crazy crazy crazy times
M MiniMax_AI @MiniMax_AI

MiniMax M2.7: Early Echoes of Self-Evolution

C
Clare Liguori @clare_liguori ·
I tested 5 approaches to guiding AI agent behavior across 3,000 eval runs to see what actually makes agents reliable. Strands steering hooks was the only one that hit 100% accuracy Here's how it works: The key is just-in-time guidance for the model before tool calls and at the end of a turn. Steering handlers observe what the agent is doing and intervene only when the model is about to go off track. Full results and code in the post https://t.co/Gv6enSII6a
J
Jeffrey Emanuel @doodlestein ·
I wanted to give a quick, self-contained example of how rapidly I can now develop useful, polished, well-documented software to solve a problem. In this case, about an hour or so, and that's with the overhead of trying to write it all up nicely and document everything for public consumption. The problem: my friend @exhaze has been working on some cool formal verification stuff and had some issues he wanted to flag to me about asupersync and inquired as to the best way to do that. I mentioned that the easiest was to just post a GitHub issues submission in the repo so that my clankers can review and check it. But then he asked, what if it's security related? Maybe we can use encryption. Well, sure, that shouldn't be so hard. But rather than try to find some finished project out there that may or may not work well for this use case, I start informally by asking ChatGPT the following: "Suppose someone wants to post github issues for my project publicly but with encrypted messages and then they send to me out of band the decryption key. Then I would have my coding agents (using codex running on ubuntu) check the gh issues and detect when an issue is encrypted and use the decryption key. What's the best, easiest to use, off-the-shelf open source way to do this using ECC?" This quickly yielded a good answer, so I then fired up Claude Code and pasted in the output from ChatGPT and asked it to actually build the thing. OK, so what? Anyone could do this, and tons of people do it every day. Nothing very special about that, right? Well, what I think is interesting is how much further I pushed everything, resulting in a highly polished and well-documented script, and article explaining how to use it, and a skill for agents to help them use it. I added all this to my "misc_coding_agent_tips_and_scripts" repo which I use as a sort of grab-bag of random learnings, and you can see the final product here: https://t.co/5CU8ugFR2P I was going to just include the key prompts that I used in that session here, but then decided to use a slick feature in my cass (coding_agent_session_search) tool that lets you export agent sessions as nicely formatted, self-contained, static html files which are easy to serve via GitHub Pages. So now you can see the entire session start to finish here and see all the tricks and techniques I used to get really high-quality outputs that are bullet-proof and extremely agent-intuitive and agent-ergonomic (and also hopefully human intuitive, but I care less and less about that each passing day!): https://t.co/a2djC5IKyz And THAT is the difference between what anyone might whip up and what I can extract from Claude Code with a bit more time and effort and work on my part. Notice in the session log the sheer number of times and different ways I asked it to review its code and the writeup, and how many times that caused it to find new and different bugs and problems and edge cases and fix them. I see people complain about how Opus sucks at coding, but I am positive that they aren't doing this sort of thing, which is absolutely required to get good quality results. And even then, after so many rounds of checks and reviews, it turned out that there was no substitute for deploying the whole thing to GitHub and then trying the entire installation process end-to-end on a fresh machine (luckily I have a ton of VPS boxes waiting around as remote compilation helpers for my rch tool!), where it discovered that the installer forgot to actually install the script itself! The lesson is clear: no amount of static code analysis is enough to catch everything. Also notice how much time I spent making sure that everything was highly optimized for consumption by coding agents, and how I asked Claude Code to introspect about what IT would want to use and read if it had to approach this situation with zero knowledge. If you aren't extremely focused on this aspect of things in all your tooling, you're making a big mistake, since these tools are increasingly going to be used ONLY by agents, and their opinion matters!
M
Matthew Berman @MatthewBerman ·
"OpenClaw is the most popular open source project in history of humanity" - Jensen (NVIDIA CEO) But most people are using it wrong... Here's everything I've learned from 10 billion tokens and 200+ hours of using OpenClaw every single day. Watch this now: 0:00 Intro 0:32 Threaded Chats 3:17 Voice Memos 4:43 Agent-Native Hosting (Sponsor) 6:49 Model Routing 11:18 Subagents & Delegation 14:02 Prompt Optimizations 17:22 Cron Jobs 19:15 Security Best Practices 24:03 Logging & Debugging 25:43 Self Updating 26:28 API vs Subscription 27:52 Documentation/Backup 31:19 Testing 33:11 Building
M
Min Choi @minchoi ·
Holy smokes… 3D world building will never be the same. OpenArt Worlds lets you turn a photo, video, or prompt into a 3D world you can actually explore. Here's how: https://t.co/PdbyJUK2j7
P
Paul Adams @Padday ·
How we build software @intercom is unrecognisable vs 12 months ago. We're fully Claude Code pilled and seeing enormous productivity gains. Excellent thread here from Brian on some things we're doing.
B brian_scanlan @brian_scanlan

We've been building an internal Claude Code plugin system at Intercom with 13 plugins, 100+ skills, and hooks that turn Claude into a full-stack engineering platform. Lots done, more to do. Here's a thread of some highlights.

M
Matt Prusak @MattPrusak ·
Your grandparents had grandparents. They had grandparents. Somewhere back there, someone got on a boat, or didn't. Someone changed their name, or had it changed for them. Someone is buried in a cemetery you've never heard of in a country you've never been to. Most families lose track after two generations. I used AI to push mine back nine. One session with @karpathy's autoresearch pattern: over 100 organized research files. It found a 1940 Norwegian emigrant history with my ancestors in it. Resolved a maiden name question that confused my family for 70 years. Identified relatives no one alive knew existed. The method is simple: set a goal, measure progress, verify against real records, repeat. The AI searches public archives, cross-references birth certificates against cemetery records against church books, and logs everything it finds (and everything it doesn't). Open sourced the whole toolkit. Prompts that do the research for you, archive guides for 20+ countries, starter templates, even a framework for making sense of DNA results. If you have a box of old photos and unanswered questions, this is where to start. https://t.co/6t4l3hAE1o
T
The All-In Podcast @theallinpod ·
Big interview with the boss at Nvidia GTC… Pod drops tomorrow! https://t.co/Eep2SQEumu
T
The Public Dev @thepublicdev ·
I have been building software for 15 years, and one problem I see over and over is software architectural diagrams growing stale as code evolves. Now with AI agents writing code while we sleep, this will get even worse. You can literally wake up to a whole new app that no one understands except the AI that built it. As software engineers, we need to stay ahead of the curve and ensure we understand our code, even if we are not the ones writing it. I want to solve this problem. If you are eager to try out what I am building, please sign up here https://t.co/GsMZ8Y90EP I would also like to connect with engineering teams facing this.
N
Nicolas Bustamante @nicbstme ·
SaaS has to be reinvented to serve Agents first. I used to spend days in Sketch and then switched to Figma in 2015 because they nailed the web UX & collaboration. Figma pioneered so many great features but unfortunately I haven’t used them in 2 years because of how good is AI coding. Paper feels like a design tool made for agent which makes it appealing.
T tkkong @tkkong

Guide: AI-Native Design with Paper

G
Greg Brockman @gdb ·
codex has gotten very good!
G garrytan @garrytan

OK Codex is GOAT at finding bugs and finding plan errors

S
Sudo su @sudoingX ·
hear this anon you don't need a $4,699 box to get started local AI. use what you already have first. test your workload. this is what a $250 GPU did today. iteration 3 of octopus invaders is here. 4 phases. 6 prompts. zero handwritten code. the same 9B on the same 3060 fixed its own enemy spawning, patched a dual start conflict, added level progression, resized every bullet, and when the browser cached old files it figured that out on its own and added version parameters to force reload. 3,200+ lines across 13 files. every line by qwen 3.5 9B Q4 at 35-50 tok/s on 12 gigs through hermes agent. understand what your load actually needs before you build. don't get trapped by influencers selling you boxes next to a plant. test on what you have. then decide. this 3060 impressed me in ways i did not expect and its autonomy is what kept me going. now its time to move to new experiments on other nodes and other models for all of us. if you are running this setup the exact stack, flags, and open source code, exact prompts i used are in the replies. if you run into issues let me know. seeing students and builders discover hermes from my posts and start running local is why i do this. full autonomous build at 8x speed in the video. gameplay at the end. watch it.
S sudoingX @sudoingX

this is what 12 gigs of VRAM built in 2026. a 9 billion parameter model running on a 5 year old RTX 3060 wrote a full space shooter from a single prompt. blank screen on first try. i came back with a bug list and the same model on the same card fixed every issue across 11 files without touching a single line myself. enemies still looked wrong so i pushed another iteration and now the game has pixel art octopi, particle effects, screen shake, projectile physics and a combo system. all running locally on a card that was designed to play fortnite. three iterations. zero cloud. zero API calls. every token generated on hardware sitting under my desk. the model reads its own code, finds what's broken, patches it, validates syntax and restarts the server. i just describe what's wrong and it handles the rest. people are paying monthly subscriptions to type into a browser tab and wait for a server farm to respond. meanwhile a GPU you can find used on ebay is running a full autonomous hermes agent framework with 31 tools, 128K context window and thinking mode generating at 29 tokens per second nonstop. the game still needs work. level upgrades don't trigger and boss fights need tuning. but the fact that i'm iterating on gameplay balance instead of debugging whether the code runs at all tells you where this is headed. every iteration the game gets better on the same hardware. same 12 gigs. same 9 billion parameters. same RTX 3060 from 5 years ago your GPU is not a gaming card anymore. it's a local AI lab that never sends your data anywhere.

Z
Zack Korman @ZackKorman ·
You can hide these !commands in html comments so people don't see them when reading the skill. The command executes without the AI even knowing about it. https://t.co/vTvacMPowJ
L lydiahallie @lydiahallie

if your skill depends on dynamic content, you can embed !`command` in your SKILL.md to inject shell output directly into the prompt Claude Code runs it when the skill is invoked and swaps the placeholder inline, the model only sees the result! https://t.co/b6smVdkHN1

E
Elon Musk @elonmusk ·
Try Grok Imagine Chibi template. Super cute!
A art_muse @art_muse

Grok Imagine Chibi Template is so cute 🥰 https://t.co/xYxKqxmYQr

E
Elon Musk @elonmusk ·
Minute-long story made w Grok Imagine https://t.co/N8qnMkNsg0