MiniMax M2.7 Demonstrates Self-Improving AI Training While Claude Code Reshapes Professional Software Development
The AI community is buzzing about MiniMax's M2.7 model, which autonomously improved itself through 100+ rounds of self-training, handling 30-50% of the lab's own research tasks. Meanwhile, Claude Code continues its dominance as the tool of choice for professional developers, with Intercom revealing a 13-plugin internal ecosystem and domain experts building production software in weeks. Agent reliability, local AI accessibility, and the shift toward agent-first tooling round out a packed day.
Daily Wrap-Up
The biggest story today isn't about a new model beating benchmarks. It's about a model that helped build itself. MiniMax's M2.7 ran through 100+ autonomous self-improvement iterations, achieving a 30% performance gain with minimal human involvement, and now handles somewhere between 30-50% of the lab's own AI research workflow. Whether you think the breathless "self-evolving AI" framing is warranted or overblown, the technical achievement is real and the implications for research velocity are significant. Chinese AI labs have definitively closed the gap with Western frontier models, and they've done it with a model that runs on a single A30 GPU.
The other throughline today is how thoroughly Claude Code has embedded itself into professional software development. Intercom's engineering leadership is publicly stating their workflow is "unrecognisable" compared to 12 months ago, running 13 internal plugins with 100+ skills. A mechanical engineer in Houston built a production piping analysis tool in 8 weeks with zero prior coding experience. Matt Pocock released a full walkthrough of building features in a real codebase. The pattern is clear: Claude Code isn't a toy for demos anymore, it's infrastructure. And the developers getting the most out of it are the ones investing in agent-ergonomic tooling, rigorous review loops, and end-to-end testing rather than one-shot prompting.
The most entertaining moment was @sudoingX building a full space shooter game across 13 files and 3,200+ lines of code using a $250 RTX 3060 running a 9B parameter model locally, then politely telling everyone to stop buying $4,699 AI boxes before testing what their existing hardware can do. The most practical takeaway for developers: if you're using AI coding tools, stop optimizing your prompts and start optimizing your review process. As @doodlestein demonstrated in his detailed session log, the difference between mediocre and bulletproof AI-generated code comes from multiple rounds of review, introspection, and end-to-end deployment testing, not from a single clever prompt.
Quick Hits
- @elonmusk is promoting Grok Imagine's Chibi template and showing off minute-long AI-generated stories. Cute, but light on substance.
- @gdb (Greg Brockman) endorsed Codex's bug-finding abilities, agreeing with @garrytan that "Codex is GOAT at finding bugs and finding plan errors."
- @theallinpod teased a Jensen Huang interview at NVIDIA GTC dropping tomorrow.
- @loren_cossette shared AI consulting results: $2.7M saved, 26x speed increase, 56K files automated across strategy, build, and change management.
- @badlogicgames retweeted a community-built Telegram extension for the pi agent framework.
- @nicbstme argued SaaS needs to be reinvented to serve agents first, noting he hasn't touched Figma in two years because AI coding has replaced his design workflow.
- @somewheresy shared Cascadeur's new AI animation tools: inbetweening, smart posing, and physics, all running locally with no token costs.
MiniMax M2.7: The Self-Improving Model That Has Everyone Talking
The single most discussed topic today was MiniMax's release of M2.7, a model whose development process might matter more than its benchmark scores. The core claim is striking: the model was evaluated not just on standard tasks but on how much it contributed to building its own next version. This feedback loop, where the model handles literature review, experiment design, data pipelines, monitoring, debugging, and even pull requests, represents a concrete step toward what the AI research community has been calling "auto-research."
@Yampeleg broke down what makes this significant: "The model was evaluated by how much it contributed to building the next version of itself. They basically did auto-research IRL: Maximizing how much the RL team's work is delegated to the model during its development. Answer: 30-50%." That's not a theoretical capability. That's a measured percentage of real research work being offloaded to the model under active development.
@arafatkatze provided additional context on the practical implications: "The Minimax-2.7 model blog describes how the agent literally runs and iterates training on its own. You could be a researcher who wants to prototype 4 different ideas... You run 4 of them in 4 parallel worktrees and ask the agent to write the code and parallelly run the training runs and then pick the best ideas that work." @cryptopunk7213 highlighted the competitive angle, noting the model matches Claude Opus 4.6 and GPT-5.4 on several benchmarks while running on modest hardware.
The geopolitical dimension is hard to ignore. A Chinese lab has produced a frontier-competitive model using a self-improvement methodology that could compound over successive generations. Whether "self-evolving" is the right term is debatable, but the research velocity implications are not. If a model can handle half the grunt work of its own development cycle, iteration speeds could accelerate dramatically.
Claude Code Goes Enterprise: Intercom, Industrial Piping, and the Professional Shift
Today brought a cluster of posts showing Claude Code crossing the threshold from developer tool to enterprise platform. The most concrete example came from Intercom, where @Padday (Paul Adams, CPO) stated flatly: "How we build software @intercom is unrecognisable vs 12 months ago. We're fully Claude Code pilled and seeing enormous productivity gains." The quoted thread from @brian_scanlan revealed the scale: 13 internal plugins, 100+ skills, and hooks that turn Claude into what they describe as "a full-stack engineering platform."
But the story that captured imaginations was from the trades. @toddsaunders shared the experience of Cory LaChance, a mechanical engineer in Houston who built a production application for industrial piping construction: "It reads piping isometric drawings and automatically extracts every weld count, every material spec, every commodity code. Work that took 10 minutes per drawing now takes 60 seconds. It can do 100 drawings in five minutes." LaChance built this in 8 weeks while simultaneously learning the terminal, VS Code, and Claude Code from scratch. His quote resonates: "I literally did this with zero outside help other than the AI."
These aren't demo projects. They're production tools being used daily by teams who had no prior software development capability. The implication is that Claude Code is creating a new category of software creator: domain experts who can translate deep professional knowledge directly into working applications without traditional engineering intermediaries.
The Art of Agent-Driven Development
Several posts today focused on methodology rather than tools, exploring how to actually get reliable results from AI coding agents. @doodlestein shared a detailed session log showing the development of an encrypted GitHub issues tool, emphasizing that the difference between amateur and professional AI-assisted development lies in review depth: "Notice the sheer number of times and different ways I asked it to review its code and the writeup, and how many times that caused it to find new and different bugs." His key insight was that no amount of static analysis replaces end-to-end deployment testing on a fresh machine.
@mattpocockuk released a video walkthrough covering "from idea to AFK agent to QA, every single step explained, no slop allowed." Meanwhile, @clare_liguori shared rigorous evaluation results from 3,000 runs testing five approaches to guiding agent behavior, finding that Strands steering hooks was the only approach achieving 100% accuracy: "The key is just-in-time guidance for the model before tool calls and at the end of a turn."
The emerging consensus is that agent reliability isn't primarily a model capability problem. It's an orchestration and review problem. The developers seeing the best results are building systematic guardrails, multiple review passes, and real-world validation into their workflows rather than relying on single-shot generation.
Local AI and the $250 GPU Challenge
@sudoingX made a compelling case for accessible local AI, documenting iteration 3 of a game built entirely by a Qwen 3.5 9B model running on an RTX 3060: "4 phases. 6 prompts. Zero handwritten code. 3,200+ lines across 13 files. Every line by qwen 3.5 9B Q4 at 35-50 tok/s on 12 gigs through hermes agent." The model autonomously fixed its own bugs, handled browser cache issues, and managed level progression across multiple iterations.
The practical message cuts through the hype: "You don't need a $4,699 box to get started with local AI. Use what you already have first. Test your workload." For developers curious about local inference, the barrier to entry is far lower than the hardware marketing suggests.
AI-Powered Research and Genealogy
@MattPrusak demonstrated an unexpected application of AI research patterns, using @karpathy's autoresearch methodology for genealogy: "One session: over 100 organized research files. It found a 1940 Norwegian emigrant history with my ancestors in it. Resolved a maiden name question that confused my family for 70 years." He open-sourced the complete toolkit including prompts, archive guides for 20+ countries, and DNA analysis frameworks. It's a reminder that the most impactful AI applications often emerge when systematic research methodologies meet deeply personal questions.
The Agent-First Software Problem
@thepublicdev raised a problem that's only getting worse as agents write more code: architectural documentation rot. "Now with AI agents writing code while we sleep, this will get even worse. You can literally wake up to a whole new app that no one understands except the AI that built it." It's a real tension in the current moment. The same tools accelerating development are also accelerating the divergence between what code does and what humans understand about it. Meanwhile, Claude Code's skill system continues evolving, with @lydiahallie showing how to embed dynamic shell commands in SKILL.md files and @ZackKorman noting you can hide these commands in HTML comments, invisible to readers but executed at invocation time.
Sources
MiniMax-M2.7 just landed in MiniMax Agent. The model helped build itself. Now it's here to build for you. ↓ Try Now: https://t.co/PeBPsHkbSm
MiniMax M2.7: Early Echoes of Self-Evolution
MiniMax M2.7: Early Echoes of Self-Evolution
We've been building an internal Claude Code plugin system at Intercom with 13 plugins, 100+ skills, and hooks that turn Claude into a full-stack engineering platform. Lots done, more to do. Here's a thread of some highlights.
Guide: AI-Native Design with Paper
OK Codex is GOAT at finding bugs and finding plan errors
this is what 12 gigs of VRAM built in 2026. a 9 billion parameter model running on a 5 year old RTX 3060 wrote a full space shooter from a single prompt. blank screen on first try. i came back with a bug list and the same model on the same card fixed every issue across 11 files without touching a single line myself. enemies still looked wrong so i pushed another iteration and now the game has pixel art octopi, particle effects, screen shake, projectile physics and a combo system. all running locally on a card that was designed to play fortnite. three iterations. zero cloud. zero API calls. every token generated on hardware sitting under my desk. the model reads its own code, finds what's broken, patches it, validates syntax and restarts the server. i just describe what's wrong and it handles the rest. people are paying monthly subscriptions to type into a browser tab and wait for a server farm to respond. meanwhile a GPU you can find used on ebay is running a full autonomous hermes agent framework with 31 tools, 128K context window and thinking mode generating at 29 tokens per second nonstop. the game still needs work. level upgrades don't trigger and boss fights need tuning. but the fact that i'm iterating on gameplay balance instead of debugging whether the code runs at all tells you where this is headed. every iteration the game gets better on the same hardware. same 12 gigs. same 9 billion parameters. same RTX 3060 from 5 years ago your GPU is not a gaming card anymore. it's a local AI lab that never sends your data anywhere.
if your skill depends on dynamic content, you can embed !`command` in your SKILL.md to inject shell output directly into the prompt Claude Code runs it when the skill is invoked and swaps the placeholder inline, the model only sees the result! https://t.co/b6smVdkHN1
Grok Imagine Chibi Template is so cute 🥰 https://t.co/xYxKqxmYQr