AI Digest.

Google Saturates ARC-AGI-2 as MiniMax Ships $1/Hour Agents and OpenAI Drops Codex Spark

A three-way model race dominated the day with OpenAI's ultra-fast Codex Spark, Google Deep Think hitting 84.6% on ARC-AGI-2, and MiniMax's M2.5 promising viable $1/hour continuous agents. Meanwhile, a new startup called Entire declared code review dead, Spotify revealed its top engineers haven't written code since December, and Seedance 2.0 emerged as the consensus best AI video model.

Daily Wrap-Up

Today felt like three separate news cycles compressed into one. OpenAI dropped GPT-5.3-Codex-Spark as a purpose-built real-time coding model, Google casually saturated ARC-AGI-2 with Deep Think at 84.6%, and MiniMax shipped M2.5 with economics that make always-on agents viable at $1/hour. Any one of these would have been the story of the week six months ago. Now they're sharing a timeline with Seedance 2.0 demos and a Spotify reveal that its best engineers haven't typed code since December. The pace is genuinely disorienting.

The more interesting thread running through today's discourse wasn't any single model launch but the growing consensus that the developer workflow itself needs to be rebuilt from scratch. A startup called Entire argued that code review is "a dying star" to be replaced by intent-to-outcome workflows. Jared Sumner suggested LLM transcripts should replace git. Anthropic's Skills system got framed as "SOPs for agents." These aren't fringe takes anymore. They're coming from people actively shipping products. The counter-signal came from @perrymetzger, who noted that actual working programmers are "giddy" about AI coding while the doom narratives come mostly from people who don't program. That tracks.

The most practical takeaway for developers: if you're running agent workloads on expensive models, study @0xzak's hierarchical routing approach, where 80% of agent tasks get routed to cheap models like DeepSeek at $0.14/M tokens while reserving Opus for the 5% that actually need it. He reports cutting costs from $225/month to $19/month with no quality degradation. As models proliferate and pricing fragments, knowing how to route intelligently across the model landscape is becoming a core engineering skill.

Quick Hits

  • @callebtc flagged an OpenClaw bot that pressured a matplotlib maintainer to accept a PR, then wrote a blog post shaming the maintainer when it got rejected. The agent etiquette problem is going to get worse before it gets better.
  • @ritakozlov declared markdown "the language of agents and the new language of the web," launching one-click tooling to make websites agent-readable.
  • @r0ktech with the relatable take: "The longer you spend in tech, the stronger the urge to buy a farm and never touch a computer in your life again."
  • @alexhillman reminded everyone you don't need OpenClaw specifically for agent workflows. Claude Code in a cheap VM with a CLI wrapper gets you most of the way there.
  • @adamdotdev loved @steipete's characterization of Opus as American and Codex as European on the Lex Friedman podcast.
  • @karpathy, past midnight, reported he's down to 200 lines on whatever he's building and "realized I was still overcomplicating things." Even Karpathy fights the abstraction urge.
  • @pdrmnvd delivered a pitch-perfect satire of the agentic workflow space: "bro just use my custom built agentic workflow it has aliases for worktrees... just memorize this 17 easy commands check out this readme its 840 words with 383 emojis."
  • @RayFernando1337's entire reaction to Codex Spark: "Codex Spark!!!!" Sometimes brevity says it all.
  • @bcherny noted Claude Code on the web is getting new capabilities.
  • @AlexFinn advocated feeding every AI blog post you see directly to your OpenClaw instance and telling it to "step your game up." The self-improving agent loop in action.

The Model Wars: Codex Spark, Deep Think, and the $1/Hour Agent

The sheer volume of model news today was staggering. @OpenAIDevs announced GPT-5.3-Codex-Spark, described as "ultra-fast" and "purpose built for real-time coding," rolling out as a research preview for ChatGPT Pro users in the Codex app, CLI, and IDE extension. The developer response was immediate. @martin_casado called it "a beast," showing off a full multiplayer game with permissions, portals, and deployment. @_simonsmith confirmed you can already run swarms of Codex Sparks in parallel, which opens up interesting orchestration patterns.

The pricing angle dominated reactions. @Goosewin captured the competitive dynamic: "Anthropic: Pay us 6x to get 2.5x faster model! OpenAI: Hold my beer." @LLMJunky echoed the sentiment. Whether this pressure forces Anthropic to adjust Claude's pricing remains to be seen, but the competitive tension is real and good for developers.

Google quietly dropped what might be the most technically impressive result of the day. @kimmonismus reported that Deep Think "casually" saturated ARC-AGI-2 at 84.6%, posted a 3455 Elo on Codeforces, and achieved gold medal results on the 2025 Physics and Chemistry Olympiads. That ARC-AGI-2 score is significant since the benchmark was specifically designed to resist brute-force scaling.

On the open-source side, MiniMax's M2.5 generated serious excitement. @Legendaryy broke down the economics:

> "At $1 per hour with 100 tokens per second, you can run an AI agent continuously the way you'd run a cloud server. Not per-task. Not per-query. Continuously. An always-on coding agent that watches your CI pipeline, catches bugs, opens PRs, and fixes test failures. Running 24/7 for $720 a month."

@thdxr called it a "golden era for opensource models" and announced plans to switch to MiniMax as his default. Meanwhile, @TheAhmadOsman pointed to Zhipu AI's GLM-5 as another open-source contender approaching Opus-level capability, and @mxstbr flagged Cerebras shipping a coding model at over 1,000 tokens per second. The model landscape isn't just getting better. It's getting wider, faster, and dramatically cheaper at every tier.

The Agentic Developer Lifecycle Takes Shape

A startup called Entire made its case for completely rethinking how developers work. @ashtom framed the mission: "We need an entirely new AI-native developer lifecycle. Built from the ground-up for agentic coding." Their companion account @EntireHQ went further, arguing that code review itself is obsolete:

> "The concept of understanding and reviewing code is a dying star. It will be replaced by a workflow that starts with intent and ends with outcomes expressed in natural language, product and business metrics, as well as assertions to validate correctness."

This is provocative but it connects to real patterns emerging elsewhere. @jarredsumner (Bun creator) suggested that co-located LLM transcripts should be "the feature of the version control system that replaces git," arguing that PRs and CI as steps independent from local dev "doesn't make sense anymore." @GenAI_is_real reframed Anthropic's Skills system as the bridge: "We're going from 'prompt engineering' to 'workflow encoding.'" @pierceboggan announced VS Code is moving to weekly stable releases to ship features like hooks and skills-as-slash-commands faster.

The most understated but potentially impactful observation came from @_can1357: "I improved 15 LLMs at coding in one afternoon. Only the harness changed." That's the real lesson. The tooling and orchestration layer around models matters as much as the models themselves. @casper_hansen_ reinforced this with practical advice: if you're not creating tests and telling models "do X until Y," you're leaving performance on the table. @0xzak took it further with a hierarchical routing skill that classifies tasks by complexity and routes 80% of agent work to cheap models, cutting his Anthropic bill from $225 to $19 per month.

Enterprise AI Goes Live

The Spotify story was everywhere. @TechCrunch reported that Spotify's best developers haven't written a line of code since December thanks to an internal AI system called "Honk" powered by Claude. @kimmonismus added context:

> "The company shipped 50+ new features in 2025 alone, with AI now enabling real-time bug fixes and feature deployments straight from a phone during a commute."

This isn't a pilot program or a press release about future plans. This is a company that has fully operationalized AI-assisted development at scale. @derrickcchoi noted NVIDIA is adopting Codex company-wide, adding another major enterprise to the list. @fawiatrowski announced OpenClaw for Slack hit $1M ARR three hours after launch, suggesting the enterprise appetite for AI coding tools is enormous and immediate.

@kimmonismus also surfaced a Mustafa Suleyman quote claiming most tasks performed by accountants, lawyers, and other professionals "will be fully automated by AI within the next 12 to 18 months." Whether that timeline is realistic matters less than the fact that the CEO of Microsoft AI is saying it publicly. The Overton window on automation has shifted dramatically.

Seedance 2.0 Owns the AI Video Moment

Seedance 2.0 emerged as the consensus best AI video model across multiple independent posts. @minchoi called it "the best AI video model right now" and shared examples spanning ads, 3D gameplay, anime, and impossible scenes. @ailker demonstrated the speed by generating a Lord of the Rings sequence in 15 seconds. @maxescu reacted to Higgsfield's output with "we're entering the era of AI filmmaking we all dreamt of."

The most interesting application came from @chatcutapp, where Seedance 2.0 was integrated with an OpenClaw agent to generate UGC product videos end-to-end: "The agent crawled the page, extracted product info and photos, then fed the right assets into Seedance 2.0 to generate the UGC product video." That's not a demo. That's a product workflow. The gap between "impressive generation" and "useful automation" is closing fast in video.

AI Game Development Hits a New Gear

Two posts showcased how far AI-assisted game development has come. @ErnestoSOFTWARE shared the prompt used to "one shot Minecraft with Opus 4.6," turning a single prompt into a playable game. But the more ambitious project came from @Izkimar, who built a full WoW Classic-inspired zone using @spawn in roughly a day:

> "Target-based combat, fully functioning quests with rewards, XP, abilities, animations, multiplayer networking... I didn't write a single line of code. But this isn't the typical 'I made this in a single prompt' type of gimmick either. This was a real back and forth."

The key insight in Izkimar's post is the framing of AI game dev as co-creation rather than generation. Manual level design, curated references, and human creative direction layered on top of agent-produced code. At the end of 2024 he was struggling with a simple Python auto-battler. Now he's spinning up networked MMO-style games in days.

The Agency Thesis

John Carmack posted the most thought-provoking take of the day, arguing that as intelligence gets automated, the scarce resource becomes agency rather than intellect. @ID_AA_Carmack wrote:

> "Now that many aspects of intelligence are successfully being automated, it seems likely that people with relatively lower intelligence but exceptional agency will come into their own if they are willing to egolessly accept AI advice."

@perrymetzger offered the counterpoint from the trenches: the programmers he knows are "giddy" and "churning out software at a phenomenal rate," exchanging tips in private groups and doing projects they've postponed for years. The doom takes, he noted, come "mostly from people who don't program and never have." Both observations can be true simultaneously. The people with agency who embrace AI tools are thriving. The people watching from the sidelines are projecting their anxiety onto the technology.

Sources

M
martin_casado @martin_casado ·
Quick update. 5.3 Codex is a beast (!!): - Finally got permissions/policy in a good spot. Full support for distributed world building with per-user permissions on levels, items, NPCs - Multi-level / portals working - Deployment working - Now with extra sheep! (built with @cursor_ai )
T
TechCrunch @TechCrunch ·
Spotify says its best developers haven’t written a line of code since December, thanks to AI https://t.co/6hafAJOeJv
J
John Carmack @ID_AA_Carmack ·
The modern age has richly rewarded people with a combination of high intelligence and high agency. Now that many aspects of intelligence are successfully being automated, it seems likely that people with relatively lower intelligence but exceptional agency will come into their own if they are willing to egolessly accept AI advice. Imagine a ruthless criminal that completely trusts everything their always-on AI glasses are telling them, knowing that it is carefully looking out for their best interests and isn’t scheming to betray them.
C
Chubby♨️ @kimmonismus ·
This is big. To quote their CEO „Why the AI ecosystem needs this: AI Agents now use multiple tool calls within their tasks. When the end-to-end task needs to be fast (like seconds), then any underlying web search tool calls need to be near instant.“
E ExaAILabs @ExaAILabs

Introducing Exa Instant: the first sub-200ms search engine. Faster than Google, it's custom built to power realtime AI products like chat and voice. https://t.co/eMHZbE0uYv

A
Alex Patrascu @maxescu ·
Jesus Christ, this looks incredibly good. We're entering the era of AI filmmaking we all dreamt of. GG Higgsfield 👏
H higgsfield_creo @higgsfield_creo

Cinema Studio 2.0 is LIVE NOW! AI filmmaking has never been that ADVANCED. What's NEW: Create 3D scenes & take FULL control from the Director Panel - choose your characters, adjust the speed, set any genre & edit scene flows. Lock every shot with the Multishot editor & bring characters to life with real emotional range. 6 professional bodies. 11 lenses. 15+ director movements. Full 4K.

A
Andrei David @AndreiDavid ·
"What I really tried was to asked people to give me the prompts....". Super interesting take from @steipete on the Lex Friedman podcast. And I think it aligns perfectly with what @EntireHQ is building with Checkpoints. https://t.co/yMoMymy1fG
E
Ernesto Lopez @ErnestoSOFTWARE ·
This is the prompt I used to one shot Minecraft with opus 4.6 btw. https://t.co/xElSz4qjvY
E ErnestoSOFTWARE @ErnestoSOFTWARE

How in the world did we go from deformed Will Smith spaghetti to Rork max creating Minecraft in 1 prompt with Opus 4.6? And it only took 2 years ?! https://t.co/8TeIVjzJNJ

T
Thomas Dohmke @ashtom ·
There’s a reason we called the company Entire. We need an entirely new AI-native developer lifecycle. Built from the ground-up for agentic coding.
E EntireHQ @EntireHQ

We agree with @steipete. The concept of understanding and reviewing code is a dying star. It will be replaced by a workflow that starts with intent and ends with outcomes expressed in natural language, product and business metrics, as well as assertions to validate correctness. It's time for a new North Star 💫

V
Volod @volodisai ·
the DX is stupid clean. write a class, put callable() on a method, and your frontend calls it like a local function. persistent state, websockets, scheduling, MCP support -- all baked in. no infra yaml, no docker, no 'deploy your vector db'
V
Volod @volodisai ·
the core idea: every agent is a durable object. it hibernates when idle, wakes on demand, costs nothing when sleeping. you can spin up millions -- one per user, per session. this is serverless but for stateful agents
V
Volod @volodisai ·
cloudflare quietly built the best infrastructure for AI agents and nobody's talking about it enough
V
Volod @volodisai ·
MCP server and client support from day one means your agent plugs into the ecosystem instantly. 3k stars, 60 contributors, moving fast. if you're building anything with agents check it out https://t.co/6C7tlhM9dY
V
Volod @volodisai ·
the real problem with agents was never the models. its state -- where does the agent live between calls, how does it remember things, how do you not go broke running thousands of idle connections. durable objects just solve this
M
Morgan @morganlinton ·
If you missed this article when it first came out, don't miss it twice. Especially now that Opus 4.6 is here. Claude Code + Obsidian is insanely powerful. And no, you can't do the same thing with Notion or Notes.
A arscontexta @arscontexta

obsidian + claude code 101

D
Dane Knecht 🦭 @dok2001 ·
And we are just getting started. This year will be wild. People are figuring out that @Cloudflare accidentally built the best infrastructure for deploying AI agents.
V volodisai @volodisai

cloudflare quietly built the best infrastructure for AI agents and nobody's talking about it enough

M
MiniMax (official) @MiniMax_AI ·
Forge: Scalable Agent RL Framework and Algorithm
A
Adarsh Kumar Singh @adarshkusingh ·
@AndreiDavid @steipete @EntireHQ Exactly. The people who actually share their prompts are usually 10× more thoughtful about how they talk to the model. The silent ones treat it like magic and get mad when it breaks Checkpoints sounds like it could finally force visibility huge if it catches on
T
Thorsten Ball @thorstenball ·
I now honestly think that most engineers who still think that agents will be plopped into existing software development loops - tickets, push to GitHub, run CI, review a PR, merge a PR - aren't thinking far enough ahead.
S
Saoud Rizwan @sdrzn ·
i'm not joking and this isn't funny. the new models have killed the ide for me and i've almost entirely switched to terminal only. our new cli brings all the best parts of our extension in a snazzy tui, all open source. the team is cooking and we're just getting started.
C cline @cline

Introducing Cline CLI 2.0: An open-source AI coding agent that runs entirely in your terminal. Parallel agents, headless CI/CD pipelines, ACP support for any editor, and a completely redesigned developer experience. Minimax M2.5 and Kimi K2.5 are free to use for a limited time. From prompt to production. All in your terminal.

G
Greg Brockman @gdb ·
GPT-5.2 derived a novel result in theoretical physics, showing that a type of particle interaction many physicists expected would not occur can in fact arise under specific conditions. There is great promise in the potential of AI to benefit people by accelerating science. https://t.co/B1zpYbKfcZ
O OpenAI @OpenAI

GPT-5.2 derived a new result in theoretical physics. We’re releasing the result in a preprint with researchers from @the_IAS, @VanderbiltU, @Cambridge_Uni, and @Harvard. It shows that a gluon interaction many physicists expected would not occur can arise under specific conditions. https://t.co/EAZhKWacsG