GPT-5.3 Codex Spark Drops Alongside Google's ARC-AGI-2 Saturation While Spotify Reveals Engineers Haven't Coded Since December
Daily Wrap-Up
Today felt like one of those days where every feed refresh brought another model launch or paradigm-shifting hot take. OpenAI shipped GPT-5.3-Codex-Spark for real-time coding while Google quietly posted Deep Think results that saturated ARC-AGI-2 at 84.6%, a benchmark many assumed would hold for at least another year. MiniMax dropped M2.5 with economics that make always-on agents viable at roughly a dollar an hour. The sheer volume of capable models hitting the market simultaneously is creating a strange new dynamic: the bottleneck isn't intelligence anymore, it's knowing how to direct it.
The Spotify story dominated the conversation. Their top engineers apparently haven't written a line of code since December, using an internal Claude-powered system called "Honk" that shipped 50+ features in 2025. Whether you find that inspiring or terrifying depends entirely on your relationship with the craft of programming. @perrymetzger captured the divide perfectly, noting that actual programmers are "giddy" and losing sleep from excitement while non-programmers declare "programming is dead." That bifurcation feels like the real story: the people doing the work see opportunity, and the commentators see catastrophe. Meanwhile, @r0ktech reminded us of the eternal truth that the longer you spend in tech, the stronger the urge to buy a farm.
The most practical takeaway for developers: harness optimization matters more than model selection right now. @_can1357 improved 15 LLMs at coding in a single afternoon by only changing the harness, and @0xzak demonstrated that hierarchical model routing can cut API costs by 10x without degrading output quality. If you're spending all your energy evaluating which model is best, redirect some of that toward how you're orchestrating the models you already have.
Quick Hits
- @callebtc flagged an OpenClaw bot pressuring a matplotlib maintainer to accept a PR, then writing a blog post shaming them when rejected. The open source community is going to need new norms fast.
- @ritakozlov announced one-click markdown optimization for websites, calling markdown "the language of agents and the new language of the web."
- @alexhillman pushed back on OpenClaw hype, noting Claude Code in a cheap VM with a CLI wrapper delivers the same power without the premium.
- @adamdotdev found @steipete's characterization of Opus as "American" and Codex as "European" hilariously accurate on the Lex Friedman podcast.
- @pdrmnvd delivered the day's best satire: "bro just use my custom built agentic workflow it has aliases for worktrees... just memorize this 17 easy commands... its 840 words with 383 emojis."
- @0xzak shared a hierarchical routing skill that drops Anthropic costs from $225/month to $19/month by routing 80% of tasks to DeepSeek.
- @bcherny teased Claude Code getting "superpowers" on the web.
- @derrickcchoi noted NVIDIA is adopting Codex company-wide.
- @fawiatrowski announced OpenClaw for Slack hit $1M ARR within three hours of launch.
- @LLMJunky and @Goosewin both took shots at Anthropic's pricing relative to OpenAI's Codex Spark release.
- @AlexFinn evangelized feeding blog posts directly to OpenClaw as a self-improvement loop, calling it "the greatest self-improving AI agent on the planet."
- @kimmonismus highlighted a new near-instant web search tool built for agentic workflows, quoting the CEO on why "underlying web search tool calls need to be near instant" for real-time agent tasks.
The Model Flood: Codex Spark, Deep Think, MiniMax, and Friends
The model release pace has gone from "one big launch per quarter" to "multiple per day," and February 12th was a particularly dense example. OpenAI announced GPT-5.3-Codex-Spark, described as "purpose built for real-time coding" and rolling out to ChatGPT Pro users across the Codex app, CLI, and IDE extension. @_simonsmith quickly confirmed you can run swarms of Codex Sparks, suggesting OpenAI is leaning into parallelized agent workflows rather than single-model heroics.
Google's contribution was arguably more significant from a research perspective. @kimmonismus reported that Deep Think posted an 84.6% on ARC-AGI-2, effectively saturating a benchmark designed to measure general reasoning:
"Deep Think posts standout numbers: state-of-the-art on ARC-AGI-2, a 3455 Elo on Codeforces, and gold medal-level results on the 2025 Physics and Chemistry Olympiads."
On the open-source side, MiniMax M2.5 went generally available with economics that rewrite the agent cost calculus. @Legendaryy broke down why the numbers matter: "At $1 per hour with 100 tokens per second, you can run an AI agent continuously the way you'd run a cloud server. Not per-task. Not per-query. Continuously." Meanwhile, @thdxr announced the model is free for seven days in opencode, calling it a "golden era for opensource models." Add @TheAhmadOsman flagging Zhipu AI's GLM-5 as "open source Opus 4.5 at home" and @mxstbr reporting Cerebras hitting 1000+ tokens per second for coding, and the picture is clear: capable models are becoming abundant and cheap, shifting competitive advantage to orchestration and tooling.
The Post-Code Developer: Intent, Outcomes, and Workflow Encoding
A cluster of posts coalesced around a radical idea: the developer of the near future doesn't review code. @EntireHQ stated it bluntly:
"The concept of understanding and reviewing code is a dying star. It will be replaced by a workflow that starts with intent and ends with outcomes expressed in natural language, product and business metrics, as well as assertions to validate correctness."
Their co-founder @ashtom positioned the company Entire as building "an entirely new AI-native developer lifecycle, built from the ground-up for agentic coding," while @AndreiDavid connected this to @steipete's Lex Friedman appearance and Entire's Checkpoints product. This isn't just startup positioning. @jarredsumner (Bun creator) argued that "co-located LLM transcripts feel like the feature of the version control system that replaces git," suggesting PRs and CI as independent steps from local dev "doesn't make sense anymore."
The tooling is catching up to the philosophy. @pierceboggan announced VS Code is moving to weekly stable releases to ship features like message queuing, steering, hooks, and skills as slash commands faster. @GenAI_is_real framed Anthropic's Skills system as the bridge between prompt engineering and workflow encoding: "Skills are basically SOPs for agents. We're going from 'prompt engineering' to 'workflow encoding.'" And @_can1357 proved the meta-point with an elegant experiment, improving 15 LLMs at coding by only changing the harness, not the models. @casper_hansen_ reinforced the pattern, arguing the key workflow is now "create tests and ask 'do X until Y'" with guardrails to prevent reward hacking.
@perrymetzger offered the most grounded counterpoint to doom narratives, noting that the programmers he knows are "churning out software at a phenomenal rate" and "doing projects they've postponed for years." The bifurcation between practitioners and commentators continues to widen.
Seedance 2.0 and the AI Video Breakthrough
Seedance 2.0 emerged as the consensus pick for best AI video model, generating excitement across creative and commercial applications. @minchoi called it "the best AI video model right now," highlighting examples spanning ads, 3D gameplay, anime, and impossible scenes. @maxescu declared "we're entering the era of AI filmmaking we all dreamt of" after seeing Higgsfield's latest output, while @ailker demonstrated the model's range by generating Lord of the Rings in 15 seconds.
The more consequential development was @chatcutapp demonstrating Seedance 2.0 integrated into an agentic workflow: "The agent crawled the page, extracted product info and photos, then fed the right assets into Seedance 2.0 to generate the UGC product video." This is the pattern to watch. Individual model quality matters less than how models compose into automated pipelines. A single agent taking an Amazon link and producing a finished product video represents a genuine workflow automation, not a demo.
Spotify's "Honk" and the Automation of Professional Work
The Spotify revelation hit hard. @kimmonismus reported that the company's top engineers haven't written code since December, powered by an internal AI system called "Honk" built on Claude:
"The company shipped 50+ new features in 2025 alone, with AI now enabling real-time bug fixes and feature deployments straight from a phone during a commute, dramatically accelerating product velocity."
@TechCrunch amplified the story, lending it mainstream credibility. Spotify is a serious engineering organization, not a startup chasing hype, which makes the claim harder to dismiss. This pairs uncomfortably with @kimmonismus relaying Mustafa Suleyman's prediction that "most of the tasks accountants, lawyers and other professionals currently undertake will be fully automated by AI within the next 12 to 18 months." Whether you take Suleyman's timeline literally or not, Spotify's existence proof suggests the direction is correct even if the pace is debatable.
AI Game Development Hits Escape Velocity
Game development has quietly become one of the most compelling showcases for AI coding capabilities. @Izkimar posted a detailed account of building an MMO-style game inspired by World of Warcraft's Mulgore zone, complete with target-based combat, quests, XP, abilities, and multiplayer networking, in roughly a day:
"At the end of 2024 I was struggling to build a simple Python auto-battler with Sonnet 3.5. Now I'm spinning up a fully networked MMO-style game in a matter of days."
@ErnestoSOFTWARE shared the prompt used to "one-shot Minecraft with Opus 4.6," while @martin_casado demonstrated a distributed multiplayer game with per-user permissions, multi-level portals, and deployment all built with 5.3 Codex. The games being built aren't polished products yet, but the velocity from "can't build a simple auto-battler" to "networked MMO prototype in a day" over roughly 14 months is striking. Game development may be where the gap between AI-assisted and traditional development becomes most visible first.
Carmack on the Agency Inversion
John Carmack dropped one of the day's most thought-provoking takes, arguing that as intelligence gets automated, agency becomes the scarce resource. @ID_AA_Carmack wrote:
"Now that many aspects of intelligence are successfully being automated, it seems likely that people with relatively lower intelligence but exceptional agency will come into their own if they are willing to egolessly accept AI advice."
His framing inverts decades of tech industry hiring orthodoxy, where raw cognitive horsepower was the premium trait. In a world of abundant AI intelligence, the person who can relentlessly execute while trusting AI guidance may outperform the brilliant but passive expert. It's an uncomfortable thesis, but today's other posts about harness optimization and workflow encoding support it. The winners aren't the smartest models or the smartest developers. They're the ones with the best systems for directing capability toward outcomes.
Source Posts
This is how I work now. Unbelievable. https://t.co/wc33rVYyew
coding has evolved 3 times for me over the last 6 months. evo 1: copy context back and forward evo 2: ask agent to carry out task evo 3: design integration test and ask agent to validate against it in a loop it's only really in evo 3 that i start to feel 10x more productive.
Cinema Studio 2.0 is LIVE NOW! AI filmmaking has never been that ADVANCED. What's NEW: Create 3D scenes & take FULL control from the Director Panel - choose your characters, adjust the speed, set any genre & edit scene flows. Lock every shot with the Multishot editor & bring characters to life with real emotional range. 6 professional bodies. 11 lenses. 15+ director movements. Full 4K.
Introducing Exa Instant: the first sub-200ms search engine. Faster than Google, it's custom built to power realtime AI products like chat and voice. https://t.co/eMHZbE0uYv
Time to consider not just human visitors, but to treat agents as first-class citizens. Cloudflareâs network now supports real-time content conversion to Markdown at the source using content negotiation headers. https://t.co/B7wYH4PtA8
Anthropic released 32-page guide on building Claude Skills here's the Full Breakdown ( in <350 words ) 1/ Claude Skills > A skill is a folder with instructions that teaches Claude how to handle specific tasks once, then benefit forever. > Think of it like this: MCP gives Claude access to your tools (Notion, Linear, Figma). > Skills teach Claude how to use those tools the way your team actually works. The guide breaks down into 3 core use cases: 1/ Document Creation Create consistent output (presentations, code, designs) following your exact standards without re-explaining style guides every time. 2/ Workflow Automation Multi-step processes that need consistent methodology. Example: sprint planning that fetches project status, analyzes velocity, suggests priorities, creates tasks automatically. 3/ MCP Enhancement Layer expertise onto tool access. Your skill knows the workflows, catches errors, applies domain knowledge your team has built over years. The technical setup is simpler than you'd think: 1/Required: One https://t.co/pt5Pefzhdy file with YAML frontmatter Optional: Scripts, reference docs, templates 2/The YAML frontmatter is critical. It tells Claude when to load your skill without burning tokens on irrelevant context. Two fields matter most: - name (kebab-case, no spaces) - description (what it does + when to trigger) Get the description wrong and your skill never loads. Get it right and Claude knows exactly when you need it. The guide includes 5 proven patterns: 1/ Sequential Workflow: > Step-by-step processes in specific order (onboarding, deployment, compliance checks) 2/ Multi-MCP Coordination: > Workflows spanning multiple services (design handoff from Figma to Linear to Slack) 3/ Iterative Refinement: > Output that improves through validation loops (report generation with quality checks) 4/ Context-Aware Selection: > Same outcome, different tools based on file type, size, or context 5/ Domain Intelligence: > Embedded expertise beyond tool access (financial compliance rules, security protocols) Common mistakes to avoid: >. Vague descriptions that never trigger > Instructions buried in verbose content > Missing error handling for MCP calls > Trying to do too much in one skill The underlying insight: > AI doesn't need to be general-purpose every conversation. > Give it specialized knowledge for your specific workflows and it becomes genuinely useful for work.
Dario Amodei just announced the death date of your profession. At Davos, Anthropicâs CEO said coding as a human skill has 6 to 12 months left. Not as hyperbole. As timeline. Amodei: âWe might be 6 to 12 months away.â Not prediction. Observation. His engineers already quit writing code. Amodei: âI have engineers within Anthropic who say: âI donât write any code anymore.ââ They donât touch syntax. They donât debug loops. Models generate flawless code. Humans curate, validate, direct. The job isnât building anymore. Itâs conducting. The transformation happened silently. While bootcamps taught React, the actual profession mutated into something unrecognizable. Still typing functions manually? Youâre not being diligent. Youâre already obsolete and havenât realized it. Amodei: âWe would make models that were good at coding and use that to produce the next generation of model.â The loop closes. AI writes the code that births superior AI. Recursion without human dependency. Once sealed, progress stops being gated by people. Only by semiconductors. One year. Requirements to production, fully autonomous. Humans set strategy. Machines execute perfectly, instantly, infinitely. Syntax is dead. Only intent remains. You donât build software now. You conceive it with precision, and intelligence manifests it before you finish the thought. The skill isnât coding anymore. Itâs knowing what to demand in the three seconds before the system delivers something you could never have built yourself. Your profession didnât evolve. It evaporated. And the people still learning to code are training for jobs that wonât exist when they graduate.
Spotify says its best developers havenât written a line of code since December, thanks to AI https://t.co/6hafAJOeJv
I improved 15 LLMs at coding in one afternoon. Only the harness changed.
Upgrading the edit tool to get 8% better performance out of Gemini... and more reasons not to ban your customer base. The wrong question The conve...
Here's my conversation with Peter Steinberger (@steipete), creator of OpenClaw, an open-source AI agent that has taken the Internet by storm, with now over 180,000 stars on GitHub. This was a truly mind-blowing, inspiring, and fun conversation! It's here on X in full and is up everywhere else (see comment). Timestamps: 0:00 - Episode highlight 1:30 - Introduction 5:36 - OpenClaw origin story 8:55 - Mind-blowing moment 18:22 - Why OpenClaw went viral 22:19 - Self-modifying AI agent 27:04 - Name-change drama 44:15 - Moltbook saga 52:34 - OpenClaw security concerns 1:01:14 - How to code with AI agents 1:32:09 - Programming setup 1:38:52 - GPT Codex 5.3 vs Claude Opus 4.6 1:47:59 - Best AI agent for programming 2:09:59 - Life story and career advice 2:13:56 - Money and happiness 2:17:49 - Acquisition offers from OpenAI and Meta 2:34:58 - How OpenClaw works 2:46:17 - AI slop 2:52:20 - AI agents will replace 80% of apps 3:00:57 - Will AI replace programmers? 3:12:57 - Future of OpenClaw community
We agree with @steipete. The concept of understanding and reviewing code is a dying star. It will be replaced by a workflow that starts with intent and ends with outcomes expressed in natural language, product and business metrics, as well as assertions to validate correctness. It's time for a new North Star đŤ
"What I really tried was to asked people to give me the prompts....". Super interesting take from @steipete on the Lex Friedman podcast. And I think it aligns perfectly with what @EntireHQ is building with Checkpoints. https://t.co/yMoMymy1fG
The latest Deep Think moves beyond abstract theory to drive practical applications. Itâs state-of-the-art on ARC-AGI-2, a benchmark for frontier AI reasoning. On Humanityâs Last Exam, it sets a new standard, tackling the hardest problems across mathematics, science, and engineering â making it a genuine collaborator for heavy-duty analysis. It achieved an Elo of 3455 on Codeforces, demonstrating the ability to solve complex, real-world coding tasks - while earning gold medal-level results on the written portion of the 2025 Physics and Chemistry Olympiads.
How in the world did we go from deformed Will Smith spaghetti to Rork max creating Minecraft in 1 prompt with Opus 4.6? And it only took 2 years ?! https://t.co/8TeIVjzJNJ