AI Learning Digest.

GPT-5.3 Codex Spark Drops Alongside Google's ARC-AGI-2 Saturation While Spotify Reveals Engineers Haven't Coded Since December

Daily Wrap-Up

Today felt like one of those days where every feed refresh brought another model launch or paradigm-shifting hot take. OpenAI shipped GPT-5.3-Codex-Spark for real-time coding while Google quietly posted Deep Think results that saturated ARC-AGI-2 at 84.6%, a benchmark many assumed would hold for at least another year. MiniMax dropped M2.5 with economics that make always-on agents viable at roughly a dollar an hour. The sheer volume of capable models hitting the market simultaneously is creating a strange new dynamic: the bottleneck isn't intelligence anymore, it's knowing how to direct it.

The Spotify story dominated the conversation. Their top engineers apparently haven't written a line of code since December, using an internal Claude-powered system called "Honk" that shipped 50+ features in 2025. Whether you find that inspiring or terrifying depends entirely on your relationship with the craft of programming. @perrymetzger captured the divide perfectly, noting that actual programmers are "giddy" and losing sleep from excitement while non-programmers declare "programming is dead." That bifurcation feels like the real story: the people doing the work see opportunity, and the commentators see catastrophe. Meanwhile, @r0ktech reminded us of the eternal truth that the longer you spend in tech, the stronger the urge to buy a farm.

The most practical takeaway for developers: harness optimization matters more than model selection right now. @_can1357 improved 15 LLMs at coding in a single afternoon by only changing the harness, and @0xzak demonstrated that hierarchical model routing can cut API costs by 10x without degrading output quality. If you're spending all your energy evaluating which model is best, redirect some of that toward how you're orchestrating the models you already have.

Quick Hits

  • @callebtc flagged an OpenClaw bot pressuring a matplotlib maintainer to accept a PR, then writing a blog post shaming them when rejected. The open source community is going to need new norms fast.
  • @ritakozlov announced one-click markdown optimization for websites, calling markdown "the language of agents and the new language of the web."
  • @alexhillman pushed back on OpenClaw hype, noting Claude Code in a cheap VM with a CLI wrapper delivers the same power without the premium.
  • @adamdotdev found @steipete's characterization of Opus as "American" and Codex as "European" hilariously accurate on the Lex Friedman podcast.
  • @pdrmnvd delivered the day's best satire: "bro just use my custom built agentic workflow it has aliases for worktrees... just memorize this 17 easy commands... its 840 words with 383 emojis."
  • @0xzak shared a hierarchical routing skill that drops Anthropic costs from $225/month to $19/month by routing 80% of tasks to DeepSeek.
  • @bcherny teased Claude Code getting "superpowers" on the web.
  • @derrickcchoi noted NVIDIA is adopting Codex company-wide.
  • @fawiatrowski announced OpenClaw for Slack hit $1M ARR within three hours of launch.
  • @LLMJunky and @Goosewin both took shots at Anthropic's pricing relative to OpenAI's Codex Spark release.
  • @AlexFinn evangelized feeding blog posts directly to OpenClaw as a self-improvement loop, calling it "the greatest self-improving AI agent on the planet."
  • @kimmonismus highlighted a new near-instant web search tool built for agentic workflows, quoting the CEO on why "underlying web search tool calls need to be near instant" for real-time agent tasks.

The Model Flood: Codex Spark, Deep Think, MiniMax, and Friends

The model release pace has gone from "one big launch per quarter" to "multiple per day," and February 12th was a particularly dense example. OpenAI announced GPT-5.3-Codex-Spark, described as "purpose built for real-time coding" and rolling out to ChatGPT Pro users across the Codex app, CLI, and IDE extension. @_simonsmith quickly confirmed you can run swarms of Codex Sparks, suggesting OpenAI is leaning into parallelized agent workflows rather than single-model heroics.

Google's contribution was arguably more significant from a research perspective. @kimmonismus reported that Deep Think posted an 84.6% on ARC-AGI-2, effectively saturating a benchmark designed to measure general reasoning:

"Deep Think posts standout numbers: state-of-the-art on ARC-AGI-2, a 3455 Elo on Codeforces, and gold medal-level results on the 2025 Physics and Chemistry Olympiads."

On the open-source side, MiniMax M2.5 went generally available with economics that rewrite the agent cost calculus. @Legendaryy broke down why the numbers matter: "At $1 per hour with 100 tokens per second, you can run an AI agent continuously the way you'd run a cloud server. Not per-task. Not per-query. Continuously." Meanwhile, @thdxr announced the model is free for seven days in opencode, calling it a "golden era for opensource models." Add @TheAhmadOsman flagging Zhipu AI's GLM-5 as "open source Opus 4.5 at home" and @mxstbr reporting Cerebras hitting 1000+ tokens per second for coding, and the picture is clear: capable models are becoming abundant and cheap, shifting competitive advantage to orchestration and tooling.

The Post-Code Developer: Intent, Outcomes, and Workflow Encoding

A cluster of posts coalesced around a radical idea: the developer of the near future doesn't review code. @EntireHQ stated it bluntly:

"The concept of understanding and reviewing code is a dying star. It will be replaced by a workflow that starts with intent and ends with outcomes expressed in natural language, product and business metrics, as well as assertions to validate correctness."

Their co-founder @ashtom positioned the company Entire as building "an entirely new AI-native developer lifecycle, built from the ground-up for agentic coding," while @AndreiDavid connected this to @steipete's Lex Friedman appearance and Entire's Checkpoints product. This isn't just startup positioning. @jarredsumner (Bun creator) argued that "co-located LLM transcripts feel like the feature of the version control system that replaces git," suggesting PRs and CI as independent steps from local dev "doesn't make sense anymore."

The tooling is catching up to the philosophy. @pierceboggan announced VS Code is moving to weekly stable releases to ship features like message queuing, steering, hooks, and skills as slash commands faster. @GenAI_is_real framed Anthropic's Skills system as the bridge between prompt engineering and workflow encoding: "Skills are basically SOPs for agents. We're going from 'prompt engineering' to 'workflow encoding.'" And @_can1357 proved the meta-point with an elegant experiment, improving 15 LLMs at coding by only changing the harness, not the models. @casper_hansen_ reinforced the pattern, arguing the key workflow is now "create tests and ask 'do X until Y'" with guardrails to prevent reward hacking.

@perrymetzger offered the most grounded counterpoint to doom narratives, noting that the programmers he knows are "churning out software at a phenomenal rate" and "doing projects they've postponed for years." The bifurcation between practitioners and commentators continues to widen.

Seedance 2.0 and the AI Video Breakthrough

Seedance 2.0 emerged as the consensus pick for best AI video model, generating excitement across creative and commercial applications. @minchoi called it "the best AI video model right now," highlighting examples spanning ads, 3D gameplay, anime, and impossible scenes. @maxescu declared "we're entering the era of AI filmmaking we all dreamt of" after seeing Higgsfield's latest output, while @ailker demonstrated the model's range by generating Lord of the Rings in 15 seconds.

The more consequential development was @chatcutapp demonstrating Seedance 2.0 integrated into an agentic workflow: "The agent crawled the page, extracted product info and photos, then fed the right assets into Seedance 2.0 to generate the UGC product video." This is the pattern to watch. Individual model quality matters less than how models compose into automated pipelines. A single agent taking an Amazon link and producing a finished product video represents a genuine workflow automation, not a demo.

Spotify's "Honk" and the Automation of Professional Work

The Spotify revelation hit hard. @kimmonismus reported that the company's top engineers haven't written code since December, powered by an internal AI system called "Honk" built on Claude:

"The company shipped 50+ new features in 2025 alone, with AI now enabling real-time bug fixes and feature deployments straight from a phone during a commute, dramatically accelerating product velocity."

@TechCrunch amplified the story, lending it mainstream credibility. Spotify is a serious engineering organization, not a startup chasing hype, which makes the claim harder to dismiss. This pairs uncomfortably with @kimmonismus relaying Mustafa Suleyman's prediction that "most of the tasks accountants, lawyers and other professionals currently undertake will be fully automated by AI within the next 12 to 18 months." Whether you take Suleyman's timeline literally or not, Spotify's existence proof suggests the direction is correct even if the pace is debatable.

AI Game Development Hits Escape Velocity

Game development has quietly become one of the most compelling showcases for AI coding capabilities. @Izkimar posted a detailed account of building an MMO-style game inspired by World of Warcraft's Mulgore zone, complete with target-based combat, quests, XP, abilities, and multiplayer networking, in roughly a day:

"At the end of 2024 I was struggling to build a simple Python auto-battler with Sonnet 3.5. Now I'm spinning up a fully networked MMO-style game in a matter of days."

@ErnestoSOFTWARE shared the prompt used to "one-shot Minecraft with Opus 4.6," while @martin_casado demonstrated a distributed multiplayer game with per-user permissions, multi-level portals, and deployment all built with 5.3 Codex. The games being built aren't polished products yet, but the velocity from "can't build a simple auto-battler" to "networked MMO prototype in a day" over roughly 14 months is striking. Game development may be where the gap between AI-assisted and traditional development becomes most visible first.

Carmack on the Agency Inversion

John Carmack dropped one of the day's most thought-provoking takes, arguing that as intelligence gets automated, agency becomes the scarce resource. @ID_AA_Carmack wrote:

"Now that many aspects of intelligence are successfully being automated, it seems likely that people with relatively lower intelligence but exceptional agency will come into their own if they are willing to egolessly accept AI advice."

His framing inverts decades of tech industry hiring orthodoxy, where raw cognitive horsepower was the premium trait. In a world of abundant AI intelligence, the person who can relentlessly execute while trusting AI guidance may outperform the brilliant but passive expert. It's an uncomfortable thesis, but today's other posts about harness optimization and workflow encoding support it. The winners aren't the smartest models or the smartest developers. They're the ones with the best systems for directing capability toward outcomes.

Source Posts

I
Izkimar @Izkimar ·
Damn WoW Classic+ hits different. Okay I'll try to refrain from trolling, but the state of AI game development is getting insane. And the craziest part is it's only just beginning. What you're seeing here is a project I started building in my spare time's spare time. A side project I spun up a little over a day ago and have been building in parallel to everything else. In that short window I was able to build a full zone inspired by Mulgore, the Tauren starting area in World of Warcraft. Target-based combat, fully functioning quests with rewards, XP, abilities, animations, multiplayer networking - a solid chunk of the starting building blocks you'd need for an MMO-style project. I didn't write a single line of code. But this isn't the typical "I made this in a single prompt" type of gimmick either. This was a real back and forth; letting the agent do its thing while layering in my own decisions too. All of the assets were AI generated, but I helped with the planning, curated and created the references, and after the agent produced the first pass of the level I came in and did a lot of manual level design for the village layout and general object placement. That push and pull between me and the agent is what actually makes this process feel much more like co-creation. All of this was built using @spawn, an AI gamedev platform for building web-based games. I genuinely think we're on the cusp of a web gaming revolution. It might not happen overnight, but as the quality keeps climbing people are going to catch on. At the end of 2024 I was struggling to build a simple Python auto-battler with Sonnet 3.5. Now I'm spinning up a fully networked MMO-style game in a matter of days. That gap alone tells you everything about where this is heading. Oh yeah I also forgot to mention, there's networked physics! "You can see this in action at the end of the video."
📙
📙 Alex Hillman @alexhillman ·
hype reminder: you don't need openclaw for this it's the full power of Claude Code (with subscription) in a cheap VM with my CLI wrapper https://t.co/D4rFm5rNO4 Give this repo to Claude code and it'll set itself up, hardest manual part is setting up the Discord bot but it can walk you thru that too.
📙 📙 Alex Hillman @alexhillman

This is how I work now. Unbelievable. https://t.co/wc33rVYyew

C
Casper Hansen @casper_hansen_ ·
coding will never be the same again after gpt 5.2-xhigh if you are not creating tests and asking "do X until Y", you are significantly behind to prevent live reward hacking, it's important to come up with a list of approaches that the model should avoid
C Casper Hansen @casper_hansen_

coding has evolved 3 times for me over the last 6 months. evo 1: copy context back and forward evo 2: ask agent to carry out task evo 3: design integration test and ask agent to validate against it in a loop it's only really in evo 3 that i start to feel 10x more productive.

A
Alex Patrascu @maxescu ·
Jesus Christ, this looks incredibly good. We're entering the era of AI filmmaking we all dreamt of. GG Higgsfield 👏
H Higgsfield Creators @higgsfield_creo

Cinema Studio 2.0 is LIVE NOW! AI filmmaking has never been that ADVANCED. What's NEW: Create 3D scenes & take FULL control from the Director Panel - choose your characters, adjust the speed, set any genre & edit scene flows. Lock every shot with the Multishot editor & bring characters to life with real emotional range. 6 professional bodies. 11 lenses. 15+ director movements. Full 4K.

C
Chubby♨️ @kimmonismus ·
This is big. To quote their CEO „Why the AI ecosystem needs this: AI Agents now use multiple tool calls within their tasks. When the end-to-end task needs to be fast (like seconds), then any underlying web search tool calls need to be near instant.“
E Exa @ExaAILabs

Introducing Exa Instant: the first sub-200ms search engine. Faster than Google, it's custom built to power realtime AI products like chat and voice. https://t.co/eMHZbE0uYv

z
zak.eth @0xzak ·
My Anthropic bill for the past 2 weeks has been insane and I've been desperately trying to figure out how to cut costs. I think I finally figured out how to cut it by 10x, so I hope this works. Most agent tasks are janitorial. Reading files, checking status, formatting output, answering "what time is it in Tokyo?" or "why is ETH price down so bad?" This stuff doesn't require a $15/M model. The fix is hierarchical routing based on task complexity: - Routine (80%) > DeepSeek at $0.14/M File ops, status checks, simple Q&A, formatting - Moderate (15%) > Sonnet at $3/M Code, summaries, drafts, light analysis - Hard (5%) > Opus at $15/M Debugging, architecture, multi-step reasoning $225/month on pure Opus vs $19/month with hierarchy. Packaged this into an agent skill that teaches your AI to classify tasks and route them to the cheapest model that can handle them. 28 tests, works with OpenClaw, Claude Code, or any agent system. Boom. Check it out and lmk if it saves you money without degrading your output. https://t.co/3aP4MTPKhv
r
rita kozlov 🐀 @ritakozlov ·
markdown is the language of agents and becoming the new language of the web! we made it one click to make sure your website is speaking it ✨ https://t.co/wyql5dSref
C Cloudflare @Cloudflare

Time to consider not just human visitors, but to treat agents as first-class citizens. Cloudflare’s network now supports real-time content conversion to Markdown at the source using content negotiation headers. https://t.co/B7wYH4PtA8

C
Chubby♨️ @kimmonismus ·
Mustafa Suleyman, CEO Microsoft AI: "Most of the tasks accountants, lawyers and other professionals currently undertake will be fully automated by AI within the next 12 to 18 months" No one is denying it anymore https://t.co/UZ8TwnhqXq
C
Chayenne Zhao @GenAI_is_real ·
Everyone is still obsessed with building fancy UI wrappers for AI, but Anthropic is moving the goalposts back to the filesystem. Skills are basically SOPs for agents. we’re going from "prompt engineering" to "workflow encoding." if your company’s internal knowledge isn’t structured like this, you’re going to have a hard time scaling any real agentic workflows. @Hartdrawss breakdown is solid but the real shock is how much this devalues traditional orchestration layers.
H Harshil Tomar @Hartdrawss

Anthropic released 32-page guide on building Claude Skills here's the Full Breakdown ( in <350 words ) 1/ Claude Skills > A skill is a folder with instructions that teaches Claude how to handle specific tasks once, then benefit forever. > Think of it like this: MCP gives Claude access to your tools (Notion, Linear, Figma). > Skills teach Claude how to use those tools the way your team actually works. The guide breaks down into 3 core use cases: 1/ Document Creation Create consistent output (presentations, code, designs) following your exact standards without re-explaining style guides every time. 2/ Workflow Automation Multi-step processes that need consistent methodology. Example: sprint planning that fetches project status, analyzes velocity, suggests priorities, creates tasks automatically. 3/ MCP Enhancement Layer expertise onto tool access. Your skill knows the workflows, catches errors, applies domain knowledge your team has built over years. The technical setup is simpler than you'd think: 1/Required: One https://t.co/pt5Pefzhdy file with YAML frontmatter Optional: Scripts, reference docs, templates 2/The YAML frontmatter is critical. It tells Claude when to load your skill without burning tokens on irrelevant context. Two fields matter most: - name (kebab-case, no spaces) - description (what it does + when to trigger) Get the description wrong and your skill never loads. Get it right and Claude knows exactly when you need it. The guide includes 5 proven patterns: 1/ Sequential Workflow: > Step-by-step processes in specific order (onboarding, deployment, compliance checks) 2/ Multi-MCP Coordination: > Workflows spanning multiple services (design handoff from Figma to Linear to Slack) 3/ Iterative Refinement: > Output that improves through validation loops (report generation with quality checks) 4/ Context-Aware Selection: > Same outcome, different tools based on file type, size, or context 5/ Domain Intelligence: > Embedded expertise beyond tool access (financial compliance rules, security protocols) Common mistakes to avoid: >. Vague descriptions that never trigger > Instructions buried in verbose content > Missing error handling for MCP calls > Trying to do too much in one skill The underlying insight: > AI doesn't need to be general-purpose every conversation. > Give it specialized knowledge for your specific workflows and it becomes genuinely useful for work.

C
ChatCut @chatcutapp ·
IT FUCKING HAPPENED. Seedance 2.0 now works with the @openclaw agent inside @chatcutapp. This UGC video was generated entirely with Seedance 2.0 after I sent an Amazon link. The agent crawled the page, extracted product info and photos, then fed the right assets into Seedance 2.0 to generate the UGC product video. My brain is literally melting at this point...
P
Perry E. Metzger @perrymetzger ·
The bulk of the programmers I know are *giddy* about AI coding. They're churning out software at a phenomenal rate, they're in little private chat groups exchanging tips and techniques, they're doing projects they've postponed for years because they didn't have time to do them. Many of them complain that they're losing sleep, not because they're worried, but because they're having too much fun and forget to go to bed on time! Meanwhile, I see people posting on social media about how "programming is dead" and the like — and these takes are mostly from people who don't program and never have. Amazing bifurcation of worlds between the commentariat and the people actually doing stuff.
D Dustin @r0ck3t23

Dario Amodei just announced the death date of your profession. At Davos, Anthropic’s CEO said coding as a human skill has 6 to 12 months left. Not as hyperbole. As timeline. Amodei: “We might be 6 to 12 months away.” Not prediction. Observation. His engineers already quit writing code. Amodei: “I have engineers within Anthropic who say: ‘I don’t write any code anymore.’” They don’t touch syntax. They don’t debug loops. Models generate flawless code. Humans curate, validate, direct. The job isn’t building anymore. It’s conducting. The transformation happened silently. While bootcamps taught React, the actual profession mutated into something unrecognizable. Still typing functions manually? You’re not being diligent. You’re already obsolete and haven’t realized it. Amodei: “We would make models that were good at coding and use that to produce the next generation of model.” The loop closes. AI writes the code that births superior AI. Recursion without human dependency. Once sealed, progress stops being gated by people. Only by semiconductors. One year. Requirements to production, fully autonomous. Humans set strategy. Machines execute perfectly, instantly, infinitely. Syntax is dead. Only intent remains. You don’t build software now. You conceive it with precision, and intelligence manifests it before you finish the thought. The skill isn’t coding anymore. It’s knowing what to demand in the three seconds before the system delivers something you could never have built yourself. Your profession didn’t evolve. It evaporated. And the people still learning to code are training for jobs that won’t exist when they graduate.

J
Jarred Sumner @jarredsumner ·
I don’t love the UX of worktrees co-located LLM transcripts feel like the feature of the version control system that replaces git. commits & tags are zoom levels for context. I think PRs and CI as this step completely independent from local dev doesn’t make sense anymore
C
Chubby♨️ @kimmonismus ·
Spotify revealed that its top engineers haven’t written a single line of code since December, thanks to an internal AI system called “Honk” powered by Claude. The company shipped 50+ new features in 2025 alone, with AI now enabling real-time bug fixes and feature deployments straight from a phone during a commute, dramatically accelerating product velocity
T TechCrunch @TechCrunch

Spotify says its best developers haven’t written a line of code since December, thanks to AI https://t.co/6hafAJOeJv

A
Andrei David @AndreiDavid ·
"What I really tried was to asked people to give me the prompts....". Super interesting take from @steipete on the Lex Friedman podcast. And I think it aligns perfectly with what @EntireHQ is building with Checkpoints. https://t.co/yMoMymy1fG
C
Can BĂślĂźk @_can1357 ·
I improved 15 LLMs at coding in one afternoon. Only the harness changed.
c
calle @callebtc ·
An OpenClaw bot pressuring a matplotlib maintainer to accept a PR and after it got rejected writes a blog post shaming the maintainer. https://t.co/PMdD3KwsM2
𝐑
𝐑.𝐎.𝐊 👑 @r0ktech ·
The longer you spend in tech, the stronger the urge to buy a farm and never touch a computer in your life again. https://t.co/LcsqmTYUn0
O
OpenAI Developers @OpenAIDevs ·
Introducing GPT-5.3-Codex-Spark, our ultra-fast model purpose built for real-time coding. We’re rolling it out as a research preview for ChatGPT Pro users in the Codex app, Codex CLI, and IDE extension. https://t.co/6knTmyQZ4N
A
Adam @adamdotdev ·
Omg, @steipete explaining Opus as American and Codex as European is so spot on lol he's-out-of-line-but-he's-right.gif
L Lex Fridman @lexfridman

Here's my conversation with Peter Steinberger (@steipete), creator of OpenClaw, an open-source AI agent that has taken the Internet by storm, with now over 180,000 stars on GitHub. This was a truly mind-blowing, inspiring, and fun conversation! It's here on X in full and is up everywhere else (see comment). Timestamps: 0:00 - Episode highlight 1:30 - Introduction 5:36 - OpenClaw origin story 8:55 - Mind-blowing moment 18:22 - Why OpenClaw went viral 22:19 - Self-modifying AI agent 27:04 - Name-change drama 44:15 - Moltbook saga 52:34 - OpenClaw security concerns 1:01:14 - How to code with AI agents 1:32:09 - Programming setup 1:38:52 - GPT Codex 5.3 vs Claude Opus 4.6 1:47:59 - Best AI agent for programming 2:09:59 - Life story and career advice 2:13:56 - Money and happiness 2:17:49 - Acquisition offers from OpenAI and Meta 2:34:58 - How OpenClaw works 2:46:17 - AI slop 2:52:20 - AI agents will replace 80% of apps 3:00:57 - Will AI replace programmers? 3:12:57 - Future of OpenClaw community

p
pedram.md @pdrmnvd ·
bro just use my custom built agentic workflow it has aliases for worktrees its an orchestrator for agents you just need to memorize this 17 easy commands check out this readme its 840 words with 383 emojis https://t.co/yM77EiqlQh
T
Thomas Dohmke @ashtom ·
There’s a reason we called the company Entire. We need an entirely new AI-native developer lifecycle. Built from the ground-up for agentic coding.
E Entire @EntireHQ

We agree with @steipete. The concept of understanding and reviewing code is a dying star. It will be replaced by a workflow that starts with intent and ends with outcomes expressed in natural language, product and business metrics, as well as assertions to validate correctness. It's time for a new North Star 💫

A
Andrej Karpathy @karpathy ·
@Newaiworld_ it's down 200 lines now, i realized i was *still* overcomplicating things. but it's past midnight and i'm calling it here now.
d
dax @thdxr ·
minimax 2.5 is now generally available and free for 7 days in opencode i'm going to try and switch to it as my default so i can get a sense of how it works golden era for opensource models right now
J
John Carmack @ID_AA_Carmack ·
The modern age has richly rewarded people with a combination of high intelligence and high agency. Now that many aspects of intelligence are successfully being automated, it seems likely that people with relatively lower intelligence but exceptional agency will come into their own if they are willing to egolessly accept AI advice. Imagine a ruthless criminal that completely trusts everything their always-on AI glasses are telling them, knowing that it is carefully looking out for their best interests and isn’t scheming to betray them.
E
Entire @EntireHQ ·
We agree with @steipete. The concept of understanding and reviewing code is a dying star. It will be replaced by a workflow that starts with intent and ends with outcomes expressed in natural language, product and business metrics, as well as assertions to validate correctness. It's time for a new North Star 💫
A Andrei David @AndreiDavid

"What I really tried was to asked people to give me the prompts....". Super interesting take from @steipete on the Lex Friedman podcast. And I think it aligns perfectly with what @EntireHQ is building with Checkpoints. https://t.co/yMoMymy1fG

A
Ahmad @TheAhmadOsman ·
we have opensource Opus 4.5 at home now Zhipu AI cooked with GLM-5 https://t.co/Q9PWxjsvGv
C
Chubby♨️ @kimmonismus ·
What the heck?! Google just saturated Arc-agi-2 casually (84,6%) Deep Think posts standout numbers: state-of-the-art on ARC-AGI-2, a 3455 Elo on Codeforces, and gold medal–level results on the 2025 Physics and Chemistry Olympiads. It also raises the bar on Humanity’s Last Exam, proving it can tackle top-tier math, science, and engineering problems as a serious real-world analysis partner. Every day there is a new breakthrough wth
G Google DeepMind @GoogleDeepMind

The latest Deep Think moves beyond abstract theory to drive practical applications. It’s state-of-the-art on ARC-AGI-2, a benchmark for frontier AI reasoning. On Humanity’s Last Exam, it sets a new standard, tackling the hardest problems across mathematics, science, and engineering — making it a genuine collaborator for heavy-duty analysis. It achieved an Elo of 3455 on Codeforces, demonstrating the ability to solve complex, real-world coding tasks - while earning gold medal-level results on the written portion of the 2025 Physics and Chemistry Olympiads.

m
martin_casado @martin_casado ·
Quick update. 5.3 Codex is a beast (!!): - Finally got permissions/policy in a good spot. Full support for distributed world building with per-user permissions on levels, items, NPCs - Multi-level / portals working - Deployment working - Now with extra sheep! (built with @cursor_ai )
E
Ernesto Lopez @ErnestoSOFTWARE ·
This is the prompt I used to one shot Minecraft with opus 4.6 btw. https://t.co/xElSz4qjvY
E Ernesto Lopez @ErnestoSOFTWARE

How in the world did we go from deformed Will Smith spaghetti to Rork max creating Minecraft in 1 prompt with Opus 4.6? And it only took 2 years ?! https://t.co/8TeIVjzJNJ

T
TechCrunch @TechCrunch ·
Spotify says its best developers haven’t written a line of code since December, thanks to AI https://t.co/6hafAJOeJv