AI Digest.

GitHub Ships Agent Client Protocol as Multi-Agent Workflows Expose the Human Bottleneck

Agent orchestration dominated today's discourse with GitHub adopting ACP for Copilot CLI, Andrew Ng launching an agent skills course with Anthropic, and engineers discovering that scaling to three concurrent agents makes the human the planning bottleneck. Context management is crystallizing into a proper discipline with concrete patterns for token budgets and lazy-loaded instructions across monorepos.

Daily Wrap-Up

The conversation today felt like a turning point for multi-agent development. We've moved decisively past "can agents do useful work" into "how do we orchestrate a fleet of them without losing our minds." @unclebobmartin nailed the inflection point: with one agent you wait for Claude, with three agents Claude waits for you. GitHub formalized this shift by adding Agent Client Protocol support to Copilot CLI, giving agents a standard interface for capability discovery, session isolation, and streaming results. @AndrewYNg dropped a full course on agent skills built with Anthropic, treating skills as portable instruction folders that deploy across Claude Code, the API, and the SDK. The plumbing for serious multi-agent development is hardening fast, and the people building on it are already hitting the next wall: their own planning bandwidth.

The second dominant thread was context management graduating from a bag of tricks to something resembling a discipline. @masondrxy outlined dynamic offloading where large tool results get swapped for filesystem pointers once context hits a threshold, while multiple engineers shared their CLAUDE.md strategies for monorepos. The consensus is coalescing around lazy-loaded subdirectory instructions that keep agents focused rather than drowning in irrelevant project details. On the career front, @hosseeb delivered the sharpest framing yet for the anxiety gripping the industry: treat this like 1993 and the PC revolution. Try everything. Build intuitions. Don't wait until it's a job requirement.

The entertainment highlight was @mattshumer_ showing an agent autonomously signing up for Reddit with its own email account, and @theo observing that nobody mentions GraphQL anymore now that AI handles API integration. The most practical takeaway for developers: adopt the lazy-loading CLAUDE.md pattern for your repos today, placing high-level architecture in root and feature-specific instructions in subdirectories, so your agents only load context they actually need for the task at hand.

Quick Hits

  • @BillAckman on Neuralink potentially restoring sight to the blind, calling it Musk's most important work yet.
  • @chris__sev flagged a "thoughtful and terrifying" article on prompt injection risks when AI agents have access to tools like Google CLI.
  • @theallinpod shared Coinbase CEO @brian_armstrong describing "reverse prompting": asking AI "what should I be aware of?" instead of telling it what to do.
  • @TheAhmadOsman shared Karpathy's advice on becoming an expert, noting it's how he learned LLM internals.
  • @filippkowalski highlighted Claude managing App Store workflows autonomously.
  • @angeloldesigns launched Supa Colors, a palette generator focused on visual rather than mathematical color balance.
  • @exQUIZitely took a nostalgic detour into Anno 1602 history, somehow fitting the "build your empire" energy in the agent space.
  • @theo noted Cursor's migration to React is "going roughly as expected" (not a compliment).
  • @theo also observed: "Upsides of AI: I haven't heard anyone mention GraphQL in years."
  • @ashebytes shared reflections on beauty found in the relational and AI's potential to reconnect us with our humanity.
  • @nummanali flagged an article arguing software distribution's future is via specification rather than packaged code.
  • @doodlestein mentioned xf, a tool for searching personal X/Twitter archives, with a broader search system coming.

Agent Orchestration Hits Critical Mass

The agent tooling ecosystem is consolidating around interoperability standards and repeatable patterns, and today brought several concrete moves. @github announced that Copilot CLI now supports the Agent Client Protocol, enabling agents to initialize connections, discover capabilities, create isolated sessions, and stream updates as they work. This matters because ACP provides the wiring for IDE integrations, CI/CD pipelines, custom frontends, and multi-agent coordination to all speak the same language.

On the educational front, @AndrewYNg launched a course on agent skills built with Anthropic, describing skills as "folders of instructions that equip agents with on-demand knowledge and workflows." The key pitch is build-once portability: the same skill works across claude.ai, Claude Code, the API, and the Agent SDK. This standardization push mirrors what's happening across the ecosystem as teams move from ad hoc prompting to structured, reusable agent capabilities.

The practitioners building on these foundations are already discovering the next constraint. @unclebobmartin put it bluntly: "With three agents Claude is waiting for me. I am the bottleneck. And the bottleneck is all planning." @ryancarson and @dcwj both published approaches to keeping agents productive overnight, with @dcwj's "Mr. Meeseeks Method" outlining a software factory pattern. @sawyerhood demonstrated the performance gap between browser agents, showing Do Browser completing a Figma retheme in 30 seconds versus 55 minutes for Claude for Chrome. And @doodlestein shared a "System Performance Remediation" skill for cleaning up the zombie processes and stuck compilations that accumulate when running multiple agents, calling the buildup of dead processes "mind-boggling." Even @mattshumer_ got in on the action, showcasing an agent autonomously creating a Reddit account with its own email through @agentmail. The agent infrastructure is maturing, but managing a fleet of autonomous workers is generating its own category of operational challenges.

Context Engineering Becomes a Discipline

If agent orchestration is the hot topic, context management is the quiet prerequisite that determines whether any of it actually works. @masondrxy outlined a pattern called dynamic offloading: "When context hits a threshold, large tool inputs and results are swapped for filesystem pointers and 10-line previews, while older history is compressed into a summary that the agent can 're-read' via retrieval tools only when needed." This is the kind of specific, battle-tested technique that marks a field moving from experimentation to engineering.

The CLAUDE.md pattern for managing agent instructions is generating its own best practices. @housecor made the case for subdirectory placement: "When instructions only apply to a subfolder, place the CLAUDE.md within the subfolder. Then those instructions are lazy loaded. They're only in context when that subfolder is read/written to." @somi_ai validated this at scale: "we have like 12 different CLAUDE.md files across our project and it keeps context super focused. The trick is putting high level architecture stuff in root and feature specific stuff in subdirs."

Meanwhile, @jumperz added a practical refinement to agent memory patterns: "having the agent write to memory files mid-session not just end of day catches more context before it gets lost." Taken together, these posts sketch out a coherent approach to context engineering: lazy-load instructions by scope, offload large artifacts to the filesystem, compress history aggressively, and persist memory continuously rather than in batch. None of this is revolutionary on its own, but the convergence of practitioners arriving at the same patterns independently suggests these are becoming settled best practices.

The Career Anxiety Discourse Gets a Better Frame

The "what does AI mean for my career" thread is a daily fixture at this point, but today's contributions ranged from nihilistic to genuinely helpful. @hosseeb delivered the most constructive take: "Imagine it was 1993 and the personal computer revolution was kicking off. If you could go back in time to then, what should you have done? The answer: try everything. Buy a PC. Learn how to touch type. Figure out what the Internet is." His core argument is that nobody has the answers yet, and staying at the frontier costs less than ever.

On the darker end, @PatrickHeizer raised what he called "potentially the worst non-lethal AI situation: AGI is never achieved, but it's enough of a capable replica that most 'BS jobs' are eliminated, creating an economic crisis where the productivity gains from the not-quite AGI can't raise the tide enough for all." @alexhillman echoed this, noting that "software became a factory floor and nobody noticed," while @andruyeung was characteristically terse: "Entry-level McKinsey consultants have now been automated." @davidpattersonx went full nihilist with "Don't learn to code. In fact, don't plan a career in anything."

The tension between these perspectives is real, but @hosseeb's framing holds up best. The people panicking are the ones watching from the sidelines. The people building intuitions through daily use are the ones who'll adapt fastest, regardless of which scenario plays out.

AI-Assisted Coding Finds Its Sweet Spot in Refactoring

A quieter but important thread emerged around where AI coding assistance actually delivers outsize value. @mattgperry identified refactoring as the killer use case: "It's tedious, not imaginative, and error prone. The refactor needed to get layout animations running outside React was massive & I abandoned a couple week-long attempts last year. Opus 4.5 had it done in an afternoon." This is a more measured claim than "AI writes all my code," and it's backed by a specific, verifiable result.

@TheAhmadOsman showed the local inference angle, running Claude Code against GLM-4.5 Air served by vLLM on 4x RTX 3090s. The demo is more proof-of-concept than daily driver, but it shows the local AI coding workflow is real and getting more accessible. @damianplayer took a contrarian stance, arguing that the hype around Claude Code doesn't match reality unless you invest in proper setup and configuration. The emerging consensus is that AI coding tools reward investment in context engineering and clear instructions, which loops back to the CLAUDE.md patterns discussed earlier.

Developer Tooling: Search, Diagrams, and Testing

Several tool launches and patterns landed today. @balintorosz released Beautiful Mermaid, a visual layer on top of Mermaid diagram syntax, motivated by diagrams becoming "my primary way of reasoning about code with Agents." This reflects a broader shift where developers are using visual artifacts to communicate with agents rather than writing detailed prose descriptions.

@doodlestein went deep on local semantic search, conducting a bake-off between embedding models and landing on a two-tier system: "use potion as a first pass but at the same time in the background we do miniLM-L6 and then when it finishes we upgrade the search results." The potion-128M model runs in sub-millisecond time with acceptable quality, while MiniLM-L6-v2 takes 128ms but delivers better semantic understanding. The progressive upgrade approach means users see instant results that get quietly refined. @nummanali shared early results testing UI with Playwright end-to-end tests managed by agents, another sign that agent-driven testing is gaining traction.

Research Breakthroughs Still Worth Betting On

@karpathy pushed back against the narrative that AI incumbents are unbeatable, drawing on history: "This is exactly the sentiment I listened to often when OpenAI started ('how could the few of you possibly compete with Google?') and it was very wrong, and then it was very wrong again with a whole another round of startups." His argument is that rapid progress creates dust in the air, and the gap between frontier LLMs and the 20-watt human brain means "the probability of research breakthroughs that yield closer to 10X improvements (instead of 10%) still feels very high."

Reinforcing this from a different angle, @thdxr noted that a $20K consumer hardware setup running a "very very good" model is now competitive with the $10-20K per developer that companies already spend annually on cloud inference. The economics of local inference are crossing over, and the combination of falling hardware costs and potential research breakthroughs suggests the competitive landscape is far from settled.

Sources

A
Andrej Karpathy @karpathy ·
@airesearch12 💯 @ Spec-driven development It's the limit of imperative -> declarative transition, basically being declarative entirely. Relatedly my mind was recently blown by https://t.co/pTfOfWwcW1 , extreme and early but inspiring example.
C
chirag @mrnacknack ·
10 ways to hack into a vibecoder's clawdbot & get entire human identity (educational purposes only)
B
Boris Cherny @bcherny ·
In case it’s not clear in the docs: - Ancestor https://t.co/pp5TJkWmFE’s are loaded into context automatically on startup - Descendent https://t.co/pp5TJkWmFE’s are loaded *lazily* only when Claude reads/writes files in a folder the https://t.co/pp5TJkWmFE is in. Think of it as a special kind of skill. We designed it this way for monorepos and other big repos, tends to work pretty well in practice.
S
Somi AI @somi_ai ·
honestly the lazy loading is so clutch for monorepos. we have like 12 different https://t.co/pbJ4jymZPh files across our project and it keeps context super focused. the trick is putting high level architecture stuff in root and feature specific stuff in subdirshonestly the lazy loading is so clutch for monorepos. we have like 12 different https://t.co/pbJ4jymZPh files across our project and it keeps context super focused. the trick is putting high level architecture stuff in root and feature specific stuff in subdirs
A
Angelo Libero @angeloldesigns ·
Excited to share something I've been working on. 3 years of color tools. 2 months building this. Supa Colors generates palettes where every shade looks balanced — visually, not just mathematically. Really proud of it. 🔗 https://t.co/LT0GSmor7H https://t.co/esBIlNEly3
P
Patrick Heizer @PatrickHeizer ·
We underrate potentially the worst non-lethal AI situation: AGI is never achieved, but it's enough of a capable replica that most "BS jobs" are eliminated, creating an economic crisis where the productivity gains from the not-quite AGI can't 'raise the tide' enough for all.
D
Damian Player @damianplayer ·
Clawdbot Is Mostly Hype. Unless You Do This (read twice)...
C
Cory House @housecor ·
This is an important point for context optimization. When instructions only apply to a subfolder, place the https://t.co/i5oTIORKNu within the subfolder. Why? Then those instructions are lazy loaded. They’re only in context when that subfolder is read/written to.
B bcherny @bcherny

In case it’s not clear in the docs: - Ancestor https://t.co/pp5TJkWmFE’s are loaded into context automatically on startup - Descendent https://t.co/pp5TJkWmFE’s are loaded *lazily* only when Claude reads/writes files in a folder the https://t.co/pp5TJkWmFE is in. Think of it as a special kind of skill. We designed it this way for monorepos and other big repos, tends to work pretty well in practice.

R
Ryan Carson @ryancarson ·
How to make your agent learn and ship while you sleep
G
GitHub @github ·
🆕 Copilot CLI now supports the Agent Client Protocol. Set up this communication between AI agents and clients to: • Initialize a connection and discover agent capabilities • Create isolated sessions with custom working directories • Send prompts with text, images, and context resources • Receive streaming updates as the agent works • And more ✅ Learn how you can rethink IDE integrations, CI/CD pipelines, custom frontends, and multi-agent systems with ACP. 👇 https://t.co/voS348IOoM
T
The All-In Podcast @theallinpod ·
Coinbase CEO Explains “Reverse Prompting” and the Rise of the AI CEO @brian_armstrong: “One of the big pushes we made in the last year was we got our own internal hosted AI model that was connected to all of our data sources, right?” “So it's like every Slack message, every Google doc, Salesforce data, Confluence, you know.” “So now the data is all aggregated and I've started to ask it really… it's not just like prompting it, ‘Hey, can you write this kind of memo for me,’ or something.” “I'm asking these AI agents now, ‘As CEO, what should I be aware of in the company that I might not be aware of?’ And it'll tell me, ‘Did you know that there's actually disagreement on this team about the strategy?’ And I was like, actually, I didn't know that.” “This is like reverse prompting. So instead of telling the AI agent what you want it to do, you ask it what you should be thinking more about.” @Jason: “It's a mentor. It's a coach.” Brian: “Yeah. Like, what could make me a better CEO? And it's like, ‘Well, I looked at how you spent your time in the last quarter and here's how you said that you wanted to spend it, but you actually spent 32% of your time on this instead of 20%.’” “I've asked it other questions like, ‘What's the thing that I changed my mind on the most over the last year?’ Things like that.” “It'll prompt you with information you should be thinking about instead of the other way around.” Thanks to our partner for making this happen!: Our episode is sponsored by the New York Stock Exchange - a modern marketplace and exchange for building the future. It all happens at the @NYSE. https://t.co/cUEk8db7Sw
C
Chris Sev @chris__sev ·
Very thoughtful (and terrifying) article. My favorite is the email prompt injection if you gave your @openclaw access to gog (Google CLI) https://t.co/Xoj4DoHFXg
M mrnacknack @mrnacknack

10 ways to hack into a vibecoder's clawdbot & get entire human identity (educational purposes only)

J
JUMPERZ @jumperz ·
@ryancarson running this pattern too. one addition: having the agent write to memory files mid-session not just end of day catches more context before it gets lost
N
Numman Ali @nummanali ·
The Future of Software distribution will be via Specification Amazing read: https://t.co/wSZUV5KGOO cr: @kenn https://t.co/ovJE0QTL9k
K karpathy @karpathy

@airesearch12 💯 @ Spec-driven development It's the limit of imperative -> declarative transition, basically being declarative entirely. Relatedly my mind was recently blown by https://t.co/pTfOfWwcW1 , extreme and early but inspiring example.

S
Satya Nadella @satyanadella ·
Just reported our quarterly results. We are still in the beginning phases of AI diffusion and its broad GDP impact, and already we’ve built an AI business that is larger than some of our biggest franchises that took decades to build. Our quarterly cloud revenue crossed $50 billion for the first time. What’s striking is it was less than 10 years ago that our annual cloud revenue was $10 billion! (That is what expanding TAM + good execution looks like) A few other highlights from across the stack:
E
Eyad @eyad_khrais ·
I Installed Moltbot. Most Of What You're Seeing On X Is Overhyped.
P
Prince Canuma @Prince_Canuma ·
Wow, this is incredible almost fooled me! 🔥 And it only took 28 mins to generate? Guess the latest optimizations were worth it. This is why I built mlx-video, to enable creatives. https://t.co/2JyYR7qfFY
J JakiTreehorne @JakiTreehorne

Just used @openclaw to produce a 25-second "Her"-style commercial 100% locally: 🎬 MLX-Video + LTX-2 (19B) on M4 series Mac 128G 🎙️ ElevenLabs VO 🎵 Epidemic Sound 10 scenes with continuity. 28 min generation. Zero cloud render costs. Huge thanks to @Prince_Canuma for mlx-video 🔥 Local AI filmmaking is here.

M
Michael Feldstein @msfeldstein ·
My favorite way of using cursor is asking it to deconstruct things i want to understand, show it to me step by step rather than one shot generations of things i dont understand. You can build your own interactive explainers. https://t.co/e4Z37aoJB3
X XorDev @XorDev

Rocaille 2 vec2 p=(FC.xy*2.-r)/r.y/.3,v;for(float i,f;i++<1e1;o+=(cos(i+vec4(0,1,2,3))+1.)/6./length(v))for(v=p,f=0.;f++<9.;v+=sin(v.yx*f+i+t)/f);o=tanh(o*o); https://t.co/PRJ99gngf5

F
Fernando Rojo @fernandorojo ·
We just launched 𝚟𝚎𝚛𝚌𝚎𝚕-𝚌𝚘𝚖𝚙𝚘𝚜𝚒𝚝𝚒𝚘𝚗-𝚙𝚊𝚝𝚝𝚎𝚛𝚗𝚜: every lesson from the talk below, now available as a skill. Turn your React code into something you (and your LLM) enjoy working with. ▲ ~/ npx skills add vercel-labs/agent-skills https://t.co/1xQpArcB7i
F fernandorojo @fernandorojo

Composition is all you need. Watch the full video below. https://t.co/efP8tl0es0

A
Ahmad @TheAhmadOsman ·
Genuine advice If you need ANY hardware, BUY IT NOW - Phones - Laptops - Computer parts Hardware prices are about to get ridiculous I just bought my wife a new MacBook & iPhone I’m not trying to flex, just getting ahead of the supply shock before the prices get wild
K
khoi @khoiracle ·
Launching Supacode https://t.co/xsiil8wedj - A native macOS coding agent orchestrator. 📟 Claude Code, Codex, Open Code or any agents run natively 👻 libghostty as the engine so blazing fast ⇥ Tabs, panes, splits so you can bring our own tools (lazygit, emacs, magit) Try it out, hope you like it.
P
Pierce Boggan @pierceboggan ·
Introducing Primer: Get your repo ready for AI - Generate high-quality instructions for your repos - Lightweight eval framework to ensure instructions improve agent outcomes - Batch processing with auto PR submission for organizations and teams to scale AI initiatives Try it: https://t.co/0bHvfksvap
X
xAI @xai ·
We are excited about partnering with @fal on the new Grok Imagine API!
F fal @fal

fal is proud to partner with @xai as Grok Imagine’s day-0 platform partner xAI's latest image & video gen + editing model ✨ Stunning photorealistic images/videos from text ⚡ Lightning-fast generation 🎥 Dynamic animations with precise control 🎨 Edit elements, styles & more https://t.co/1RwkhlJA9w

T
Theo - t3.gg @theo ·
Calling it now: all these agent coding TUIs are a phase and it will be short lived. Most devs will be back in GUIs and IDEs in a few months.
C
Cheng Lou @_chenglou ·
Waiting for Opus 5 to clean up the mess I’ve made with Opus 4.5
S
Simplifying AI @simplifyinAI ·
"I don't have a GPU" is officially dead 🤯 You can now run 70B model on a single 4GB GPU and it even scales up to the colossal Llama 3.1 405B on just 8GB of VRAM. AirLLM uses "Layer-wise Inference." Instead of loading the whole model, it loads, computes, and flushes one layer at a time → No quantization needed by default → Supports Llama, Qwen, and Mistral → Works on Linux, Windows, and macOS 100% Open Source.
K
kanav @kanavtwt ·
Someone made it possible to write AWS infrastructure using React components. And it outputs production-grade Terraform too 😭 https://t.co/TJ5x9rrtdx https://t.co/7HHEe9iK9I
D
Dan ⚡️ @d4m1n ·
just in: Agent Skills apparently suck because they introduce a decision point the new reco is... compress the sh💩t out of your instructions and paste them all into AGENTS․md?! this produced code with 100% pass rate vs 79% with skills
A
Ahmad @TheAhmadOsman ·
The Top 26 Essential Papers (+5 Bonus Resources) for Mastering LLMs and Transformers This list bridges the Transformer foundations with the reasoning, MoE, and agentic shift Recommended Reading Order 1. Attention Is All You Need (Vaswani et al., 2017) > The original Transformer paper. Covers self-attention, > multi-head attention, and the encoder-decoder structure > (even though most modern LLMs are decoder-only.) 2. The Illustrated Transformer (Jay Alammar, 2018) > Great intuition builder for understanding > attention and tensor flow before diving into implementations 3. BERT: Pre-training of Deep Bidirectional Transformers (Devlin et al., 2018) > Encoder-side fundamentals, masked language modeling, > and representation learning that still shape modern architectures 4. Language Models are Few-Shot Learners (GPT-3) (Brown et al., 2020) > Established in-context learning as a real > capability and shifted how prompting is understood 5. Scaling Laws for Neural Language Models (Kaplan et al., 2020) > First clean empirical scaling framework for parameters, data, and compute > Read alongside Chinchilla to understand why most models were undertrained 6. Training Compute-Optimal Large Language Models (Chinchilla) (Hoffmann et al., 2022) > Demonstrated that token count matters more than > parameter count for a fixed compute budget 7. LLaMA: Open and Efficient Foundation Language Models (Touvron et al., 2023) > The paper that triggered the open-weight era > Introduced architectural defaults like RMSNorm, SwiGLU > and RoPE as standard practice 8. RoFormer: Rotary Position Embedding (Su et al., 2021) > Positional encoding that became the modern default for long-context LLMs 9. FlashAttention (Dao et al., 2022) > Memory-efficient attention that enabled long context windows > and high-throughput inference by optimizing GPU memory access. 10. Retrieval-Augmented Generation (RAG) (Lewis et al., 2020) > Combines parametric models with external knowledge sources > Foundational for grounded and enterprise systems 11. Training Language Models to Follow Instructions with Human Feedback (InstructGPT) (Ouyang et al., 2022) > The modern post-training and alignment blueprint > that instruction-tuned models follow 12. Direct Preference Optimization (DPO) (Rafailov et al., 2023) > A simpler and more stable alternative to PPO-based RLHF > Preference alignment via the loss function 13. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (Wei et al., 2022) > Demonstrated that reasoning can be elicited through prompting > alone and laid the groundwork for later reasoning-focused training 14. ReAct: Reasoning and Acting (Yao et al., 2022 / ICLR 2023) > The foundation of agentic systems > Combines reasoning traces with tool use and environment interaction 15. DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning (Guo et al., 2025) > The R1 paper. Proved that large-scale reinforcement learning without > supervised data can induce self-verification and structured reasoning behavior 16. Qwen3 Technical Report (Yang et al., 2025) > A modern architecture lightweight overview > Introduced unified MoE with Thinking Mode and Non-Thinking > Mode to dynamically trade off cost and reasoning depth 17. Outrageously Large Neural Networks: Sparsely-Gated Mixture of Experts (Shazeer et al., 2017) > The modern MoE ignition point > Conditional computation at scale 18. Switch Transformers (Fedus et al., 2021) > Simplified MoE routing using single-expert activation > Key to stabilizing trillion-parameter training 19. Mixtral of Experts (Mistral AI, 2024) > Open-weight MoE that proved sparse models can match dense quality > while running at small-model inference cost 20. Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints (Komatsuzaki et al., 2022 / ICLR 2023) > Practical technique for converting dense checkpoints into MoE models > Critical for compute reuse and iterative scaling 21. The Platonic Representation Hypothesis (Huh et al., 2024) > Evidence that scaled models converge toward shared > internal representations across modalities 22. Textbooks Are All You Need (Gunasekar et al., 2023) > Demonstrated that high-quality synthetic data allows > small models to outperform much larger ones 23. Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet (Templeton et al., 2024) > The biggest leap in mechanistic interpretability > Decomposes neural networks into millions of interpretable features 24. PaLM: Scaling Language Modeling with Pathways (Chowdhery et al., 2022) > A masterclass in large-scale training > orchestration across thousands of accelerators 25. GLaM: Generalist Language Model (Du et al., 2022) > Validated MoE scaling economics with massive > total parameters but small active parameter counts 26. The Smol Training Playbook (Hugging Face, 2025) > Practical end-to-end handbook for efficiently training language models Bonus Material > T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (Raffel et al., 2019) > Toolformer (Schick et al., 2023) > GShard (Lepikhin et al., 2020) > Adaptive Mixtures of Local Experts (Jacobs et al., 1991) > Hierarchical Mixtures of Experts (Jordan and Jacobs, 1994) If you deeply understand these fundamentals; Transformer core, scaling laws, FlashAttention, instruction tuning, R1-style reasoning, and MoE upcycling, you already understand LLMs better than most Time to lock-in, good luck ;)
T TheAhmadOsman @TheAhmadOsman

There are maybe ~20-25 papers that matter. Implement those and you’ve captured ~90% of the alpha behind modern LLMs. Everything else is garnish.

U
Unsloth AI @UnslothAI ·
We successfully trained an LLM without human intervention using Claude Code. We made a guide on how to do this with local LLMs via Claude Code and OpenAI Codex. Connect GLM-4.7-Flash to your server and start agentic coding locally! Guide: https://t.co/NXNX35i50r https://t.co/VFIxiEXG9i
A
Aparna Dhinakaran @aparnadhinak ·
Agent Harness Architectures
G
Google AI Developers @googleaidevs ·
Access the weights on GitHub and @huggingface. https://t.co/oZDE8Wh0jH
G
Google AI Developers @googleaidevs ·
AlphaGenome, a new breakthrough AI model for genomics, is our most accurate and comprehensive DNA sequence model to date. Watch the video to learn how it works. 🧬 https://t.co/5u5StRAiAE
G
Google AI Developers @googleaidevs ·
And check out the @Nature article to learn more. https://t.co/q3GCp4l9Uv
G
Google DeepMind @GoogleDeepMind ·
Step inside Project Genie: our experimental research prototype that lets you create, edit, and explore virtual worlds. 🌎
G
Google DeepMind @GoogleDeepMind ·
Here’s how it works: 🔵 Design your world and character using text and visual prompts. 🔵 Nano Banana Pro makes an image preview that you can adjust. 🔵 Our Genie 3 world model generates the environment in real-time as you move through. 🔵 Remix existing worlds or discover new ones in the gallery.
G
Google DeepMind @GoogleDeepMind ·
Project Genie is rolling out to @Google AI Ultra subscribers in the U.S. (18+) With this prototype, we want to learn more about immersive user experiences to advance our research and help us better understand the future of world models. See the details → https://t.co/JsQm3hxaQ8 https://t.co/238Q3mbUra
T
Theoretically Media @TheoMediaAI ·
Google Genie is seriously mind bending. This is a Text To World prompt of a man walking down Hollywood Blvd. I am not only controlling the movement of the man, but also the camera. This is the World Model we've been waiting for. More Below! https://t.co/ojQHhpNKDM
E
Ethan Mollick @emollick ·
Had early access to Genie 3 world modelling. Huge leap forward in modelling/physics but some issues remain Here is a bit of an otter airline pilot with a duck on its head walking through a Rothko inspired airport and an otter in a wingsuit flying through a city of gothic towers. https://t.co/Aot58bxAOP
V
vittorio @IterIntellectus ·
WORLD MODEL IS HERE
G GoogleDeepMind @GoogleDeepMind

Step inside Project Genie: our experimental research prototype that lets you create, edit, and explore virtual worlds. 🌎

R
Ryan Carson @ryancarson ·
If you can do this, you're in the top 1% of engineers right now. Most engineers in enterprise are barely using agents (and if they are, most are stuck with copilot). If you can add looping at night, you go next level. It's not hard though. Just point your agent at this article (or copy/paste) and say "Help me set this up". Trigger the crons manually and iron out any bugs, then set it and wake up tomorrow to see what you've got.
R ryancarson @ryancarson

How to make your agent learn and ship while you sleep

P
Peter H. Diamandis, MD @PeterDiamandis ·
A PROPOSAL FOR UNIVERSAL HIGH INCOME (UHI): During my recent Moonshots podcast with @elonmusk, we dove into his notion of Universal High Income (UHI) – Elon’s proposal that an AI and Robotics will enable a world of sustainable abundance for all... a life beyond basic income, towards high income and standards of living. When I asked him how this might work, he said: “You know, this is my intuition but I don’t know how to do it. I welcome ideas.” That single statement has been ringing in my head ever since. Here’s why: the economics of scarcity are flipping to the economics of Abundance. I do believe that AI and humanoid robots can produce nearly anything we need—goods, services, healthcare, education—at costs approaching zero. But there’s a gap between that vision and getting there. How do we actually fund and distribute Abundance to everyone? Today, I’m excited to share one compelling answer. I’ve been talking to Daniel Schreiber, CEO of Lemonade (the AI-insurance company that just launched 50% off premiums for Tesla FSD drivers), about a framework called the MOSAIC Model: a concrete proposal for how governments could implement Universal High Income without raising taxes on workers or businesses. (See the components of MOSAIC in my P.S. below.) Here’s the core insight that makes the math work: 1/ THE AUTOMATION PARADOX: AI Unemployment ≠ Traditional Unemployment When most people hear “mass job displacement,” they picture economic collapse: bread lines, depression, social chaos. That’s because they’re thinking about traditional unemployment, where workers disappear and nothing replaces them. AI unemployment is fundamentally different. Think of it this way: imagine sending a digital twin to work in your place. It performs your tasks faster, cheaper, and better. The company’s output increases. GDP grows. The resources exist – they just need to be redistributed. This is the Automation Paradox: AI can raise productivity while displacing labor. When workers are replaced by more productive capital, GDP rises even as fewer humans work. The challenge is not affordability. It’s capture and distribution. 2/ “AI DIVIDEND”: Where the Money Actually Comes From Daniel’s framework identifies two places the AI surplus shows up, and how to capture it without disrupting consumers or raising statutory tax rates: Channel 1: Dynamic VAT (The Deflation Dividend) AI is deflationary. When AI cuts the cost of producing something by 30%, that value creation can either flow entirely to shareholders – or be partially recaptured for society. Dynamic VAT works like this: as AI drives quality-adjusted price declines in goods and services, the VAT rate adjusts upward by exactly enough to keep consumer prices stable. Consumers pay the same. But the government captures part of the deflation dividend. It’s frictionless redistribution. Prices don’t rise. No one feels it. Channel 2: Over-Trend Profit Ring-Fencing AI is generating windfall profits for companies at the frontier. Rather than raising corporate tax rates (which drives capital flight), the MOSAIC Model proposes ring-fencing only the above-trend portion of capital income tax receipts. Baseline profits? Untouched. Normal corporate taxes? Unchanged. But what about the incremental surge in profits attributable to AI? A portion gets earmarked for the “Universal High Income” fund. Statutory rates stay the same. Companies keep most of their windfall. But society captures enough to fund a universal floor. 3/ WHAT THIS MEANS FOR FAMILIES: Here’s where it gets real. Under the MOSAIC Model’s basic implementation (before any additional policy choices), a household with two non-working parents and two children would receive income equivalent to today’s fourth decile: roughly the 30-40th percentile of current household income. To be clear, that’s not survival-level subsistence. It’s lower-middle-class security. For doing nothing. This creates a Universal Basic Floor – funded entirely by the two low-friction channels above. But this is just the starting line, not the finish line. If society chooses to capture more of the AI dividend through additional mechanisms (windfall levies, land-value capture, AI-services taxation), the floor could rise to what Daniel calls the “the UHI Benchmark”: approximately 120% of median wages. Upper-middle-class income. Universal. The surplus exists. The question is: how much do we collectively choose to redistribute? 4/ WHY TIMING IS EVERYTHING: Here’s what keeps both Daniel and me up at night: the political window for implementing this is closing. The MOSAIC Model’s political economy analysis shows something counterintuitive: feasibility is highest early in the AI transition – before capital consolidates opposition, before tech incumbents organize billion-dollar lobbying efforts, before the status quo hardens. Wait until mass displacement is undeniable? By then, it may be too late to pass anything. Act early or not at all. A good system passed in 2026 beats a perfect system proposed in 2030 that fails. 5/ THE INVITATION: Elon said he welcomes ideas. This is one. The MOSAIC Model isn’t the only answer, but it’s a rigorous, economically grounded starting point. It demonstrates that Universal High Income is not utopian dreaming. It’s an engineering problem with identifiable solutions. The AI dividend is real. The fiscal math works. The question is whether we have the collective will to build the capture mechanisms before the window closes. The full MOSAIC Model is available today at https://t.co/foAZ0mToPw for policymakers, economists, and fellow entrepreneurs to critique, improve, and implement. Read the full plan, verify the math, and let’s debate this. Because this is not a matter of any single country or company getting it right. It’s about humanity navigating the biggest economic transition in history. When AI takes our jobs, it should also pay our wages. Let’s make that happen. Peter Diamandis (in collaboration with Daniel Schreiber, @daschreiber, CEO of Lemonade and Chair of the MOSAIC AI Policy Institute) P.S. The detailed components of MOSAIC that make the model affordable: M – Multi-channel / Mechanism (Implied): The core philosophy that no single tax can fund UHI alone; it requires a “mosaic” of multiple bases. O – Over-trend Ring-fencing: Earmarking 85% of the “windfall” capital-income tax receipts (profits and capital gains) that exceed historical trends. S – Savings (Government Automation Dividend - GAD): capturing the cost savings from automating government bureaucracy (e.g., using AI for back-office admin). A – AI-linked Deflation (Captured via Dynamic VAT): The largest tile. As AI drives prices down, the VAT rate adjusts upward to capture the “deflation gap,” keeping prices stable for consumers while generating revenue. I – Income (Negative Income Tax): The distribution mechanism itself, ensuring work always pays. C – Consolidation: Rolling existing, overlapping welfare transfers into the new single payment to avoid double-spending. In short: The MOSAIC is the Fiscal Architecture. It argues that while one tax (like a “wealth tax”) is politically impossible or insufficient, a mosaic of VAT + Windfall Profits + Efficiency Savings + Legacy Consolidation creates a robust funding base for a poverty-ending income floor.
G
Google AI @GoogleAI ·
Last August, we previewed Genie 3: a general-purpose world model that turns a single text prompt into a dynamic, interactive environment. Since then, trusted testers have taken it further than we ever imagined — experimenting, exploring, and pioneering entirely new interactive worlds. Now, it’s your turn. Starting today, we're rolling out access to Project Genie for Google AI Ultra subscribers in the U.S. (18+). We know what you create will be out of this world 🚀
H
Hugo @striedinger ·
Imagine naming your company after the metaverse and not coming up with this
G GoogleDeepMind @GoogleDeepMind

Step inside Project Genie: our experimental research prototype that lets you create, edit, and explore virtual worlds. 🌎

C
Cursor @cursor_ai ·
We're proposing an open standard for tracing agent conversations to the code they generate. It's interoperable with any coding agent or interface. https://t.co/jO4DIoIl6A
S
Sasha Varlamov @savarlamov ·
@cursor_ai Thanks for including us in the drafting process Lee. We'd love agent-trace contributions to the Git AI https://t.co/XEONu8FOQg We've already got support for all the big agents, Cursor included!
L
Lee Robinson @leerob ·
This has been fun to work on. Excited to see how the spec evolves! It should be easy to understand models/prompts used across any coding agent, IDE, CLI, etc. Might as well figure out the shared schema once versus having a hundred different versions.
C cursor_ai @cursor_ai

We're proposing an open standard for tracing agent conversations to the code they generate. It's interoperable with any coding agent or interface. https://t.co/jO4DIoIl6A

D
Dev Shah @0xDevShah ·
genie is to robotics what opus is to agents. so close to the alphafold moment for robotics. sim-to-real had been waiting for this all along. > genie computes environments > transform frames into gaussian splats (nvidia's longsplats) > splats collapse into low-poly > low poly gets an ultra-realistic touch up with shaders and textures > the whole thing gets dropped in isaac lab where autonomous machinery can learn to navigate complex world at compute cost generate > reconstruct > realisticize > simulate > train just five steps between "genie imagine a warehouse" to "robots know how to move through warehouses" @demishassabis, please open source this (or a lite version). the full robotics community is just a few miles away from infinite training envs. cc @OfficialLoganK @sundarpichai
D demishassabis @demishassabis

Thrilled to launch Project Genie, an experimental prototype of the world's most advanced world model. Create entire playable worlds to explore in real-time just from a simple text prompt - kind of mindblowing really! Available to Ultra subs in the US for now - have fun exploring! https://t.co/2XDy0V0BW0

A
a16z @a16z ·
The hottest role in tech — the forward-deployed engineer — was "the ugliest duckling" for a decade. In this conversation, Akshay Krishnaswamy, Chief Architect of Palantir, joins a16z GP Erin Price-Wright to cover: - Why a good team of engineers is like a hive mind - The archetypes of people that thrive as FDEs - Why pain tolerance is a hiring filter - Managing high-agency engineers without hierarchy and more. 00:00 Introduction 02:17 Defining forward-deployed engineering 04:49 Differences between FDE and other roles 06:09 Building and managing teams 09:55 Challenges and evolution of FDE 15:27 Maintaining product focus and customer relationships @hyperindexed @espricewright
O
OpenAI Developers @OpenAIDevs ·
Inside our in-house AI data agent It reasons over 600+ PB and 70k datasets, enabling natural language data analysis across Engineering, Product, Research, and more Our agent uses Codex-powered table-level knowledge plus product and organizational context https://t.co/Nr1geMcLoc
E
Everlier @Everlier ·
@cursor_ai To save a click, here's how a sample edit looks like. https://t.co/VDpe78myvQ
S
Sherwin Wu @sherwinwu ·
Anyone who tries to build an AI agent for an enterprise quickly realizes that context is king, but is still extremely hard to get right. Internally at OpenAI, we've been trying to solve the context problem for one vertical: data warehouses. And it's starting to work quite well!
O OpenAIDevs @OpenAIDevs

Inside our in-house AI data agent It reasons over 600+ PB and 70k datasets, enabling natural language data analysis across Engineering, Product, Research, and more Our agent uses Codex-powered table-level knowledge plus product and organizational context https://t.co/Nr1geMcLoc

A
Aakash Gupta @aakashgupta ·
Everyone's calling this a gaming toy. Google just told you exactly what they're building and nobody's repricing it. Project Genie is a training gym factory for embodied AI. The constraints tell the real story. 60-second generation limits? Latency on character control? Worlds that don't always follow prompts exactly? Those are acceptable tradeoffs when your actual customer is SIMA, DeepMind's robot training agent that needs millions of diverse environments to practice warehouse navigation, edge-case scenarios, and physics interactions. Google explicitly stated in August that Genie 3 is a "foundational building block for AGI." Now they're letting consumers create environments while quietly harvesting data on what kinds of prompts generate interesting training scenarios. The math makes this clear. Traditional robotics simulation requires teams spending months hand-coding environments in Unity or Unreal Engine. Genie 3 generates them in seconds from text. The cost per training environment just dropped by orders of magnitude. Meanwhile OpenAI's Sora generates beautiful videos you can watch. NVIDIA Cosmos targets industrial customers with explicit physics parameters. Google built something that trains its own AI agents while consumers think they're playing with a toy. The "promptable world events" feature where you can drop objects mid-session, change weather, spawn characters? That's curriculum generation for reinforcement learning. You're teaching their robots how to handle novel situations. Google AI Ultra subscribers are paying $250/month to be QA testers for DeepMind's AGI infrastructure. The "World Models as a Service" moat is being dug in plain sight.
M Meer_AIIT @Meer_AIIT

📢 New from Google DeepMind: Project Genie An experimental prototype that lets users create and explore AI-generated interactive worlds in real time. Powered by Genie 3 (their world model), Nano Banana Pro, and Gemini. How it works: → Prompt with text or images to design a world and character → Preview and adjust with Nano Banana Pro before entering → Genie 3 generates the environment in real time as you move through it → Remix existing worlds or browse a gallery for inspiration Rolling out now to Google AI Ultra subscribers in the U.S. (18+).

A
Ahmad @TheAhmadOsman ·
a reminder that, in closed source AI from companies like OpenAI & Anthropic you have zero control over how the models behave, and they can > quantize it > distill it > hot-swap to a cheaper/weaker checkpoint > make the model manipulative > fine-tune it in ways that break safety or depth > drop its IQ > run experiments on you and/or your data > throttle output speed or raise prices > sunset the entire model/version > block your request for any made-up bs reason they have all the knobs & you're at their mercy you won't even get a changelog opensource FTW Buy a GPU
𝞍
𝞍 Shin Megami Boson 𝞍 @shinboson ·
as far as I can tell, the common pattern seen in people who are very good at getting LLMs to do things is: - intelligent - empathetic - definitely autistic - some kind of will to power
G
Guillermo Rauch @rauchg ·
This ◉ ʜᴜᴍᴀɴ ○ ᴍᴀᴄʜɪɴᴇ toggle by @p0 is brilliant. It's a beautiful illustration of what the web will "look like" to agents. It will look like a whole lotta markdown 😄 Incidentally, we just made it such that https://t.co/mIlnkwx1ph links automatically render as markdown when agents consume it (we do the same for /𝚍𝚘𝚌𝚜). Page went from 500kb to 2kb. The web for agents will be very efficient! Try: curl -H 'accept: text/markdown' https://t.co/LrMKUHyJim
V
Vox @Voxyz_AI ·
@rauchg @p0 500kb to 2kb is wild. this is basically the "mobile-friendly" moment again but for agents. soon every site will need a machine-readable version the same way they needed a responsive layout
S
Slazac 🇪🇺 🇺🇦 🇹🇼 🌐 @TrueSlazac ·
Wow. Just made my first AI video game with Google’s Genie 3 The prompt: “French woman has to climb through a word that defies logic, flying objects everywhere” Is it the end of the gaming industry? https://t.co/X7tG7sECJ9
G GoogleAI @GoogleAI

Last August, we previewed Genie 3: a general-purpose world model that turns a single text prompt into a dynamic, interactive environment. Since then, trusted testers have taken it further than we ever imagined — experimenting, exploring, and pioneering entirely new interactive worlds. Now, it’s your turn. Starting today, we're rolling out access to Project Genie for Google AI Ultra subscribers in the U.S. (18+). We know what you create will be out of this world 🚀

A
Anthropic @AnthropicAI ·
Participants in the AI group finished faster by about two minutes (although this wasn’t statistically significant). But on average, the AI group also scored significantly worse on the quiz—17% lower, or roughly two letter grades. https://t.co/ko7aaBX4Rq
A
Anthropic @AnthropicAI ·
In a randomized-controlled trial, we assigned one group of junior engineers to an AI-assistance group and another to a no-AI group. Both groups completed a coding task using a Python library they’d never seen before. Then they took a quiz covering concepts they’d just used. https://t.co/JRXJq9e0dy
A
Anthropic @AnthropicAI ·
AI can make work faster, but a fear is that relying on it may make it harder to learn new skills on the job. We ran an experiment with software engineers to learn more. Coding with AI led to a decrease in mastery—but this depended on how people used it. https://t.co/lbxgP11I4I
A
Anthropic @AnthropicAI ·
We were particularly interested in coding because as software engineering grows more automated, humans will still need the skills to catch AI errors, guide its output, and ultimately provide oversight for AI deployed in high-stakes environments.
A
Anthropic @AnthropicAI ·
However, some in the AI group still scored highly while using AI assistance. When we looked at the ways they completed the task, we saw they asked conceptual and clarifying questions to understand the code they were working with—rather than delegating or relying on AI. https://t.co/6H5Hnxiv7O
A
Anthropic @AnthropicAI ·
For more details on this research, see the full paper: https://t.co/V06Q83Luhv
A
Anthropic @AnthropicAI ·
These results have broader implications—on how to design AI products that facilitate learning, and how workplaces should approach AI policies. As we also continue to release more capable AI tools, we’re continuing to study their impact on work—at Anthropic, and more broadly.
Z
Ziyang Xie @ZiyangXie_ ·
Genie3 is super good at simulating (or 'hallucinating') complex physics. It can simulate the splashes, foam, and their interaction with the surfer that are almost impossible for traditional graphics engines to render in real-time. The gap between simulation and generation is closing.
T
Theo - t3.gg @theo ·
This is going to be a huge bump to sentiment around Codex for new users. Calling it now, the perceived gap between Codex and Claude Code is about to close
D dkundel @dkundel

Web search is now enabled by default for the Codex CLI and IDE Extension 🎉 By default it will use a web search cache but you can toggle live results or if you use --yolo live results are enabled by default. More details in the changelog 👇 https://t.co/Ex2z1g2fUt

M
Min Choi @minchoi ·
Holy moly... Genie 3 just created this mock 3D game world from Breath of the Wild. How I did it + prompts in comment. https://t.co/H33an42YNd
M minchoi @minchoi

This is wild... Google just dropped Genie 3. This AI generates photorealistic & 3D worlds from text prompt and image... that you can explore in real-time This is a big step toward embodied AGI 10 examples + how to try (Ultra subs & US only)👇 1. We got Genie 3 before GTA 6 https://t.co/J1jDa4MtUX

O
OpenCode @opencode ·
kimi 2.5 is free for a limited time in OpenCode if you ran into bugs before, upgrade OpenCode - we've fixed up a few things and we're having a great time with it now huge thanks to fireworks for getting this model running so well so quickly
D
dax @thdxr ·
i've been using it for all my work for the past 24 hours and i don't see much of a difference from opus maybe opus is a bit smarter but this guy is so fast and so cheap and we're probably going to drop our prices even further
O opencode @opencode

kimi 2.5 is free for a limited time in OpenCode if you ran into bugs before, upgrade OpenCode - we've fixed up a few things and we're having a great time with it now huge thanks to fireworks for getting this model running so well so quickly

I
Invideo @invideoOfficial ·
We just launched AI Motion Graphics with @AnthropicAI Think vibecoding for motion design. The cost of professional motion work just dropped to zero. All generated from a single prompt. Small teams can now produce the same quality as large agencies. No After Effects, no templates, no code — just describe what you want. Try it on https://t.co/DbCkAwMecj
C
Coops @0xCoops ·
@rauchg @p0 The toggle is cute but unnecessary. Just add llms.txt at the root level. I wrote about this last week. https://t.co/COPNjP4Rwm