Liquid AI Drops 8B Model Trained on 38T Tokens as On-Device Inference Challenges Cloud Economics
The clearest signal from today's posts is a decisive shift toward small, efficient models running locally. Liquid AI released an 8B MoE model trained on a staggering 38 trillion tokens, consumer AMD GPUs are hitting 87 tokens per second with Qwen3.6, and OpenJarvisAI launched a fully on-device personal assistant. Meanwhile, the agent tooling ecosystem matured with dynamic subagent workflows and MCP server instructions landing in ChatGPT and Codex.
Daily Wrap-Up
There is a quiet revolution happening in AI, and it is not coming from the billion-dollar training clusters. Today's posts collectively paint a picture of an industry pivoting hard toward efficiency: smaller models, local inference, and tooling that makes agents genuinely useful rather than just impressive demos. Liquid AI's LFM2.5-8B, an 8-billion parameter mixture-of-experts model trained on 38 trillion tokens (more than DeepSeek V4 Pro), stole the show. The idea that an 8B model can punch above its weight class against models four times its size, while being customizable on a single GPU, represents a fundamental shift in what "frontier" means.
The developer tooling story is equally compelling. Dynamic subagent workflows are moving from research curiosities to production tooling, MCP just got native support in ChatGPT and Codex, and coding agents are generating enough real-world data (Cursor published new metrics) that we can start having serious conversations about how software engineering is actually changing. The gap between "vibe coding" and professional AI-assisted development is narrowing fast, and the posts from @doodlestein, @odysseus0z, and @leerob show three different facets of that maturation.
Perhaps the most entertaining moment was @w1nklerr describing Nvidia-backed startup Span bolting 16 Blackwell GPUs onto residential AC units in the suburbs, paying homeowners an estimated $1,000/month to host mini AI data centers. The AI boom literally moving into backyards is a perfect metaphor for the decentralization theme running through today's feed. The most practical takeaway for developers: spend time this week getting a local model running on your own hardware. Whether it is Qwen3.6 on an AMD card or Liquid AI's new 8B release, the tooling has matured enough that local inference is no longer a novelty but a legitimate development environment you should have in your toolkit.
Quick Hits
- @jxnlco highlighted that ChatGPT and Codex now support MCP server instructions, letting servers return guidance like rate limits and workflow rules directly to the model. The first 512 characters of instructions get passed when the model decides whether to use an MCP server.
- @johnny_makes reported accidentally setting a new frontier in AI memory at 96.4% using a smaller, cheaper model, tackling the core problem where AI memory degrades as context grows.
- @royvanrijn built "The Anatomy of an LLM," an interactive explainer walking through how text becomes tokens, vectors, attention, transformer blocks, and generated output.
- @justsisyphus retweeted a thread about Anthropic's aggressive API banning practices, noting that random developers shipping tools like oh-my-opencode have faced sudden bans.
The On-Device AI Revolution Gets Real
For months, the narrative in AI has been dominated by scale: bigger clusters, more GPUs, gigawatts of power consumption. Today, a counter-narrative emerged with remarkable specificity. Jon Saad-Falcon launched @OpenJarvisAI v1.0, framing it explicitly as a bet against the cloud-heavy status quo. "The dominant story in AI has been the growing cloud: bigger clusters, larger models, more gigawatts," he wrote. "We believe the future is in the opposite direction: on-device inference, smaller models, watts instead of gigawatts." That framing is no longer aspirational. It is backed by shipping hardware numbers.
Nicolás Schürmann (@_nasch_) demonstrated Qwen3.6 27B running at 87 tokens per second on a consumer AMD graphics card, noting that "the best local model runs faster than paid cloud models." That claim would have been laughable twelve months ago. Today, with efficient MoE architectures and consumer hardware catching up, it is simply a data point. The Spanish-language post carried an extra punch: "El futuro de las empresas de AI no se ve tan dominante" (The future of AI companies doesn't look so dominant).
The capstone came from Liquid AI, whose LFM2.5-8B-A1B was highlighted by @Snixtp with a simple reaction: "An 8B model trained on 38T tokens. Holy." The specs tell the story: 8B total parameters with only 1.5B active via MoE, a 128K context window, and training on 38 trillion tokens with large-scale reinforcement learning. That token count exceeds DeepSeek V4 Pro's 32 trillion. The model is designed for phones, laptops, robots, and lightweight server use. Meanwhile, @neural_avb offered a complementary educational resource: a 45-minute video on training tiny 100M-parameter local models for narrow tasks, complete with code, datasets, and training harnesses. The convergence is clear. Local inference is no longer the province of hobbyists. It is becoming the default for a growing class of real-world applications.
Agent Tooling Levels Up
The agent ecosystem took several meaningful steps forward today, moving beyond single-turn prompts toward genuinely composable workflows. @odysseus0z highlighted the release of pi-dynamic-workflows by @micLivs, calling Michael "a hidden gym" (presumably meaning gem) for his habit of decomposing complex features into clean implementations. The tool introduces a JavaScript-based workflow DSL with primitives like agent(), parallel(), pipeline(), and phase() that let agents write their own orchestration code. It is, as George noted, "code mode for subagents," and it addresses one of the persistent complaints about agent frameworks: that they either abstract too much or require too much manual wiring.
Jeffrey Emanuel (@doodlestein) delivered what hundreds of people had been asking for: a screencast of his day-to-day Agent Flywheel development workflow. The video covers his actual setup and tooling in real conditions, complete with bugs and meandering. That authenticity matters. As @lateinteraction's retweet about DSPy noted, frameworks like DSPy "require more up front learning than just writing natural language instructions. But once you get it, it makes building" far more systematic. The throughline is that agent development is graduating from prompt engineering into something closer to real software engineering, with reusable abstractions, testable components, and debuggable workflows. @theo's playful contribution, slotslop, captures the current chaos: an npx tool that randomizes your choice of agent, model, and effort level, mimicking the "slot machine feel" of Claude Code when using other tools. It is a joke that lands because it is true.
AI-Powered Sales and Finance
The most immediately monetizable AI applications today are not in research labs but in sales floors and trading desks. @chrispisarski shared a detailed playbook for running daily sales war rooms powered by Claude Code and the Crustdata MCP. The workflow is strikingly concrete: export every team member's LinkedIn connections, feed the CSVs into Claude as context, enrich them through Crustdata with full work histories and current roles, then ask Claude to surface the warmest introduction path to any decision-maker at a target company. "For any open deal you just ask: find me the warmest connection to the CFO of [target company]," he explained. Half of their stuck-deal wins came from backchannel introductions surfaced this way.
On the quantitative finance side, @antpalkin described Horizon, a tool that collapses trading strategy development from weeks of Python and API wrangling into 90 seconds of plain English. The system parses a thesis, compiles it, backtests five years of data in 12 seconds, runs Monte Carlo and walk-forward analysis, and deploys live with one click. The framing is aggressive but the underlying point about democratization is sound: "Jane Street spent $6 billion and 4,032 GPUs just to test faster than you. The moat was never the math. They got a thousand tries. You got one." Tools like Horizon aim to close that gap.
From the investor perspective, @rodriscoll posed a question that cuts to the heart of the AI business model debate: will Corporate America buy a trillion dollars of token value direct from frontier labs, or intermediated through vertical AI applications? His firm is betting on the app layer, arguing that "the AI Apps business will be just as vibrant as the prior SaaS apps business." The sales and finance examples above are early evidence that he might be right.
Coding Agents and the New Software Engineering
Lee Robinson (@leerob) shared a 15-minute talk unpacking new Cursor data on how coding agents are reshaping software engineering. His three key points deserve attention: lines of code is an imperfect measure of AI progress, there is a real tradeoff between intelligence, cost, and speed when selecting models, and the industry is now grappling with "Mega PRs" exceeding 1,000 lines that challenge traditional code review processes. These are no longer theoretical concerns. They are the daily reality of teams shipping with AI assistance.
The hiring side is adapting too. @steipete announced that Vince (@vincent_koc) has joined the OpenClaw Foundation as Chief Architect, noting that "very few people understand the new ways how software is built. He gets it." The role is explicitly focused on agentic computing and the post-claw era where AI moves beyond coding into personal life, with announcements planned at Nvidia Computex and Microsoft Build. The message is clear: understanding how to build with and alongside AI agents is becoming a first-class engineering skill, not a specialty.
AI Infrastructure Moves to the Suburbs
The most surreal infrastructure story of the day came from @w1nklerr, describing Nvidia-backed startup Span building residential AI data centers that look like standard AC units. Each unit contains 16 Blackwell GPUs and Dell servers, bolts onto a home, and pays the homeowner for power and Wi-Fi. Some estimates put the hosting income at $1,000 per month. Span claims deployment is dramatically faster and cheaper than traditional data center construction. "The AI boom is literally moving into the suburbs," as the post put it. Whether this is a genuine infrastructure innovation or a sign of unsustainable demand for compute, it illustrates how acute the data center capacity crunch has become.
Meanwhile, @JonMSchwartz reported being "honestly shocked" by demand for his company's robots, with customers signing contracts after 30-minute intro calls. His takeaway is worth noting: "Seeing (in the real world) is believing. The more you can show, the more trust you'll be given." That principle applies equally to AI demos, agent workflows, and hardware. Tangible demonstration beats theoretical capability every time.
Sources
We accidentally set a new frontier in AI memory. 96.4% using a smaller, cheaper model.
[ Technical report linked at the end ] Context: AI memory gets worse the more it remembers (almost always). Retrieval falls apart, context bloats, ...
The App Layer is Dead. Long Live the App Layer
Today, we're releasing LFM2.5-8B-A1B, a device-optimized model designed to power real-life applications on phones, laptops, PCs, robots, and fast & lightweight server-side use-cases. > 8B MoE, 1.5B active > Expanded 128K context > LFM2.5 flagship hybrid MoE architecture > Trained on 38T tokens + large-scale RL > fast, reliable tool calling, punching above its weight, comparable to models with up to 4x its size > customizable on a single GPU for any specialized task > LFM2 open-weight license 🧵
Jane Street made $39.6 billion last year with just 3,500 people. That's $11.3M per employee - 7x Goldman. They didn't do it with more humans. They spent $6 billion on AI and 4,032 GPUs in a Texas data center to make each quant 10x faster. There's now a tool that lets anyone test same tradings strategys in 90 seconds - no coding, no finance degree, just plain English -> https://t.co/pDDYFGfVga Here's what's happening inside the top firms: Man Group ($150B) built an AI on Claude that writes and tests strategies on its own - hundreds a week. A human team tests 20 in a quarter. Bridgewater runs a $2B AI fund making "alpha uncorrelated to what our humans do" The edge was never the idea. It was speed. They test 100 strategies for every 1 you test by hand. Horizon hands that exact speed to regular people. Plain English in. Tested system out. Save this. Re-read it when the waitlist is closed.
one of the best sales advice we got back in YC was the "daily war room": every day for 15 minutes, the CEO + the entire sales + growth team come together in one room they go through every open deal and ask one question: "what was the last touch, and what do we do next?" there are no stupid questions or status updates, just going through the top deals on that day and answering these 2 questions / figuring out what the next move is even if you are a solo founder, you should probably have a daily war room a lot of deals closed because someone in the room said "wait, have you tried looping in their CFO? I know x that can intro us here to push the deal forward" and you did it that afternoon
HOW ONE $2,999 NVIDIA BOX MADE ME $22,000 IN A YEAR
🚢 ChatGPT and Codex now support MCP server instructions! 🎉 MCP servers can return the standard `instructions` field to give Codex/ChatGPT server-wide/cross-tool guidance like: - "Always use validate_schema → migrate_schema for safe db migrations" - "Db connection tools are rate limited to 10 req/min" We pass the first 512 characters of your instructions to the model when it's deciding to use the MCP server. Happy building!
introducing pi-dynamic-workflows This is probably going to be a bigger token burner than pi-goal, BUT, dynamic workflows is the first implementation of subagents that i don't hate, mainly because it's "code mode" for subagents. agent writes a js-based workflow DSL into a dedicated tool, engine parses the workflow code and runs it. the dsl implements some primitives for the agent (agent(), parallel(), pipeline(), phase() and log()) to keep it as simple as possible. now available in @badlogicgames pi! pi install npm:pi-dynamic-workflows
I’ve joined the🦞@openclaw Foundation as Chief Architect! Excited to propel the future of agentic computing with @steipete and a world-class team. In the post-claw era, AI is moving beyond coding into our personal lives. Big announcements at @nvidia Computex & @Microsoft Build! https://t.co/6gVJQWKfmh