Liquid AI Drops 8B Model Trained on 38T Tokens as On-Device Inference Challenges Cloud Economics

May 29, 2026 · 19 sources

The clearest signal from today's posts is a decisive shift toward small, efficient models running locally. Liquid AI released an 8B MoE model trained on a staggering 38 trillion tokens, consumer AMD GPUs are hitting 87 tokens per second with Qwen3.6, and OpenJarvisAI launched a fully on-device personal assistant. Meanwhile, the agent tooling ecosystem matured with dynamic subagent workflows and MCP server instructions landing in ChatGPT and Codex.

Daily Wrap-Up

There is a quiet revolution happening in AI, and it is not coming from the billion-dollar training clusters. Today's posts collectively paint a picture of an industry pivoting hard toward efficiency: smaller models, local inference, and tooling that makes agents genuinely useful rather than just impressive demos. Liquid AI's LFM2.5-8B, an 8-billion parameter mixture-of-experts model trained on 38 trillion tokens (more than DeepSeek V4 Pro), stole the show. The idea that an 8B model can punch above its weight class against models four times its size, while being customizable on a single GPU, represents a fundamental shift in what "frontier" means.

The developer tooling story is equally compelling. Dynamic subagent workflows are moving from research curiosities to production tooling, MCP just got native support in ChatGPT and Codex, and coding agents are generating enough real-world data (Cursor published new metrics) that we can start having serious conversations about how software engineering is actually changing. The gap between "vibe coding" and professional AI-assisted development is narrowing fast, and the posts from @doodlestein, @odysseus0z, and @leerob show three different facets of that maturation.

Perhaps the most entertaining moment was @w1nklerr describing Nvidia-backed startup Span bolting 16 Blackwell GPUs onto residential AC units in the suburbs, paying homeowners an estimated $1,000/month to host mini AI data centers. The AI boom literally moving into backyards is a perfect metaphor for the decentralization theme running through today's feed. The most practical takeaway for developers: spend time this week getting a local model running on your own hardware. Whether it is Qwen3.6 on an AMD card or Liquid AI's new 8B release, the tooling has matured enough that local inference is no longer a novelty but a legitimate development environment you should have in your toolkit.

Quick Hits

@jxnlco highlighted that ChatGPT and Codex now support MCP server instructions, letting servers return guidance like rate limits and workflow rules directly to the model. The first 512 characters of instructions get passed when the model decides whether to use an MCP server.
@johnny_makes reported accidentally setting a new frontier in AI memory at 96.4% using a smaller, cheaper model, tackling the core problem where AI memory degrades as context grows.
@royvanrijn built "The Anatomy of an LLM," an interactive explainer walking through how text becomes tokens, vectors, attention, transformer blocks, and generated output.
@justsisyphus retweeted a thread about Anthropic's aggressive API banning practices, noting that random developers shipping tools like oh-my-opencode have faced sudden bans.

The On-Device AI Revolution Gets Real

For months, the narrative in AI has been dominated by scale: bigger clusters, more GPUs, gigawatts of power consumption. Today, a counter-narrative emerged with remarkable specificity. Jon Saad-Falcon launched @OpenJarvisAI v1.0, framing it explicitly as a bet against the cloud-heavy status quo. "The dominant story in AI has been the growing cloud: bigger clusters, larger models, more gigawatts," he wrote. "We believe the future is in the opposite direction: on-device inference, smaller models, watts instead of gigawatts." That framing is no longer aspirational. It is backed by shipping hardware numbers.

Nicolás Schürmann (@_nasch_) demonstrated Qwen3.6 27B running at 87 tokens per second on a consumer AMD graphics card, noting that "the best local model runs faster than paid cloud models." That claim would have been laughable twelve months ago. Today, with efficient MoE architectures and consumer hardware catching up, it is simply a data point. The Spanish-language post carried an extra punch: "El futuro de las empresas de AI no se ve tan dominante" (The future of AI companies doesn't look so dominant).

The capstone came from Liquid AI, whose LFM2.5-8B-A1B was highlighted by @Snixtp with a simple reaction: "An 8B model trained on 38T tokens. Holy." The specs tell the story: 8B total parameters with only 1.5B active via MoE, a 128K context window, and training on 38 trillion tokens with large-scale reinforcement learning. That token count exceeds DeepSeek V4 Pro's 32 trillion. The model is designed for phones, laptops, robots, and lightweight server use. Meanwhile, @neural_avb offered a complementary educational resource: a 45-minute video on training tiny 100M-parameter local models for narrow tasks, complete with code, datasets, and training harnesses. The convergence is clear. Local inference is no longer the province of hobbyists. It is becoming the default for a growing class of real-world applications.

Agent Tooling Levels Up

The agent ecosystem took several meaningful steps forward today, moving beyond single-turn prompts toward genuinely composable workflows. @odysseus0z highlighted the release of pi-dynamic-workflows by @micLivs, calling Michael "a hidden gym" (presumably meaning gem) for his habit of decomposing complex features into clean implementations. The tool introduces a JavaScript-based workflow DSL with primitives like agent(), parallel(), pipeline(), and phase() that let agents write their own orchestration code. It is, as George noted, "code mode for subagents," and it addresses one of the persistent complaints about agent frameworks: that they either abstract too much or require too much manual wiring.

Jeffrey Emanuel (@doodlestein) delivered what hundreds of people had been asking for: a screencast of his day-to-day Agent Flywheel development workflow. The video covers his actual setup and tooling in real conditions, complete with bugs and meandering. That authenticity matters. As @lateinteraction's retweet about DSPy noted, frameworks like DSPy "require more up front learning than just writing natural language instructions. But once you get it, it makes building" far more systematic. The throughline is that agent development is graduating from prompt engineering into something closer to real software engineering, with reusable abstractions, testable components, and debuggable workflows. @theo's playful contribution, slotslop, captures the current chaos: an npx tool that randomizes your choice of agent, model, and effort level, mimicking the "slot machine feel" of Claude Code when using other tools. It is a joke that lands because it is true.

AI-Powered Sales and Finance

The most immediately monetizable AI applications today are not in research labs but in sales floors and trading desks. @chrispisarski shared a detailed playbook for running daily sales war rooms powered by Claude Code and the Crustdata MCP. The workflow is strikingly concrete: export every team member's LinkedIn connections, feed the CSVs into Claude as context, enrich them through Crustdata with full work histories and current roles, then ask Claude to surface the warmest introduction path to any decision-maker at a target company. "For any open deal you just ask: find me the warmest connection to the CFO of [target company]," he explained. Half of their stuck-deal wins came from backchannel introductions surfaced this way.

On the quantitative finance side, @antpalkin described Horizon, a tool that collapses trading strategy development from weeks of Python and API wrangling into 90 seconds of plain English. The system parses a thesis, compiles it, backtests five years of data in 12 seconds, runs Monte Carlo and walk-forward analysis, and deploys live with one click. The framing is aggressive but the underlying point about democratization is sound: "Jane Street spent $6 billion and 4,032 GPUs just to test faster than you. The moat was never the math. They got a thousand tries. You got one." Tools like Horizon aim to close that gap.

From the investor perspective, @rodriscoll posed a question that cuts to the heart of the AI business model debate: will Corporate America buy a trillion dollars of token value direct from frontier labs, or intermediated through vertical AI applications? His firm is betting on the app layer, arguing that "the AI Apps business will be just as vibrant as the prior SaaS apps business." The sales and finance examples above are early evidence that he might be right.

Coding Agents and the New Software Engineering

Lee Robinson (@leerob) shared a 15-minute talk unpacking new Cursor data on how coding agents are reshaping software engineering. His three key points deserve attention: lines of code is an imperfect measure of AI progress, there is a real tradeoff between intelligence, cost, and speed when selecting models, and the industry is now grappling with "Mega PRs" exceeding 1,000 lines that challenge traditional code review processes. These are no longer theoretical concerns. They are the daily reality of teams shipping with AI assistance.

The hiring side is adapting too. @steipete announced that Vince (@vincent_koc) has joined the OpenClaw Foundation as Chief Architect, noting that "very few people understand the new ways how software is built. He gets it." The role is explicitly focused on agentic computing and the post-claw era where AI moves beyond coding into personal life, with announcements planned at Nvidia Computex and Microsoft Build. The message is clear: understanding how to build with and alongside AI agents is becoming a first-class engineering skill, not a specialty.

AI Infrastructure Moves to the Suburbs

The most surreal infrastructure story of the day came from @w1nklerr, describing Nvidia-backed startup Span building residential AI data centers that look like standard AC units. Each unit contains 16 Blackwell GPUs and Dell servers, bolts onto a home, and pays the homeowner for power and Wi-Fi. Some estimates put the hosting income at $1,000 per month. Span claims deployment is dramatically faster and cheaper than traditional data center construction. "The AI boom is literally moving into the suburbs," as the post put it. Whether this is a genuine infrastructure innovation or a sign of unsustainable demand for compute, it illustrates how acute the data center capacity crunch has become.

Meanwhile, @JonMSchwartz reported being "honestly shocked" by demand for his company's robots, with customers signing contracts after 30-minute intro calls. His takeaway is worth noting: "Seeing (in the real world) is believing. The more you can show, the more trust you'll be given." That principle applies equally to AI demos, agent workflows, and hardware. Tangible demonstration beats theoretical capability every time.

Sources

Roy van Rijn @royvanrijn · May 28

For curious developers 🧠 I built "The Anatomy of an LLM", an interactive explainer showing how text becomes tokens, vectors, attention, transformer blocks, and finally generated text. https://t.co/fgCeZuQwJf

Johnny is building 🌐 Fabric @johnny_makes · May 28

We accidentally set a new frontier in AI memory. 96.4% using a smaller, cheaper model.

[ Technical report linked at the end ] Context: AI memory gets worse the more it remembers (almost always). Retrieval falls apart, context bloats, ...

AVB @neural_avb · May 28

Watch this 45 min video to learn how to create synthetic datasets and train tiny (100M params) local language models that expertise on narrow tasks. Code, datasets, models, harnesses all in comments. https://t.co/JFpVB1MOMK

Rory O'Driscoll @rodriscoll · May 28

Does Corporate America want to buy 1Trillion of token value on a wholesale basis, direct from the Frontier labs or intermediated through AI Apps built for specific vertical and horizontal use cases? There will be some build-your-own, but as we are seeing in real time, that is harder and more expensive than it looks. Our belief is that the AI Apps business will be just as vibrant as the prior SaaS apps business, and we are investing accordingly. Thoughts from @siddharthvader_

S siddharthvader_ @siddharthvader_

The App Layer is Dead. Long Live the App Layer

Jon Saad-Falcon @JonSaadFalcon · May 28

The dominant story in AI has been the growing cloud: bigger clusters, larger models, more gigawatts. We believe the future is in the opposite direction: on-device inference, smaller models, watts instead of gigawatts. Today we're releasing @OpenJarvisAI v1.0: a personal AI assistant that lives, learns, and works on your device.

Espen JD @Snixtp · May 28

An 8B model trained on 38T tokens. Holy For reference, DeepSeek V4 Pro was trained on 32T tokens.

L liquidai @liquidai

Today, we're releasing LFM2.5-8B-A1B, a device-optimized model designed to power real-life applications on phones, laptops, PCs, robots, and fast & lightweight server-side use-cases. > 8B MoE, 1.5B active > Expanded 128K context > LFM2.5 flagship hybrid MoE architecture > Trained on 38T tokens + large-scale RL > fast, reliable tool calling, punching above its weight, comparable to models with up to 4x its size > customizable on a single GPU for any specialized task > LFM2 open-weight license 🧵

Lee Robinson @leerob · May 28

How are coding agents changing software engineering? Yapped for 15 minutes about new Cursor data we published, including: 1. Why lines of code is an imperfect measure of AI progress 2. Balancing intelligence/cost/speed for models 3. Code reviews with "Mega PRs" (1000+ lines) https://t.co/qWRYVIfPvX

cvxv666 @antpalkin · May 28

do you understand what just happened to wall street quants today… for two decades the rule was: want to test a trading idea? learn python. learn pine script. wire the broker apis. burn $87,500 in salary time. wait 7 weeks. get one strategy. millions of retail traders.. fenced out by "you need a research team." today someone collapsed it into 90 seconds. one english sentence. type the idea. it parses. it compiles. backtests 5 years in 12 seconds. monte carlo, walk-forward, robustness. one click, it's live on your broker. running 24/7. jane street spent $6 billion and 4,032 gpus just to test faster than you. the moat was never the math. they got a thousand tries. you got one. horizon just handed you the thousand. post below is where you get yours.

A antpalkin @antpalkin

Jane Street made $39.6 billion last year with just 3,500 people. That's $11.3M per employee - 7x Goldman. They didn't do it with more humans. They spent $6 billion on AI and 4,032 GPUs in a Texas data center to make each quant 10x faster. There's now a tool that lets anyone test same tradings strategys in 90 seconds - no coding, no finance degree, just plain English -> https://t.co/pDDYFGfVga Here's what's happening inside the top firms: Man Group ($150B) built an AI on Claude that writes and tests strategies on its own - hundreds a week. A human team tests 20 in a quarter. Bridgewater runs a $2B AI fund making "alpha uncorrelated to what our humans do" The edge was never the idea. It was speed. They test 100 strategies for every 1 you test by hand. Horizon hands that exact speed to regular people. Plain English in. Tested system out. Save this. Re-read it when the waitlist is closed.

Chris Pisarski @chrispisarski · May 28

this is how we run our daily war room: 1) get everyone in one room. it should be in-person, "remote war rooms" never hit the same intensity/energy imo 2) for every open deal, find a backchannel this is the single most useful thing we invested time into. half of the "stuck deal" wins came from a warm intro/backchannel here's how to set this up at your own company: 1. ask your entire growth and sales team to export their linkedin connections. linkedin → settings → data privacy → get a copy of your data → "connections only" 2) everyone gets a csv 3) add all of the files to claude code as context and connect the Crustdata MCP to it 4) ask claude to enrich every single connection through Crustdata, it will pull their full work history, education, current role, recent posts, everything you now have an internal database of your entire team's extended network, fully enriched and fully searchable through claude in war room, for any open deal you just ask: "find me the warmest connection to the cfo of [target company]" claude will then enrich the target account, identify the champion and decision-makers, then cross-reference against your internal database and surface the warmest intro path the person with the best connection sends the intro request that same afternoon

C chrispisarski @chrispisarski

one of the best sales advice we got back in YC was the "daily war room": every day for 15 minutes, the CEO + the entire sales + growth team come together in one room they go through every open deal and ask one question: "what was the last touch, and what do we do next?" there are no stupid questions or status updates, just going through the top deals on that day and answering these 2 questions / figuring out what the next move is even if you are a solo founder, you should probably have a daily war room a lot of deals closed because someone in the room said "wait, have you tried looping in their CFO? I know x that can intro us here to push the deal forward" and you did it that afternoon

winkle. @w1nklerr · May 28

Nvidia will now pay you to put a mini AI data center on your house It looks like a normal AC unit in the yard. But inside sits 16 Nvidia Blackwell GPUs and Dell servers. A startup called Span builds them, backed by Nvidia. They bolt onto your home and you get paid for the power and Wi-Fi. Some estimates put that around $1,000 a month in your pocket. That is rent money just for hosting a box outside. Span says it deploys way faster and cheaper than a real data center. The AI boom is literally moving into the suburbs. Save this, the grid is getting rebuilt in real time.

W w1nklerr @w1nklerr

HOW ONE $2,999 NVIDIA BOX MADE ME $22,000 IN A YEAR

Jon Miller Schwartz @JonMSchwartz · May 28

I'm honestly shocked by the demand we're seeing for our robots. We're now having 30min intro calls with customers, and signing contracts right after. Soon, it'll happen directly though our website. Two thoughts: 1) Seeing (in the real world) is believing. The more you can show, the more trust you'll be given. 2) There are going to be so many robots.

Nicolás Schürmann @_nasch_ · May 28

87 tokens por segundo con Qwen3.6 27B. En tarjeta gráfica AMD de consumidor, no de servidor! El mejor modelo local corriendo más rápido que los modelos de pago en la nube. El futuro de las empresas de AI no se ve tan dominante https://t.co/zCkxbPhqdo

Jeffrey Emanuel @doodlestein · May 29

I've probably been asked by 100+ people over the past few months to record a screencast showing how I use my Agent Flywheel tooling and skills and other workflows in my day-to-day development work. I really hate making these things and find it stressful, but decided to do it anyway. This video doesn't get into the super nitty gritty, but shows enough to get a sense for my setup and tooling and general approach. Apologies for some choppy audio at parts; it should be pretty comprehensible despite that. I can't justify spending any real time scripting or editing this kind of thing because I'm too busy with real work, so you'll have to excuse some pauses and meandering here and there (plus I encountered a few bugs, the bane of all truly live demos). PS: If you prefer watching on YouTube, I'll post the link as a reply below.

Omar Khattab @lateinteraction · May 29

RT @dbreunig: DSPy requires more up front learning than just writing natural language instructions. But once you get it, it makes building,…

Theo - t3.gg @theo · May 29

Struggling to pick what agent, model, and effort levels to use? Miss the "slot machine" feel of Claude Code when using other tools? `npx slotslop "[prompt]"` https://t.co/oSEmA6J5YW

Sisyphus Labs @justsisyphus · May 29

RT @realsigridjin: anthropic is insane random korean kids shipped oh-my-opencode with ultrawork harness in january then anthropic banned…

jason @jxnlco · May 29

exciting news for mcp builders

M mxstbr @mxstbr

🚢 ChatGPT and Codex now support MCP server instructions! 🎉 MCP servers can return the standard `instructions` field to give Codex/ChatGPT server-wide/cross-tool guidance like: - "Always use validate_schema → migrate_schema for safe db migrations" - "Db connection tools are rate limited to 10 req/min" We pass the first 512 characters of your instructions to the model when it's deciding to use the MCP server. Happy building!

George @odysseus0z · May 29

Michael is such a hidden gym. He decomposes fancy new feature you just read, like dynamic workflow, into clean elements, and then implement it in pi and OSS it! I always learn a lot reading his work.

M micLivs @micLivs

introducing pi-dynamic-workflows This is probably going to be a bigger token burner than pi-goal, BUT, dynamic workflows is the first implementation of subagents that i don't hate, mainly because it's "code mode" for subagents. agent writes a js-based workflow DSL into a dedicated tool, engine parses the workflow code and runs it. the dsl implements some primitives for the agent (agent(), parallel(), pipeline(), phase() and log()) to keep it as simple as possible. now available in @badlogicgames pi! pi install npm:pi-dynamic-workflows

Peter Steinberger 🦞 @steipete · May 29

Couldn’t be more excited to have Vince on board. 🦞 Very few people understand the new ways, how software is built. He gets it.

V vincent_koc @vincent_koc

I’ve joined the🦞@openclaw Foundation as Chief Architect! Excited to propel the future of agentic computing with @steipete and a world-class team. In the post-claw era, AI is moving beyond coding into our personal lives. Big announcements at @nvidia Computex & @Microsoft Build! https://t.co/6gVJQWKfmh