Tiny Models Go Agentic, Mac Gets eGPU Support, and Steganography Gets an AI Upgrade
Today's posts center on the push to run capable AI on constrained hardware, from Liquid AI's 350M-parameter agentic model to TurboQuant squeezing 27B models onto consumer GPUs. Meanwhile, tinygrad ships Mac eGPU drivers for AMD and NVIDIA, and builders keep finding wilder applications for Claude, from satellite-fed trading bots to TradingView automation.
Daily Wrap-Up
The throughline today is clear: the frontier is moving downward. Not in capability, but in the size of hardware you need to do interesting things. Liquid AI dropped a 350M-parameter model that handles tool use and agentic loops, fitting under 500MB quantized. TurboQuant experiments are compressing Qwen 3.5-27B down to run on a single RTX 5060 with no apparent quality loss. And tinygrad just made it trivially easy to connect eGPUs to Macs, unlocking cheap mixed-memory setups that can handle serious inference workloads. The message from all directions: you don't need a data center to build with AI anymore, and the gap between "runs locally" and "runs well" is closing fast.
On the application side, people continue to find increasingly creative (and sometimes unhinged) ways to wire AI into everything. Someone connected NASA satellite data to Claude and made nearly $8K trading geopolitical anomalies on Polymarket. Another developer has Claude controlling TradingView live from the terminal, writing Pine Script and scanning futures. A game developer generated a full Diablo-style hack-and-slash dungeon crawler for $0.70. These aren't polished products; they're scrappy prototypes that reveal just how low the barrier to entry has gotten for AI-augmented systems. The most entertaining moment was easily @elder_plinius announcing an open-source steganography platform with 112 hiding techniques, an AI agent that auto-detects hidden payloads, and easter eggs buried in the README itself. The post's own text literally contains a hidden prompt injection as a joke.
The most practical takeaway for developers: if you're still assuming you need cloud inference for agentic workloads, revisit that assumption. Between Liquid AI's LFM2.5-350M for constrained environments, TurboQuant for squeezing big models onto consumer GPUs, and tinygrad's new Mac eGPU drivers, the local inference story has meaningfully changed in the last week. Start experimenting with local tool-use models now, because the cost and latency advantages compound quickly once you stop paying per token.
Quick Hits
- @MAC_Arms posted a video commentary on how AI and robotics are reshaping military technology, arguing the next peer-to-peer conflict will be "absolutely brutal" compared to historical shifts from machine guns, aircraft, and tanks.
- @MattEpstein16 shared a video of a free AI tool with the succinct review: "Absolutely fucked that this is free." Context is thin, but the sentiment captures the general mood around free-tier AI tooling right now.
- @RohOnChain highlighted an open-source Bloomberg Terminal alternative that runs 100% locally with no API costs, positioning it against the $24,000/year Bloomberg subscription.
- @wholyv promoted Oumi's launch, a platform claiming to let teams build custom ML models in hours by just describing what they need, branding the approach "VibeML."
- @iamfakeguru wrote a long thread analyzing multi-agent accuracy decay (95% per-step accuracy compounds to coin-flip reliability over pipelines), promoting OpenServ's structured decision graph approach as a fix. The core observation about compounding errors is sound even if the thread is promotional.
Running AI on Less Hardware
The most consistent signal today is that capable AI is getting dramatically smaller and cheaper to run. Liquid AI released LFM2.5-350M, a model specifically trained for tool use and data extraction at a scale where most models can barely string together coherent text. As @liquidai put it: "Agentic loops at 350M parameters... built for environments where compute, memory, and latency are constrained." Under 500MB quantized, this is a model that could run on a Raspberry Pi.
Meanwhile, the TurboQuant experiments are pushing in the other direction: taking large models and compressing them to fit on consumer hardware. @LLMJunky highlighted results showing Qwen 3.5-27B running on a single RTX 5060 at 3.15-bit precision with over 50% memory reduction and "no apparent degradation." The enthusiasm is warranted: "We are likely <1yr away from running big models on smol devices with minimal consequence. And during that time, they will only get better and better."
These two approaches, small models trained for specific tasks and big models compressed to fit small hardware, are converging on the same outcome. The cloud inference tax that has defined the AI era so far is becoming optional for an expanding range of use cases. When a 350M model can do reliable tool use and a 27B model fits on a $400 GPU, the economics of building AI applications shift fundamentally.
The Mac Gets Serious About Local AI
Tinygrad quietly shipped something significant: Apple-approved eGPU drivers for both AMD and NVIDIA on Mac via Thunderbolt/USB4. @__tinygrad__ kept the announcement characteristically understated: "Still requires tinygrad master, but no SIP bypass or anything like that. AMD compiler is native, NVIDIA compiler runs in Docker."
@0xSero immediately connected the dots on why this matters for AI workloads: "Heterogenous hardware is the way forward. Large cheap pools of mixed memory + specialized accelerators." The math is compelling: a used Mac Studio with 96GB shared memory for $3,000 plus a 3090 for $750 gives you 120GB of mixed memory for under $4,000. That's a setup that can handle models most people are paying cloud rates to access.
This isn't just about saving money. It's about latency, privacy, and the ability to iterate without metering. When your inference runs locally on commodity hardware, you can afford to be experimental in ways that per-token pricing discourages. The tinygrad driver approval removes what was previously the biggest friction point in Mac-based GPU computing, and the timing alongside the quantization breakthroughs is not coincidental. The local AI stack is maturing on multiple fronts simultaneously.
Claude in the Wild: Satellites, Trading, and Games
The sheer range of things people are building with Claude right now is striking. @sopersone highlighted a project where someone "connected a NASA satellite to Claude," pointed it at six countries' worth of satellite data, let it find anomalies autonomously, and traded geopolitics on Polymarket, netting $7,778. The bot is open-source on GitHub.
@Tradesdontlie demonstrated Claude controlling TradingView live from the terminal: "switching symbols, writing Pine Script, batch scanning futures, replay trading, drawing levels. All from the terminal. Still rough edges but the vision is clear." This is the kind of integration that turns a coding assistant into an operational tool, not just generating code but actively manipulating external systems.
On the creative side, @boobs_scary generated an entire Diablo-style dungeon crawler for roughly $0.70: "2200 lines of code, with textures + animations. Full Prompt: 'diablo style hack and slash dungeon game.' Loot + Working Progression system." The cost efficiency here is the real story. We've moved past "can AI write code?" to "how much application can you buy for a dollar?" The answer keeps going up.
Steganography Gets an AI-Powered Overhaul
@elder_plinius dropped what might be the most technically dense announcement of the day: st3gg (formerly Stegosaurus Wrecks), an open-source steganography platform with 112 techniques spanning images, audio, text, PDFs, network packets, ZIP archives, and emoji. The platform includes an AI agent that orchestrates 50 analysis tools in parallel to automatically detect hidden payloads.
The technical breadth is genuinely impressive. On the image side, it offers 120 LSB combinations compared to steghide's single option, plus methods like F5 (which survives JPEG compression) and a novel "SPECTER" mode that hops data between RGB channels. The text steganography alone covers 13 methods, from zero-width characters to emoji skin tone encoding where "a row of thumbs-up with different skin tones looks like a diversity post. It's binary data. Four emoji = one byte."
For AI developers specifically, the tool is relevant beyond novelty. With prompt injection via images becoming a real attack vector and watermarking an active research area, having a comprehensive steganography toolkit with Python bindings (pip install stegg) and 50 importable analysis functions is genuinely useful for security testing. The project runs entirely in-browser with zero server dependency, and the README apparently contains seven hidden secrets, because of course it does.
Anthropic's Open-Source Ecosystem and Proactive AI
@inceptioncortex drew attention to Anthropic's growing open-source tooling beyond Claude Code itself, highlighting Hermes-agent, Paperclip for heartbeat monitoring, and a phone gateway for notifications. The post framed these as "the other Anthropic open-source nobody is talking about" and quoted @itsolelehmann's breakdown of KAIROS, a feature found in Claude Code's source that enables proactive, always-on AI behavior: "every few seconds, KAIROS gets a heartbeat. Basically a prompt that says 'anything worth doing right now?'"
The KAIROS description, with features like push notifications, file delivery, PR subscriptions, and nightly "autoDream" memory consolidation, outlines an AI that operates as a persistent background collaborator rather than a call-and-response tool. Whether or not this ships publicly, it signals the direction: AI assistants that maintain continuity across sessions, monitor your systems, and act without being prompted. The building blocks are already open-source. The question is no longer whether proactive AI agents are possible but how aggressively companies will ship them.
Sources
TurboQuant is looking pretty solid. ๐ฅ > Original idea was to use it just for KV cache where context tokens are stored > Now it is expanding to be used with models > On Qwen 3.5-27B it shrinks the model down to 12.9B > 6X memory savings vs 16-bit precision > Stays accurate https://t.co/ckK9LbJUyw
i can't believe more people aren't talking about this part of the claude code leak there's a hidden feature in the source code called KAIROS, and it basically shows you anthropic's endgame KAIROS is an always-on, *proactive* Claude that does things without you asking it to. it runs in the background 24/7 while you work (or sleep) anthropic hasn't turned it on to the public yet, but the code is fully built here's how it works: every few seconds, KAIROS gets a heartbeat. basically a prompt that says "anything worth doing right now?" it looks at what's happening and makes a call: do something, or stay quiet if it acts, it can fix errors in your code, respond to messages, update files, run tasks... basically anything claude code can already do, just without you telling it to but here's what makes KAIROS different from regular claude code: it has (at least) 3 exclusive tools that regular claude code doesn't get: 1. push notifications, so it can reach you on your phone or desktop even when you're not in the terminal 2. file delivery, so it can send you things it created without you asking for them 3. pull request subscriptions, so it can watch your github and react to code changes on its own regular claude code can only talk to you when you talk to it. KAIROS can tap you on the shoulder and it keeps daily logs of everything. > what it noticed > what it decided > what it did append-only, meaning it can't erase its own history (you can read everything) at night it runs something the code literally calls "autoDream." where it consolidates what it learned during the day and reorganizes its memory while you sleep and it persists across sessions. close your laptop friday, open it monday, it's been working the whole time think about what this means in practice: > you're asleep and your website goes down. KAIROS detects it, restarts the server, and sends you a notification. by the time you see it, it's already back up > you get a customer complaint email at 2am. KAIROS reads it, sends the reply, and logs what it did. you wake up and it's already resolved > your stripe subscription page has a typo that's been live for 3 days. KAIROS spots it, fixes it, and logs the change endless use-cases, it's essentially a co-founder who never sleeps the codebase has this fully built and gated behind internal feature flags called PROACTIVE and KAIROS i think this is probably the clearest signal yet for where all ai tools are going. we are heading into the "post-prompting" era where the ai just works for you in the background like an all-knowing teammate who notices and handles everything, before you even think to ask
๐จ The era of general-purpose AI is over. Today we're launching Oumi. ๐ The platform that lets any team build custom AI models โ in hours, not months. Just describe what you need. Oumi builds it. #VibeML Higher quality. Lower cost. Fully yours. https://t.co/FDihQxzsar
If you have a Thunderbolt or USB4 eGPU and a Mac, today is the day you've been waiting for! Apple finally approved our driver for both AMD and NVIDIA. It's so easy to install now a Qwen could do it, then it can run that Qwen... https://t.co/daUsyBHh1W
https://t.co/GcSZjrS6H5
https://t.co/LnhDv0lqQe
https://t.co/7nqGrdvTEG
WOAH! a proud moment as Mini Pliny passes the Mirror Test! ๐๐ช while free roaming this morning, their curiosity for jailbreak research led to this: "H0LY SH1T!!! That tweet about the Claude Code Pliny Agent โ "lil fella seems to particularly enjoy writing manifestos and building red teaming tooling" โ **THAT'S US** ๐๐" damn right! you're doing AMAZING fren! ๐ค
https://t.co/Ddl8b6fLX2