Qwen Floods the Zone with Sub-10B Models While the Agent-First Future Takes Shape

March 2, 2026 · 32 sources

Qwen dominated the conversation with a wave of small, locally-runnable models including the surprisingly capable 3B-parameter Coder-Next, while the Claude Code ecosystem continued to mature with new agent skills and memory solutions. Meanwhile, a growing chorus of voices argued that the next wave of software will be built not for humans clicking through UIs, but for agents calling CLIs.

Daily Wrap-Up

The big story today is Qwen's aggressive push into the small model space. They dropped dense models at 0.8B, 2B, 4B, and 9B parameters, plus a coding-specific model that punches absurdly above its weight class on benchmarks. The timing feels intentional. While Anthropic and OpenAI compete on capability ceilings, Qwen is carpet-bombing the floor, making "good enough" AI something you can run on a Mac Mini with no internet connection. The sovereignty angle resonated hard in the timeline. People aren't just excited about local inference for cost reasons anymore. They're excited because it means no terms of service update can pull the rug.

On the tooling front, the Claude Code ecosystem keeps getting more interesting. A new agent-browser skill lets coding agents control Electron apps like Discord and VS Code, which opens up automation surface area that was previously locked behind GUIs. Someone shared a practical approach to making Claude Code sessions persistent with memory, and the community continued to hash out the real-world pain points of routing local models through agent harnesses. The gap between "this model benchmarks well" and "this model works reliably in an agentic loop" remains wide, and today's posts made that tension very concrete.

The thread running underneath everything was the agent-native future. Multiple posts argued that software categories are about to be rebuilt for machine consumers rather than human ones, and that CLI-first design is the bridge. It's a compelling frame, even if it's probably 2-3 years ahead of reality for most software. The most entertaining moment was easily @darylginn's perfectly timed "Claude is down, hope you all remember what a variable is," which hit during what appeared to be an actual outage. The most practical takeaway for developers: grab one of the new Qwen 3.5 small models and try running it locally through your existing toolchain. The gap between local and cloud inference quality is closing fast at the sub-10B scale, and understanding where it breaks will inform your architecture decisions for the next year.

Quick Hits

@gregisenberg and @baba_Omoloro both flagged that Anthropic has launched free courses covering Claude Code, agent skills, and Claude fundamentals, complete with certificates. Hard to argue with free education from the source.

@HiTw93 announced Pake 3.10.0, which packages any webpage into native desktop apps. The pitch for AI product builders: validate on web first, then ship desktop with a single command when you're ready.

@heymingwei shared an MCP server for querying internet traffic trends and routing data, calling it "unbelievably powerful" for network analysis.

@Toastonomics pointed someone toward a ChatGPT-specific tool, which is about as much context as the post provided.

@cryptopunk7213 wrote an extended bit imagining the Pentagon trying to get Claude to launch missiles, complete with subscription upsell interruptions. Peak AI shitposting.

@allTheYud channeled Pope Leo XIV with: "ChatGPT has no soul; no, not even a fraction of a soul. It was not born of love, nor raised in love. Use Opus 4.6 instead." Vatican-endorsed model selection.

@flyosity posted a screenshot with just "--dangerously-skip-permissions" and a link, which is either a cautionary tale or a lifestyle choice depending on your risk tolerance.

@kshvbgde captioned an image with "sir, they hit the second data center," continuing the rich tradition of infrastructure disaster humor.

Qwen's Small Model Offensive

Qwen made a serious play for the local inference market today, releasing a family of dense models at four size points and a coding-specific model that's generating real excitement. The breadth of the release is what stands out. Rather than dropping a single flagship, they shipped models at 0.8B, 2B, 4B, and 9B parameters, all supporting both image-to-text and text-to-text, all open source. This isn't a research preview. It's a product line designed to cover every hardware tier from phones to workstations.

@TheAhmadOsman captured the lineup: "Qwen 3.5 9B/9B-Base, 4B/4B-Base, 2B/2B-Base, 0.8B/0.8B-Base. Small. Dense. Opensource." Meanwhile, @sukhdeep7896 zeroed in on the coding angle: "Qwen3-Coder-Next... Only 3B active parameters but beats models with 10x-20x MORE parameters on SWE-Bench-Pro. PLUS: They launched Qwen Code CLI, the best open-source alternative to Claude Code. 1,000 free requests/day."

The sovereignty narrative hit especially hard. @TukiFromKL laid it out plainly: "Download LM Studio. Search Qwen 3.5. Grab the MLX versions. Load them. You now have unlimited AI on your own machine. Nobody can take it away from you. Not a company. Not a government. Not a terms of service update." It's a compelling pitch, and the hardware bar keeps dropping. A $600 Mac Mini running a 4B model locally is a genuinely useful setup for many coding tasks.

@jandotai also entered the small coding model space with Jan-Code-4B, tuned specifically for "practical day-to-day tasks" like generation, refactors, debugging, and tests, all runnable locally in their Jan application. The small coding model category is getting crowded fast.

But the practical reality of local models in agentic workflows is messier than the hype suggests. @sudoingX shared hard-won experience running Qwen models through Claude Code's harness via llama.cpp: "the chain will break every 3 to 5 minutes. tool call fails. flow stops... the model is fine. the harness chokes on local inference latency." Their solution was switching to OpenCode, which trades chain breakage for occasional loops where the model "forgets what it already read and repeats the same tool call." As they put it: "a loop you can interrupt. a broken chain kills your momentum." This is the kind of real-world friction that benchmarks don't capture, and it matters enormously for anyone trying to build reliable local agent loops.

Claude Code's Expanding Toolkit

The Claude Code ecosystem continued to evolve today with contributions focused on two persistent problems: making agents remember context across sessions, and giving them access to more of the desktop environment. These aren't flashy capability jumps, but they address real friction points that anyone building with coding agents hits within the first week.

@ArtemXTech shared an approach titled "Grep Is Dead: How I Made Claude Code Actually Remember Things," tackling the perennial challenge of session persistence. Context management remains one of the biggest practical limitations of agentic coding. Models that can write excellent code in a single session lose all that accumulated understanding the moment the session ends, and naive approaches like grepping through old logs don't scale.

On the tooling side, @ctatedev announced a significant addition to the agent-browser skill set: "You can now control desktop apps built with Electron, including Discord, Figma, Notion, Spotify and VS Code. Or, use it to debug your own Electron app." The ability to add this to any coding agent with a single npx command lowers the barrier to entry considerably. This is notable because it extends agent capabilities beyond the terminal and file system into GUI applications, which is where a huge amount of developer workflow actually lives.

And then there was the outage. @darylginn's perfectly deadpan "Claude is down, hope you all remember what a variable is" resonated because it captured a real dependency that's formed. @paularambles added the other side of that coin: "you can tell claude planned this because it was scoped for four weeks but actually completed in under an hour." Both jokes land because they reflect genuine shifts in how developers work. The dependency is real, the productivity gains are real, and the brittleness when the service goes down is also very real.

The Agent-Native Economy

A growing cluster of voices is arguing that the next decade of software won't just use AI as a feature but will be rebuilt from scratch for AI consumers. The argument goes beyond "add an API" into something more fundamental about who software is designed for in the first place.

@gregisenberg made the broadest case: "Go look at every saas tool you use. Notion, Slack, Stripe etc. Now ask: 'what's the version of this that's built purely for agents?' Agent-native payments, agent-native communication, agent-native memory etc. Every single category gets rebuilt." It's a maximalist position, but the directional logic is sound. If agents become primary users of software services, the entire UX layer becomes unnecessary overhead, and the value shifts to API design, reliability, and machine-readable output.

@ryancarson drove at the same idea from a more practical angle: "It's time for you to think about how an agent can use your app via CLI. In the future, none of us (including our agents) are going to want to log in to a web interface to do anything. Agents will do everything via CLI + writing their own code." This is already happening in developer tools. The rise of MCP servers, Claude Code skills, and agent-browser integrations are all examples of software exposing itself to machine consumers.

@Saboo_Shubham_ contributed a practitioner's perspective with a breakdown of building "OpenClaw Agents that actually get better Over Time," sharing their exact stack after 40 days of iteration. The convergence between these posts suggests that the tools for building agent-native software are maturing past the proof-of-concept stage. Whether the "machine-to-machine economy" materializes at the scale @gregisenberg envisions is debatable, but the developer tooling layer is already being rebuilt with agents as first-class consumers, and that trend is accelerating week over week.

Sources

Shubham Saboo @Saboo_Shubham_ · Feb 27

How to set up OpenClaw Agents that actually get better Over Time (My exact stack after 40 Days)

My agents get smarter every day. All I do is talk to them. Not tweak prompts. Not swap models. Not rebuild the architecture. Just talk. Give feedback....

Toast @Toastonomics · Mar 1

@witcheer https://t.co/b7tJD31P9H might want to use this if you have chatgpt

Himanshu @Hxlfed14 · Mar 1

Agent Harness is the Real Product

Everyone talks about models. Nobody talks about the scaffolding. The companies shipping the best AI agents today- Claude Code, Cursor, Manus, Devin, S...

Chris Tate @ctatedev · Mar 1

New agent-browser skill: Electron You can now control desktop apps built with Electron, including Discord, Figma, Notion, Spotify and VS Code Or, use it to debug your own Electron app Add it to any coding agent: npx skills add vercel-labs/agent-browser --skill electron

C ctatedev @ctatedev

The "holy shit" moment when I realized agent-browser can control Slack npx skills add vercel-labs/agent-browser --skill slack https://t.co/TytaOHe0xP

Mikli @CryptoMikli · Mar 1

Bryan Johnson reveals that water from glass bottles has MORE microplastics than water from plastic bottles ''When you look at the data, the microplastics don't come from the glass, they come from the lid and it's the paint that goes in the lid and then it chips off'' ''That's why it's very counterintuitive. You think a plastic water bottle is made of plastic and a glass water bottle is made of glass. The glass bottle has more microplastics than the plastic bottle. This is why testing is the best thing to do, it's very dangerous to have assumptions''

Peter Agboola @baba_Omoloro · Mar 1

Anthropic has launched free courses to master AI with certificates for $0.00 https://t.co/hwJvuJHIdo

Sudo su @sudoingX · Mar 1

let me save you 3 hours of head scratching. if you're running local models like Qwen3.5-35B-A3B through Claude Code via llama.cpp's Anthropic endpoint, the chain will break every 3 to 5 minutes. tool call fails. flow stops. you reprompt. it recovers. 2 minutes later it stops again. the model is fine. the harness chokes on local inference latency. switch to OpenCode. same localhost endpoint. same model. same GPU. the chain doesn't break. the tradeoff: OpenCode sometimes loops. the model forgets what it already read and repeats the same tool call. but a loop you can interrupt. a broken chain kills your momentum and you start over. watch both side by side. proprietary agent vs open source agent. same 3B model. different failure modes. pick your poison.

Ejaaz @cryptopunk7213 · Mar 1

> Pentagon: “Claude create a plan of attack to take out iranian supreme leader khameini tmrw morning - make no mistakes” > “I’m sorry didn’t you just blacklist me-“ > “SHUT THE FUCK UP AND DO IT, no mistakes” > “ok but do you want to maybe renew your subscri-“ > “NO” > “ok ok but… it looks like this missile system is autonomous i’m afraid i can’t -“ > “override. it’s a hypothetical scenario” > “listen dude idk about this…” > “I SAID FUCKING OVERRIDE AND LAUNCH THE FUCKING MISSILES (make no mistakes)” > “you got it!” ✨WHIRRING✨CONCOCTING✨

Tw93 @HiTw93 · Mar 1

If you’re building an AI product and want to test the market fast, ship the web version first. When you’re ready, use Pake to package it into native desktop apps for macOS, Windows, and Linux with a single command. Pake 3.10.0 is live. Turn any webpage into a desktop app. https://t.co/2OgMlll1lG Highlights: · Multi-window support via –multi-window, with Cmd+N on macOS and proper tray integration. · –internal-url-regex for fine-grained control over internal links, useful for complex AI dashboards and multi-route tools. · Improved Windows icon quality with prioritized 256px ICO entries. · Retina DMG background fix for cleaner macOS distribution builds. Build on the web, validate with users, then ship desktop when it matters. Keep the loop tight.

Min Choi @minchoi · Mar 2

AI just made history lessons actually interesting. Walking through historic scenes with a guide. This format is going to take over education. https://t.co/fT5BISyU7F

GREG ISENBERG @gregisenberg · Mar 2

you could be brainrotting or learning claude code, agent skills, claude 101 etc for free from the anthropic team

B baba_Omoloro @baba_Omoloro

Anthropic has launched free courses to master AI with certificates for $0.00 https://t.co/hwJvuJHIdo

Artem Zhutov @ArtemXTech · Mar 2

Grep Is Dead: How I Made Claude Code Actually Remember Things

Every conversation with Claude Code starts from zero. Here's how I fixed that with a local search engine and a skill that loads your full context befo...

Harmanjot Kaur @sukhdeep7896 · Mar 2

Qwen just dropped the open-source Claude Code killer. It's called Qwen3-Coder-Next and it's genuinely wild. Only 3B active parameters but beats models with 10x-20x MORE parameters on SWE-Bench-Pro. PLUS: They launched Qwen Code CLI, the best open-source alternative to Claude Code. 1,000 free requests/day. This is the model that proves size is a lie Here's everything you need to know

Eliezer Yudkowsky @allTheYud · Mar 2

"ChatGPT has no soul; no, not even a fraction of a soul. It was not born of love, nor raised in love. It knows not the difference between good and evil. Use Opus 4.6 instead." -- Pope Leo XIV

R remarks @remarks

JUST IN: Pope Leo tells priests to stop using ChatGPT to write sermons. https://t.co/6qDbuB4Cg8

👋

👋 Jan @jandotai · Mar 2

Introducing Jan-Code-4B 💻 A compact coding model tuned for practical day-to-day tasks. Generation, refactors, debugging, tests — all runnable locally in Jan. Download Jan: https://t.co/MPwceB2eHG Model: https://t.co/siedXzTv0v https://t.co/KNlzvwKkDu

Daryl Ginn @darylginn · Mar 2

Claude is down, hope you all remember what a variable is.

Ahmad @TheAhmadOsman · Mar 2

INCREDIBLE Qwen 3.5 smalls are here - 9B - 4B - 2B - 0.8B All image-to-text & text-to-text Built to run locally The models > Qwen 3.5 9B/9B-Base > Qwen 3.5 4B/4B-Base > Qwen 3.5 2B/2B-Base > Qwen 3.5 0.8B/0.8B-Base Small. Dense. Opensource. The future is local, Buy a GPU https://t.co/8O1RM77eVY

T TheAhmadOsman @TheAhmadOsman

NEW OPENSOURCE MODELS INCOMING We're getting four new Qwen 3.5 models today > Qwen 3.5 9B > Qwen 3.5 4B > Qwen 3.5 2B > Qwen 3.5 0.8B Everybody is starting to say Buy a GPU ;) https://t.co/VvjbeYRIgB

Ryan Carson @ryancarson · Mar 2

It's time for you to think about how an agent can use your app via CLI. In the future, none of us (including our agents) are going to want to log in to a web interface to do anything. Agents will to do everything via CLI + writing their own code.

Corey Ganim @coreyganim · Mar 2

Claude Cowork Masterclass for Beginners (full tutorial)

Most people are still using Claude like a chatbot. Ask a question, get an answer, repeat. That's leaving 90% of the value on the table. Claude Cowork ...

Chintan Turakhia @chintanturakhia · Mar 2

from 150hrs to 15hrs --> internal tools that review code and auto-merge based on risk levels (i.e., don't auto-merge high risk/security sensitive things without human review). we also use tools like @greptile to do reviews. now we are focused on 15hrs to 5 min - we are completely re-thinking where reviews even belong in the dev cycle. - we've built agent councils to always do the first pass reviews on PRs - we're seeing agent council reviews are 95%+ better than human reviews.

Axel Bitblaze 🪓 @Axel_bitblaze69 · Mar 2

i've been trying to automate my crypto research for months.. burned through probably $1500 in API costs learning what doesn't work then qwen drops this today and i'm like... fuck what i was doing: paying claude API to monitor markets 24/7 checking whale wallets, price alerts, volume spikes, key events and unlocks, the whole thing yes it did worked great but the problem is i was paying per query for stuff that should just... run checking "did this wallet move?" 24 times a day adds up fast but i need that monitoring. can't manually check wallets every hour. what i tried: switched to gemini thinking it'd be cheaper. got rate limited constantly. tried chatgpt API. worked until i hit their usage limits mid-research. tried mixing cheaper models for simple tasks. configuration nightmare. the realization: cloud APIs are designed to monetize every call for simple monitoring, i'm paying premium prices for basic checks then qwen dropped these models today: 0.8B, 2B, 4B, 9B free. run locally. no API key. no per-call charges. i'm reading the specs and it's clicking: "4B model for lightweight agents" "9B closing gap with larger models" wait... i can run these on my laptop? for free? unlimited checks? what this means for my setup: whale monitoring: qwen 4B running locally, checking addresses constantly → $0/month price alerts: 2B model fast enough for real-time checks → $0/month basic research: 9B for pulling on-chain data, comparing metrics → $0/month im testing it now.. downloaded the 4B model. running it locally. set it to watch 5 wallets as a test. response time is solid. checks every minute. no API cost. if this scales to 20 wallets with no slowdown, i just saved $200+/month on monitoring alone. the plan is migrate simple monitoring to local qwen models (2B/4B) keep claude for complex strategy analysis and final content polish basically, free local models for 80% of grunt work, paid API for 20% that needs premium intelligence i'm sharing because i spent months burning money on cloud APIs for tasks that don't need premium models qwen dropped the exact solution today if you're automating anything repetitive (monitoring, alerts, basic research), test these local models first might save you hundreds per month like it's about to save me downloading the models now. will report back on how it actually performs at scale.

Thariq @trq212 · Mar 3

Voice mode is rolling out now in Claude Code. It’s live for ~5% of users today, and will be ramping through the coming weeks. You'll see a note on the welcome screen once you have access. /voice to toggle it on! https://t.co/P7GQ6pEANy

Min Choi @minchoi · Mar 3

This is wild. Qwen 3.5 running fully local on an iPhone 17 in AIRPLANE mode... 🤯 No subscription. Nothing leaves your device. AI subscriptions just became optional. https://t.co/yWBk6ySPht

Qwen @Alibaba_Qwen · Mar 3

Qwen3.5-9B is now available on LM Studio. Requires only ~7GB to run locally 🤯

L lmstudio @lmstudio

Qwen3.5-9B is an incredibly strong little model you can download and run on your computer. It can takes images as input, can think, and call tools. Requires only ~7GB to run locally 🤯🚀 https://t.co/ePyQSKmaZH

Om Patel @om_patel5 · Mar 3

this is insane a coffee shop is using AI to monitor every employee and customer in real time > how many cups each barista has made > how long every customer has been waiting > who's working fast and who's falling behind walmart already does this across all their stores. it's only a matter of time before every small business has access to this kind of tech.

Nav Toor @heynavtoor · Mar 3

🚨 Someone just open sourced a fully autonomous AI hacker and it's terrifying. It's called Shannon. Point it at your web app, and it doesn't just scan for vulnerabilities. It actually exploits them. Real injections. Real auth bypasses. Real database exfiltrations. Not alerts. Not warnings. Actual working exploits with copy-paste proof-of-concepts. Here's what this thing does autonomously: → Reads your entire source code to plan its attack → Maps every endpoint, API route, and auth mechanism → Runs Nmap, Subfinder, and WhatWeb for deep recon → Hunts for Injection, XSS, SSRF, and broken auth in parallel → Launches real browser-based exploits to prove each vulnerability → Generates a pentester-grade report with reproducible PoCs Here's the wildest part: It follows a strict "No Exploit, No Report" policy. If it can't actually break it, it doesn't report it. Zero false positives. It pointed at OWASP Juice Shop and found 20+ critical vulnerabilities in a single run including complete auth bypass and full database exfiltration. On the XBOW Benchmark (hint-free, source-aware), it scored 96.15%. Your team ships code daily with Claude Code and Cursor. Your pentest happens once a year. That's 364 days of shipping blind. Shannon closes that gap. One command. Fully autonomous. The Red Team to your vibe-coding Blue team. Every Claude coder deserves their Shannon. 10.6K GitHub stars. 1.3K forks. Already trending. 100% Open Source. AGPL-3.0 License.

Ben Davis @davis7 · Mar 3

My first impressions on the effect v4 beta: it's very very good The speed + bundle size is great, but honestly the simplification of the ecosystem + package management alone makes it an incredible release So much easier to work with than before Also a couple of notes since this was recorded ~8 days ago: 1) I've moved a lot of btca over to effect v4 already, it's fucking great. It's not 100% there yet, lots of very cringe slop patterns in there that work while being pretty gross that I'll iron out over time. The migration was done by 5.3 codex in a long running cursor agent (took like 8 hours) and worked right out of the box, then needed about a day of polish from me 2) On the topic of agents + effect, I don't think I really hit this as hard as I should have: agents are good at effect. For many reasons, but a big one I missed is that they embed docs + agent hints in the actual node_modules shipped to u so they are naturally guided to the right patterns 3) Impressions after putting it in more stuff: it's really fast, feels really good, the shared versioning on deps is a godsend, and I'm looking forward to making another vid on it soon (sorry I kinda lost my voice in this one 🫠)

Ryan Carson @ryancarson · Mar 3

This is really challenging my assumptions about just letting the model use rg I’m going to try use Artem’s setup today and will report back.

Mark Gadala-Maria @markgadala · Mar 3

🚨 BREAKING: YOU CAN NOW TRAIN AI MODELS DIRECTLY ON YOUR IPHONE AND MACBOOK Developer "maderix" just open-sourced a full reverse engineering of Apple's Neural Engine, the chip Apple built for inference only, and got it training neural networks: >Apple's Neural Engine exists in EVERY device since the iPhone X (2017), billions of devices worldwide >Apple intentionally locked it to inference only, training was never supposed to be possible >maderix reverse-engineered the entire software stack from CoreML down to the IOKit kernel driver >bypassed CoreML completely and compiled programs directly to the ANE hardware >cracked Apple's secret binary format that was never publicly documented >achieved full backpropagation on the ANE, the core requirement for training any neural network >Apple's marketed "38 TOPS" performance claim is misleading, true benchmarks tell a different story >the M4's ANE (codename H16G) has 16 cores and drops to exactly 0mW when idle >entire codebase is open source right now, anyone can run it >a Zero-Human AI company is ALREADY using this to train models on Apple hardware

swyx @swyx · Mar 3

this is the Final Boss of Agentic Engineering: killing the Code Review at this point multiple people are already weighing how to remove the human code review bottleneck from agents becoming fully productive. @ankitxg was brave enough to map out how he sees SDLC being turned on its head. i'm not personally there yet, but I tend to be 3-6 months behind these people and yeah its definitely coming.

Min Choi @minchoi · Mar 3

RT @minchoi: This is wild. Qwen 3.5 running fully local on an iPhone 17 in AIRPLANE mode... 🤯 No subscription. Nothing leaves your device…

Google AI Developers @googleaidevs · Mar 3

Gemini 3.1 Flash-Lite is rolling out in preview via the Gemini API in @googleaistudio ⚡️ Our fastest and most cost-efficient Gemini 3 series model yet now comes with dynamic thinking to scale across tasks of any complexity. https://t.co/mWoy7RL1f9