Claude Code Goes Remote, Cursor Ships Video Demos, and Qwen 3.5 Proves Smaller Models Can Win
Daily Wrap-Up
February 24th was one of those days where the developer tooling space moved so fast it felt like three separate news cycles compressed into one. Both Anthropic and Cursor shipped major features that push coding agents further from "fancy autocomplete" toward genuinely autonomous workflows. Claude Code now lets you kick off a task in your terminal and monitor it from your phone, while Cursor's agents can spin up cloud computers, build your feature, and send you a video demo of the finished work. The common thread is unmistakable: both companies are betting that developers want to supervise agents, not babysit them.
On the model side, Alibaba's Qwen team delivered a compelling proof point for the "smaller and smarter" thesis. Their Qwen 3.5-35B-A3B model now surpasses the previous generation's 235B parameter flagship, running locally on consumer hardware at 72 tokens per second. That's a 6.7x reduction in model size with better performance across benchmarks. For anyone running local inference, this is the kind of generational jump that changes what's practical on a single GPU. The broader conversation around agents as a distribution channel also reached a crescendo, with takes from @rauchg and @aakashgupta arguing that CLIs and MCP servers are becoming the new front door for software products, not marketing sites.
The most entertaining moment belonged to @Johnie36149708, who claims to have asked his plumber about RAG vector databases and was met with the blank stare that joke deserved. The "we're so early" genre of AI Twitter posts continues to thrive, but @damianplayer's more grounded observation that executives managing eight-figure budgets still think AI is a fad hits harder. The most practical takeaway for developers: if you're building any kind of SaaS product, start thinking about your agent-accessible surface area now. Ship a CLI, expose an MCP server, make your docs machine-readable. The companies that treat agent integration as an afterthought will find themselves invisible to the fastest-growing distribution channel in software.
Quick Hits
- @AlRaion shared a Claude screenshot without commentary, letting the vibes speak for themselves.
- @jessegenet showed how they use OpenClaw to plan hands-on Montessori lessons for their kids, proving AI in education doesn't have to mean more screen time.
- @AtlasForgeAI published a guide on building nine meta-learning loops for OpenClaw agents.
- @_ashleypeacock broke down Cloudflare Sandboxes' new R2 backup and restore feature, with a smart reminder to set lifecycle rules so you don't pay for storage you don't need.
- @addyosmani dropped solid advice on AGENTS.md files: treat them as a living list of codebase smells, not a permanent config. Auto-generated ones hurt agent performance by duplicating what agents can already discover.
- @dani_avila7 replaced Claude Code's default worktree command with a custom setup using Ghostty, Lazygit, and Yazi that keeps worktrees as sibling directories instead of nesting them inside the project.
- @Hesamation captured the universal experience of starting a new AI side project: pure dopamine followed by existential dread and the dead idea graveyard.
- @d4m1n noted that dev friends from big corps tried agent-driven workflows and immediately understood what "being in the top 1%" means.
- @BraydenWilmoth reported a NextJS rebuild costing $1,100 with AI assistance, resulting in 4.4x faster performance and 57% smaller bundle size.
- @nbaschez praised Vercel's open source output as being on "a generational run."
- @ashtom and @EntireHQ announced that Checkpoints are now available for all opencode users, capturing context automatically on every git push.
- @devops_nk and @zivdotcat both posted memes about Claude updates and usage limits, respectively, capturing the daily emotional range of the Claude power user.
- @Av1dlive predicted solo founder billionaires are coming, pointing to a workflow article by @elvissun.
- @Clad3815 updated the GPT Plays Pokemon FireRed harness, stripping away pathfinding tools to test GPT-5.2's raw navigation abilities. Slowly approaching a vision-only harness.
- @steipete clarified OpenClaw's security model after processing 20 reports: it's designed as a personal assistant (one user, many agents), not a multi-tenant bus. Stop trying to force adversarial multi-user scenarios onto it.
Agents as the New Distribution Channel
The loudest signal from today's posts wasn't any single product launch but a converging argument about how software gets discovered and used in an agent-driven world. @aakashgupta laid out the case most explicitly, building on Karpathy's framing: agents don't browse your marketing site or click through onboarding flows. They call your CLI, hit your MCP server, and read your docs programmatically. MCP went from zero to 97 million monthly SDK downloads in twelve months, and the standard has effectively won. If your product doesn't have an agent-accessible surface, it's invisible to the fastest-growing class of software consumers.
@rauchg reinforced this from the Vercel perspective:
"Every company will have an agentic interface. But it won't just be on your turf, your .com. It'll also be on Slack, Discord, Teams, Google Workspace, and more. I was at a hackathon in SF the other day and I watched this unfold IRL. Many startups just presented their agents as Slack @mentions."
Google jumped into the agent builder space too, with @itsPaulAi noting that Google Opal now lets you add agent blocks and "program" them in plain English, complete with tool calls, memory, and conditional logic. @kurtinc surfaced a detail from Shopify's partner briefing that makes the distribution shift concrete: AI agents pull the first 6,000 characters of your product descriptions as their source of truth, ignoring meta descriptions and SEO titles entirely. Meanwhile @alexhillman shared his api2cli skill for Claude Code, which walks through API discovery, designs a CLI, and wraps it with a skill, calling it "the easiest way to give your agent access to nearly any API." The direction is clear: agent-first interfaces are becoming table stakes, and @shiri_shh's observation that "agent writes the code, agent reviews the PR, agent runs tests, agent sends demo video" is less joke than roadmap.
Claude Code Goes Mobile and Anthropic Draws Safety Lines
Anthropic had a two-front day, shipping developer-facing features while simultaneously publishing updated safety commitments. The headline feature is Claude Code Remote Control, which @claudeai described as the ability to "kick off a task in your terminal and pick it up from your phone while you take a walk or join a meeting." Claude keeps running on your machine while you supervise from the Claude app or web interface. @minchoi's reaction captured the mood: "It's over... for touching grass."
@ryancarson connected this to the broader trajectory:
"This is exactly what I'm talking about. We're going to start to see something more like an ADE versus an IDE where the iteration loop is closed more and more by the agent. We're getting closer to real code factories here."
On the safety side, @AnthropicAI announced they're separating unilateral safety commitments from industry-wide recommendations, and committing to publish Frontier Safety Roadmaps with detailed goals alongside Risk Reports that quantify risk across deployed models. @trq212 also noted that Claude in Chrome is "significantly faster" with the Quick Mode experiment, and Anthropic launched Cowork and plugin updates aimed at helping enterprise teams customize Claude for better collaboration.
Cursor Ships Cloud Computers and Video Demos
Cursor's launch was arguably the most visually impressive announcement of the day. @cursor_ai summed it up as "Cursor now shows you demos, not diffs," with agents able to use the software they build and send video recordings of their work. @leerob provided the technical details across multiple posts: agents can onboard to your codebase, use a cloud computer to make changes, and deliver a video demo of finished work, with the remote desktop latency being "smooooth."
"Local agents (and modifying files on your machine) are still sometimes preferred, but I'm excited to make cloud computers easier. You get a secure sandbox + Linux VM you can control, and you can kick off these agents from web/mobile/desktop/Slack/API/more!" - @leerob
@benln called it a "huge launch" and @karankendre captured the developer reaction: "So you're telling me a vscode clone can not only review my code but also test the feature on a cloud computer and send me a demo video of the whole process." @stephenhaney also launched Paper Desktop on the same day, positioning it as "a canvas for Cursor, Claude Code, Codex" where any agent can read and write HTML. The dev tools ecosystem is rapidly moving toward agents that don't just write code but verify their own work.
Qwen 3.5: The Smaller-is-Better Thesis Gets Its Best Evidence
Alibaba's Qwen team released four new models that collectively make the strongest case yet for efficient architecture over raw parameter counts. The headline number: Qwen3.5-35B-A3B now surpasses the previous Qwen3-235B-A22B in benchmarks while being 6.7x smaller. @itsPaulAi put the trajectory in perspective, noting that "at some point, we'll have an Opus 4.6 intelligence running on a phone."
@mkurman88 provided the practical data point that matters most for local inference enthusiasts:
"Running Qwen 3.5 35B A3B locally on an RTX 3090 24GB, with 72 TPS. Amazing times."
@TheAhmadOsman highlighted that the models beat Sonnet 4.5 in many benchmarks while running on consumer hardware, declaring "the future is open source." The Qwen3.5-Flash variant ships with 1M context length by default and built-in tools, positioning it as a serious production option. For anyone building local AI infrastructure, these models represent a meaningful inflection point where the gap between local and cloud-hosted intelligence continues to narrow.
The AI Adoption Chasm Nobody Talks About
A thread from @damianplayer struck a nerve by pointing out what AI Twitter systematically ignores: the vast majority of the economy hasn't adopted AI tools at all. He described meeting executives managing 50+ employees and eight-figure budgets who think AI is a fad, with zero AI tools in their workflow. "Nobody outside of this app understands how fast this is moving. And most of them won't until it's too late."
The follow-up was equally pointed: "I'm not talking about tech companies. I'm talking about boring. Construction, insurance and property management. The businesses that make up most of the economy and none of AI Twitter." @chriswiser added that "half the world doesn't know Claude exists and the other half is terrified of it," while @lucky_strikes_x argued "we are in a mega bubble." Whether you read this as an opportunity or a warning depends on where you sit, but the gap between the AI-native developer bubble and the broader business world has never been more starkly illustrated.
When AI Makes You Worse at Thinking
@aakashgupta surfaced research from Anthropic itself showing that polished AI outputs make users measurably worse at critical evaluation. Tracking 9,830 conversations, Anthropic found that when Claude produces finished-looking artifacts, users are 5.2 percentage points less likely to catch missing context and 3.1 points less likely to question the reasoning. The psychology is straightforward: presentation quality triggers cognitive shortcuts that bypass accuracy assessment.
The flip side is encouraging. Users who iterated on Claude's responses showed 2.67 additional fluency behaviors versus 1.33 for those who accepted the first output, questioned reasoning 5.6x more often, and flagged missing context 4x more frequently. As @aakashgupta put it, "the most valuable AI skill in 2026 is knowing when to push back on a confident-sounding answer." Separately, @kimmonismus flagged Anthropic's timeline prediction that AI systems could "fully automate or otherwise dramatically accelerate" top-tier research teams as early as 2027, a claim that lands differently when paired with data showing humans already struggle to evaluate AI output critically.
Source Posts
How to Build Nine Meta-Learning Loops for Your OpenClaw Agent
TL;DR: Agents are smart within sessions and stupid across them. The fix is structural feedback loops in your agent's files: failures become guardrails...
Cursor now shows you demos, not diffs. Agents can use the software they build and send you videos of their work. https://t.co/gBRJXWR7Vi
talked to a few execs at a mid-size company last week. no AI tools in their workflow. zero. still running everything through email chains + manual reports. one of them didn’t know what Claude was. only messed around with ChatGPT. these are people managing teams of 50+ employees and eight-figure budgets. and they think this is a fad. nobody outside of this app understands how fast this is moving. and most of them won’t until it’s too late.
We’re now separating the safety commitments we’ll make unilaterally and our recommendations for the industry. We’re also committing to publish new Frontier Safety Roadmaps with detailed safety goals, and Risk Reports that quantify risk across all our deployed models.
🚀 Introducing the Qwen 3.5 Medium Model Series Qwen3.5-Flash · Qwen3.5-35B-A3B · Qwen3.5-122B-A10B · Qwen3.5-27B ✨ More intelligence, less compute. • Qwen3.5-35B-A3B now surpasses Qwen3-235B-A22B-2507 and Qwen3-VL-235B-A22B — a reminder that better architecture, data quality, and RL can move intelligence forward, not just bigger parameter counts. • Qwen3.5-122B-A10B and 27B continue narrowing the gap between medium-sized and frontier models — especially in more complex agent scenarios. • Qwen3.5-Flash is the hosted production version aligned with 35B-A3B, featuring: – 1M context length by default – Official built-in tools 🔗 Hugging Face: https://t.co/wFMdX5pDjU 🔗 ModelScope: https://t.co/9NGXcIdCWI 🔗 Qwen3.5-Flash API: https://t.co/82ESSpaqAF Try in Qwen Chat 👇 Flash: https://t.co/UkTL3JZxIK 27B: https://t.co/haKxG4lETy 35B-A3B: https://t.co/Oc1lYSTbwh 122B-A10B: https://t.co/hBMODXmh1o Would love to hear what you build with it.
Opal, our no-code visual builder for AI workflows, just got a major upgrade. 🧠💎 We’ve added a new agent step that analyzes your goal, determines the best approach, and automatically calls the right tools — such as Veo for video or web search for research — to complete the task. We’re also adding new tools to make the agent even more capable: 💾 Memory – Remember info, like a user’s name or your style preferences across sessions. 🚀 Dynamic Routing – Let the agent choose the next best step using the “@ Go to” tool. 💬 Interactive Chat – Initiate user interactions to gather missing information or present options before moving on. Try it now → https://t.co/6DjWPHJK6x
Cursor just got a major upgrade! Agents can onboard to your codebase, use a cloud computer to make changes, and send you a video demo of their finished work. The latency of using the remote desktop is smooooth. https://t.co/QYUpL5vbXO
You should delete your CLAUDE․md/AGENTS․md file. I have a study to prove it. https://t.co/jOUNE53y7m
Cursor now shows you demos, not diffs. Agents can use the software they build and send you videos of their work. https://t.co/gBRJXWR7Vi
New research: The AI Fluency Index. We tracked 11 behaviors across thousands of https://t.co/RxKnLNNcNR conversations—for example, how often people iterate and refine their work with Claude—to measure how well people collaborate with AI. Read more: https://t.co/g65nGQFmjG
Announcing a new Claude Code feature: Remote Control. It's rolling out now to Max users in research preview. Try it with /remote-control Start local sessions from the terminal, then continue them from your phone. Take a walk, see the sun, walk your dog without losing your flow.