AI Learning Digest.

Claude Code Goes Remote, Cursor Ships Video Demos, and Qwen 3.5 Proves Smaller Models Can Win

Daily Wrap-Up

February 24th was one of those days where the developer tooling space moved so fast it felt like three separate news cycles compressed into one. Both Anthropic and Cursor shipped major features that push coding agents further from "fancy autocomplete" toward genuinely autonomous workflows. Claude Code now lets you kick off a task in your terminal and monitor it from your phone, while Cursor's agents can spin up cloud computers, build your feature, and send you a video demo of the finished work. The common thread is unmistakable: both companies are betting that developers want to supervise agents, not babysit them.

On the model side, Alibaba's Qwen team delivered a compelling proof point for the "smaller and smarter" thesis. Their Qwen 3.5-35B-A3B model now surpasses the previous generation's 235B parameter flagship, running locally on consumer hardware at 72 tokens per second. That's a 6.7x reduction in model size with better performance across benchmarks. For anyone running local inference, this is the kind of generational jump that changes what's practical on a single GPU. The broader conversation around agents as a distribution channel also reached a crescendo, with takes from @rauchg and @aakashgupta arguing that CLIs and MCP servers are becoming the new front door for software products, not marketing sites.

The most entertaining moment belonged to @Johnie36149708, who claims to have asked his plumber about RAG vector databases and was met with the blank stare that joke deserved. The "we're so early" genre of AI Twitter posts continues to thrive, but @damianplayer's more grounded observation that executives managing eight-figure budgets still think AI is a fad hits harder. The most practical takeaway for developers: if you're building any kind of SaaS product, start thinking about your agent-accessible surface area now. Ship a CLI, expose an MCP server, make your docs machine-readable. The companies that treat agent integration as an afterthought will find themselves invisible to the fastest-growing distribution channel in software.

Quick Hits

  • @AlRaion shared a Claude screenshot without commentary, letting the vibes speak for themselves.
  • @jessegenet showed how they use OpenClaw to plan hands-on Montessori lessons for their kids, proving AI in education doesn't have to mean more screen time.
  • @AtlasForgeAI published a guide on building nine meta-learning loops for OpenClaw agents.
  • @_ashleypeacock broke down Cloudflare Sandboxes' new R2 backup and restore feature, with a smart reminder to set lifecycle rules so you don't pay for storage you don't need.
  • @addyosmani dropped solid advice on AGENTS.md files: treat them as a living list of codebase smells, not a permanent config. Auto-generated ones hurt agent performance by duplicating what agents can already discover.
  • @dani_avila7 replaced Claude Code's default worktree command with a custom setup using Ghostty, Lazygit, and Yazi that keeps worktrees as sibling directories instead of nesting them inside the project.
  • @Hesamation captured the universal experience of starting a new AI side project: pure dopamine followed by existential dread and the dead idea graveyard.
  • @d4m1n noted that dev friends from big corps tried agent-driven workflows and immediately understood what "being in the top 1%" means.
  • @BraydenWilmoth reported a NextJS rebuild costing $1,100 with AI assistance, resulting in 4.4x faster performance and 57% smaller bundle size.
  • @nbaschez praised Vercel's open source output as being on "a generational run."
  • @ashtom and @EntireHQ announced that Checkpoints are now available for all opencode users, capturing context automatically on every git push.
  • @devops_nk and @zivdotcat both posted memes about Claude updates and usage limits, respectively, capturing the daily emotional range of the Claude power user.
  • @Av1dlive predicted solo founder billionaires are coming, pointing to a workflow article by @elvissun.
  • @Clad3815 updated the GPT Plays Pokemon FireRed harness, stripping away pathfinding tools to test GPT-5.2's raw navigation abilities. Slowly approaching a vision-only harness.
  • @steipete clarified OpenClaw's security model after processing 20 reports: it's designed as a personal assistant (one user, many agents), not a multi-tenant bus. Stop trying to force adversarial multi-user scenarios onto it.

Agents as the New Distribution Channel

The loudest signal from today's posts wasn't any single product launch but a converging argument about how software gets discovered and used in an agent-driven world. @aakashgupta laid out the case most explicitly, building on Karpathy's framing: agents don't browse your marketing site or click through onboarding flows. They call your CLI, hit your MCP server, and read your docs programmatically. MCP went from zero to 97 million monthly SDK downloads in twelve months, and the standard has effectively won. If your product doesn't have an agent-accessible surface, it's invisible to the fastest-growing class of software consumers.

@rauchg reinforced this from the Vercel perspective:

"Every company will have an agentic interface. But it won't just be on your turf, your .com. It'll also be on Slack, Discord, Teams, Google Workspace, and more. I was at a hackathon in SF the other day and I watched this unfold IRL. Many startups just presented their agents as Slack @mentions."

Google jumped into the agent builder space too, with @itsPaulAi noting that Google Opal now lets you add agent blocks and "program" them in plain English, complete with tool calls, memory, and conditional logic. @kurtinc surfaced a detail from Shopify's partner briefing that makes the distribution shift concrete: AI agents pull the first 6,000 characters of your product descriptions as their source of truth, ignoring meta descriptions and SEO titles entirely. Meanwhile @alexhillman shared his api2cli skill for Claude Code, which walks through API discovery, designs a CLI, and wraps it with a skill, calling it "the easiest way to give your agent access to nearly any API." The direction is clear: agent-first interfaces are becoming table stakes, and @shiri_shh's observation that "agent writes the code, agent reviews the PR, agent runs tests, agent sends demo video" is less joke than roadmap.

Claude Code Goes Mobile and Anthropic Draws Safety Lines

Anthropic had a two-front day, shipping developer-facing features while simultaneously publishing updated safety commitments. The headline feature is Claude Code Remote Control, which @claudeai described as the ability to "kick off a task in your terminal and pick it up from your phone while you take a walk or join a meeting." Claude keeps running on your machine while you supervise from the Claude app or web interface. @minchoi's reaction captured the mood: "It's over... for touching grass."

@ryancarson connected this to the broader trajectory:

"This is exactly what I'm talking about. We're going to start to see something more like an ADE versus an IDE where the iteration loop is closed more and more by the agent. We're getting closer to real code factories here."

On the safety side, @AnthropicAI announced they're separating unilateral safety commitments from industry-wide recommendations, and committing to publish Frontier Safety Roadmaps with detailed goals alongside Risk Reports that quantify risk across deployed models. @trq212 also noted that Claude in Chrome is "significantly faster" with the Quick Mode experiment, and Anthropic launched Cowork and plugin updates aimed at helping enterprise teams customize Claude for better collaboration.

Cursor Ships Cloud Computers and Video Demos

Cursor's launch was arguably the most visually impressive announcement of the day. @cursor_ai summed it up as "Cursor now shows you demos, not diffs," with agents able to use the software they build and send video recordings of their work. @leerob provided the technical details across multiple posts: agents can onboard to your codebase, use a cloud computer to make changes, and deliver a video demo of finished work, with the remote desktop latency being "smooooth."

"Local agents (and modifying files on your machine) are still sometimes preferred, but I'm excited to make cloud computers easier. You get a secure sandbox + Linux VM you can control, and you can kick off these agents from web/mobile/desktop/Slack/API/more!" - @leerob

@benln called it a "huge launch" and @karankendre captured the developer reaction: "So you're telling me a vscode clone can not only review my code but also test the feature on a cloud computer and send me a demo video of the whole process." @stephenhaney also launched Paper Desktop on the same day, positioning it as "a canvas for Cursor, Claude Code, Codex" where any agent can read and write HTML. The dev tools ecosystem is rapidly moving toward agents that don't just write code but verify their own work.

Qwen 3.5: The Smaller-is-Better Thesis Gets Its Best Evidence

Alibaba's Qwen team released four new models that collectively make the strongest case yet for efficient architecture over raw parameter counts. The headline number: Qwen3.5-35B-A3B now surpasses the previous Qwen3-235B-A22B in benchmarks while being 6.7x smaller. @itsPaulAi put the trajectory in perspective, noting that "at some point, we'll have an Opus 4.6 intelligence running on a phone."

@mkurman88 provided the practical data point that matters most for local inference enthusiasts:

"Running Qwen 3.5 35B A3B locally on an RTX 3090 24GB, with 72 TPS. Amazing times."

@TheAhmadOsman highlighted that the models beat Sonnet 4.5 in many benchmarks while running on consumer hardware, declaring "the future is open source." The Qwen3.5-Flash variant ships with 1M context length by default and built-in tools, positioning it as a serious production option. For anyone building local AI infrastructure, these models represent a meaningful inflection point where the gap between local and cloud-hosted intelligence continues to narrow.

The AI Adoption Chasm Nobody Talks About

A thread from @damianplayer struck a nerve by pointing out what AI Twitter systematically ignores: the vast majority of the economy hasn't adopted AI tools at all. He described meeting executives managing 50+ employees and eight-figure budgets who think AI is a fad, with zero AI tools in their workflow. "Nobody outside of this app understands how fast this is moving. And most of them won't until it's too late."

The follow-up was equally pointed: "I'm not talking about tech companies. I'm talking about boring. Construction, insurance and property management. The businesses that make up most of the economy and none of AI Twitter." @chriswiser added that "half the world doesn't know Claude exists and the other half is terrified of it," while @lucky_strikes_x argued "we are in a mega bubble." Whether you read this as an opportunity or a warning depends on where you sit, but the gap between the AI-native developer bubble and the broader business world has never been more starkly illustrated.

When AI Makes You Worse at Thinking

@aakashgupta surfaced research from Anthropic itself showing that polished AI outputs make users measurably worse at critical evaluation. Tracking 9,830 conversations, Anthropic found that when Claude produces finished-looking artifacts, users are 5.2 percentage points less likely to catch missing context and 3.1 points less likely to question the reasoning. The psychology is straightforward: presentation quality triggers cognitive shortcuts that bypass accuracy assessment.

The flip side is encouraging. Users who iterated on Claude's responses showed 2.67 additional fluency behaviors versus 1.33 for those who accepted the first output, questioned reasoning 5.6x more often, and flagged missing context 4x more frequently. As @aakashgupta put it, "the most valuable AI skill in 2026 is knowing when to push back on a confident-sounding answer." Separately, @kimmonismus flagged Anthropic's timeline prediction that AI systems could "fully automate or otherwise dramatically accelerate" top-tier research teams as early as 2027, a claim that lands differently when paired with data showing humans already struggle to evaluate AI output critically.

Source Posts

Q
Qwen @Alibaba_Qwen ·
🚀 Introducing the Qwen 3.5 Medium Model Series Qwen3.5-Flash · Qwen3.5-35B-A3B · Qwen3.5-122B-A10B · Qwen3.5-27B ✨ More intelligence, less compute. • Qwen3.5-35B-A3B now surpasses Qwen3-235B-A22B-2507 and Qwen3-VL-235B-A22B — a reminder that better architecture, data quality, and RL can move intelligence forward, not just bigger parameter counts. • Qwen3.5-122B-A10B and 27B continue narrowing the gap between medium-sized and frontier models — especially in more complex agent scenarios. • Qwen3.5-Flash is the hosted production version aligned with 35B-A3B, featuring: – 1M context length by default – Official built-in tools 🔗 Hugging Face: https://t.co/wFMdX5pDjU 🔗 ModelScope: https://t.co/9NGXcIdCWI 🔗 Qwen3.5-Flash API: https://t.co/82ESSpaqAF Try in Qwen Chat 👇 Flash: https://t.co/UkTL3JZxIK 27B: https://t.co/haKxG4lETy 35B-A3B: https://t.co/Oc1lYSTbwh 122B-A10B: https://t.co/hBMODXmh1o Would love to hear what you build with it.
C
Claude @claudeai ·
Introducing Cowork and plugin updates that help enterprises customize Claude for better collaboration with every team. https://t.co/pRwJqPBRQj
A
Atlas Forge @AtlasForgeAI ·
How to Build Nine Meta-Learning Loops for Your OpenClaw Agent
s
shirish @shiri_shh ·
pov: devs' slowly realizing there’s literally nothing left to do at work > agent writes the code > agent reviews the pr > agent runs tests in cloud > agent sends demo video https://t.co/jii3A1Qaa0
C Cursor @cursor_ai

Cursor now shows you demos, not diffs. Agents can use the software they build and send you videos of their work. https://t.co/gBRJXWR7Vi

C
Chris Wiser @chriswiser ·
@damianplayer Half the world doesn't know Claude exists and the other half is terrified of it.
D
Damian Player @damianplayer ·
talked to a few execs at a mid-size company last week. no AI tools in their workflow. zero. still running everything through email chains + manual reports. one of them didn’t know what Claude was. only messed around with ChatGPT. these are people managing teams of 50+ employees and eight-figure budgets. and they think this is a fad. nobody outside of this app understands how fast this is moving. and most of them won’t until it’s too late.
J
Johnie Homeless, EuroR3tardio @Johnie36149708 ·
My wife clogged the shitter the other day. I called the plumber, some 63 year dude. When he was done I asked him: Why don't you use a RAG vector db on your own vibe coded app to onboard the clients and increase ARPU. He looked at me like a cow, had no idea. We are so early.
D Damian Player @damianplayer

talked to a few execs at a mid-size company last week. no AI tools in their workflow. zero. still running everything through email chains + manual reports. one of them didn’t know what Claude was. only messed around with ChatGPT. these are people managing teams of 50+ employees and eight-figure budgets. and they think this is a fad. nobody outside of this app understands how fast this is moving. and most of them won’t until it’s too late.

C
Claude @claudeai ·
New in Claude Code: Remote Control. Kick off a task in your terminal and pick it up from your phone while you take a walk or join a meeting. Claude keeps running on your machine, and you can control the session from the Claude app or https://t.co/er6Blrr63e https://t.co/FxUVDecyVJ
C
Chubby♨️ @kimmonismus ·
One year left: „We believe it is plausible, as soon as early 2027, that our AI systems could fully automate, or otherwise dramatically accelerate, the work of large, top-tier teams of human researchers in domains where fast progress could cause threats to international security and/or rapid disruptions to the global balance of power, for example, energy, robotics, weapons development and AI itself.“
A Anthropic @AnthropicAI

We’re now separating the safety commitments we’ll make unilaterally and our recommendations for the industry. We’re also committing to publish new Frontier Safety Roadmaps with detailed safety goals, and Risk Reports that quantify risk across all our deployed models.

S
Stephen Haney @stephenhaney ·
Hello! Today we're releasing Paper Desktop Paper is now a canvas for Cursor, Claude Code, Codex. Any agent can read and write html to Paper. • push or pull from your codebase • pull real data from anywhere • less work, more design What will you ship? Sound on 🎶 https://t.co/2E6OYWpmeP
L
Lee Robinson @leerob ·
Cursor just got a major upgrade! Agents can onboard to your codebase, use a cloud computer to make changes, and send you a video demo of their finished work. The latency of using the remote desktop is smooooth. https://t.co/QYUpL5vbXO
P
Paul Couvert @itsPaulAi ·
Wow they did it 🔥 "Qwen3.5-35B-A3B now surpasses Qwen3-235B-A22B-2507" So in 6 months they've trained a model which is: - 6.7x smaller than the previous one - Better in all benchmarks - Available locally on a laptop We're just at the very beginning of local LLMs and, at some point, we'll have an Opus 4.6 intelligence running on a phone.
Q Qwen @Alibaba_Qwen

🚀 Introducing the Qwen 3.5 Medium Model Series Qwen3.5-Flash · Qwen3.5-35B-A3B · Qwen3.5-122B-A10B · Qwen3.5-27B ✨ More intelligence, less compute. • Qwen3.5-35B-A3B now surpasses Qwen3-235B-A22B-2507 and Qwen3-VL-235B-A22B — a reminder that better architecture, data quality, and RL can move intelligence forward, not just bigger parameter counts. • Qwen3.5-122B-A10B and 27B continue narrowing the gap between medium-sized and frontier models — especially in more complex agent scenarios. • Qwen3.5-Flash is the hosted production version aligned with 35B-A3B, featuring: – 1M context length by default – Official built-in tools 🔗 Hugging Face: https://t.co/wFMdX5pDjU 🔗 ModelScope: https://t.co/9NGXcIdCWI 🔗 Qwen3.5-Flash API: https://t.co/82ESSpaqAF Try in Qwen Chat 👇 Flash: https://t.co/UkTL3JZxIK 27B: https://t.co/haKxG4lETy 35B-A3B: https://t.co/Oc1lYSTbwh 122B-A10B: https://t.co/hBMODXmh1o Would love to hear what you build with it.

D
Damian Player @damianplayer ·
I’m not talking about tech companies I’m talking about boring. construction, insurance and property management. the businesses that make up most of the economy and none of AI twitter.
A
Anthropic @AnthropicAI ·
We’re now separating the safety commitments we’ll make unilaterally and our recommendations for the industry. We’re also committing to publish new Frontier Safety Roadmaps with detailed safety goals, and Risk Reports that quantify risk across all our deployed models.
D
Daniel San @dani_avila7 ·
I replaced Claude Code’s default --worktree command with a custom one built around Ghostty, Lazygit, and Yazi. By default, Claude Code creates worktrees inside the .claude/worktrees folder within the same project. That means if you spin up 3 worktrees, you end up with 3 complete copies of your project nested inside your main project. It makes the project structure messy and difficult to manage. So I built a hook that: - Overrides the default --worktree command - Creates each branch in a sibling directory: ../worktrees/branch-name - Automatically opens Ghostty panes - Launches Lazygit and Yazi already positioned in the correct branch directory Install it with: npx claude-code-templates@latest --hook=development-tools/worktree-ghostty --yes Now each branch lives where it should, outside the main project, and your terminal environment is ready instantly.
P
Paul Couvert @itsPaulAi ·
So Google has just released its own agent builder?! You can now add the agent block in Google Opal and "program it" in plain English. And it has natively: - Tool call (with Nano Banana, Veo, web search...) - Memory to save infos between sessions - Conditional logic Probably the easiest way to build AI agents I've seen so far.
G Google Labs @GoogleLabs

Opal, our no-code visual builder for AI workflows, just got a major upgrade. 🧠💎 We’ve added a new agent step that analyzes your goal, determines the best approach, and automatically calls the right tools — such as Veo for video or web search for research — to complete the task. We’re also adding new tools to make the agent even more capable: 💾 Memory – Remember info, like a user’s name or your style preferences across sessions. 🚀 Dynamic Routing – Let the agent choose the next best step using the “@ Go to” tool. 💬 Interactive Chat – Initiate user interactions to gather missing information or present options before moving on. Try it now → https://t.co/6DjWPHJK6x

L
Lee Robinson @leerob ·
Local agents (and modifying files on your machine) are still sometimes preferred, but I'm excited to make cloud computers easier. You get a secure sandbox + Linux VM you can control, and you can kick off these agents from web/mobile/desktop/Slack/API/more!
A
Al @AlRaion ·
@claudeai https://t.co/OKUENpi0xI
R
Ryan Carson @ryancarson ·
this is awesome and this is exactly what i'm talking about. we're going to start to see something more like an ADE versus an IDE where the iteration loop is closed more and more by the agent. i can't wait to try this out. we're getting closer to real code factories here
L Lee Robinson @leerob

Cursor just got a major upgrade! Agents can onboard to your codebase, use a cloud computer to make changes, and send you a video demo of their finished work. The latency of using the remote desktop is smooooth. https://t.co/QYUpL5vbXO

ℏ
ℏεsam @Hesamation ·
POV: you just started a new AI side project from scratch and are enjoying the dopamine rush before the existential dread appears and you send it to your dead idea graveyard https://t.co/IRrMFeFwzk
J
Jesse Genet @jessegenet ·
AI use in education doesn’t mean screens by default! @openclaw and AI can help us give our children bespoke hands on educations 📚 Here I break down how I use @openclaw to help me give our little kids high quality Montessori lessons 🤓 https://t.co/MGtxYerndP
A
Addy Osmani @addyosmani ·
Tip: Be careful with /init. A good mental model is to treat AGENTS(.md) as a living list of codebase smells you haven't fixed yet rather than a permanent configuration. Auto-generated AGENTS(.md) files hurt agent performance and inflate costs because they duplicate what agents can already discover. Human-written files help only when they contain non-discoverable information - tooling gotchas, non-obvidous conventions, landmines. Every other line is noise. Beyond what to put in it, there's a structural problem worth naming: a single AGENTS(.md) at the root of your repo isn't sufficient for any codebase of real complexity. What you actually need is a hierarchy of AGENTS(.md) files - placed at the relevant directory or module level - automatically maintained so that each agent gets context scoped precisely to the code it's working in, rather than a monolithic file that conflates concerns across the entire project.
T Theo - t3.gg @theo

You should delete your CLAUDE․md/AGENTS․md file. I have a study to prove it. https://t.co/jOUNE53y7m

B
Ben Lang @benln ·
Huge launch from the Cursor team today:
C Cursor @cursor_ai

Cursor now shows you demos, not diffs. Agents can use the software they build and send you videos of their work. https://t.co/gBRJXWR7Vi

K
Kurt Elster @kurtinc ·
Straight from @Shopify's latest partner briefing: - AI agents are pulling the first ~6,000 characters of your product descriptions as their source of truth. - Meta descriptions, SEO titles, theme presentation logic, none of it gets touched. - If your product data isn't structured for AI discovery, it just doesn't show up.
A
Aakash Gupta @aakashgupta ·
Anthropic just told you their own product makes people worse at thinking and the data is wild. They tracked 9,830 conversations and found that when Claude produces polished outputs like code or documents, users are 5.2 percentage points less likely to catch missing context and 3.1pp less likely to question the reasoning. The psychology here is predictable. A finished-looking artifact triggers the same cognitive shortcut as a printed report versus a rough draft. Your brain assigns credibility based on presentation quality, not accuracy. The shinier the output, the faster you stop thinking. But here’s what makes this data actually useful. Users who iterated on Claude’s responses showed 2.67 additional fluency behaviors versus 1.33 for people who accepted the first output. They questioned reasoning 5.6x more often. They flagged missing context 4x more frequently. 85.7% of conversations showed iteration. The other 14.3% are treating a probabilistic text generator like a search engine that’s always right. Anthropic is essentially publishing the user manual for their own product’s failure mode. The people who treat Claude like a first draft collaborator get dramatically better results than the people who treat it like an oracle. The most valuable AI skill in 2026 is knowing when to push back on a confident-sounding answer.
A Anthropic @AnthropicAI

New research: The AI Fluency Index. We tracked 11 behaviors across thousands of https://t.co/RxKnLNNcNR conversations—for example, how often people iterate and refine their work with Claude—to measure how well people collaborate with AI. Read more: https://t.co/g65nGQFmjG

đź“™
đź“™ Alex Hillman @alexhillman ·
Maybe my most underrated Claude Code skill is api2cli https://t.co/sAfWsxRMix Automatically walks you through api discovery, designs a CLI that follows best practices for human and agent uzers, then wraps the cli with a skill. It's the easiest way to give your agent access to nearly any API.
B
Boris Cherny @bcherny ·
Have been using this daily and loving it! Tell us what you think
N Noah Zweben @noahzweben

Announcing a new Claude Code feature: Remote Control. It's rolling out now to Max users in research preview. Try it with /remote-control Start local sessions from the terminal, then continue them from your phone. Take a walk, see the sun, walk your dog without losing your flow.

A
Ashley Peacock @_ashleypeacock ·
Cloudflare Sandboxes now provide the ability to create and restore backups to R2, allowing you to restore a sandbox to a prior state rapidly rather than having to run slow, repeated steps (e.g. checkout, installing dependencies). Usage is straightforward: - Call sandbox.createBackup() to create a point-in-time backup - Store the backup reference somewhere for later (e.g. KV, DO) - Call sandbox.restoreBackup() Make sure to setup an R2 lifecycle rule to clear data from the bucket once it's no longer needed, otherwise you'll be paying $$$ for storage unnecessarily!
L
Lee Robinson @leerob ·
@NickADobos If you're using something like React Native that would work. Think iOS apps are trickier. But you can use this for other desktop software, e.g. we use it to test Cursor itself. Anything in a container!
L
Lucky (PSYOP arc) @lucky_strikes_x ·
@damianplayer We are in a mega bubble. Ask 100 people on the street if they know what Claude is. See what happens.