NVIDIA's Nemotron Cascade Takes on Qwen While Claude Code Ships /schedule for 24/7 Automation

March 24, 2026 · 22 sources

The AI agent ecosystem is maturing fast, with new tools for game dev, coding workflows, and autonomous scheduling dominating the conversation. On the model front, NVIDIA's Nemotron Cascade and on-device Qwen optimizations signal a shift toward efficient inference, while the developer community debates best practices for agent-driven development.

Daily Wrap-Up

The big picture today is that AI agents are no longer a novelty but a workflow category with real tooling debates. Developers are arguing about memory management for multi-agent setups, whether agents should write TypeScript instead of making tool calls, and how to keep "vibe-coded" projects from turning into unmaintainable messes. The conversation has shifted from "can agents do X?" to "what's the right architecture for agents doing X at scale?" That's a meaningful inflection point.

On the model side, NVIDIA quietly dropped Nemotron Cascade, a 30B parameter model with only 3B active, using a hybrid Mamba MoE architecture that fits on a single RTX 3090. Meanwhile, @Alexintosh pushed Qwen 3.5 35B to 13.1 tokens per second on an iPhone 17 through a stack of optimization tricks. The race for efficient inference is heating up on both desktop GPUs and mobile devices, and the gap between cloud and local is narrowing faster than most people expected. Claude Code's new /schedule feature also deserves attention: the ability to set up recurring cloud-based jobs from your terminal turns an AI coding assistant into something closer to an autonomous DevOps teammate.

The most entertaining moment was @zostaff claiming Claude built three trading bots in 15 minutes that made $2,503 the next day, prompting an immediate job resignation. The timeline's skepticism meter should be pegged on that one. The most practical takeaway for developers: when your AI agent fixes a bug, immediately ask it to write comprehensive end-to-end integration tests that would catch that class of bug in the future, as @doodlestein suggests. It hardens your codebase and often flushes out additional issues you didn't know existed.

Quick Hits

@iruletheworldmo shared @PawelHuryn's guide on running Claude Dispatch from a phone for 48 hours straight. Worth a bookmark if you're exploring mobile-first agent workflows.
@theo posted a cryptic "I'm not scared of Anthropic" video that generated engagement without much substance to analyze.
@kshvbgde highlighted a guide on "vibecoding" your way to $10M ARR, capturing the indie hacker energy around AI-assisted rapid development.
@Tradesdontlie quote-tweeted a "learn while you sleep" post with bewilderment that anyone isn't using these tools yet. The adoption pressure is real.
@DavidGeorge83 published "There are only two paths left for software," arguing the comfortable middle ground for software companies is over as public markets reprice the sector.
@heygurisingh spotlighted Pascal Editor, an open-source browser-based 3D building editor built with React Three Fiber and WebGPU that could disrupt $50K/seat BIM software.

AI Agents and Autonomous Workflows

The agent conversation today was less about whether agents work and more about how to architect them properly. The tooling layer is getting sophisticated, with developers building persistent memory systems, multi-agent coordination, and learning loops that convert one-off solutions into reusable skills.

@javierblez broke down how the Hermes agent by Nous Research built a fully playable Worms clone in 2.5 hours: "Hermes used 'Persistent Shell' mode, which ensured it didn't forget its current folder or active tools. To optimize the workflow, the agent moved beyond linear execution and parallelized the workload." The agent spawned isolated subagents, used filesystem checkpoints for rollback, and even attached to a live Chrome instance via CDP to fix rendering bugs in real-time. By the end, it had autonomously converted the physics logic into a reusable plugin.

On the memory front, @code_rams pushed back on a popular article about fixing agent memory, adding three layers the original missed: "LCM (Lossless Context Management) lets compaction summaries expand back to full detail. No information loss. Cron-driven nightly memory distillation... don't rely on manual saves. Multi-agent shared workspace... all my agents read the same memory files." This kind of systems thinking around agent infrastructure is exactly what separates toy demos from production setups. @davis7 is exploring a related frontier, letting agents write TypeScript to call MCPs and APIs instead of using normal tool calls, noting that @RhysSullivan's "executor" project might represent the future of agent-to-tool communication.

Claude Code Ecosystem

Claude's developer tooling got several notable updates and community contributions today. The headline feature is /schedule, which lets you create recurring cloud-based jobs directly from the terminal. @minchoi captured the excitement: "Claude just got /schedule. Now it can work for you 24/7 while you sleep." The original announcement from @noahzweben explained they use it internally to "automatically resolve CI failures, push doc updates, and generally power automations that you want to exist beyond a closed laptop."

@lydiahallie shared a practical tip for anyone building with the Claude Agent SDK: "If you're building a read-only tool, make sure to mark it with readOnlyHint: true. This tells Claude Code it has no side effects and is safe to parallelize. Otherwise no other tool can run alongside it, essentially creating a 'serializing barrier.'" Meanwhile, @dani_avila7 revealed the Claude Code team is testing a revamped /init command that interviews you, scans your codebase, and sets up CLAUDE.md, skills, and hooks automatically. Enable it with "CLAUDE_CODE_NEW_INIT": "1" in your settings.json.

@coreyganim also pushed out a guide on the "anatomy of a perfect OpenClaw setup," emphasizing that the real value comes from configuring memory, skills, and custom behavior rather than just installing and chatting. These posts collectively paint a picture of Claude Code evolving from a coding assistant into a configurable development platform.

Model Performance and Efficient Inference

The model benchmarking community had a field day with two developments pushing the boundaries of what's possible on consumer hardware. @sudoingX is testing NVIDIA's Nemotron Cascade head-to-head against Qwen 3.5 on a single RTX 3090: "30B total, 3B active. Fits on a single RTX 3090. Hybrid mamba MoE. Gold medal on the international math olympiad with only 3 billion active parameters." The architecture comparison between Mamba and DeltaNet on identical hardware should produce genuinely useful benchmarks.

On the mobile side, @Alexintosh achieved a 2.3x speedup over baseline for Qwen 3.5 35B on an iPhone 17, hitting 13.1 tokens per second through a cocktail of optimizations: "Fused attention, CMD1+CMD2 Merge, Fused Expert Kernel, Expert Prefetch, I/O Fanout." Running a 19.5GB model on a device with 12GB of RAM was already impressive at 5.6 tok/s two days ago. The rapid iteration on mobile inference optimization suggests on-device AI is closer to practical usability than the benchmarks alone would indicate.

Game Development Meets AI

Game development emerged as a surprisingly active intersection point today. @ErickSky highlighted Unity-MCP, a repo that bridges LLMs directly to Unity Editor and game runtime: "With ONE SINGLE LINE you can convert ANY C# method into a tool that AI can use. AI inside the final game: intelligent NPCs, runtime debugging, a dropship flying on its own." The repo offers 100+ native tools and promises a two-minute CLI setup.

@jojodecayz demonstrated a different angle, using ComfyUI with local agent tools to recreate social media videos by simply dropping URLs. The argument is that ComfyUI's extensibility positions it uniquely to ride the agent wave, while more closed tools will struggle to integrate. Between Unity-MCP's game-engine integration and ComfyUI's creative pipeline automation, the pattern is clear: AI agents are moving beyond text and code into interactive media production.

Developer Best Practices

@doodlestein shared what amounts to the best agent coding tip of the day: whenever your agent fixes a bug, don't let it stop there. "Ask it to also create comprehensive end-to-end integration tests that would have caught that bug and all similar types of bugs in the future." The insight is that this approach doesn't just prevent regression; it actively surfaces latent bugs by forcing the agent to think about entire categories of failure modes.

@0xSero took a complementary angle on code quality, showcasing a /readiness-report feature in Droid that "compares your codebase against best standards and suggests ways to de-slop." As agent-generated code proliferates, the need for automated quality gates becomes critical. The community is starting to build the tooling to keep AI-assisted development from producing technical debt at machine speed.

OpenAI Infrastructure Update

@OpenAIDevs announced a 10x speedup for spinning up agent containers through a new container pool in the Responses API: "Requests can reuse warm infrastructure instead of creating a full container creation each session." This is a meaningful infrastructure improvement for anyone building agent workflows on OpenAI's platform, reducing the cold-start penalty that makes agent interactions feel sluggish. The competitive pressure between Anthropic's Claude Code scheduling and OpenAI's container pooling shows both companies investing heavily in making agents feel instantaneous.

Sources

Hero Wars @HeroWarsWeb · Jan 19

Full Blown RPG in your browser: No Downloads - Just Click and Go!

OpenAI Developers @OpenAIDevs · Mar 21

Agent workflows got even faster. You can spin up containers for skills, shell and code interpreter about 10x faster. We added a container pool to the Responses API, so requests can reuse warm infrastructure instead of creating a full container creation each session. https://t.co/lmvwsaf5HN

Guri Singh @heygurisingh · Mar 22

🚨Architects are going to hate this. Someone just open sourced a full 3D building editor that runs entirely in your browser. No AutoCAD. No Revit. No $5,000/year licenses. It's called Pascal Editor. Built with React Three Fiber and WebGPU -- meaning it renders directly on your GPU at near-native speed. Here's what's inside this thing: → A full building/level/wall/zone hierarchy you can edit in real time → An ECS-style architecture where every object updates through GPU-powered systems → Zustand state management with full undo/redo built in → Next.js frontend so it deploys as a web app, not a desktop install → Dirty node tracking -- only re-renders what changed, not the whole scene Here's the wildest part: You can stack, explode, or solo individual building levels. Select a zone, drag a wall, reshape a slab -- all in 3D, all in the browser. Architecture firms pay $50K+ per seat for BIM software that does this workflow. This is free. 100% Open Source.

keshav @kshvbgde · Mar 23

indie hackers are toughest B2C users > as a founder, this is the only article you'll need to bookmark https://t.co/OMnRBTmxGV

G growthsuck @growthsuck

the ULTIMATE 48h doomsday guide to vibecode & scale your app to $10M ARR

David George @DavidGeorge83 · Mar 23

There are only two paths left for software

To software CEOs, founders, boards, and the investor community: the comfortable middle is over. Public markets have already repriced the sector, and...

Javier @javierblez · Mar 23

This broke my mental model of game dev 💀 2.5 hours → fully playable ‘Worms’ clone. Built with Hermes agent by @NousResearch Here’s what made that speed possible: Hermes used ‘Persistent Shell’ mode, which ensured it didn't forget its current folder or active tools. This allowed it to work smoothly, without the distraction of constantly having to recall where it left off last time. To optimize the workflow, the agent moved beyond linear execution and parallelized the workload. It spawned isolated subagents while executing multiple independent tool calls via ThreadPoolExecutor. Like, one subagent wrote Python RPC scripts for the projectile physics while another utilized vision tools for character sprites. When the complex terrain logic required debugging, the agent used filesystem checkpoints and the /rollback command to instantly return to a stable state. To fix UI bugs, it attached to a live Chrome instance via CDP (/browser connect), fixing rendering issues in real-time. The agent’s built-in learning loop was active from the very beginning. By the time the game was finished, this continuous process allowed the agent to autonomously convert the physics logic into a custom skill. This logic is now a permanent plugin file in the agent's plugin architecture, making the physics engine a native capability that the agent can reuse for future projects. Follow @warv3finale for updates!

Ramya Chinnadurai 🚀 @code_rams · Mar 23

Just implemented all of this. But the article misses 3 things that actually matter: 1. LCM (Lossless Context Management) lets compaction summaries expand back to full detail. No information loss. 2. Cron-driven nightly memory distillation don't rely on manual saves. Automate it. 3. Multi-agent shared workspace all my agents (Vasi, Sana, Nila) read the same memory files. One save benefits everyone. The article is a solid starting point. But if you're running multiple agents, you need layers on top of this.

B BentoBoiNFT @BentoBoiNFT

I Fixed OpenClaw's Biggest Problem (Memory)

Daniel San @dani_avila7 · Mar 23

The Claude Code team is testing a new /init built from community feedback If you wanna try it just add this to your settings.json "CLAUDE_CODE_NEW_INIT": "1" It interviews you, scans your codebase, and sets up CLAUDE.md, skills, and hooks automatically. I tested it on a my repo, here's what happened 👇

Corey Ganim @coreyganim · Mar 23

when I send this to all my friends and they finally understand how to perfectly set up OpenClaw not just "install and chat" a real workspace with memory, skills, and custom behavior https://t.co/L0dGnDu5q6

C coreyganim @coreyganim

Anatomy of a perfect Openclaw setup

Trades Dont Lie @Tradesdontlie · Mar 23

imagine you aren’t running this tech right now what are you even doing writing down things on paper

F FreakinFrick @FreakinFrick

What if You Could Learn While You Sleep?

0xSero @0xSero · Mar 23

How I am de-slopping my code-bases This feature has made my codebases so much less messy /readiness-report in Droid compares your codebase against best standards and suggests ways to de-slop. 2x speed. https://t.co/mUdhiF9NIk

zostaff @zostaff · Mar 23

CLAUDE MADE ME 3 TRADING BOTS IN 15 MINUTES +$2,503 in my wallet the next day, I quit my job that same day. I wrote one prompt and answered a few questions. Claude took three strategies - MACD, RSI + VWAP, CVD divergence. Assigned each bot its own. The first one catches momentum - sees when volume picks up and gets in before the crowd wakes up. The second one trades reversals - waits for everyone to panic and bets against them. The third one scans divergences - when price says one thing but money does another, it follows the money. Built the structure itself, wrote the backtest, ran each strategy on historical data, set up the risk manager, deployed - all on its own. Three bots, three accounts, each trades differently - they don't know about each other. Started with $1, then $5, then $10, then $50, then $2,503 in a day. Citadel, Jane Street, Two Sigma have been trading with bots for years - they don't feel fear, greed, or FOMO - they only listen to algorithms. The market rewards systems.

Z zostaff @zostaff

How to Quit a Job You Hate. How to Build Your Own Trading Bot.A Complete Guide.

Lydia Hallie ✨ @lydiahallie · Mar 23

If you're building a read-only tool with the Claude Agent SDK, make sure to mark it with `readOnlyHint: true` This tells Claude Code it has no side effects and is safe to parallelize. Otherwise no other tool can run alongside it, essentially creating a "serializing barrier"! https://t.co/sLMq5FbirJ

Min Choi @minchoi · Mar 23

Claude just got /schedule Now it can work for you 24/7 while you sleep 🤯 https://t.co/xu8Vun6fZx

N noahzweben @noahzweben

Use /schedule to create recurring cloud-based jobs for Claude, directly from the terminal. We use these internally to automatically resolve CI failures, push doc updates, and generally power automations that you want to exists beyond a closed laptop https://t.co/uuDesRzSrg

alexintosh @Alexintosh · Mar 23

It's optimization time. Qwen3.5 35B at 13.1 tok/sec on iPhone 17 (no-pro). 2.3x from baseline of 2 days ago. Experimental features: --- Fused attention CMD1+CMD2 Merge Fused Expert Kernel Expert Prefetch I/O Fanout (credit @anemll) cc @danveloper @danpacary https://t.co/Kdd6eMfyPZ

A Alexintosh @Alexintosh

I just ran Qwen3.5 35B on my iPhone at 5.6 tok/sec. Fully on-device. 4bit | 256 experts. Model: 19.5GB. iPhone: 12GB RAM. wild. https://t.co/gZErpMVdvO

Ben Davis @davis7 · Mar 23

Been going deeper into the "code mode" stuff. Basically letting the agents write typescript to call MCPs, APIs, and etc. instead of normal tool calls or bash commands. No clue what the final form of this is yet. Really like what @RhysSullivan is working on with executor. I think it or something like it is probably the future

Erick @ErickSky · Mar 24

🚨 Este repo está rompiendo todo el game dev actual. Unity-MCP es el puente que conecta Claude, Cursor, Gemini, Copilot o cualquier LLM directamente con tu Unity Editor Y con el Runtime del juego compilado. Lo que hace es una locura: - +100 tools nativos: crea assets, modifica escenas, ejecuta código… todo con IA. - Con UNA SOLA LÍNEA conviertes CUALQUIER método C# en una tool que la IA puede usar. - IA dentro del juego final: NPCs inteligentes, debugging en runtime, dropship volando solo (hay demo real). - Full AI develop/test loop: la IA construye, prueba y arregla… mientras tú ves tiktoks. - Setup en 2 minutos con el CLI. REPOOO👇

Jeffrey Emanuel @doodlestein · Mar 24

Agent Coding Life Hack: Whenever your agent finds and fixes a bug in your project, don't let it just stop there. Ask it to also create comprehensive end-to-end integration tests that would have caught that bug and all similar types of bugs in the future: "Also I need you, once you've fixed and verified each of those problems is completely resolved and working properly, to create extremely in-depth e2e integration tests that would have caught each of those issues plus any other similar issues of the same basic kind so this sort of thing can't happen again in the future." Not only will this harden your project, but you might be surprised at just how many additional new bugs and problems it immediately flushes out into the open.

Theo - t3.gg @theo · Mar 24

I'm not scared of Anthropic. https://t.co/b0Ip0hFvHS

JO. Z @jojodecayz · Mar 24

People will gradually understand how good it is for ComfyUI to work with agent tools locally. I ask my agent to recreate any social media videos I saw and all I do is dropping the urls. This was a demo a month ago. I have a way better version now. https://t.co/xMzt5fm1dc

C c__byrne @c__byrne

People are missing this: ComfyUI is the only tool in class positioned to ride the agent wave. Others are simply too closed off and unextensible. In a future where agents are integral to work, the artists who stuck with ComfyUI will be miles ahead of those who went elsewhere

Sudo su @sudoingX · Mar 24

the hype around this model settled fast. good. now i can test it without the noise. NVIDIA released nemotron cascade. 30B total, 3B active. fits on a single RTX 3090. hybrid mamba MoE. gold medal on the international math olympiad with only 3 billion active parameters. they say it beats qwen on math, code, and reasoning. i tested qwen 3.5 35B-A3B on a single 3090 at 112 tok/s. now same card, same tests, different architecture. mamba vs deltanet. nvidia vs alibaba. receipts incoming tonight.

S sudoingX @sudoingX

testing cascade 2 on a single 3090 right now. same card i tested qwen 3.5 35B-A3B on at 112 tok/s. same active params, same VRAM tier, different hybrid architectures. mamba vs deltanet head to head. numbers coming tonight. if a spark lands on my desk next you'll get those numbers too.

🍓

🍓🍓🍓 @iruletheworldmo · Mar 24

if you haven’t tried dispatch yet you need to wake up this guide will see you free it’s a must read. must bookmark.

P PawelHuryn @PawelHuryn

The Claude Dispatch Guide: 48 Hours Running AI From My Phone