Multi-Agent Teams Eclipse Single-Model Workflows as Anthropic Teases Claude Memory Files and Local Inference Heats Up

May 25, 2026 · 21 sources

The conversation around AI has shifted decisively from prompt engineering to agent architecture, with multiple high-profile voices arguing that single-agent workflows are already obsolete. Anthropic continues expanding its ecosystem with a new memory upgrade and a notable hire, while developers show that small local models like Qwen 3.5-4B can power sophisticated agent memory stacks on consumer hardware.

Daily Wrap-Up

If there was one unmistakable signal across today's feed, it is that the AI community has moved past the era of chatting with a single model and calling it productivity. From Boris Cherny's talk about Claude Code's multi-agent architecture to Garry Tan's metaphor of building the cerebellum before the prefrontal cortex, the consensus is clear: teams of specialized agents working in concert are the new baseline for serious AI work. This is not theoretical. People are shipping these systems today, and the posts that resonated most were the ones showing exactly how.

Anthropic sits at the center of this shift in multiple ways. TestingCatalog reported that Claude will soon get a file-based memory system that lets users browse and edit organized notes, a feature that looks like preparation for always-on agent experiences. Meanwhile, @sattyyouneed broke the news that @dudufolio has joined Anthropic, continuing the company's aggressive talent acquisition. And at the infrastructure level, developers like @Michaelzsguo are wiring TencentDB's layered memory architecture to Qwen 3.5-4B running locally on a MacBook Pro, proving that agent memory does not require cloud-scale infrastructure or massive models.

The most practical takeaway for developers: stop trying to find the perfect single prompt and start decomposing your workflows into discrete, repeatable steps that specialized agents or skills can own. Whether you use Karpathy's autonomous research setup, Vaibhav Srivastav's Codex workflow auditor, or simply break your next project into researcher/builder/reviewer roles, the compound advantage of multi-agent orchestration is already too large to ignore.

Quick Hits

@tpschmidt_ shares a pragmatic AWS rule: never build on fancy abstractions, only primitives. Six container services that tried to simplify ECS are now archived, EOL, or shut to new customers.
@reevo_ai pitches an AI-native CRM built from scratch, arguing that legacy CRMs predate AI entirely.
@zavudev warns that unofficial WhatsApp libraries can get expensive fast when banned numbers block messages for your users. Their platform offers the official Cloud API with automatic fallback to SMS and email.
@Teslahubs promotes ProGuard, a weather-sealing kit that claims up to 6x less cabin noise for Teslas with a 20-40 minute no-tools install.
@mumaren_2 shares a curated "Follow Builders, Not Influencers" list featuring Karpathy, swyx, Amanda Askell, and dozens of others worth tracking for substantive AI content.
@MichaelHyatt asks whether to install the Hermes agent locally or on a VPS, signaling that always-on personal agents are reaching mainstream adoption curiosity.

The Multi-Agent Imperative: Specialized Teams Beat Single Models

The most striking pattern today is how many independent voices converged on the same conclusion: single-agent workflows are a transitional phase, not the destination. @eng_khairallah1 summarized a talk by Boris Cherny, the creator of Claude Code at Anthropic, who argued that the future is teams of agents, not better prompts. The breakdown is surgical: one agent researching, one building, one reviewing, one orchestrating. Cherny also highlighted that CLAUDE.md alone consumes roughly 14% of context before you type a single word, which means context budgeting is already a critical architectural concern.

This maps directly onto @garrytan's framing of the problem. "Everyone building AI agents is focusing on building the prefrontal cortex. Planning. Reasoning. Multi-step chains," he wrote. "But also, a reframe: there is value in building the cerebellum." His argument is that most agent frameworks will fail because they treat all cognition as high cognition. The winners will nail the boring, reflexive stuff first. Your mortgage gets paid by a standing order, not a committee. The same should be true for automated workflows.

@DanielMiessler took this further by sketching what companies will actually look like when this plays out: graphs of algorithms that are transparent and optimizable, with humans stepping in only as exceptions to be resolved. "Humans doing the main, anticipated work will be a failure case to be solved," he wrote. SOPs and clearly articulated workflows, similar to what Anthropic is building, become the core operational layer.

On the practical side, @rohit4verse shared a concrete setup based on Andrej Karpathy's open-sourced autonomous research agent. The process is elegantly simple: a coding agent edits one file, trains for five minutes, keeps the change if validation loss drops, and reverts if it does not. Git is the memory. The metric is the judge. You wake up to a staircase of validated improvements. Meanwhile, @reach_vb published a detailed Codex prompt that audits your last 30 days of work, identifies repeated manual workflows, and packages them as skills, subagents, or automations. The prompt is remarkably thorough, specifying evidence hierarchy, confidence thresholds, and explicit skip criteria to avoid over-automation.

Tying this all together is @BetterCallMedhi's analysis of Ramp, which he describes as one of the most fascinating companies right now precisely because it is building specialized agent swarms trained on proprietary data rather than relying on generalist LLMs. Ramp's Fast Ask subagent, built with PrimeIntellect using reinforcement learning, scores 4% above Opus on exact match accuracy at Haiku latency. That is the multi-agent thesis in production: small, specialized models that outperform giants on vertical tasks at a fraction of the cost.

Anthropic's Expanding Ecosystem

Anthropic is positioning itself as the platform layer for the multi-agent future, and today's posts show multiple vectors of that expansion. @testingcatalog reported that Claude will soon receive a file-based memory upgrade called Memory Files, offering users a choice between organized notes and classic memory. The feature lets Claude write notes as you chat and read them when relevant, with full browse and edit capabilities. TestingCatalog notes this appears to be an evolution of the previously discovered "Knowledge Bases" feature and closely resembles memory systems in always-on agents like OpenClaw and Hermes. The implication is clear: Anthropic is building toward persistent, stateful agent experiences, and memory is the foundation.

The hiring front is equally active. @sattyyouneed reported that @dudufolio has joined Anthropic, adding to a growing roster of talent focused on safety and product. Combined with Cherny's public talks about Claude Code architecture and Daniel Miessler's references to upcoming Anthropic workflow releases, the picture is of a company systematically assembling both the people and the infrastructure for an agent-native platform.

Agent Memory and Local Inference

One of the most technically detailed posts today came from @Michaelzsguo, who wired TencentDB Agent Memory to Qwen 3.5-4B running locally through llama-server on a MacBook Pro. The architecture is a layered memory stack: L0 stores raw logs in SQLite and JSONL, L1 extracts typed JSON memories, L2 organizes them into Markdown narratives, and L3 generates a coherent persona synthesis. Each layer uses cursor-based checkpointing, so if the local model crashes, the pipeline resumes seamlessly. His choice of Qwen 3.5-4B is telling. It was the smallest model that could reliably handle both structured JSON extraction and multi-step tool use. Qwen 2.5-3B was too brittle. The local inference sweet spot is moving fast.

This pairs naturally with @ItsmeAjayKV's appreciation post for Qwen 3.6, the 35B Mixture-of-Experts model running on a consumer 3060 GPU. He has burned over a million tokens across Hermes, the Pi coding agent, and other tools, all locally. The open-source local model ecosystem has reached a point where daily-driver inference on consumer hardware is genuinely practical for production agent workloads.

Developer Tooling for the Agent Era

The tooling layer is catching up to the ambition. @AniC_dev introduced Box, a sandboxing solution built specifically for agents that promises to be both simple and affordable. Sandboxes are critical infrastructure for any multi-agent system that needs to execute arbitrary code safely, and the current options are either expensive or complex. Box appears to target that gap directly.

@BHolmesDev shared his experience with Matt Pocock's Skills library, using a grill-with-docs command for major feature work and a handoff command for side-quests he does not want to context-switch into. Instead of temporary markdown files, he tracks these in GitHub issues, which is a small but significant workflow improvement that keeps agent-generated tasks visible to the whole team.

@mitsuhiko, known for Flask and Sentry, wrote about his learnings maintaining Pi as a junior contributor to @badlogicgames. The post touches on agentic engineering from the maintainer's perspective, a viewpoint that is underrepresented in a conversation dominated by builders and users. As AI agents increasingly interact with open-source projects as contributors, the maintenance burden and workflow implications for project maintainers will become a first-class concern.

The AI Productivity Paradox

@thdxr captured a frustration that many developers feel but few articulate clearly: looking back at past projects, it seems obvious they would have been completed faster with AI, yet everything still feels as slow and difficult as ever. This is the productivity paradox of AI tooling. The tools are demonstrably faster at individual tasks, but the overall cadence of complex projects has not meaningfully accelerated. The bottleneck has shifted from execution to integration, decision-making, and the overhead of orchestrating AI assistance itself.

@addyosmani named a related concept: cognitive surrender, defined as the moment you stop thinking altogether and blindly accept the answer the AI gives you. It is the psychological mirror of the productivity paradox. When AI speeds up the easy parts of a task, the remaining hard parts feel harder by contrast, and the temptation to surrender critical judgment grows precisely when it matters most. Together, these two observations frame the real challenge of the agent era. The technology for multi-agent systems, local inference, and persistent memory is arriving faster than our workflows and habits can adapt. The developers who benefit most will be those who treat AI as a system to be architected, not a magic button to be pressed.

Sources

Teslahubs @Teslahubs · May 2

🔥 Make your Tesla quieter & more premium with ProGuard + Up to 6× less cabin noise + Cuts wind, road & rain noise + 20–40 min install, no tools + All-weather sealing (water & dust) + Better thermal insulation + Fewer rattles & vibrations Get your ProGuard now!

Reevo @reevo_ai · May 4

Your CRM was built before AI existed. Reevo wasn't. The AI-native CRM.

Zavu.dev @zavudev · May 11

Using unofficial WhatsApp libraries can get VERY expensive. A banned number doesn’t just break your integration: it blocks messages for your users. With Zavu, you get the official WhatsApp Cloud API + automatic fallback to SMS and Email.

Khairallah AL-Awady @eng_khairallah1 · May 23

Boris Cherny, the creator of Claude Code at Anthropic, just explained why single-agent workflows are already dead in this talk he breaks down exactly how the future is teams of agents, not better prompts: - the 14% you lose to CLAUDE.md before typing a word - one agent researching. one building. one reviewing. one orchestrating - the architecture that separates hobbyists from real builders - the 3 properties every agent team needs to actually survive if you've been using Claude for more than a month and never left the chat window, you've been using one agent when you could be running a team of them instead of another show tonight, watch this make sure to bookmark it before it gets lost in your feed the guide is in the article below

E eng_khairallah1 @eng_khairallah1

https://t.co/WO3VvIgJZ2

AJ @ItsmeAjayKV · May 23

Qwen 3.6 will forever hold a special place in my heart. The first local model (35b moe) that I ended up using every day for a months duration, still going strong, more than a million tokens burned, works with Hermes, pi coding agent, basically all tools i use. All on my 3060. I hope I can stop using it soon, where's qwen 3.7 35B A?E moe, and or qwen3.7-27b dense ? @Alibaba_Qwen

Addy Osmani @addyosmani · May 23

"Cognitive surrender is when you stop thinking altogether and blindly accept the answer the AI gives you" https://t.co/qfUVRH7LkB

Vaibhav (VB) Srivastav @reach_vb · May 24

UPDATE: Came up with an even better version of this prompt after the feedback Ask Codex to look across your sessions, Memories, and Chronicle, identify patterns, reuse what already exists, and only create the smallest useful skill, subagent, or automation. "Look back over my recent work from the last 30 days, or all available history if shorter, and identify repeated manual workflows worth packaging. Use available evidence in this order: - Recent Codex sessions and task summaries. - Codex Memories and rollout summaries to find patterns repeated across sessions. - Chronicle, if enabled, to spot repeated work outside Codex. Use Chronicle for discovery only; confirm important details in the relevant source system when possible. - Existing skills, custom agents, and automations, so you reuse or extend what already exists instead of duplicating it. Look broadly for work that is repeated, time-consuming, error-prone, context-heavy, or benefits from a consistent process. Include workflows across coding, research, writing, planning, communication, operations, analysis, and personal administration. Only act on a candidate when it: - occurred at least twice, or is clearly likely to recur and costly to repeat; - has stable inputs, a repeatable procedure, and a clear output or stopping condition; - would materially improve speed, quality, consistency, or reliability; - is not already adequately covered. Choose the smallest appropriate form: - Skill: a reusable workflow or playbook. - Custom subagent: a bounded specialist role or investigation task suitable for delegation. - Automation: a scheduled or recurring check, report, reminder, or monitor. - Skip: work that is too one-off, ambiguous, sensitive, or poorly evidenced to package. First produce a compact shortlist with: - repeated workflow - supporting evidence and dates - frequency/confidence - recommended form: skill, subagent, automation, extend existing, or skip - why it is or is not worth creating Then create only the high-confidence missing items. Keep them narrow, practical, source-aware, and easy to validate. Do not create speculative, overlapping, or overly broad assets. Finish with: - what you created or extended - what you deliberately skipped - what needs more evidence before packaging"

R reach_vb @reach_vb

Copy and paste this into your codex: “Look through my recent Codex sessions and identify repeated workflows or repeated asks. For anything I keep doing manually, suggest: 1. a skill if it is a reusable workflow 2. a custom subagent if it is a bounded role or investigation task Focus on practical things like CI failures, PR reviews, changelogs, docs updates, release prep, debugging, and test triage. Create the useful ones only. Keep them simple.”

🚨

🚨 AI News | TestingCatalog @testingcatalog · May 24

ANTHROPIC 🔥: Claude will soon receive a new file-based memory upgrade, offering users the option to choose between Memory Files and Classic memory. > Organized notes Claude writes as you chat and reads when they're relevant. Browse and edit them anytime. This feature appears to be a new iteration of the previously discovered "Knowledge Bases" and more closely resembles what memory works in always-on agents like OpenClaw and Hermes. Considering a potential future debut of Claude Conway, Memory Files feature is likely an important preparation step.

Armin Ronacher ⇌ @mitsuhiko · May 24

Has been a while since I wrote about agentic engineering, so this time around some learnings of maintaining Pi as a junior maintainer to @badlogicgames :) https://t.co/TbD9Jvqk3t

Satyam @sattyyouneed · May 24

BREAKING: @dudufolio has joined Anthropic https://t.co/aV4QGCuPtd

Garry Tan @garrytan · May 24

Everyone building AI agents is focusing on building the prefrontal cortex. Planning. Reasoning. Multi-step chains. There's value here. CEO-stuff. But also, a reframe: there is value in building the cerebellum. It's offloading boring tasks into reflex so the complex thought can focus. Your mortgage gets paid by a standing order, not a committee. The things that are not fun, not interesting, but have to be done? Done. Most agent frameworks will fail because they treat all cognition as high cognition. The winners will nail the boring stuff first.

Anicet @AniC_dev · May 24

introducing box📦 simple, powerful sandboxes for agents and the most affordable as well https://t.co/IV0ayOw4GC

Michael Hyatt @MichaelHyatt · May 24

I’m about to install @Hermes_agentAI. Should I install on local, dedicated machine or VPS? Why? cc: @gregisenberg, @AlexFinn, @NetworkChuck

Michael Guo @Michaelzsguo · May 24

Today I upgraded my Hermes agents with TencentDB Agent Memory. I did not connect it to a cloud LLM. Instead, I wired it to Qwen 3.5-4B running locally on my MacBook Pro through llama-server. It works great. TencentDB Agent Memory is a very well-designed product. It uses a layered memory stack: L0 raw logs in SQLite + JSONL L1 typed JSON memory extraction L2 scene blocks as Markdown narratives L3 persona synthesis and interaction protocols An LLM is needed to extract and synthesize information for each layer except L0. At L1, the LLM parses conversations into typed memories as JSON: persona facts, episodic events, and instructions. At L2, the LLM uses tool calls, such as read, write, and edit, to organize memories into thematic Markdown narratives. At L3, the LLM reads all scenes and generates a coherent user profile with interaction protocols. Each layer has cursor-based checkpointing, so if the local model goes down, nothing is lost. The pipeline resumes from the last processed cursor. Why Qwen 3.5-4B? It was the smallest model I found that could reliably handle the two jobs this architecture needs: structured JSON extraction and multi-step tool use. Qwen 2.5-3B was too brittle for this setup. Qwen 3.5-4B hits the local inference sweet spot.

Mehdi (e/λ) @BetterCallMedhi · May 24

pour moi @tryramp est sans doute l'une des boites les plus fascinantes du moment partis d'une simple solution de gestion de cartes corporate ils sont en train de devenir un véritable labo d'ia appliquée qui repense complètement le fonctionnement interne des entreprises alors que la majorité des startups se contentent d'accrocher le mot IA sur leur pitch deck pour gratter quelques millions, Ramp publie ses papiers de recherche, fait du fine-tuning par renforcement avec @PrimeIntellect et intègre tout ça dans un produit qui génère déjà plusieurs centaines de millions de revenus leur vision est visiblement limpide et très peu de gens la copient, je pense que l'avenir du b2b ce ne sont pas des LLM généralistes accessibles via une api commune mais des essaims d'agents spécialisés entraînés sur des données propriétaires capables de battre les modèles géants sur des tâches verticales avec 10 fois moins de latence et 100 fois moins de coût et c'est cette accumulation patiente de petits agents performants sur chaque interaction qui crée à long terme un moat impossible à répliquer & je crois que ceux qui auront vu ça en 2026 regarderont ramp en 2035 comme on regarde aujourd'hui amazon en 2005, une boite ennuyeuse qui a calmement avalé son industrie

R RampLabs @RampLabs

We partnered with @PrimeIntellect to build Fast Ask, a small RL-trained subagent that helps our Sheets agent find answers in spreadsheets. It scores +4% over Opus on exact match accuracy at Haiku latency. https://t.co/GJQvHJjABl

Ben Holmes @BHolmesDev · May 24

Finally diving into @mattpocockuk's Skills library. Loving it so far. Now using /grill-with-docs for big feature work, Codex and CC Using /handoff whenever I notice a side-quest I don't want to work on right away. But instead of temp .md files, I'm tracking in GitHub issues https://t.co/5HcMzYzfan

ᴅ

ᴅᴀɴɪᴇʟ ᴍɪᴇssʟᴇʀ 🛡️ @DanielMiessler · May 24

Agree it's interesting, but I don't think it's correct. There will be people doing PM-like things of course, but that's turning into a skill vs. being a role. Same with design, or the ability to build/code. And being able to pitch the vision, sell it, etc. And ultimately this is what companies will look like: Graphs of Algorithms. Transparent. Optimizable. Harnesses are what humans use. The core of work being done in companies will be through the execution of SOPs, and they'll look more like the /workflows that Anthropic is about to release. A series of clearly articulated steps that do _________. Humans will definitely have harnesses, but a human using a harness to do something is like an issue that needs to be resolved. Why wasn't that work part of an existing SOP or process? Can it be added? Humans doing things will largely be optimizations of these workflows according to their keen human understanding of the problem and how they want it solved, and coming up with net-new things. But humans doing the main, anticipated work will be a failure case to be solved.

P pmarca @pmarca

Interesting.

木

木马人2.0 @mumaren_2 · May 25

这条推文的含金量还在上升 Follow Builders, Not Influencers~ @karpathy — 前 OpenAI/Tesla AI，现 Eureka Labs，AI 教育传奇 @swyx — AI Engineer 运动发起人，Latent Space 播客主理人 @joshwoodward — Google Labs VP，负责 Gemini App 和 AI Studio @kevinweil — 前 OpenAI CPO，前 Instagram/Twitter 产品负责人 @petergyang — Roblox 产品领导，Behind the Craft 作者 @thenanyu — Linear Head of Product，一线 AI 产品构建者 @realmadhuguru — Google Gemini 产品领导，推动"快速构建"文化 @AmandaAskell — Anthropic 哲学家，塑造 Claude 的人格与品格 @_catwu — Anthropic Claude Code 产品负责人 @trq212 — Anthropic Claude Code 工程师，深度分享 AI Agent 实践 @GoogleLabs — Google 官方 AI 实验账号 @amasad — Replit CEO，AI 编程工具推动者 @rauchg — Vercel CEO，Next.js 作者 @alexalbert__ — Anthropic Claude 团队 PM @levie — Box CEO，企业级 AI 与商业趋势洞察 @ryolu_ — Cursor Head of Design，前 Notion/Stripe @garrytan — Y Combinator CEO，AI 创业生态 @mattturck — AI 投资人，MAD Podcast 主理人 @zarazhangrui — follow-builders 项目作者，AI Builder & 策展人 @nikunj — FPV Ventures 合伙人，AI 时代 SaaS 思考 @steipete — iOS/macOS 开发传奇，现聚焦 AI 开发工具 @danshipper — Every 创始人，探讨 AI 对工作与创造的影响 @adityaag — South Park Commons GP，前 Dropbox CTO @sama — OpenAI CEO— Anthropic 官方 Claude 账号 @mumaren_2 —木马人，多年大厂经验，专注AI领域知识和工具分享

M mumaren_2 @mumaren_2

是不是烦透了时间线上的垃圾内容？然后刷了半天刷不到有用信息？今天分享一个X的小技巧只要3步帮你轻松解决信息源的问题！ 1.选择你喜欢的博主，点击右上角，从列表中添加 2.选择添加列表，自定义列表名称，比如我这里有个AI信息源，就是所以我觉得好的AI博主的列表，看到就更新 3.配置时间线，在设置—时间线—主页标签中，可以自定义列表位置和内容这样设置完，如果看到感兴趣就添加到列表中，每天刷刷列表就行了如果觉得有用的话，点个赞吧！

dax @thdxr · May 25

think back to projects you've worked on in the past it's hard not to imagine they'd have been completed way faster now that we have ai but everything still feels as slow and as difficult as ever

Tobias Schmidt @tpschmidt_ · May 25

My rule for AWS: never build on the fancy abstractions. Only on the primitives. 6 services in the container space are either archived, EOL, or shut to new customers. All of them tried to make ECS easier. • 𝗘𝗖𝗦 𝗖𝗟𝗜, archived November 2025.

Rohit @rohit4verse · May 25

Every night you're not running an autonomous research agent, you're hand-running experiments someone else automated months ago. Most people are still hunting for the "right" setup. Frameworks, orchestration, glue code. You don't need any of it. Andrej Karpathy open-sourced his own version that runs its own ML research. One GPU. ~100 experiments overnight. You never touch the Python. Here's the exact setup (takes 2 minutes): 1. Clone it: (repo link in comments) 2. uv sync, then uv run prepare[.]py 3. uv run train[.]py once to confirm the baseline runs 4. Point your coding agent at program.md and walk away The agent edits one file, trains 5 minutes, keeps the change if val_bpb drops, reverts it if it doesn't. Git is the memory. The metric is the judge. You wake up to a staircase of validated improvements, not a backlog of ideas you never tested.

C cyrilXBT @cyrilXBT

How to Build a Claude Research Agent That Reads the Internet Every Morning and Briefs You in 5 Mins