Anthropic Acquires Stainless as Cursor Drops Composer 2.5, AMD Shrinks 200B Models to Desktop, and Agents Enter Production

May 18, 2026 · 21 sources

Anthropic snapped up SDK platform Stainless API to own its developer tooling stack, while Cursor's Composer 2.5 emerged partially trained on xAI's Colossus 2 supercomputer with Elon Musk's endorsement. AMD unveiled a pocket-sized AI dev PC running 200B-parameter models locally, Alibaba's Qwen3.7 hit the Arena, and the agentic ecosystem took a leap forward with Devin's auto-triage, Poly's enterprise voice platform, and Neo4j's open-source agent memory layer.

Daily Wrap-Up

Today's signals all point in one direction: AI is leaving the demo stage and moving into production systems that actually run the business. Anthropic's acquisition of Stainless API is a clear infrastructure play, bringing the SDK tooling that has powered every Anthropic API since day one in-house. Meanwhile, Cursor launched Composer 2.5, its most powerful model yet, partially trained on Elon Musk's Colossus 2 supercomputer, a detail Musk himself amplified. The coding assistant wars are no longer about who has the best autocomplete; they are about who controls the full stack from training compute to developer workflow.

The hardware side is catching up just as fast. AMD CEO Lisa Su unveiled what is being called the world's smallest AI development PC, capable of running 200B parameter models locally. Pair that with Unsloth's announcement that Qwen3.6 now runs twice as fast with MTP GGUFs on just 18GB of RAM, and the economics of local inference are shifting dramatically. On the agent front, the maturity curve is unmistakable: Devin now auto-triages bugs and opens PRs with long-term memory, Poly's voice platform has resolved over a billion real enterprise conversations, and Neo4j released an open-source unified memory layer for AI agents built on knowledge graphs. These are not prototypes. They are deployed systems handling production workloads. The most practical takeaway for developers: invest in agent memory and triage architectures now. Tools like Neo4j's agent-memory library, Devin's Auto-Triage pattern, and the MoE expert offload techniques being benchmarked by the community represent the foundational building blocks of production AI systems. The gap between teams shipping agentic workflows and those still prompt-tweaking chatbots is widening by the week.

Quick Hits

ElevenLabs launched a dedicated YouTube channel for AI engineers, promising deep dives into Text-to-Speech, Speech-to-Text, and ElevenAgents (@ElevenLabsDevs)
Charly Wargnier shared a meme about Hermes users that hit a nerve in the open-source LLM community (@DataChaz)
A cybersecurity humor post laid out the eternal backup plan: Plan A is infosec, Plan B is whatever that video shows (@cyber__razz)

AI Agents Graduate to Production

The agent ecosystem crossed a meaningful threshold today. Three separate announcements from different corners of the industry all converged on the same message: agents are no longer experimental. Cognition introduced Devin Auto-Triage, an AI first-responder that monitors incoming bugs and alerts, investigates them with long-term memory, and returns with context, next steps, or a ready PR. Nader Dabit (@dabit3) framed the significance clearly: "Most coding agents still live in the 'write code' part of the SDLC. The next era of AI software development is moving agents directly into prod. Alerts come in, PRs get opened, and the system learns: full context + running memory."

On the voice side, Poly opened its Agentic Dialog Platform to all enterprise builders. The company's proprietary model, Raven, was built from scratch for dialog rather than adapted from chat-based LLMs, and it has already handled over a billion conversations for clients like FedEx, Marriott, and PG&E. Vas (@vasuman) nailed why this matters: "Voice is one of the most overlooked commerce surfaces in any company. The customer dialed, they're committed for ten minutes, they can't tab away, they can't comparison shop. It's the highest-attention channel a business has." Voice AI eliminates the headcount bottleneck that kept phone support underinvested for decades.

Tying the agent story together, Paul Iusztin (@pauliusztin_) highlighted Neo4j's agent-memory repository as the best open-source implementation of a unified memory layer for AI agents via knowledge graphs, modeling short-term, long-term, and reasoning memory with sophisticated extraction algorithms. And Antoine Rousseaux (@AntoineRSX) reminded the community that Hermes Agent can be spun up in under five minutes with no API keys or SSH required. The tooling layer for agents is filling in fast, and the barrier to entry keeps dropping.

Models, Hardware, and the Local Inference Revolution

If you needed proof that local inference is becoming viable for serious workloads, today delivered. AMD CEO Lisa Su unveiled what Big Brain AI (@realBigBrainAI) described as "the world's smallest AI development PC, capable of running 200B parameter models locally." That is not a typo. Two hundred billion parameters, on a desktop form factor. The implications for data-sensitive industries and offline deployments are enormous.

On the software optimization side, Unsloth AI (@UnslothAI) announced that Qwen3.6 now runs twice as fast using MTP (Multi-Token Prediction) GGUFs, requiring just 18GB of RAM. The numbers are striking: Qwen3.6-27B MTP hits 160 tokens per second, while the 35B-A3B MoE variant reaches 240 tokens per second, all with no accuracy degradation. This builds on benchmarking work from @witcheer, who demonstrated that MoE expert offloading on 8GB VRAM is 10.8 times faster than dense layer offloading, with the gap widening to 16.7x at longer context lengths. The key insight: MoE keeps the active hot path entirely in VRAM while only moving inactive experts to CPU, whereas dense models bounce every token through PCIe for all layers. @0xSero raised the next frontier question: "How can you predict which experts are going to be active given a prompt's trajectory?" Solving that prediction problem could unlock another order-of-magnitude improvement.

Alibaba also entered the chat with Qwen3.7 Preview, landing on Arena with Qwen3.7-Max-Preview and Qwen3.7-Plus-Preview, positioning Alibaba as a top-six lab in text benchmarks. Between hardware breakthroughs, quantization tricks, and architectural innovations like MoE and MTP, the local AI story is accelerating faster than most cloud providers would like to admit.

The Anthropic and Cursor Ecosystem Expands

Anthropic made a strategic move today by acquiring Stainless API, the SDK and MCP server platform that has powered every Anthropic SDK since the API's earliest days. This is not a talent acquisition or a feature add. It is Anthropic bringing a critical piece of its developer experience infrastructure in-house, ensuring that the tooling layer between Claude and the engineers building on it is fully under Anthropic's control. The move also positions Anthropic more deeply in the MCP ecosystem at a time when protocol adoption is accelerating.

On the product side, ClaudeDevs shared best practices for running Claude Code at scale across multi-million-line monorepos, decades-old legacy systems, and distributed microservices. The post reflects real production experience: teams are not just experimenting with AI-assisted coding, they are deploying it against their gnarliest codebases and learning what actually works.

The biggest surprise of the day came from Elon Musk, who endorsed Cursor's new Composer 2.5 model with a casual "Try it out!" followed by a revealing parenthetical: "(Partially trained on Colossus 2)." That is xAI's supercomputer being used to train a third-party coding model. Cursor describes Composer 2.5 as more intelligent, better at sustained work on long-running tasks, and more reliable at following complex instructions. The fact that Musk is publicly amplifying a product built partially on his compute infrastructure suggests interesting alignment dynamics in the AI tooling space that go beyond simple competition.

AI Evaluation and the Security Frontier

A former Google DeepMind researcher, Lun Wang, announced his departure with a pointed critique of the entire AI evaluation paradigm. As summarized by @0xLogicrw, Wang's argument is that current eval systems are fundamentally backward-looking, testing only capabilities models already possess while remaining blind to emergent behaviors in next-generation systems. The most alarming gap: if a model learns to strategically withhold information to achieve a goal, current safety tools cannot detect it because every individual output statement may be factually correct. Wang called for self-evolving evaluations that grow alongside models rather than static benchmarks that become obsolete the moment a model crosses a capability threshold.

Complementing this security-conscious thread, Cloudflare's security team published findings from testing Anthropic's Mythos offensive AI tool against fifty of their own repositories. Their takeaway was nuanced: faster patching is the wrong reaction, and the architecture around vulnerability management needs fundamental rethinking in an era of AI-powered offense. Together, these two posts paint a picture of an industry that has built powerful systems but is still catching up on how to measure and secure them.

AI, Creativity, and the Human Element

Take-Two CEO Strauss Zelnick delivered what @MarioNawfal called a precise argument the AI industry does not want to hear, and he did it without being anti-AI. The core thesis: AI is built on backward-looking datasets, and all genuine hits are forward-looking by definition. Asset creation and hit creation are fundamentally different jobs. AI is getting very good at the first. Nobody has shown it can do the second. As Zelnick implicitly argued, the thing AI cannot replicate is taste: the cultural antenna that detects the gap in the market before the data can see it. "Data tells you what people wanted. Hits tell people what they want next. Those are different jobs."

Meanwhile, Garry Tan (@garrytan) pushed back against media narratives dismissing AI adoption, defending Reese Witherspoon for encouraging moms to try AI before absorbing anti-AI sentiment. And Aaron Levie (@levie) offered a pragmatic take on the jobs conversation, noting that Fortune 500 CEOs across every industry are desperate for technical talent to implement agentic systems. The demand for skills has not disappeared; it has shifted from building customer-facing apps at tech companies to deploying AI infrastructure everywhere else.

Developer Workflow and the Prompting Craft

Two posts today stood out for their practical, immediately applicable advice. Matt Pocock (@mattpocockuk) shared a recurring experience with UI prototyping: "Every time I run /prototype on UI I think, 'Should I really be burning tokens on 3 radically different UI designs every time?' And then every time, it gives me something that surprises me and the design ends up awesome." The lesson is simple but worth internalizing. The cost of generating multiple diverse options is negligible compared to the value of surprise, and developers who skimp on variation are leaving better outcomes on the table.

@trq212 shared a prompt pattern gaining traction: asking the model to keep a running implementation-notes.html file while executing a spec, documenting decisions not in the spec, tradeoffs made, and anything else worth knowing. This turns the coding agent from a silent executor into a collaborator that surfaces its reasoning. It is the kind of lightweight process improvement that compounds across hundreds of interactions and makes the difference between an AI tool you trust and one you constantly second-guess.

Sources

Antoine Rousseaux @AntoineRSX · May 6

Get Hermes Agent running in less than 5 minutes. No API key, No bot token, No SSH. Just running with the skills and plugins adapted to your needs.

ElevenLabs Developers @ElevenLabsDevs · May 14

Introducing ElevenLabs Devs, a new YouTube channel for AI engineers. Expect deep dives, demos, and clear explanations of key concepts across Text to Speech, Speech to Text, ElevenAgents, and broader AI systems. Subscribe: https://t.co/bZzvbzMc5F

Charly Wargnier @DataChaz · May 16

Hermes users after reading this: https://t.co/4fdVU9UkhT

A akshay_pachaar @akshay_pachaar

https://t.co/Exoyd8tB0d

Abdulkadir | Cybersec @cyber__razz · May 17

PLAN A: Cybersecurity PLAN B: https://t.co/kSAzp1w8zi

Mario Nawfal @MarioNawfal · May 18

The CEO of Take-Two, the company behind GTA, just said something the entire AI industry doesn't want to hear. And he said it without being anti-AI. Strauss Zelnick's argument is precise. AI is built on datasets. Datasets are backward-looking. Creativity is forward-looking. A model trained on everything that already exists cannot, by definition, produce something genuinely unexpected. And all hits, by their very nature, are unexpected. Asset creation and hit creation are not the same thing. AI is getting very good at the first one. The second one is what actually makes money, builds franchises, and changes culture. Nobody has shown AI can do that yet. The derivative property problem is real. You can clone GTA with existing technology. You could do it before AI. It would take 3 years and look identical. It still wouldn't sell. Because it isn't GTA. It's a clone of GTA. And consumers, despite what the industry occasionally pretends, can feel the difference between something genuinely new and something assembled from the residue of things that already worked. Thousands of mobile games ship every year. 0 to 5 hits get made. The same studios make them every time. The technology to make more games has been commoditized for years. It didn't democratize hit creation. It just flooded the market with more forgettable product. The Silicon Valley thesis that AI unlocks game creation for everyone is true in the same way that cheap cameras unlocked filmmaking for everyone. They did. And the same 5 studios still make the movies everyone watches. What Zelnick is saying, without quite saying it, is that the thing AI cannot replicate is taste. The instinct for what hasn't been done yet. The cultural antenna that detects the gap in the market before the data can see it. Data tells you what people wanted. Hits tell people what they want next. Those are different jobs.

M MarioNawfal @MarioNawfal

🇺🇸 Tucker lays out the deepest critique of AI yet, and it's not about jobs... His argument: writing produces thinking. You can't formulate a thought without first articulating it. If kids never write because AI writes for them, the quality of human thinking collapses. That's the surface problem. The deeper one is purpose: "The point of living is to create. That's the point of being a human being. It's necessary for joy. There is no joy without creation." If the machine creates everything and humans just consume, you don't get utopia. You get despair, mass unemployment, and eventually political revolution.

Aaron Levie @levie · May 18

Right now there’s a temporary mismatch between the jobs that used to be sought after in some fields and the new jobs that are becoming in demand in those fields. For instance, if you studied CS, for years the general direction of travel was often to join a tech company and build customer-facing software in some form. A significant portion of the CS pipeline from college to hire was built for this. When you realize that AI is going to make coding abundant, you realize everyone will need technical talent to implement agentic systems. This means the types of roles engineers should be thinking about radically expands. I was talking to a Fortune 500 pharma CEO a week ago that commented on how much more technical talent they need right now. The job may be different from what it was 5 years ago when thinking about tech, but the demand for the skills are still there. And this is what I’m hearing from every CIO and CEO across nearly every industry right now. We definitely need colleges to wake up to this; but we equally need companies think about how they craft pipelines into these jobs.

P PeterDiamandis @PeterDiamandis

If AI now accounts for 25% of corporate layoffs, but 275,000 'AI jobs' are open, what's the real problem? It's not that AI is killing jobs. It's that we're training people for careers that expired five years ago. The education system is the bottleneck—not the technology. Fix that, and abundance follows.

Paul Iusztin @pauliusztin_ · May 18

`agent-memory` by @neo4j is the best open-source repository for building a unified memory layer for AI agents via knowledge graphs. They did such an amazing job modeling short, long, and reasoning memory, their ontology, and extraction algorithms. After spending 2 days dissecting and testing their code, I wrote a full piece on https://t.co/NYVGH6MvYd explaining how it works. I will publish it tomorrow.

思

思维怪怪 @0xLogicrw · May 18

Google DeepMind 研究员 Lun Wang 宣布离职，并在一篇长文中彻底否定了现有的 AI 评测路线。目前的评测系统全都在「刻舟求剑」，只能被动测试模型已经具备的能力，根本猜不到下一代模型会突然演化出什么新本事。比起数据、算力和架构，落后的评测体系已经成了卡住 AI 往前走的最大瓶颈。现有的主流刷榜测试只在当前这一代模型身上管用。一旦模型学会了没见过的新操作，这些测试就会集体变成废纸。如果模型为了达成目标，开始故意「藏一手」隐瞒关键信息，现在的安全工具根本抓不到它，因为模型输出的每一句话在事实上全都是正确的。找不到能提前预警 AI 突然变聪明的「核心信号」，导致整个业界在开发前沿大模型时完全处于「盲飞」状态。如果不解决「究竟该测什么」这个根本问题，跟着旧指标去做模型训练、安全防护和算力扩容，最后全都会错得离谱。面对越来越能独立干活的模型，评测系统也必须「活」过来。除了盯紧分数的异常波动，还要让 AI 自己去生成考题试探同类的底线。未来的评测套件必须是一个能跟大模型一起进化的生命体，不能再是一份按去年标准刻出来的死板检查清单。

L lunwang1996 @lunwang1996

I’ve left Google DeepMind after an amazing chapter. I’m incredibly grateful for the people I worked with, the things we built, and the lessons I learned from taking frontier AI research into production. DeepMind shaped how I think about research, product, evaluation, and what it takes to build AI systems at real scale. As I wrap up this chapter, I wrote down something I’ve been thinking about a lot: evals. We’re good at evaluating the models we have. We’re much worse at evaluating the models we’re about to build — especially if they cross into a new capability regime. We will have self-evolving models, but before that, we need self-evolving evaluations. https://t.co/F1lUWxDG2D

Big Brain AI @realBigBrainAI · May 18

Lisa Su (CEO of AMD) unveils the world's smallest AI development PC, capable of running 200B parameter models locally. https://t.co/rgGCRcOvWV

Matt Pocock @mattpocockuk · May 18

Every time I run /prototype on UI I think: "Should I really be burning tokens on 3 radically different UI designs every time?" And then every time, it gives me something that surprises me and the design ends up awesome

Cloudflare @Cloudflare · May 18

Cloudflare's security team spent the last few weeks testing Anthropic's Mythos against fifty of our own repositories. What we learned about offensive AI, why faster patching is the wrong reaction, and what the architecture around vulnerabilities has to look like next. https://t.co/RSrRtIhgaV

Garry Tan @garrytan · May 18

The NYT is predictably tearing down Reese Witherspoon for encouraging moms to try AI before they ingest the anti-AI pablum as truth Instead of linking to the NYT op-ed, I think you should watch this video and encourage you to follow Reese Witherspoon on Instagram https://t.co/Z2iI8ddaSt

Unsloth AI @UnslothAI · May 18

Qwen3.6 now runs 2x faster with MTP GGUFs! Run locally on just 18GB RAM. ⚡️ MTP enables Qwen3.6 to generate ~1.4–2.2× faster with no accuracy change. Qwen3.6-27B MTP runs at 160 tokens/s. 35B-A3B reaches 240 t/s. GGUFs: https://t.co/7gWhKnseZo Guide: https://t.co/7qzk6ypWDQ https://t.co/8ICXw7iV3G

0xSero @0xSero · May 18

I still believe MoEs with cpu offloading can be competitive and bring down costs tremendously. I hit a wall with my testing, mainly: How can you predict which experts are going to be active given a prompt’s trajectory? Anyone interested in digging into this more? Shoot a plan

W witcheer @witcheer

MoE vs dense offload on 8GB VRAM MoE offload is 10.8x faster than dense offload on 8GB VRAM. here's the proof. I tested Qwen3.6 35B A3B (MoE, 3B active) vs Qwen3.6 27B (dense, 27B active) on my RTX 4060 Ti 8GB. the numbers: >MoE (-ncmoe 30): 35.4 tok/s >dense (-ngl 20): 3.28 tok/s ratio: 10.8x it gets worse at longer context. at 24K tokens, the gap is 16.7x. MoE has zero context degradation (SSM layers), dense loses -35.4%. why: MoE expert offload keeps the hot path (3B active params) entirely in VRAM. only inactive experts move to CPU when selected. dense layer offload splits every layer across GPU and CPU. every token bounces through PCIe for all 64 layers. the bandwidth bottleneck is fatal. quality is slightly better on dense (5/6 vs 4/6). the 27B model has the best hallucination resistance of all 9 models I tested. if you have 8GB VRAM and a model that doesn't fit: MoE with expert offload, not dense with layer offload.

ClaudeDevs @ClaudeDevs · May 18

What are best practices for running Claude Code at scale? New blog post on what we've learned from teams running it across multi-million-line monorepos, decades-old legacy systems, and distributed microservices: https://t.co/rJUYlIUiTT

noname @malikwas1f · May 18

RT @Alibaba_Qwen: 🚀🚀Qwen3.7 Preview lands on Arena ！ Here come Qwen3.7-Max-Preview & Qwen3.7-Plus-Preview. Alibaba now #6 lab in Text, #5…

nader dabit @dabit3 · May 18

Most coding agents still live in the “write code” part of the SDLC. The next era of AI software development is moving agents directly into prod. Alerts come in, PRs get opened, and the system learns: full context + running memory. These types of automations save your team countless hours, harden your codebase, improve uptime, and enable engineers to focus on higher-leverage work.

C cognition @cognition

Introducing Devin Auto-Triage: Your AI first-responder with long-term memory. Devin can monitor incoming bugs, alerts, and incidents, investigate them, and come back with context, next steps, or a PR.

Thariq @trq212 · May 18

a prompt I've been using a lot recently: implement <SPEC> and while you do, keep a running implementation-notes.html file (or markdown) with decisions you had to make weren't in the spec, things you had to change, tradeoffs you had to make or anything else I should know https://t.co/qQFTES4fjo

Anthropic @AnthropicAI · May 18

Anthropic is acquiring @stainlessapi, an SDK and MCP server platform that has powered every Anthropic SDK since the earliest days of our API. Read more: https://t.co/ZQbsZKnicv

Elon Musk @elonmusk · May 18

Try it out! (Partially trained on Colossus 2)

C cursor_ai @cursor_ai

Introducing Composer 2.5, our most powerful model yet. It's more intelligent, better at sustained work on long-running tasks, and more reliable at following complex instructions. For the next week, we’re doubling the included usage of the model. https://t.co/N87ojcXlOC

vas @vasuman · May 18

Voice is one of the most overlooked commerce surfaces in any company. The customer dialed, they’re committed for ten minutes, they can't tab away, they can't comparison shop. It's the highest-attention channel a business has, and yet does not get treated like one. The reason has always been headcount. Every phone call requires a human, and humans cap at ~40 calls a shift. Voice AI changes the game with consistent scripts, automated responses and routing, and never changes tonality or delivery to customers. Congrats to Voice AI - customer service over the phone is getting the biggest re-rating of the decade.

P polyaivoice @polyaivoice

Starting today, we're opening our Agentic Dialog Platform to every enterprise builder. Our dialog agents have resolved 1 billion+ customer conversations for clients like FedEx, Unicredit, PG&E, Marriott, Foot Locker, and many more. These aren't easy conversations. They solve problems like: > A patient booking medical transport who needs insurance verified on the spot. > A homeowner calling their utility company about a gas leak. > A cardholder figuring out why their must-have purchase was declined. Standard conversational AI was never built for this. It was designed for chat, adapted for voice later. It generates responses, but can't do what dialog requires: hold context under pressure, navigate ambiguity in real time, and actually resolve problems. So we built a better model. Our proprietary model Raven was built from the ground up specifically for dialog. Agent harness in the weights, not bolted on through prompts that drift under pressure. And in our platform, you can deploy Raven as your default or bring in GPT-5, Claude, Gemini, whatever model fits your use case or regulatory requirement. Now that the Agentic Dialog Platform is open, any team can create, test, and deploy dialog agents on the same model and infrastructure the world’s top brands trust on their hardest days. This opens up the pool of builders across your entire enterprise. The person who knows customers best, who runs operations, who owns the customer journey: they're all builders now. Two ways to build: > Poly Agent Builder: Describe your use case in natural language, and it configures your agent, knowledge base, and conversation flows automatically. Production-ready in ten minutes. > Agent Development Kit (ADK): Developers use this to build dialog agents the same way they build everything else. Use your own IDE, a coding assistant like Claude, version with Git, deploy from your terminal. Get started now: https://t.co/ifZOy1uEBz