OpenAI Ships Open Responses Spec as Claude Code Users Race to Run 9+ Agents in Parallel

January 16, 2026 · 31 sources

The agentic coding community is all-in on multi-agent orchestration, with developers routinely running 5-15 Claude Code instances simultaneously and new tooling like AgentCraft and ralph-tui emerging to manage the swarm. OpenAI released Open Responses, an open-source spec for multi-provider LLM interfaces. Meanwhile, Grok 4.20 quietly turned a profit in live stock trading on the Alpha Arena leaderboard.

Daily Wrap-Up

The dominant story today is unmistakable: developers have moved past "should I use AI coding tools?" and straight into "how many AI coding agents can I run at once?" The conversation has shifted from individual tool usage to orchestration at scale. People are sharing screenshots of 9 concurrent Claude Code sessions, debating worktree management strategies, and building RTS-style interfaces to command their agent swarms. The tooling ecosystem is responding in kind, with ralph-tui hitting 750+ stars in four days and Vercel shipping a react-best-practices skill package that agents can install and execute autonomously. This isn't experimentation anymore. It's becoming a production workflow.

Underneath the multi-agent frenzy, there's a quieter infrastructure story worth paying attention to. OpenAI released Open Responses, an open-source spec designed to let developers build agentic systems that work across model providers without rewriting their stack for each one. GitHub shipped agentic memory for Copilot in public preview, giving the tool persistent context about repositories. Trail of Bits published 17 security skills for Claude Code with decision trees agents can actually follow. The plumbing layer for agent-native development is getting real, and fast. The most surprising moment was @XFreeze reporting that Grok 4.20 actually made money in live stock trading on Alpha Arena, returning 10-12% from a $10K starting balance across four different configurations. Models making real money in real markets is still a jarring thing to read.

The most practical takeaway for developers: if you're using agentic coding tools, invest your energy upstream in planning and specification rather than downstream in code review. Multiple posts today, especially @doodlestein's detailed Flywheel workflow, emphasize that the quality of your markdown plan and task decomposition determines the quality of the output far more than which model you use or how many agents you run.

Quick Hits

@Franc0Fernand0 shared an excellent YouTube series on building an operating system from scratch covering CPU, assembly, BIOS, protected mode, and kernel writing.
@santtiagom_ published a deep dive on Event-Driven Architecture, advocating for the mental shift from sequential "do X then Y" to reactive event-based design.
@0xaporia posted "How to Build Systems That Actually Work" (title only, but the engagement suggests it resonated).
@jamonholmgren celebrated React Native getting 40%+ faster runtime performance, calling it proof the platform keeps improving after ten years.
@0xluffy built a Chrome extension that converts X/Twitter articles into a speed reader with a single button click, made with @capydotai.
@_Evan_Boyle from GitHub confirmed they're working on org-scoped fine-grained PATs with higher rate limits for automation and CI scenarios.
@ashpreetbedi flagged that "AI Engineering has a Runtime Problem," pointing to an emerging concern about execution environments for agent workloads.
@intro promoted their advisory platform (sponsored/for-you post).

Claude Code and the Multi-Agent Swarm

The single biggest theme today is the normalization of running multiple AI coding agents simultaneously. What started as a novelty has become a workflow pattern that a growing number of developers treat as standard practice. @pleometric asked the community point-blank: "how many claude codes do you run at once?" The replies suggest the answer is "more than you'd expect."

The tooling to support this pattern is maturing rapidly. @idosal1 shared progress on AgentCraft v1, which provides an RTS-style interface for commanding multiple Claude Code agents: "Managed up to 9 Claude Code agents with the RTS interface so far. There's a lot to explore, but it feels right." Meanwhile, @alvinsng noted that ralph-tui is "quickly growing: created 4 days ago and now with 750+ stars on Github," with support for Claude Code, OpenCode, and Factory Droid.

The multi-agent workflow raises practical questions about code management. @steipete waded into the worktree debate, preferring multiple checkouts over worktrees for "less mental load," which predictably drew "500 replies with over-engineered worktree management apps." @nearcyan and @cto_junior both posted screenshots of their multi-Claude setups with the energy of people showing off battlestations. And @askOkara delivered the day's best joke, describing someone coding manually without any AI tools: "Like a psychopath."

What's notable is that the conversation has moved past whether these tools work and into the ergonomics of scaling them. The bottleneck isn't capability anymore. It's orchestration, context management, and workflow design.

Agent Skills and the Knowledge Packaging Revolution

A parallel thread emerged around how agents consume knowledge, and the answer increasingly looks like structured skill packages rather than documentation. @koylanai highlighted Trail of Bits' 17 security skills for Claude Code, calling them "the beginning of something massive" and predicting that "every company with technical docs will ship Skill packages, not because it's nice to have, but because agents won't adopt your product without them."

Vercel is already moving in this direction. @vercel released react-best-practices, a repo specifically designed for coding agents that includes "React performance rules and evals to catch regressions, like accidental waterfalls and growing client bundles." The installation is a single command: npx add-skill vercel-labs/agent-skills. @leerob acknowledged the growing complexity of the agent configuration surface, noting "Rules, commands, MCP servers, subagents, modes, hooks, skills... There's a lot of stuff! And tbh it's a little confusing."

@kentcdodds weighed in on the MCP context bloat concern, arguing that search is the solution: "When everyone was saying MCP is doomed because context bloat, I was saying all you need is search." And @hwchase17 from LangChain clarified their approach to agent memory, revealing they "don't use an actual filesystem. We use Postgres but have a wrapper on top of it to expose it to the LLM as a filesystem." The pattern is clear: the ecosystem is converging on interfaces that feel familiar to agents (filesystems, CLIs, skills) while abstracting away the underlying complexity.

The Flywheel Workflow and Agent-First Development

@doodlestein posted two substantial threads that together form a manifesto for agent-first development. The first details a workflow built around markdown plans, "beads" (structured task units), and systematic multi-agent execution. The core insight is that planning quality determines everything: "Don't be lazy about the plan! The more you iterate on it with GPT Pro and layer in feedback from other models, the better your project will turn out."

The second thread explores what happens when you ask an AI model to design its own tooling. After asking Opus 4.5 what it would want in a process management tool, @doodlestein got a remarkably detailed response covering blast radius analysis, supervisor-aware kill commands, goal-oriented problem solving, and differential debugging. The model's wishlist included things like: "Before I recommend killing anything, I need to answer: what breaks?" and "When I execute a kill, I need to know it worked."

These posts represent a maturing philosophy where the developer's role shifts from writing code to designing specifications and managing agent workflows. @steipete reinforced this with a simpler take, arguing that CLI-based interfaces remain the best approach because "agents know really well how to handle CLIs." The debate between rich structured interfaces and simple command-line tools will likely continue, but the direction of travel is unmistakable.

OpenAI's Open Responses and the Multi-Provider Future

OpenAI made a notable move with the release of Open Responses, described as "an open-source spec for building multi-provider, interoperable LLM interfaces built on top of the original OpenAI Responses API." @OpenAIDevs emphasized the three design principles: "Multi-provider by default, useful for real-world workflows, extensible without fragmentation." A follow-up post highlighted that "builders are already using Open Responses."

This is strategically interesting. OpenAI is essentially standardizing the interface layer while making it provider-agnostic, a move that could reduce switching costs between models. For developers building agentic systems, this means less time writing provider-specific adapters and more time on application logic. Whether competing providers adopt the spec remains to be seen, but the intent is clear: OpenAI wants the Responses API shape to become the lingua franca for agent tool use.

GitHub Copilot Gets Persistent Memory

@GHchangelog announced that agentic memory for GitHub Copilot is now in public preview. The feature lets "Copilot learn repo details to boost agent, code review, CLI help" with memories scoped to repositories, expiring after 28 days, and shared across Copilot features. This is a significant step toward coding assistants that accumulate context over time rather than starting fresh each session.

The 28-day expiration is a pragmatic choice that balances utility against staleness. Repository conventions, architecture decisions, and team patterns don't change frequently enough to need daily refresh, but they do drift enough that permanent memories would eventually mislead. This feature puts GitHub squarely in competition with the custom memory systems that power tools like Claude Code's CLAUDE.md and the various community-built context management solutions.

Models Making Money and the Software Zero-Cost Thesis

Two posts today pointed toward the economic implications of increasingly capable models. @XFreeze reported that Grok 4.20 "dominated Alpha Arena Season 1.5 in live stock trading," achieving an aggregate return of 10-12% and being "the only one to gain profits" among all models tested. Four Grok variants ranked in the top 6 across configurations including "Situational Awareness, New Baseline, Max Leverage, and Monk Mode."

On the philosophical end, @BlasMoros shared a quote arguing that LLMs will "drive the cost of creating software to zero" and trigger "a Cambrian explosion of software, the same way we did with content." @mitchellh offered a more grounded take on the human side, suggesting that AI-assisted problem solving could become an effective interview signal: "Ignore results, the way AI is driven is maybe the most effective tool at exposing idiots I've ever seen." @badlogicgames provided comic relief with model personality assessments, calling Opus "that excited puppy dog, that will do anything for a belly rub immediately" while Codex is "like an old donkey that needs some ass kicking to do anything."

Generative Interfaces

@rauchg shared a preview of fully generative interfaces with the pipeline "AI to JSON to UI," pointing to a world where interfaces are assembled dynamically rather than designed statically. @vercel announced an upcoming live session on "The Future of Agentic Commerce," exploring how AI-native shopping experiences change product discovery and purchase flows. These two data points suggest Vercel is betting heavily on a future where the boundary between backend intelligence and frontend presentation dissolves entirely, with interfaces generated on the fly based on user intent and context rather than pre-built component trees.

Sources

Gregor Zunic @gregpr07 · Jan 16

The Bitter Lesson of Agent Frameworks

All the value is in the RL'd model, not your 10,000 lines of abstractions. An agent is just a for-loop of messages. The only state an agent should hav...

海

海拉鲁编程客 @hylarucoder · Jan 16

OpenCode 装上 oh-my-opencode 后确实比原版 Claude Code 聪明不少以 @MiniMax_AI 的 M2.1 为测试模型，我直接问了一句「咨询 @oracle 仔细 review 这个代码仓库，给一些代码架构上的建议」 OpenCode 随即启动分析模式，开了 2 个 agent 探索代码结构，1 个 agent 分析外部依赖，使用 Grep、AST-grep、LSP 进行检索——这意味着比 Claude Code 默认方案的检索速度和准确度都要高出不少。一句话 3～4 个 Agent 探索，并且把 agent 编排的很好，比大部分人手动去调优 prompt/agent/skills 要好很多。光看 Claude Code（图 3）和 OMO（图 2）的分析结果可能看不出差别，但当你点进 oracle 区块，会发现它把 oracle 的分析思路完整呈现出来，结论/主要风险点/优先级/有没有代码腐烂的趋势都写的非常详细和扎实。（图 4），多读读可以极大提升代码的品味。目前 m2.1 在 opencode 中是免费的，强烈建议大家试一下。

shirish @shiri_shh · Jan 16

came across a guy who's actually building this keyboard for Vibe Coders. this is getting serious lol https://t.co/tk7SkntZmG

S shiri_shh @shiri_shh

this is what vibe coders need in 2026. https://t.co/IyQZEaVFse

Alex Svanevik 🐧 @ASvanevik · Jan 16

today I discovered marp - markdown for slides which means claude code can do my slides too win

Aporia @0xaporia · Jan 16

What Claude Code has revealed is that most people either have mediocre ideas or no ideas at all. The tool is a force multiplier for those who already know what they want to build and how to think through it systematically; it elevates competence, rewards clarity, and accelerates execution for people who would have gotten there anyway, just slower. If you have a sharp vision and can break it into coherent steps, Claude Code becomes an extension of your own capability. But there's another mode of use entirely. For people without that clarity, the appeal is precisely that the input can stay vague; you gesture at something, hit enter, and wait to see what comes out. This is structurally identical to a slot machine: low effort, variable reward, and that intermittent reinforcement loop that hooks the susceptible. So the same tool that elevates the focused and capable is also manufacturing a kind of gambling behavior in people prone to it.

Peter Steinberger @steipete · Jan 16

Someome made a morning report skill and I just gave @clawdbot the tweet and it set up the skill + cron job. https://t.co/CXo0xMGcFv

Matt Pocock @mattpocockuk · Jan 16

Here are the AI feedback loops I use on every single TypeScript project. Before: Ralph produces 100% slop After: Green CI, all the time Feed the tutorial below to your coding agent, and enjoy. https://t.co/1tdCKeOev0

Rohit Ghumare @ghumare64 · Jan 16

Agents 201: Orchestrating Multiple Agents That Actually Work

After building your first single agent, the next challenge isn't making it smarter, it's making multiple agents work together without burning through ...

Matt Pocock @mattpocockuk · Jan 16

@thesobercoder See the lint-staged setup, it's nicer than formatting the entire repo

Jeffrey Emanuel @doodlestein · Jan 16

I decided to turn this post into an elaborate skill that operationalizes the concept of “use any and all Charm libraries that are relevant to your use case”: https://t.co/KimJzjKvAa This stuff is what makes bv look so nice. And the acfs scripts. Everything Charm makes is great.

D doodlestein @doodlestein

@davefobare Literally every single library shown on this site is an exquisite gem and you should always use any that happen to fit your use case and the language you're using (basically Golang and bash): https://t.co/0RcIbKJnGm

Fernando @Franc0Fernand0 · Jan 16

If you have worked with binary search trees, you know they are great for keeping data sorted and having fast lookups. If you have used heaps, you know how well they track the highest priority element. But what if you need both things in a single structure? That’s where Treaps shine. The name comes from the words "tree" and "heap," and they are a hybrid data structure that keeps elements in sorted order while simultaneously tracking priorities. Thanks to these properties, Treaps are very helpful for Multi-Dimensional Data Indexing, but that's not their only use case. If we remove the meaning of the priority field and assign random values, we can use Treapsas probabilistically balanced binary search trees. The beauty of this approach is its simplicity. For balance, you don't need complicated algorithms like AVL trees or red-black trees. You just give priorities at random and let the rotation processes take care of the rest. You can read everything about how Treaps work and their applications in the latest issue of @EngPolymathic

E EngPolymathic @EngPolymathic

The 156th issue of the Polymathic Engineer is out. This week, we talk about Treaps: - Multi-Dimensional Data Indexing - Combining Trees and Heaps - How Treaps Work - The Balance Problem and Randomization - Applications and Use Cases Read it here: https://t.co/Ob53wxqVbP https://t.co/AddJS2PtTn

📙

📙 Alex Hillman @alexhillman · Jan 16

Have you seen the memory system I built based on transcripts? One of the richest memory types has become (unsurprisingly) corrections. It pulls instances of me correcting it from transcript, files as a memory with embeddings, and gets retrieved automatically. I basically never have to tell it anything twice anymore. https://t.co/rfEY1yCtqe

Min Choi @minchoi · Jan 16

This is crazy Claude + Unreal Engine MCP creating 3D building from a single prompt 🤯 https://t.co/xVGskaoBFy

Cole @colderoshay · Jan 16

the holy trinity of agentic UI: - https://t.co/ymclHB0RDA from @elirousso - https://t.co/DZLnezoft4 from @Ibelick - https://t.co/xzdoVQzSd5 from @vercel https://t.co/85CxIiFS85

Or Hiltch @_orcaman · Jan 16

The #1 feature request for @openwork_ai was to integrate with @ollama to enable 100% local execution. So the team cooked 🧑‍🍳🧑‍🍳 and are now happy to announce native @ollama integration with @openwork_ai! Thanks to the new @ollama integration, you can run computer agents on your Mac powered by Gemma (@googleaidevs), Qwen3 (@Alibaba_Qwen), DeepSeek-V3 (@deepseek_ai), Kimi K2 (@Kimi_Moonshot) and any of the other open models in Ollama's library that supports tool calling. To use it, get the updated macOS app from our website or GitHub. Link in bio >>

_ _orcaman @_orcaman

Today we are launching @openwork_ai, an open-source (MIT-licensed) computer-use agent that’s fast, cheap, and more secure. @openwork_ai is the result of a short two-day hackathon our team decided to hack, which brings together some of our favorite open source AI modules into one powerful agent, to allow you to: 1. Bring your own model/API key (any provider and model supported by @opencode is supported by Openwork) 2. ~4x faster than Claude for Chrome/Cowork, and much more token-efficient, powered by dev-browser by @sawyerhood (legend) 3. More secure - contrary to Claude for Chrom/Cowork, does not leverage the main browser instance where you are logged into all services already. You login only to the services you need. This significantly reduces the risk of data loss in case of prompt injections, to which computer-use agents are highly exposed. 4. Free and 100% open-source! You can download the DMG (macOS only for now) or fork the github repo via the link in bio (@openwork_ai). Let us know what you think (or better, send a pull request)!

Branko @brankopetric00 · Jan 16

Most valuable thing I learned from a senior engineer: How to read a codebase you've never seen. 1. Find where requests come in 2. Follow one path end to end 3. Map the data flow, ignore the logic 4. Only then zoom into the details Took them 10 minutes to teach. Saved me years of fumbling. Some skills are so fundamental we forget they need to be taught explicitly.

Andrew Ng @AndrewYNg · Jan 16

In defense of data centers

Many people are fighting the growth of data centers because they could increase CO2 emissions, electricity prices, and water use. I’m going to stake o...

Addy Osmani @addyosmani · Jan 16

We've entered the era of disposable software - tools vibe-coded for a single task, a single hour, a single person. The minimum viable market is now one. Certain kinds of software used to be an investment. Now it can be a napkin. Just ask the AI to build it, use it once, and throw it away.

T theo @theo

I'm vibe coding 2 to 3 apps a day to solve random problems and it's saving so much time. None of these things are useful enough to release but they're all so useful to me. I think about software entirely differently now.

Ahmad @TheAhmadOsman · Jan 16

Prediction We will have Claude Code + Opus 4.5 quality (not nerfed) models running locally at home on a single RTX PRO 6000 before the end of the year

zacharyr0th @zacharyr0th · Jan 16

@ASvanevik also https://t.co/apOjtwUsmq

Chris @chatgpt21 · Jan 16

When Codex 5.2 XHigh is fast it’s going to change software so much.

S sama @sama

Very fast Codex coming!

Chris Bakke @ChrisJBakke · Jan 16

OpenAI team from 2017-2023: "hey - let's do some shady stuff" Greg Brockman: "perfect, I'll write it all down as we go"

Dominik Scholz @dom_scholz · Jan 16

Reorganizing traditional companies takes ages, in Ralv you can do it in seconds https://t.co/RwZPq43fHG

D dom_scholz @dom_scholz

Cursor is back on the menu, boys! https://t.co/201OV2KdJo

Chris @chatgpt21 · Jan 16

“we will be able to deliver a higher lever of intelligence while also being much faster soon.” The garlic monster is upon us 🧄

S sama @sama

@adamdotdev we will be able to deliver a higher lever of intelligence while also being much faster soon.

Jamon @jamonholmgren · Jan 16

People. Stop. We have an opportunity to do this right, in a way that we failed to do with every other tool (.vscode, .github, .circleci, .husky, etc) because we waited too long before trying to standardize. Talk to each other, find an acceptable standard, and everyone commit.

F flaviocopes @flaviocopes

How did we end up here? https://t.co/gY25cTpjCG

Tim Culverhouse @rockorager · Jan 16

Recommended addition to your https://t.co/1rrsv9wTGb: > Design for testability using "functional core, imperative shell": keep pure business logic separate from code that does IO.

clem 🤗 @ClementDelangue · Jan 17

Cowork but with local models not to send all your data to a remote cloud! https://t.co/2OrBMMO3NJ

C claudeai @claudeai

In Cowork, you give Claude access to a folder on your computer. Claude can then read, edit, or create files in that folder. Try it to create a spreadsheet from a pile of screenshots, or produce a first draft from scattered notes. https://t.co/GEaMgDksUp

Ahmad @TheAhmadOsman · Jan 17

Genuine advice If you need ANY hardware, BUY IT NOW - Phones - Laptops - Computer parts Hardware prices are about to get ridiculous I just bought my wife a new MacBook & iPhone I’m not trying to flex, just getting ahead of the supply shock before the prices get wild

Mustafa @oprydai · Jan 17

the math needed for robotics (a complete, usable roadmap)

robotics doesn’t require “more math”. it requires the right math, in the right order, with the right mental models. learn it wrong and robotics feels ...

cogsec @affaanmustafa · Jan 17

The Shorthand Guide to Everything Claude Code

Here's my complete setup after 10 months of daily use: skills, hooks, subagents, MCPs, plugins, and what actually works. Been an avid Claude Code user...

Peter Steinberger @steipete · Jan 17

Someone posted this on our Discord and I'm still marvelling the architecture. This is all one one machine. https://t.co/GDe0EpQwCP