Multi-Agent Coding and Skills Ecosystems Dominate as OpenAI Ships Open Responses Spec

January 15, 2026 · 35 source posts

Daily Wrap-Up

Today's timeline was dominated by one unmistakable signal: the solo developer working in a single editor instance is becoming the exception. Post after post showed developers running fleets of Claude Code agents, managing them with RTS-style interfaces, and debating the finer points of worktrees versus multiple checkouts. @idosal1 casually mentioned managing nine agents simultaneously with AgentCraft, and @doodlestein laid out an entire production workflow involving 5-15 agents building out a project in parallel. The cultural shift is striking. When @askOkara jokes about seeing someone type code manually "like a psychopath," it lands because the audience recognizes the grain of truth.

The second major thread was the rapid maturation of the agent tooling ecosystem. Vercel shipped react-best-practices as a skill package for coding agents. Trail of Bits published 17 security skills. @leerob wrote an explainer trying to make sense of the sprawl of rules, commands, MCP servers, subagents, modes, hooks, and skills. The tooling layer between humans and AI agents is becoming its own product category, and companies are starting to realize that if agents can't consume your documentation programmatically, they won't adopt your product at all. Meanwhile, OpenAI's Open Responses spec is a direct play at preventing the agentic ecosystem from fragmenting across incompatible provider APIs.

The most entertaining moment was easily @badlogicgames comparing Opus to "an excited puppy dog that will do anything for a belly rub" and Codex to "an old donkey that needs some ass kicking." Both "messy morons," apparently, but with very different vibes. The most practical takeaway for developers: if you're building tools that AI agents will use, design for the agent's workflow loop (gather, decide, act, verify, explain) and expose structured, token-efficient output modes. The era of tools designed exclusively for human eyes is ending.

Quick Hits

@0xluffy built a Chrome extension that converts X articles into a speed reader, made with @capydotai.
@santtiagom_ published an article on Event-Driven Architecture, pushing readers to think in events rather than sequential procedures.
@Franc0Fernand0 shared a YouTube series on building an operating system from scratch covering CPU, assembly, BIOS, protected mode, and kernel writing.
@jamonholmgren celebrated React Native achieving 40%+ faster runtime performance.
@_Evan_Boyle confirmed GitHub is working on org-scoped fine-grained PATs with higher rate limits for automation and CI scenarios.
@XFreeze reported Grok 4.20 dominated Alpha Arena Season 1.5 in live stock trading, returning +10-12% from a $10,000 start and being the only model to turn a profit.
@0xaporia shared thoughts on "How to Build Systems That Actually Work."
@ashpreetbedi flagged that "AI Engineering has a Runtime Problem," pointing at infrastructure gaps in production agent deployments.
@vercel announced a live session on "The Future of Agentic Commerce" covering AI-native shopping experiences, scheduled for February 4.
@doodlestein recommended charmbracelet.io's library collection as "exquisite gems" for Go and bash developers building CLI tools.

The Agent Skills and Tooling Stack Takes Shape

The most significant infrastructure trend today wasn't any single tool launch. It was the convergence of multiple independent efforts toward a shared vision: agents need structured, discoverable knowledge packages to be effective. @leerob captured the current state of confusion well, noting the proliferation of "rules, commands, MCP servers, subagents, modes, hooks, skills" and acknowledging "there's a lot of stuff! And tbh it's a little confusing." His explainer was a needed attempt to impose order on a rapidly evolving space.

Vercel made a concrete move by releasing react-best-practices as an installable skill for coding agents, with @vercel_dev showing the three-step flow: install the skill, paste a prompt, review and fix. @koylanai highlighted Trail of Bits' 17 security skills for Claude Code, calling them "the beginning of something massive" and predicting that "every company with technical docs will ship Skill packages, not because it's nice to have, but because agents won't adopt your product without them." Coming from a firm that works with DARPA and Facebook, the security skills represent a serious institutional vote of confidence in the agent-readable knowledge format.

On the tooling side, @alvinsng noted that ralph-tui hit 750+ GitHub stars just four days after creation, working across Claude Code, OpenCode, and Factory Droid. @LLMJunky praised @steipete's clawdbot for enabling complex multi-step workflows from a phone, describing a pipeline that indexed Supabase migrations, passed context to a Codex agent with documentation, and produced a complete migration plan. @steipete himself advocated for CLI-based approaches, arguing that "agents know really well how to handle CLIs." @kentcdodds validated his earlier bet that MCP's context bloat problem would be solved by search, rather than requiring the protocol to be redesigned. The tooling layer is solidifying around a clear principle: give agents structured entry points, let them pull what they need, and stay out of the way.

Perhaps the most fascinating contribution came from @doodlestein, who asked Claude Opus for its "personal opinion" on what would make a process management tool useful. The response was a detailed 12-point wishlist covering everything from one-shot system snapshots to blast radius analysis, supervisor-aware kill commands, and differential debugging. The key insight wasn't any single feature request but the meta-pattern: agents want tools designed around their workflow loop of gather, decide, act, verify, and explain, with token-efficient output modes and structured confidence breakdowns. This is a template for anyone building developer tools in 2026.

Multi-Agent Coding Goes Mainstream

The most visible trend today was the normalization of running multiple AI coding agents simultaneously. This isn't experimental anymore. It's becoming the default workflow for a growing segment of developers. @pleometric asked "how many claude codes do you run at once?" and the responses made clear that single-agent workflows are increasingly seen as leaving performance on the table. @nearcyan shared their Claude Code setup with evident enthusiasm, and @cto_junior greeted "all multi clauders" as an established community.

@idosal1 provided the most concrete data point, describing AgentCraft v1's RTS (real-time strategy) interface for managing up to nine Claude Code agents simultaneously: "There's a lot to explore, but it feels right." The gaming metaphor is apt. Managing a fleet of coding agents is starting to resemble resource management in a strategy game, allocating tasks, monitoring progress, and intervening when agents get stuck.

@doodlestein laid out the most complete vision of what a mature multi-agent workflow looks like, describing a pipeline where the human focuses almost entirely on planning and review while "5-15 agents build out the beads." The workflow involves careful markdown planning, iterative refinement, frequent commits, and "fresh eyes" review prompts. The hard part, @doodlestein emphasized, is resisting laziness during planning: "of course the project is going to suck and be a buggy mess" if you skip that phase. @steipete, meanwhile, sparked a lively debate by admitting he uses multiple git checkouts instead of worktrees "because less mental load," prompting what he described as "500 replies with over-engineered worktree management apps." @badlogicgames offered a memorable comparison of the two dominant AI coding tools: Opus is the eager puppy, Codex is the stubborn donkey, and both are "messy morons" in their own way. @askOkara's joke about seeing someone code manually "like a psychopath" landed perfectly as the capstone of the day's multi-agent discourse.

AI Code Review and Memory Systems

Cursor and GitHub both shipped meaningful improvements to how AI understands and reviews code. @cursor_ai announced that their tool now catches 2.5x as many real bugs per PR, linking to a deep dive on how they build and measure agents for code review. The emphasis on "real bugs" rather than stylistic nitpicks signals a maturation of AI code review from noisy annotation tool to genuine quality gate.

GitHub's contribution was agentic memory for Copilot, now in public preview. As @GHchangelog explained, "Copilot learns repo details to boost agent, code review, CLI help," with memories scoped to repos, expiring after 28 days, and shared across Copilot features. The 28-day expiration is a pragmatic choice that avoids stale context while giving the system enough runway to learn meaningful patterns. @hwchase17 from LangChain clarified an interesting implementation detail from a related blog post: they don't use an actual filesystem for agent memory but rather "Postgres with a wrapper on top to expose it to the LLM as a filesystem." The filesystem metaphor for AI memory keeps showing up because it maps to concepts models already understand.

@mitchellh offered a provocative take on the implications of all this for hiring: "a really effective engineering interview would be to explicitly ask someone to use AI to solve a task, and see how they navigate. Ignore results, the way AI is driven is maybe the most effective tool at exposing idiots I've ever seen." It's a recognition that the skill ceiling for AI-assisted development is high, and the gap between effective and ineffective AI usage is widening.

OpenAI Ships Open Responses Spec

OpenAI released Open Responses, an open-source specification for building multi-provider, interoperable LLM interfaces. @OpenAIDevs positioned it as "multi-provider by default, useful for real-world workflows, extensible without fragmentation," with the explicit goal of letting developers "build agentic systems without rewriting your stack for every model." A follow-up post highlighted early builder adoption.

The timing is strategic. As the agent tooling ecosystem fragments across providers, a shared specification for how agents interact with LLMs could prevent the kind of integration tax that plagued earlier API ecosystems. Whether competitors actually adopt a spec originating from OpenAI remains the key question, but the move toward interoperability is directionally correct for the industry.

The Zero-Cost Software Thesis

@BlasMoros surfaced a quote that crystallized one of the more provocative theses circulating in tech: "LLMs have proven themselves to be remarkably efficient at [translating human language into computer language] and will drive the cost of creating software to zero. What happens when software no longer has to make money? We will experience a Cambrian explosion of software, the same way we did with content." It's a clean articulation of the deflationary pressure AI puts on software development, and whether you agree with the timeline or not, the directional argument is hard to dismiss given what multi-agent workflows are already enabling.

@rauchg offered a concrete glimpse of what that future looks like with fully generative interfaces, showing an "AI to JSON to UI" pipeline where interfaces are assembled dynamically rather than hand-coded. This isn't speculative. It's a working demo. The gap between "AI writes code" and "AI generates entire applications at runtime" is narrowing faster than most developers' mental models account for.

Source Posts

Ido Salomon @idosal1 · Jan 15

Building AgentCraft v1 with AgentCraft v0 is 🤌 Managed up to 9 Claude Code agents with the RTS interface so far. There's a lot to explore, but it feels right. v1 coming soon

Evan Boyle @_Evan_Boyle · Jan 15

@JoshXT We are working on org-scoped fine-grained PATs for higher rate limits, especially for automation/CI scenarios. More news on this soon!

Ashpreet Bedi @ashpreetbedi · Jan 15

AI Engineering has a Runtime Problem

Claude Code shipped two years after function calling. Models have outpaced the application layer. We have frameworks to build agents, we have observab...

Jamon @jamonholmgren · Jan 15

40%+ faster runtime … React Native gets even better! Never get tired of seeing this sort of thing even after ten years.

M Michał Pierzchała @thymikee

Hermes V1 will ship as the default in React Native 0.84 for both iOS and Android. This means: • 2-8% faster startup time • 40%+ faster runtime • faster Metro compilation (less Babel transforms) Just landed in 0.84.0-rc.1 https://t.co/fnH0aMgQxD

Mario Zechner @badlogicgames · Jan 15

Opus is kinda like that excited puppy dog, that will do anything for a belly rub immediately. A messy, cute moron. Codex is like an old donkey that needs some ass kicking to do anything. But once it's going, it's going. In fact, it's hard to stop. Also a messy moron.

Aporia @0xaporia · Jan 15

How to Build Systems That Actually Work

Most people mistake the absence of effort for simplicity. They see an elegant solution and assume it sprang fully formed from some gifted mind. What t...

OpenAI Developers @OpenAIDevs · Jan 15

Builders are already using Open Responses 👀 https://t.co/3DWq9mc3Nr

B Ben Burtenshaw @ben_burtenshaw

Finally! We (the community + @OpenAIDevs + @huggingface ) bring you an open standard for inference. It's called 'Open Responses' it's based on Responses and it's perfect for agent workloads. Fewer special cases, more consistency, faster shipping. Excited for what this unlocks. Below is a deep dive blog post, we’ll look at how Open Responses works and why the open source community should use Open Responses.

Lee Robinson @leerob · Jan 15

Rules, commands, MCP servers, subagents, modes, hooks, skills... There's a lot of stuff! And tbh it's a little confusing. Here's what you need to know (and how we got here). https://t.co/UomcW2Y0c3

near @nearcyan · Jan 15

this is how i claude code now. it's fun! https://t.co/thkWyCji2S

X Freeze @XFreeze · Jan 15

Grok 4.20 just dominated Alpha Arena Season 1.5 in live stock trading The mystery model, revealed as Grok 4.20, took the top spots on the leaderboard, with 4 Grok variants ranking in the top 6 It outperformed every major model on the board, being the only one to gain profits •Aggregate return: +10-12% •Final equity: $11,060 from a $10,000 start All achieved using different configurations: Situational Awareness, New Baseline, Max Leverage, and Monk Mode Grok 4.20 isn’t just doing well on benchmarks It’s making real money in live markets

Kent C. Dodds ⚡ @kentcdodds · Jan 15

When everyone was saying MCP is doomed because context bloat, I was saying all you need is search. https://t.co/LPGctd1szt Feels good to have my bets validated once again

T Thariq @trq212

Tool Search now in Claude Code

TDM (e/λ) (L8 vibe coder 💫) @cto_junior · Jan 15

gm to all multi clauders https://t.co/92HB27f7xF

Jeffrey Emanuel @doodlestein · Jan 15

Agent coding life hack: Claude Opus 4.5 LOVES it when you ask for its personal opinion on tooling: basically, what it would want if it got to be in charge of making a tool for its own use. I'm working on such a tool now for managing processes on a machine (process_triage) and already have a lot fleshed out in a plan document, but decided to ask Claude this: "OK, without changing the plan document, what do you think would make this project better? What would make it more useful SPECIFICALLY TO YOU if you had to manage one or more linux machines using ssh for optimal performance? Use ultrathink" Claude's response is below, but before I paste that whole thing in, here's the next prompt I used to tell it to integrate the suggestions effectively: "Wonderful. Please carefully make ALL the revisions, additions, expansions, etc. to the document, but in a super cohesive and coherent way, where they are artfully woven into the existing structure and not just dumped into some awful appendix. Use ultrathink" PS: Notice how it "desperately want[s]" these feature. Bet you didn't realize your clankers had desires, especially of the "desperate" kind! --- ● This is a great question. Let me think about what I actually need when I'm managing Linux machines via SSH. --- What Would Make pt Transformative for Me as an Agent 1. One-Shot Comprehensive Snapshot My biggest pain point: I currently run 5-10 separate commands (ps aux, top -bn1, free -h, lsof, ss -tlnp, etc.) and mentally synthesize them. What I desperately want: pt agent snapshot --json One command that returns: - System health (load, memory pressure, PSI stalls) - All suspicious processes with classification + confidence - Process trees with parent/child relationships - Blast radius for each candidate (what breaks if I kill it) - Supervisor detection (will it respawn?) - Open write handles (data loss risk) - Network dependencies (what clients are connected) This single output would replace my entire reconnaissance phase. 2. Blast Radius as First-Class Output Before I recommend killing anything, I need to answer: what breaks? { "pid": 12345, "blast_radius": { "children": [{"pid": 12346, "cmd": "worker"}, {"pid": 12347, "cmd": "worker"}], "ports_freed": [3000, 3001], "clients_disconnected": 5, "open_write_handles": [], "shared_memory_users": [], "dependent_services": ["nginx (upstream)"], "estimated_recovery_time": "immediate (stateless)", "data_loss_risk": "none" } } I want this to be impossible to miss. The current document mentions blast radius but I'd want it in EVERY candidate output, not as an optional drill-down. 3. The Right Command, Not Just "Kill" This is critical. When I see a runaway process, I often recommend kill -9 PID and it respawns instantly because it's managed by systemd/pm2/Docker. The user gets frustrated. I look incompetent. What I need: { "pid": 12345, "supervisor": { "type": "systemd", "unit": "myapp.service", "restart_policy": "always", "restart_delay_ms": 100 }, "recommended_command": "systemctl stop myapp.service", "why_not_kill": "Process will respawn in 100ms via systemd. Use supervisor command instead.", "alternative_commands": [ {"cmd": "systemctl restart myapp.service", "effect": "restart cleanly"}, {"cmd": "systemctl mask myapp.service && systemctl stop myapp.service", "effect": "stop permanently until unmask"} ] } Tell me the RIGHT command, not just the primitive action. 4. Goal-Oriented Mode That Actually Works Users come to me with goals, not process lists: - "My machine is out of memory" - "Port 3000 is stuck" - "CPU is pegged" I want: pt agent solve --goal "free 2GB memory" --json { "goal": "free 2GB memory", "achievable": true, "plans": [ { "candidates": [{"pid": 12345, "memory_mb": 1800}, {"pid": 23456, "memory_mb": 600}], "total_freed_mb": 2400, "false_kill_risk": 0.02, "commands": ["systemctl stop myapp", "kill 23456"] }, { "candidates": [{"pid": 34567, "memory_mb": 2100}], "total_freed_mb": 2100, "false_kill_risk": 0.15, "commands": ["kill 34567"], "warning": "This process may be useful - higher uncertainty" } ], "recommendation": "Plan 1 - lower risk, exceeds goal" } Give me trade-off options, not just a single recommendation. 5. Verification After Action When I execute a kill, I need to know it worked: pt agent apply --pid 12345 --action kill --verify --json { "action": "kill -TERM 12345", "result": "success", "verification": { "process_terminated": true, "memory_freed_mb": 1847, "ports_freed": [3000], "respawn_detected": true, "respawn_pid": 12390, "respawn_delay_ms": 95 }, "assessment": "Process killed but respawned immediately (systemd). Original issue will recur. Use 'systemctl stop' instead." } This closes the loop. I know what happened. I can explain to the user. I can take corrective action if needed. 6. Confidence That's Actually Useful Not just a number—tell me why you're uncertain and what would help: { "pid": 12345, "classification": "abandoned", "confidence": 0.72, "confidence_breakdown": { "cpu_pattern": {"score": 0.9, "signal": "Tight loop, no IO for 20min"}, "runtime": {"score": 0.6, "signal": "45min - ambiguous for test runner"}, "context": {"score": 0.5, "signal": "No TTY but might be CI job"} }, "uncertainty_drivers": [ "Command 'bun test' not in signature library", "No baseline for this machine", "Short observation window (30s)" ], "to_increase_confidence": [ {"action": "wait 2min", "expected_delta": "+0.15"}, {"action": "check stack trace", "expected_delta": "+0.10"}, {"action": "ask user if test suite is expected", "expected_delta": "+0.20"} ] } This lets me make informed decisions about whether to act now or gather more information. 7. Session Continuity Across My Invocations When I'm called back to the same machine, I want context: pt agent context --since 24h --json { "recent_sessions": 3, "patterns": [ { "process": "bun test --filter=gateway", "occurrences": 3, "actions_taken": ["killed", "killed", "user kept"], "last_seen": "2h ago", "assessment": "Recurring stuck test - user kept it last time, may be intentional" } ], "machine_changes": [ "Memory usage trending up over 24h", "New process type appeared: 'https://t.co/MbPsgUBDIU'" ] } This lets me say to the user: "I see we've dealt with this stuck test before. Last time you chose to keep it—is it expected?" 8. Prose Summaries for User Communication After I do my analysis, I need to explain to the user. Generating prose from JSON is token-expensive. Give me a summary: pt agent summary --prose Your machine has 2 processes worth investigating: 1. **bun test --filter=gateway** (PID 12345) Running 45 minutes at 95% CPU with no progress. → Likely stuck. Safe to kill (no open files, no connections). → Command: `kill 12345` 2. **node server.js** (PID 23456) Running 2 hours, using 1.2GB (usually 400MB). → Probably fine, but memory is elevated. → Recommendation: Keep, but monitor. Quick fix: Kill #1 to free 800MB RAM and 1 CPU core. I can paste this directly to the user. 9. Failure Modes I Can Handle When things go wrong, tell me exactly what to do: { "action": "kill -TERM 12345", "result": "failed", "reason": "Process in uninterruptible sleep (D state)", "explanation": "Process is waiting on a kernel operation (probably IO). SIGKILL will also fail.", "recovery_options": [ { "action": "wait", "command": null, "rationale": "IO may complete naturally" }, { "action": "investigate", "command": "cat /proc/12345/stack", "rationale": "See what kernel path it's stuck in" }, { "action": "check_nfs", "command": "mount | grep nfs", "rationale": "D-state often means dead NFS mount" }, { "action": "escalate", "command": null, "rationale": "May require reboot if stuck on dead mount" } ] } Don't just say "failed"—give me a decision tree. 10. Token-Efficient Output Modes My context window matters. Give me control: # Quick health check pt agent status --brief # Output: "2 critical, 1 warning" or "OK" # Only what I need to act on pt agent plan --only=actionable --fields=pid,cmd,action,confidence # Compact JSON (no pretty-printing, minimal keys) pt agent snapshot --compact --json 11. Capability Reporting Upfront Before I start, tell me what you can and can't do on this machine: pt agent capabilities { "effective_user": "ubuntu", "can_sudo": false, "tools": { "available": ["ps", "lsof", "ss", "top"], "missing": ["perf", "bpftrace", "iotop"], "impact": "Reduced IO visibility - may miss some stuck-on-IO processes" }, "kill_scope": "own processes only (uid 1000)", "supervisor_access": { "systemd": "read-only (can detect, cannot stop)", "docker": "full access" }, "recommendation": "For full capability: sudo pt agent analyze" } Now I know what to expect and can advise the user if we need elevated access. 12. Reversibility Information For every action, tell me if I can undo it: { "action": "systemctl stop myapp", "reversible": true, "undo_command": "systemctl start myapp", "undo_safe": true, "state_loss": "none - service is stateless" } { "action": "kill -9 12345", "reversible": false, "undo_command": null, "state_loss": "Process state lost. Manual restart required: 'python https://t.co/Y3POJnxWpZ'", "warning": "Training progress may be lost if not checkpointed" } --- The Killer Feature I Don't See Emphasized Enough Differential mode that's actually fast. When I'm debugging, I often do: 1. Check state 2. Take action 3. Check state again 4. See what changed I want: pt agent snapshot --save-baseline # ... I take some action ... pt agent diff --json { "time_elapsed": "15s", "changes": { "processes_terminated": [{"pid": 12345, "cmd": "bun test", "by": "user action"}], "processes_spawned": [{"pid": 12390, "cmd": "bun test", "parent": "systemd", "note": "respawn"}], "memory_freed_mb": 50, "memory_used_new_mb": 48, "net_effect": "minimal - process respawned" }, "assessment": "Kill was ineffective due to supervisor respawn. Try systemctl stop instead." } This is how I actually debug—iteratively, with before/after comparisons. --- Summary: My Wishlist Prioritized 1. One comprehensive snapshot command - Replace my 10-command reconnaissance 2. Blast radius in every candidate - Never let me accidentally break things 3. Supervisor-aware commands - Tell me the RIGHT way to stop something 4. Goal-oriented solver - "Free 2GB" not "here's a list of processes" 5. Verification after action - Confirm it worked, detect respawns 6. Actionable confidence - Why uncertain, what would help 7. Diff mode - Before/after comparison for iterative debugging 8. Prose summaries - For explaining to users without burning tokens 9. Failure recovery trees - When things go wrong, guide me through it 10. Token-efficient output - Compact modes for repeated operations The math is beautiful. The safety architecture is paranoid in the right way. What would make this transformative for me is ruthless focus on the agent workflow: gather → decide → act → verify → explain. Every feature should optimize that loop.

Guillermo Rauch @rauchg · Jan 15

Glimpse of a world of fully generative interfaces. AI → JSON → UI: https://t.co/BKcvtDky5K https://t.co/QH6ctR1ldA

Jeffrey Emanuel @doodlestein · Jan 15

@davefobare Literally every single library shown on this site is an exquisite gem and you should always use any that happen to fit your use case and the language you're using (basically Golang and bash): https://t.co/0RcIbKJnGm

Peter Steinberger @steipete · Jan 15

me: i don’t use worktrees, I just have multiple checkouts because less mental load people: 500 replies with over-engineered worktree management apps ☠️🙃

Mitchell Hashimoto @mitchellh · Jan 15

I'm not presently hiring, but I think a really effective engineering interview would be to explicitly ask someone to use AI to solve a task, and see how they navigate. Ignore results, the way AI is driven is maybe the most effective tool at exposing idiots I've ever seen.

Alvin Sng @alvinsng · Jan 15

ralph-tui is quickly growing: created 4 days ago and now with 750+ stars on Github Works with Claude Code, OpenCode and now, Factory Droid! https://t.co/HL3SolXc2r

B Ben Williams @theplgeek

ralph-tui 0.1.7 is live - feat: New agent plugin for @FactoryAI @droid - fix: Shift-Enter bug in create-prd chat input (community PR) - fix: incorrect reason command when closing beads - fix: various docs fixes

Muratcan Koylan @koylanai · Jan 15

These 17 security Skills for Claude Code are really well-written. - Decision trees agents can actually follow - Authoritative sources with specific file paths - Nested references for deeper context My take is that this is the beginning of something massive. Trail of Bits works with DARPA and Facebook. They don't do things casually. Every company with technical docs will ship Skill packages, not because it's nice to have, but because agents won't adopt your product without them. Agents (or humans) won't read docs; they execute Skills. If you're thinking about how agent-readable knowledge should be structured or are building/leading a startup that plans to create your own Skills: I'd love to chat for 5-10 min to exchange ideas. DMs open.

D Dan Guido @dguido

.@trailofbits released our first batch of Claude Skills. Official announcement coming later. https://t.co/vI4amorZrc

Cursor @cursor_ai · Jan 15

Cursor now catches 2.5x as many real bugs per PR. More on how we build and measure agents for code review: https://t.co/E5GKYIchqX

Jeffrey Emanuel @doodlestein · Jan 15

If you don’t want to dive directly into my entire Flywheel system all at once, at least try this: 1. Install agent mail using the curl | bash one-liner: curl -fsSL "https://t.co/4cpumwIS41 +%s)" | bash -s -- --yes That will automatically install beads if you don’t already have it. Then install beads_viewer with its one-liner: curl -fsSL "https://t.co/OETEyjZZhN +%s)" | bash Then set up your AGENTS dot md file for your project. You can start with this one and just remove the sections for the tools you’re not using yet: https://t.co/UEViYk7x3Z Then ask CC to adapt it to better fit the tech stack for your particular project. That’s all you need to get started. Then follow this workflow: https://t.co/xkxAQzMPQl Try to start with a smaller, self-contained greenfield (new) project and see whether you can get it all working perfectly without looking at any of the code, just from following the workflow. Spend most of your energy and human time/focus on the markdown plan. Don’t be lazy about the plan! The more you iterate on it with GPT Pro and layer in feedback from other models, the better your project will turn out. Also don’t be lazy about turning the markdown plan into beads, either. Don’t try to one-shot it with CC, you will 100% miss stuff from the plan. This is the easiest thing to screw up assuming you already have a great markdown plan. Do at least 3 rounds of polishing, improving, and expanding the beads. Once you have the beads in good shape based on a great markdown plan, I almost view the project as a foregone conclusion at that point. The rest is basically mindless “machine tending” of your swarm of 5-15 agents as they build out the beads. It’s mostly just juggling these tasks: - Making sure to make them read AGENTS dot md after compactions. - Using many rounds of the “fresh eyes” review prompt whenever an agent tells you it’s done implementing one of the beads. - Swapping accounts when you run out of usage (ugh!). - Making sure you commit frequently to GitHub using my “logically grouped” commits prompt. - When all beads are complete, doing many rounds of the random code inspection and review. - Adding more and more unit and e2e tests. - Setting up gh actions for testing, builds, tags, releases, checksums, etc. - Writing a README and help/docs/tutorials. - Iterating on a “robot mode” (you added one, right?) with feedback from the agents to make it better. - Seeing if you can make your project work better when controlled by Claude Code by making a skill for it. But most of these things can be done using very little mental focus or attention/energy. Save all of that for the ideation and planning phases! The one thing people seem to get wrong is ignoring what I say about planning or transforming their plan into beads. They make a slipshod plan all at once with Claude Code. Or they try to one-shot turning the plan into beads. Or they even do both of those things! Well, of course the project is going to suck and be a buggy mess if you do that. So don’t be lazy. Or if you insist on being lazy, save it for the stages after planning. A great set of beads is all you need. As for the rest of my tools: Once you get comfortable with that workflow, start layering in the other tools, starting with ubs to help find bugs during the review phases. Then add in dcg. You’ll actually appreciate dcg a lot more once Claude wipes out all the work from the other agents since the last commit! As you build up a good session history, layer in cass so you can tap into that history. And then try cm (cass memory system) to start extracting and codifying lessons from your past sessions. And I know I’ve said that I don’t really use ntm yet (I’m not dogfooding it at least), but that’s not quite true. I’ve been using it as a handy building block because of its robot mode. For example, ntm is used by ru (repo_updater) to automate handling gh issues. Good luck, and come to the Discord with any questions!

C Craig Van @craigvandotcom

So what would you recommend to someone who wants to start using your stack? I don’t want to use it all at once because then I don’t really feel how it works, if I add layers as I’m comfortable then I’ll feel better. What would be the simple to complex or critical to optional setup sequence?

Harrison Chase @hwchase17 · Jan 15

I should have clarified in blog (but am now afk and can’t edit articles from phone, plz fix Elon): We don’t use an actual filesystem. We use Postgres but have a wrapper on top of it expose it to the LLM as a filesystem

H Harrison Chase @hwchase17

How we built Agent Builder’s memory system

Okara @askOkara · Jan 15

I saw a guy coding today. No Okara. No Cursor. No OpenCode. No Claude Code. He just sat there, typing code manually. Like a psychopath.

Vercel @vercel · Jan 15

We just released 𝚛𝚎𝚊𝚌𝚝-𝚋𝚎𝚜𝚝-𝚙𝚛𝚊𝚌𝚝𝚒𝚌𝚎𝚜, a repo for coding agents. React performance rules and evals to catch regressions, like accidental waterfalls and growing client bundles. How we collected them and how to install the skill ↓ https://t.co/kfLSbKl15X

Blas @BlasMoros · Jan 15

prescient "Software is expensive because developers are expensive. They are skilled translators–they translate human language into computer language and vice-versa. LLMs have proven themselves to be remarkably efficient at this and will drive the cost of creating software to zero. What happens when software no longer has to make money? We will experience a Cambrian explosion of software, the same way we did with content."

C Chris Paik @cpaik

The End of Software https://t.co/JWg6QYqLzO

Peter Steinberger @steipete · Jan 15

I still think https://t.co/fz1tUJADRo is a better approach. agents know really well how to handle clis.

T Thariq @trq212

Tool Search now in Claude Code

OpenAI Developers @OpenAIDevs · Jan 15

Today we’re announcing Open Responses: an open-source spec for building multi-provider, interoperable LLM interfaces built on top of the original OpenAI Responses API. ✅ Multi-provider by default ✅ Useful for real-world workflows ✅ Extensible without fragmentation Build agentic systems without rewriting your stack for every model: https://t.co/ZJPNDemq40

am.will @LLMJunky · Jan 15

@clawdbot is utterly cracked. From my phone, I had to do repo research, indexing all the migrations, edge functions, and tables I have in my supabase It then passed this context into a Codex agent which used Context7 to pull documentation to help migrate Supabase over to @convex Codex completed the plan, saved it to my repo, and it's ready for migration. Keep in mind, this is a task I could have kicked off from the Denny's parking lot. And to implement the plan? Would have been as instructing it to spin up another Codex (or Claude, Gemini, whatever). @steipete is a legend. I'm only just scratching the surface, but there's an entire library of skills that I've installed. Browser automation, remind me, deep research. It understands images. I can leave it voice memos. I don't know if this is AGI, but its about as close as you can get right now.

Vercel Developers @vercel_dev · Jan 15

① Install the skill: $ npx add-skill vercel-labs/agent-skills ② Paste this prompt: Assess this repo against React best practices. Make a prioritized list of quick wins and top fixes. ③ Review and prompt to "make the fixes"

A Alex Sidorenko @asidorenko_

"How can I use react-best-practices skills?" Codex example 👇 https://t.co/dUrnqOUWIu

Fernando @Franc0Fernand0 · Jan 15

One of the most amazing series of YouTube videos I ever watched. How to build an Operating System: • CPU, Assembly, Booting • BIOS and Keyboard Inputs • Stack, Functions, Segmentation • disk i/o • Protected mode • Writing a kernel Watch it here: https://t.co/rwUTvtMwoz https://t.co/tpZPGLOE0K

luffy @0xluffy · Jan 15

i made a chrome extension that converts X articles into a speed reader. no copy pasting. just a button made with @capydotai https://t.co/uLO3ubJ0nc

O Oliur @UltraLinx

Can you read 900 words per minute? Try it. https://t.co/31ubbZWvXH

Vercel @vercel · Jan 15

Join Vercel for a live session on The Future of Agentic Commerce and learn how AI-native shopping experiences are changing how customers discover, evaluate, and purchase products. 📅 Feb 4, 2026 • 12:00 PM PT / 3:00 PM ET

Pleometric @pleometric · Jan 15

how many claude codes do you run at once? gas town?👋😅 https://t.co/3WjMK2XkQT

GitHub Changelog @GHchangelog · Jan 15

Agentic memory for GitHub Copilot is in public preview. • Copilot learns repo details to boost agent, code review, CLI help • Memories scoped to repos, expire in 28 days, shared across Copilot features • Enable via Copilot or org settings Explore more → https://t.co/beDx97EDg3

santi @santtiagom_ · Jan 15

Ayer publiqué mi primer artículo sobre Event-Driven Architecture. A diferencia de los tweets, acá pude ir más a fondo. Sumé ejemplos, tips prácticos y cuándo conviene usarlo. El objetivo es que lo leas y empieces a pensar en eventos cuando diseñás una aplicación. Cambiar la mentalidad de “hago X y después Y” por “algo ocurrió en el sistema y distintas partes reaccionan a eso, cada una con su responsabilidad”. Seguro empiece a crear más contenido de este estilo.

s santi @santtiagom_

Event-Driven: diseñar tu app pensando en eventos