.agents/skills Standardization Sweeps IDEs as Apple Ships Claude Agent SDK in Xcode 26.3
Daily Wrap-Up
Today felt like a coordination event across the AI coding tool landscape. The .agents/skills specification quietly became a de facto standard, with VS Code, Copilot, Codex, and Cursor all announcing support within the same news cycle. @theo couldn't resist pointing out that Claude Code, whose ecosystem arguably popularized skills, is notably absent from the adopters list. Meanwhile, Apple shipped Xcode 26.3 with a full Claude Agent SDK integration, giving Swift developers the same autonomous agent capabilities that CLI users have had. The message is clear: the IDE wars have moved past autocomplete and into agent orchestration, and skills are the interop layer everyone is converging on.
On the model front, the story was about making powerful capabilities accessible at smaller scales. Alibaba's Qwen released a 3B-parameter coding model that reportedly matches Sonnet 4.5 on benchmarks, while @simonw flagged an Unsloth quantized model that might actually drive coding agents from local hardware. @karpathy continued his GPT-2 speedrun saga, getting training time down to 2.91 hours with fp8. The practical implication: the floor for "good enough" AI-assisted coding keeps dropping, and the ceiling keeps rising. Rumors of a Sonnet 5 soft-launch on claude.ai added fuel to the speculation fire.
The most entertaining moment was @banteg issuing a genuine challenge to "claude boys, ralph boys" to take two decompiled C files and rewrite an entire game in another language and engine, preferably browser-based. It's a perfect litmus test for where agent-assisted development actually stands. The most practical takeaway for developers: if you maintain any kind of coding agent configuration, start organizing it under .agents/skills/ now. The convergence across tools means your investment in skills will be portable across VS Code, Copilot, Cursor, and likely Claude Code soon enough.
Quick Hits
- @DrJimFan teased "The Second Pre-training Paradigm" without elaboration, leaving the timeline to speculate.
- @HuggingModels spotlighted GLM-4.7-Flash-Uncensored-Heretic, an unfiltered text generator optimized for raw reasoning with "zero guardrails."
- @theworldlabs showed off persistent 3D scenes from their world model that users can build on top of indefinitely.
- @minchoi shared Higgsfield AI's Vibe-Motion, a prompt-to-motion-design tool powered by Claude reasoning that targets ad agency workflows.
- @minchoi also RT'd their own AI-generated content with "Haters will say no AI was used for this."
- @jukan05 raised eyebrows asking whether OpenAI is already starting layoffs.
- @OpenAIDevs announced a live Codex app workshop with @romainhuet and @dkundel for building apps end-to-end.
- @KaranKunjur reflected on building a space company (K2) at the intersection of Starship and orbital compute, noting "concepts I thought were 5 to 10 years out are now foundational capabilities."
- @cb_doge outlined Elon Musk's five-step plan to reach Kardashev Type II civilization through orbital AI data centers.
- @kloss_xyz shared what a $200/month Claude setup looks like in practice.
- @pierceboggan asked what to prioritize improving for developers using VS Code and GitHub Copilot CLI together.
- @trq212 highlighted a Chrome browser connection for Claude via the VS Code extension, enabling frontend debugging and browser automation.
- @felixleezd published a Claude Code guide specifically for designers.
- @flaviocopes endorsed Docker sandboxes as "a fantastic way to run agents in YOLO mode without anxiety."
- @o_kwasniewski stressed that build and lint passing isn't enough, and end-to-end testing of agent-built flows is crucial.
- @micLivs had the most concise advice of the day regarding skills configuration: "just create a symlink and get on with your life."
Claude Code's Surface Area Expands
Anthropic had a busy day pushing Claude Code into new surfaces. The headline announcement was Xcode 26.3 shipping with a native Claude Agent SDK integration. As @mikeyk described it, "Devs get the full power of Claude Code (subagents, background tasks, and plugins) for long-running, autonomous work directly in Xcode." This isn't a lightweight autocomplete bolt-on. It's the same agent harness that powers the CLI, now embedded in Apple's IDE. @AnthropicAI positioned it as covering "iPhone to Mac to Apple Vision Pro," signaling that Claude's coding agent story extends beyond web development.
On the communication front, @claudeai announced Slack integration for Pro and Max plans:
"Search your workspace channels, prep for meetings, and send messages back to keep work moving forward, without leaving your conversation with Claude."
@_catwu from Anthropic demonstrated the practical workflow: "We have a user feedback channel where we regularly tag in @Claude to investigate issues and push fixes." Meanwhile, Claude Code 2.1.30 landed with what @ClaudeCodeLog documented as "19 CLI, 1 flag, and 1 prompt changes." @Yampeleg called out the new /insights command specifically. @lydiahallie announced session sharing, letting developers share full conversations with team members via web, desktop, or mobile.
The pricing discussion also heated up. @OrenMe did the math on Claude Code vs GitHub Copilot, calculating that $1000/month on Copilot's overage pricing would yield roughly 8,500 Opus requests versus Claude Code's flat-rate approach. @rockatanescu noted that Anthropic's $125/month business seats with limits comparable to the $100 consumer plan make it "an easy choice for businesses," adding that "surprisingly, OpenAI doesn't offer business plans with higher limits." The competitive dynamics are shifting from capability comparisons to unit economics.
The .agents/skills Land Grab
Something notable happened today: multiple competing IDE platforms converged on the same skills specification within hours of each other. @theo catalogued the adopters with characteristic directness:
"Products that moved to
.agents/skillsso far: Codex, OpenCode, Copilot, Cursor. Not Claude Code."
The irony is thick. Claude Code's ecosystem did more than anyone to popularize the concept of skills as portable agent instructions, yet the formal .agents/skills directory convention is being adopted by everyone else first. @pierceboggan confirmed ".agents/skills coming to @code!" and @leerob from Vercel added "We're adding support for .agents/skills in the next release! This will make it easier to use skills with any coding agent."
The standardization is already spawning its own economy. @EXM7777 declared "investing in skills is the best play you can make in 2026" while promoting SkillStack as a marketplace for buying and selling audited skills. @haydenbleasel launched AI Elements Skills with a one-liner install: npx skills add vercel/ai-elements. Whether this becomes a real marketplace or another npm-style dependency sprawl remains to be seen, but the convergence on a common directory structure is a genuine interoperability win. Developers who invest in writing good skills today are building assets that work across toolchains.
Small Models, Big Ambitions
The model releases today told a story about compression and accessibility. @itsPaulAi highlighted Alibaba Qwen's new coding model: "Only 3B active parameters, coding performance equivalent to Sonnet 4.5. Comparable to models with 10x-20x more active parameters. But you can run it LOCALLY." If the benchmarks hold up in practice, this is significant. A model that fits comfortably on consumer hardware matching a flagship cloud model on coding tasks changes the economics of AI-assisted development.
@simonw picked up a related thread, noting that Unsloth's 46GB quantized model might actually be capable of driving coding agent harnesses like Claude Code from local hardware: "I've had trouble running those usefully from other local models that fit in <64GB so if it works this is a really big deal." The gap between cloud-only and local-capable agents keeps narrowing.
@karpathy shared detailed fp8 training results, pushing his GPT-2 speedrun to 2.91 hours on 8xH100 (roughly $20 at spot prices). His technical breakdown was characteristically thorough, noting that fp8's theoretical 2x FLOP improvement on H100 translates to only about 5% real-world speedup at GPT-2 scale due to overhead from scale conversions and insufficient GEMM sizes. As he put it: "GPT-2 (7 years ago): too dangerous to release. GPT-2 (today): new MNIST!"
On Anthropic's side, @synthwavedd claimed to spot a Sonnet 5 soft launch on claude.ai, noting that "Anthropic have stealth launched models hours before release almost every time." @kimmonismus reported Anthropic's image model going live on LMArena. And @wzhao_nlp shared an emotional account of a model release where the team "redid midtraining because we saw cases where models failed to follow instructions on out-of-distribution scaffolds," choosing fundamental fixes over surface-level patches.
Agents as Developer Infrastructure
The conversation around agents shifted today from "will agents replace developers" to "how do developers manage fleets of agents." @rauchg framed it as a scaling problem: "Agents give developers horizontal scalability. The simple version of this is Ghostty splits and tabs, tmux sessions and the like, running CLI agents in parallel. Automating the full product development loop is now your job, and your edge."
@addyosmani confirmed this is already happening at Google: "I use a multi-agent swarm for most of my daily development. This is a future we're planning for more of at Google." His practical advice was notably grounded: "Be very intentional about what requires deep vs. shallow review" and "audit what Skills, MCPs really help."
The tooling to manage this is emerging. @tobi praised Pi as "the most interesting agent harness," describing how it "RLs itself into the agent you want" by writing plugins for itself during use. @zeeg asked what felt like the question of the day: "What's the best user interface you've seen for managing multiple claude code sessions? I want the navigation of each session and to easily be able to run multiple planning agents." @hasantoxr highlighted a Chinese open-source desktop automation agent that runs entirely locally, handling desktop apps, files, and browser automation without internet. The infrastructure layer for multi-agent development is still wide open territory.
Agentic Search Outperforms RAG on Real Codebases
A fascinating technical thread emerged around how coding agents should understand codebases. @dani_avila7 shared hard-won experience: "RAG + vector DB gives decent results, but agentic search over the repo (glob/grep/read, etc) consistently worked better on real-world codebases." Their team even tried RAG combined with embeddings, AST parsing, and tree-sitter, and while quality was excellent, the operational burden was high: "staleness and privacy, you need continuous re-indexing, and all the code and embeddings must live on your servers."
The conclusion was counterintuitive: "fast models + bash-style agentic search ended up outperforming general RAG search, even if it requires more tool calls." @e7r1us offered a middle ground for JS/TS projects: parse with Babel and create compact representations of hooks, constants, context, and function signatures with starting lines, then feed that to the agent as context. @aidenybai promoted React Grab as a tool that "extracts file sources rather than DOM selectors" because "agents can't actually do much with selectors, while sources are the source of truth." He also teased a post on making Claude Code 3x faster, likely through similar context optimization techniques.
The Shifting Identity of the Developer
Today's philosophical posts carried a heavier emotional weight than usual. @adityaag wrote a raw reflection on coding with Claude: "Something I was very good at is now free and abundant. I am happy... but disoriented." He noted that both the form and function of his early career (writing code, building social networks) are now produced by AI, adding "if anything, this whole period is showing me what it is like to be human again."
@naval offered a more clinical reframing: "Vibe coding is the new product management. Training and tuning models is the new coding." It's a clean formulation, but it papers over the emotional complexity that @adityaag captured. @TheGeorgePu reported that Meta now tracks 200+ data points on employee AI usage, with "high adoption = 300% bonus" and "low adoption = managed out," prompting @nomoreplan_b to respond simply: "AI fluency is becoming job security." The career implications are no longer theoretical. They're being encoded into compensation structures at the largest tech companies in the world.
Source Posts
For devs asking โhow do I run coding agents without breaking my machine?โ Docker Sandboxes are now available. They use isolated microVMs so agents can install packages, run Docker, and modify configs - without touching your host system. Read more โ https://t.co/VjlWMG5wqF https://t.co/7ssqWboten
[๋ฏธ๊ตญ ๋ธ๋ผ์ธ๋ ๊ตฌ๊ธ ์ง์์ ๊ธ ๋ฒ์ญ] ์ ์ํธ๋จผ์ ํ์ดํ ๋ฏธํ ์ดํ OpenAI๊ฐ ์ธ์์ ๊ฐ์ถํ๋ค๋ ์์์ ๋ค์์ต๋๋ค. ์ฌ์ง์ด ํ ๋งค์นญ(Team Match)์ด๋ ์จ์ฌ์ดํธ ๋ฉด์ (Onsite Loop)์ ๋ง์น ํ์๋ ๋ฆฌํฌ๋ฃจํฐ๋ก๋ถํฐ ์ฐ๋ฝ์ด ๋๊ฒผ๋ค๋ ๋ถ๋ค์ด ์๋ค๊ณ ํ๋ค์. ์ด๊ฒ์ด ์ฌ์ค์ธ์ง, ๊ทธ๋ฆฌ๊ณ ์ด๋ฏธ ์คํผ๋ ํฐ์ ์๋ช ํ ์ฌ๋๋ค์๊ฒ๋ ์ํฅ์ด ์์์ง ๊ถ๊ธํฉ๋๋ค.
@EthanLipnik ๐ Early versions of Claude Code used RAG + a local vector db, but we found pretty quickly that agentic search generally works better. It is also simpler and doesnโt have the same issues around security, privacy, staleness, and reliability.
Vibe coding is the new product management. Training and tuning models is the new coding.
Qwen releases Qwen3-Coder-Next. ๐ The new 80B MoE model excels at agentic coding & local use. With 256K context, it delivers similar performance to models with 10-20ร more active parameters. Run on 46GB RAM or less. Guide: https://t.co/wzoXlZwDuL GGUF: https://t.co/rpYrlnazsm
Claude Code Guide for Designers
Claude Code is the highest leverage skill you can learn this year. The future of design is here. I've written a full guide for this. Story time... I...
nanochat can now train GPT-2 grade LLM for <<$100 (~$73, 3 hours on a single 8XH100 node). GPT-2 is just my favorite LLM because it's the first time the LLM stack comes together in a recognizably modern form. So it has become a bit of a weird & lasting obsession of mine to train a model to GPT-2 capability but for much cheaper, with the benefit of ~7 years of progress. In particular, I suspected it should be possible today to train one for <<$100. Originally in 2019, GPT-2 was trained by OpenAI on 32 TPU v3 chips for 168 hours (7 days), with $8/hour/TPUv3 back then, for a total cost of approx. $43K. It achieves 0.256525 CORE score, which is an ensemble metric introduced in the DCLM paper over 22 evaluations like ARC/MMLU/etc. As of the last few improvements merged into nanochat (many of them originating in modded-nanogpt repo), I can now reach a higher CORE score in 3.04 hours (~$73) on a single 8XH100 node. This is a 600X cost reduction over 7 years, i.e. the cost to train GPT-2 is falling approximately 2.5X every year. I think this is likely an underestimate because I am still finding more improvements relatively regularly and I have a backlog of more ideas to try. A longer post with a lot of the detail of the optimizations involved and pointers on how to reproduce are here: https://t.co/vhnK0d3L7B Inspired by modded-nanogpt, I also created a leaderboard for "time to GPT-2", where this first "Jan29" model is entry #1 at 3.04 hours. It will be fun to iterate on this further and I welcome help! My hope is that nanochat can grow to become a very nice/clean and tuned experimental LLM harness for prototyping ideas, for having fun, and ofc for learning. The biggest improvements of things that worked out of the box and simply produced gains right away were 1) Flash Attention 3 kernels (faster, and allows window_size kwarg to get alternating attention patterns), Muon optimizer (I tried for ~1 day to delete it and only use AdamW and I couldn't), residual pathways and skip connections gated by learnable scalars, and value embeddings. There were many other smaller things that stack up. Image: semi-related eye candy of deriving the scaling laws for the current nanochat model miniseries, pretty and satisfying!
@embirico changed merged in @code for broader copilot adoption https://t.co/JtKTMo4tP8, cheers!
Agent Skills are now available in Cursor. Skills let agents discover and run specialized prompts and code. https://t.co/aZcOkRhqw8
So anthropic's cooking an in-house image model sonata is live on lmarena and it's having a whole identity crisis claims google made it half the time, anthropic the other half This is from claude config , so it's 100% guaranteed now it is coming https://t.co/xlYDFWU1BM
๐ Introducingย Qwen3-Coder-Next, an open-weight LM built for coding agents & local development. Whatโs new: ๐ค Scaling agentic training:ย 800K verifiable tasks + executable envs ๐ EfficiencyโPerformance Tradeoff: achieves strong results on SWE-Bench Pro with 80B total params and 3B active โจย Supportsย OpenClaw, Qwen Code, Claude Code, web dev, browser use, Cline, etc ๐ค Hugging Face: https://t.co/rZoW4vRJpr ๐ค ModelScope: https://t.co/P0vT5zILBZ ๐ Blog: https://t.co/hFfFDYcwvd ๐ Tech report:ย https://t.co/Qx83PWS3oi
"Earlier, all devs used GitHub Copilot. 9 months ago, we rolled out Cursor to all devs. 1.5 weeks ago, we rolled out Claude Code to everyone, and cancelled our Copilot subscription" - CTO at a company with 600 engineers (I hear this exact "transition" story, a LOT!)
Introducing Agent Device: tokenโefficient iOS & Android automation for AI agents ๐๐๐ก ๐๐๐๐๐-๐๐๐๐๐๐ https://t.co/6hfs2LDyxq
How I made Claude Code 3x faster
Coding agents suck at frontend because translating intent (from UI โ prompt โ code โ UI) is lossy. For example, if you want to make a UI change: Creat...
The Second Pre-training Paradigm
Next word prediction was the first pre-training paradigm. Now we are living through the second paradigm shift: world modeling, or โnext physical state...
We are live. The marketplace to buy high-quality, pre-vetted Claude Skills https://t.co/N7hJjVmsBa https://t.co/LdsTmrFQZd
We're adding support for .agents/skills in the next release! This will make it easier to use skills with any coding agent.