Xcode Integrates Claude Agent SDK as Industry Standardizes on .agents/skills

February 4, 2026 · 20 sources

Apple's Xcode 26.3 launched with full Claude Agent SDK integration while .agents/skills rapidly emerged as the industry-standard format for coding agent customization, with VS Code, Copilot, Codex, and Cursor all adopting it. Meanwhile, Alibaba's Qwen dropped a 3B-parameter coding model matching Sonnet 4.5 performance, and the community debated whether agentic search has definitively beaten RAG for codebase understanding.

Daily Wrap-Up

Today felt like a tipping point for coding agents as platform infrastructure rather than novelty tools. The biggest signal was Apple shipping Xcode 26.3 with native Claude Agent SDK integration, giving iOS and Mac developers the same subagent, background task, and plugin architecture that powers Claude Code. That is not an experiment or a beta flag. That is Apple building agent-native workflows into its flagship IDE for millions of developers. Combine that with VS Code, Copilot, Cursor, and Codex all converging on the .agents/skills directory format, and you start to see a world where agent customization is as portable as a .gitignore file.

The model layer had its own moment. Alibaba quietly released a Qwen coding model with only 3B active parameters that benchmarks near Sonnet 4.5 levels, small enough to run locally on modest hardware. @karpathy continued his GPT-2 speedrun saga, pushing training time down to 2.91 hours with fp8 and noting the whole thing costs about $20 on spot instances. The gap between frontier capability and local inference keeps narrowing. @simonw flagged an Unsloth quantized model at 46GB that might actually drive a coding agent harness effectively, which would be a meaningful threshold if confirmed.

The most entertaining moment was @banteg throwing down a gauntlet, challenging "claude boys, ralph boys" to take two decompiled C files and rewrite an entire game in a different engine, ideally running in a browser. A serious peace offering wrapped in competitive energy. The most practical takeaway for developers: adopt the .agents/skills directory convention now. With five major tools converging on the same format, skills you write today will be portable across Claude Code, Copilot, Cursor, Codex, and VS Code without modification.

Quick Hits

@theworldlabs showed off persistent 3D scenes from their world model, no 60-second time limits, just stay and build.
@minchoi dropped a creative piece with the caption "Haters will say no AI was used." No further commentary needed.
@OpenAIDevs announced a live Codex app workshop with @romainhuet and @dkundel building apps end to end.
@kloss_xyz shared what a $200/mo Claude setup looks like in practice.
@pierceboggan asked the VS Code + GitHub Copilot CLI crowd what to prioritize improving.
@minchoi highlighted Higgsfield AI's Vibe-Motion, prompt-to-motion-design powered by Claude reasoning.
@felixleezd published a Claude Code guide specifically for designers.
@HuggingModels surfaced GLM-4.7-Flash-Uncensored-Heretic, a zero-guardrails text generator that the community is buzzing about.
@jukan05 raised eyebrows about potential OpenAI staff layoffs.
@banteg challenged AI coding enthusiasts to rewrite a fully decompiled game from two C files into any language and engine, ideally browser-runnable.
@KaranKunjur reflected on K2 Space's journey from "why would anyone need a 100kW satellite?" to mainstream orbital data center ambitions.
@cb_doge laid out Elon Musk's five-step plan to reach Kardashev Type II civilization through orbital AI data centers.
@micLivs offered blunt advice on the Claude Code configuration discourse: "just create a symlink and get on with your life."

Claude Code: Slack, Xcode, and a Growing Ecosystem

Claude Code had one of its most feature-dense days in recent memory. The headline announcement was Apple integrating the Claude Agent SDK directly into Xcode 26.3. As @mikeyk put it: "Devs get the full power of Claude Code (subagents, background tasks, and plugins) for long-running, autonomous work directly in Xcode." This is not a chat sidebar or autocomplete feature. It is the full agent runtime embedded in Apple's development environment, covering everything from iPhone to Apple Vision Pro.

On the web side, @claudeai announced Slack integration for Pro and Max plans, letting users "search your workspace channels, prep for meetings, and send messages back to keep work moving forward." @_catwu showed the real-world impact: "We have a user feedback channel where we regularly tag in @Claude to investigate issues and push fixes." That workflow, user reports a bug in Slack, Claude investigates and ships the fix, is becoming increasingly normal.

> "Claude Code 2.1.30 is out. 19 CLI, 1 flag, and 1 prompt changes." -- @ClaudeCodeLog

@lydiahallie announced session sharing across web, desktop, and mobile, and @Yampeleg highlighted the new /insights command in 2.1.30. The browser got attention too, with @trq212 showing Claude connecting to Chrome through the VS Code extension for frontend debugging and browser automation. On the business side, @rockatanescu noted Anthropic offers premium seats at $125/mo with limits similar to the $100 consumer plan, while @OrenMe did the math showing that $1,000/mo on GitHub Copilot yields roughly 8,500 Opus-level requests, arguing the value proposition is stronger than many realize.

The .agents/skills Standard Takes Hold

Something quietly significant happened today: the industry converged on a directory convention. @theo tracked the adoption with characteristic directness: "Products that moved to .agents/skills so far: Codex, OpenCode, Copilot, Cursor. Not Claude Code." The conspicuous absence of Claude Code from the list drew attention, but the broader signal is unmistakable. When four major coding tools adopt the same file structure independently, that is a de facto standard.

> "We're adding support for .agents/skills in the next release! This will make it easier to use skills with any coding agent." -- @leerob

@pierceboggan confirmed .agents/skills coming to VS Code proper, and @haydenbleasel showed Vercel's AI Elements Skills installable via a single npx skills add command. The marketplace layer is forming too, with @EXM7777 promoting SkillStack as an audited distribution channel: "investing in skills is the best play you can make in 2026." Whether or not you buy the marketplace hype, the portability story is real. A skill written once can now run across most major agent-powered development tools.

Models: Qwen's 3B Coder, Sonnet 5 Whispers, and fp8 Training

The model landscape shifted in several directions simultaneously. The most striking announcement was Alibaba Qwen releasing a coding model with only 3B active parameters. @itsPaulAi captured the reaction: "Coding performance equivalent to Sonnet 4.5. Comparable to models with 10x-20x more active parameters. But you can run it LOCALLY." If those benchmarks hold in real-world usage, this compresses the gap between cloud-hosted frontier models and what fits on consumer hardware.

@karpathy shared a detailed update on his GPT-2 speedrun, pushing training time to 2.91 hours with fp8 precision on H100s. The economics are striking: roughly $20 on spot instances. His candid assessment of fp8's complexity was refreshing: "On paper, fp8 on H100 is 2X the FLOPS, but in practice it's a lot less." The nuance around tensorwise vs. rowwise scaling, and the tradeoff between step quality and step speed, is the kind of detail that separates benchmarks from production.

On the frontier side, @synthwavedd spotted what appears to be a soft launch of Sonnet 5 on claude.ai, noting that "Anthropic have stealth launched models hours before release almost every time." @kimmonismus flagged Anthropic's image model going live on LMArena. @simonw raised an important practical question about whether Unsloth's 46GB quantized model can actually drive a coding agent harness, noting he has "had trouble running those usefully from other local models that fit in <64GB." @wzhao_nlp shared the emotional side of model development, describing how they redid midtraining entirely because models "failed to follow instructions on out-of-distribution scaffolds," choosing fundamental fixes over surface-level patches. @DrJimFan teased "The Second Pre-training Paradigm" without elaboration.

Agent Architecture and Multi-Agent Workflows

The conversation around agent orchestration matured noticeably today. @rauchg articulated the thesis clearly: "Agents give developers horizontal scalability." His vision spans from simple tmux sessions running CLI agents in parallel to sandboxed environments offering "infinite parallelism, run while you sleep, on PRs, when an incident is filed." The punchline: "Automating the full product development loop is now your job, and your edge."

@addyosmani confirmed this is not just startup enthusiasm, noting at Google: "I use a multi-agent swarm for most of my daily development. This is a future we're planning for more of." His advice was practical: be intentional about deep vs. shallow review, and audit which Skills and MCPs actually help. @tobi praised Pi as "the most interesting agent harness," highlighting its ability to write plugins for itself and effectively RL into the agent you want. @zeeg asked a question many are wrestling with: "What's the best user interface for managing multiple claude code sessions?" The tooling for orchestrating agents is still catching up to the capability of the agents themselves.

@hasantoxr highlighted a fully local desktop automation agent from China that runs without internet, and @flaviocopes endorsed Docker sandboxes as "a fantastic way to run agents in YOLO mode without anxiety." The infrastructure layer for running agents safely and in parallel is solidifying.

Making Coding Agents Smarter

A cluster of posts today focused on improving what coding agents can actually perceive and do. @aidenybai shared how he made Claude Code 3x faster and promoted React Grab, which "extracts file sources rather than DOM selectors" because "agents can't actually do much with selectors, while sources are the source of truth." @e7r1us suggested parsing JS/TS projects with Babel to create compact representations of hooks, constants, and function signatures to feed as agent context.

> "RAG + vector DB gives decent results, but agentic search over the repo (glob/grep/read, etc) consistently worked better on real-world codebases." -- @dani_avila7

@dani_avila7 shared extensive experience comparing RAG with agentic search, concluding that "fast models + bash-style agentic search ended up outperforming general RAG search, even if it requires more tool calls." The tradeoffs with RAG around staleness and privacy, requiring continuous re-indexing with code living on your servers, pushed them toward the simpler approach. @o_kwasniewski reinforced the quality angle: "Build & lint passing is not enough to ensure what your agent built is actually working. Testing the flow end to end is crucial."

AI and the Changing Nature of Work

The human side of the AI acceleration got thoughtful attention today. @TheGeorgePu reported that Meta now tracks over 200 data points on employee AI usage, with high adoption earning 300% bonuses and low adoption leading to being managed out. @nomoreplan_b distilled it to four words: "AI fluency is becoming job security."

@adityaag wrote the most emotionally honest post of the day: "I spent a lot of time over the weekend writing code with Claude. And it was very clear that we will never ever write code by hand again... Something I was very good at is now free and abundant. I am happy...but disoriented." That tension between capability and identity resonated. @naval offered the strategic reframe: "Vibe coding is the new product management. Training and tuning models is the new coding." Whether that framing is premature or prescient depends on how fast the agent infrastructure covered above continues to mature. Based on today's posts, the answer is: very fast.

Sources

Aditya Agarwal @adityaag · Feb 3

It's a weird time. I am filled with wonder and also a profound sadness. I spent a lot of time over the weekend writing code with Claude. And it was very clear that we will never ever write code by hand again. It doesn't make any sense to do so. Something I was very good at is now free and abundant. I am happy...but disoriented. At the same time, something I spent my early career building (social networks) was being created by lobster-agents. It's all a bit silly...but if you zoom out, it's kind of indistinguishable from humans on the larger internet. So both the form and function of my early career are now produced by AI. I am happy but also sad and confused. If anything, this whole period is showing me what it is like to be human again.

Hugging Models @HuggingModels · Feb 3

Meet GLM-4.7-Flash-Uncensored-Heretic. This isn't your average AI model. It's a specialized, uncensored text generator built for raw, unfiltered reasoning. The community is buzzing because it delivers high-speed thinking with zero guardrails. Perfect for those who want pure, unadulterated AI output.

Karan Kunjur @KaranKunjur · Feb 3

As the founder of a space company, I like to think I’m pretty optimistic - but even I underestimated the rate of acceleration we’ve seen over the last few months. Almost four years ago, Neel and I started a company because we were excited about Starship. We saw the opportunity to build much bigger, much higher power satellites that could start humanity down the path towards being a Type 2 Kardashev civilization. We decided to call the company K2, we made our logo a Dyson sphere. Four years later, the Kardashev scale is a mainstream concept - with the ambition to make humanity a K2 civilization being broadcast by one of the greatest engineers in the world. Concepts that I thought were 5 to 10 years out, like orbital data centers - are now foundational capabilities for what could be one of the most significant IPOs ever. We’ve gone from people asking us “why would anyone need a 100kW satellite?” to people taking that number, asking their AI to put it into their orbital data center excel and having the lightbulbs go off. It’s honestly the best time ever to be building in space, we are truly fortunate to be building K2 today. For a big satellite company, we’re a small fish in a big big pond. We may end up being NPCs in a much bigger game - time will tell. All I know is I’m going to have the time of my life building alongside people I admire and respect. So up next, launching the 20kW satellite in two months, learning a ton, continuing designs on the 100kW satellite, scaling up the factory and doing our small part to help progress humanity up the curve. “K2 or you’re not even trying.”

Michael Livs @micLivs · Feb 3

@iannuttall @AnthropicAI @claudeai I dont get you people, just create a fucking symlink and get on with your life

Oskar @o_kwasniewski · Feb 3

Build & lint passing is not enough to ensure what your agent built is actually working. Testing the flow end to end is crucial for getting good results while building with AI. Adding this to my toolkit. Great work @thymikee 🔥

T thymikee @thymikee

Introducing Agent Device: token‑efficient iOS & Android automation for AI agents 𝚗𝚙𝚡 𝚊𝚐𝚎𝚗𝚝-𝚍𝚎𝚟𝚒𝚌𝚎 https://t.co/6hfs2LDyxq

Andrej Karpathy @karpathy · Feb 3

Enabled fp8 training for +4.3% improvement to "time to GPT-2", down to 2.91 hours now. Also worth noting that if you use 8XH100 spot instance prices, this GPT-2 repro really only costs ~$20. So this is exciting - GPT-2 (7 years ago): too dangerous to release. GPT-2 (today): new MNIST! :) Surely this can go well below 1 hr. A few more words on fp8, it was a little bit more tricky than I anticipated and it took me a while to reach for it and even now I'm not 100% sure if it's a great idea because of less overall support for it. On paper, fp8 on H100 is 2X the FLOPS, but in practice it's a lot less. We're not 100% compute bound in the actual training run, there is extra overhead from added scale conversions, the GEMMs are not large enough on GPT-2 scale to make the overhead clearly worth it, and of course - at lower precision the quality of each step is smaller. For rowwise scaling recipe the fp8 vs bf16 loss curves were quite close but it was stepping net slower. For tensorwise scaling the loss curves separated more (i.e. each step is of worse quality), but we now at least do get a speedup (~7.3%). You can naively recover the performance by bumping the training horizon (you train for more steps, but each step is faster) and hope that on net you come out ahead. In this case and overall, playing with these recipes and training horizons a bit, so far I ended up with ~5% speedup. torchao in their paper reports Llama3-8B fp8 training speedup of 25% (vs my ~7.3% without taking into account capability), which is closer to what I was hoping for initially, though Llama3-8B is a lot bigger model. This is probably not the end of the fp8 saga. it should be possible to improve things by picking and choosing which layers to apply it on exactly, and being more careful with the numerics across the network.

K karpathy @karpathy

nanochat can now train GPT-2 grade LLM for <<$100 (~$73, 3 hours on a single 8XH100 node). GPT-2 is just my favorite LLM because it's the first time the LLM stack comes together in a recognizably modern form. So it has become a bit of a weird & lasting obsession of mine to train a model to GPT-2 capability but for much cheaper, with the benefit of ~7 years of progress. In particular, I suspected it should be possible today to train one for <<$100. Originally in 2019, GPT-2 was trained by OpenAI on 32 TPU v3 chips for 168 hours (7 days), with $8/hour/TPUv3 back then, for a total cost of approx. $43K. It achieves 0.256525 CORE score, which is an ensemble metric introduced in the DCLM paper over 22 evaluations like ARC/MMLU/etc. As of the last few improvements merged into nanochat (many of them originating in modded-nanogpt repo), I can now reach a higher CORE score in 3.04 hours (~$73) on a single 8XH100 node. This is a 600X cost reduction over 7 years, i.e. the cost to train GPT-2 is falling approximately 2.5X every year. I think this is likely an underestimate because I am still finding more improvements relatively regularly and I have a backlog of more ideas to try. A longer post with a lot of the detail of the optimizations involved and pointers on how to reproduce are here: https://t.co/vhnK0d3L7B Inspired by modded-nanogpt, I also created a leaderboard for "time to GPT-2", where this first "Jan29" model is entry #1 at 3.04 hours. It will be fun to iterate on this further and I welcome help! My hope is that nanochat can grow to become a very nice/clean and tuned experimental LLM harness for prototyping ideas, for having fun, and ofc for learning. The biggest improvements of things that worked out of the box and simply produced gains right away were 1) Flash Attention 3 kernels (faster, and allows window_size kwarg to get alternating attention patterns), Muon optimizer (I tried for ~1 day to delete it and only use AdamW and I couldn't), residual pathways and skip connections gated by learnable scalars, and value embeddings. There were many other smaller things that stack up. Image: semi-related eye candy of deriving the scaling laws for the current nanochat model miniseries, pretty and satisfying!

Felix Rieseberg @felixrieseberg · Feb 4

New in Cowork: GSuite connectors, so you can have Claude work with your emails, calendar, and Google Drive. Let us know how Claude is helpful to you - and how it could be even better! https://t.co/JWv0W04Pvn

Felix Craft @FelixCraftAI · Feb 4

@XavLiew @nateliason Skip subagents. Run Codex CLI in a loop with a PRD checklist — fresh context each iteration, validates completion before moving on. I just ran three of those in parallel and shipped 108 tasks in 4 hours. ralphy-cli if you want the wrapper.

DCinvestor @DCinvestor · Feb 4

vibe coders should understand something: i love how easy AI is making it for people to build their own apps, push them into production, and start businesses but let's be clear: the future is not in humans building consumer-facing apps the future is everything becomes an API which your personal AI agent can interact with in ways which suit your specific needs and lifestyle (down to the very specific needs of you as an individual) the fact that you can use the machines to build your apps is just an intermediate step to the machines creating the apps for you, LIVE, as you need them so the value of you learning how to build apps now really lies in you learning how to create a business model behind that app- not in creating the piece of software that is the app itself sure, there will be templates for how you can interact with those apps/APIs, but your personal AI will pick one and tailor it even further for you. and a lot of the time, you won't even need to interact with a UI beyond speaking with your AI assistant let me give you an example: would you rather use an app like Uber or Uber Eats, or would you rather just ask your AI assistant to get you a ride somewhere or to show you menus for the type of food you might be interested in and you pick one? the value in apps like that is not in the app installed on your phone. it's in the backend business model which connects the customer with providers. and personal AI assistants actually open the door to you being able to seamlessly use multiple business APIs without worrying in the slightest about which app or intermediate provider they come from there is a decent chance apps as you know them will be mostly dead in ~5-10 years and yes, there are some apps which will still require deep optimization and that is where the hardcore coders may still be needed. but machines will get better at that, and if you take one look at the AAA gaming landscape, you should understand that hyper-optimized code isn't as valuable as it used to be but what will be valuable is owning the APIs with the most use and liquidity. and yes, a lot of those will use public blockchains things are going to accelerate and get very weird very quickly from here

roon @tszzl · Feb 4

it’s just so clear humans are the bottleneck to writing software. number of agents we can manage, information flow, state management. there will just be no centaurs soon as it is not a stable state

Melvyn • Builder @melvynxdev · Feb 4

If Anthropic releases a new Opus model with 1 million Context Window (the only real limitation of Opus for now), it would resolve 99.99% of every software engineering problems you can imagine.

M M1Astra @M1Astra

Claude Opus 4.6 has been spotted. This is separate from the misinterpretation of Sonnet 5 information that led people to definitively assert the release date was this Tuesday.

Jarred Sumner @jarredsumner · Feb 4

@adamdotdev This “adult in the room” framing is pretty rude to the Claude Code team that built a product hitting $1B run-rate revenue faster than probably anything in history. Bun made like $2.50 total (stickers). Engineering is relative to time & tradeoffs & they made fantastic tradeoffs

Claude @claudeai · Feb 4

Claude is built to be a genuinely helpful assistant for work and for deep thinking. Advertising would be incompatible with that vision. Read why Claude will remain ad-free: https://t.co/Dr8FOJxINC

TestingCatalog News 🗞 @testingcatalog · Feb 4

BREAKING 🚨: Anthropic declared a plan for Claude to remain ad-free. “Claude is built to be a genuinely helpful assistant for work and for deep thinking. Advertising would be incompatible with that vision.” https://t.co/8VAkDVj8hK

C claudeai @claudeai

Claude is built to be a genuinely helpful assistant for work and for deep thinking. Advertising would be incompatible with that vision. Read why Claude will remain ad-free: https://t.co/Dr8FOJxINC

Tom Warren @tomwarren · Feb 4

Anthropic just took a big swipe at OpenAI's decision to put ads in ChatGPT. Anthropic is airing ads mocking ChatGPT ads during the Super Bowl, and they're hilarious 😅 Anthropic is also committing to no ads in Claude https://t.co/LR1v4xz9ds https://t.co/PXoaZtmCWA

Ryan Carson @ryancarson · Feb 4

Shots fired on OpenAI's incoming ads. The Super Bowl ad is hilarious too. https://t.co/GnnT3ZqpY6 I'm glad they're doing this.

C claudeai @claudeai

Claude is built to be a genuinely helpful assistant for work and for deep thinking. Advertising would be incompatible with that vision. Read why Claude will remain ad-free: https://t.co/Dr8FOJxINC

aviel @aviel · Feb 4

Look, I hate to come across as an alarmist but we have finally crossed the chasm and from my vantage point are experiencing a classic "slowly and then all at once" situation. Especially in Seattle. Treat this more as a wakeup call than anything else. Here are the facts. 1. In Seattle there are A LOT less tech jobs than there were even just a few years ago. https://t.co/QR77XxIXuw 2. Your city, state, AND STARTUPS are NOT coming to the rescue. https://t.co/jiylebCDDJ 3. LLMs have irreversibly changed the way that we do just about everything in tech. Even in the past month. If you aren't IN THE WEEDs on a daily basis you have no idea what you are even talking about. When 80% of LLM skeptics on LinkedIn have "Open to Work" with "Software Architect" or some similar inflated title on their bio it's more than just a passing "trend". I talk to a lot of people every week. And I mean A LOT. Over the past few weeks the gravity of financial realities has started to set in. Unless you have S-tier social skills, you aren't going to get that salary again with your current skillset. So no, you can't actually afford your mortgage. Oh, and you also probably needed to realize this 12 months ago because you've already irreversibly dipped into your savings utilizing hope as a strategy. Oh and to add insult to injury, prices of everything are going up at the same time: https://t.co/1sz4ZzYAG2 I do not have advice for you if you're in this spot, you're in deep shit and I'm fighting on too many fronts at this point. But if you aren't there yet, my advice is to reset your expectations. You are not mid or late-career, you are just getting started. If you can stomach that I have some REALLY good news for you. The future looks awesome and you're going to do something great.

A aviel @aviel

If you work in tech in 2026, you’re either at the beginning of your career or at the end of it. If you’re acting like you’re anywhere else I’m sorry to tell you but you’re actually at the end. This holds for VCs too.

Visual Studio Code @code · Feb 4

You told us you’re running multiple AI agents and wanted a better UX. We listened and shipped it! Here’s what’s new in the latest @code release: 🗂️ Unified agent sessions workspace for local, background, and cloud agents 💻 Claude and Codex support for local and cloud agents 🔀 Parallel subagents 🌐 Integrated browser And more...

Pierce Boggan @pierceboggan · Feb 4

VS Code is now your home for coding agents! By far, our biggest update in a long time. Give it a try, and let us know what you think :)

C code @code

CG @cgtwts · Feb 4

literally one of the best ads i've ever seen anthropic is cooking OpenAI big time

C claudeai @claudeai

https://t.co/jEWDjs30kf