Claude Code Spawns a Harness Ecosystem as Karpathy Programs an AI Research Org

February 27, 2026 · 10 source posts

Daily Wrap-Up

The big story today isn't any single announcement. It's the emerging pattern of developers building their own orchestration layers on top of Claude Code rather than waiting for Anthropic to ship features. We're seeing headless setups, custom harnesses, Obsidian integrations, and plugin ecosystems all materialize in parallel. This is what adoption looks like when the underlying model is good enough that the bottleneck shifts from capability to workflow integration. The Claude Code CLI has become a substrate, not a product, and today's posts make that unmistakably clear.

Karpathy's thread on running eight agents as a research org was the most technically interesting content of the day. His honest assessment that agents generate bad experimental ideas even at maximum intelligence is worth internalizing. They'll implement anything you describe precisely, but they won't design a rigorous ablation study or catch confounders in their own results. His framing of "programming an organization" with prompts, skills, and processes as the new source code is exactly the direction the agent harness community is moving, and it's encouraging to see someone of his caliber validating the approach while being transparent about how far it has to go.

On the workforce side, the rhetoric keeps escalating. Reports of YC founders planning to eliminate all engineering roles below staff level sit alongside claims about Anthropic's CEO predicting 50% displacement of lawyers, consultants, and finance professionals. Whether these claims are exaggerated or directionally correct, the signal is consistent: the professional class is waking up to the possibility that AI doesn't just automate blue-collar work. The most practical takeaway for developers: if you're not already building with agent harnesses and multi-agent workflows, start now. Karpathy's thread is a blueprint for the kind of experimental setup you should be running, even if the agents aren't fully autonomous yet. The value is in learning to program organizations, not just software.

Quick Hits

@sama announced OpenAI has raised a $110 billion round from Amazon, NVIDIA, and SoftBank. That's not a typo. The capital concentration in frontier AI is now at sovereign wealth fund scale.

@UnslothAI updated Qwen3.5 with improved tool-calling and coding performance. Qwen3.5-35B-A3B now runs on 22GB RAM, with benchmarks across Claude Code and Codex.

@theo flagged that the government is trying to force Anthropic to remove Claude's safety guards, calling it "probably very bad." Policy pressure on AI safety continues to escalate.

@cryptopunk7213 shared a story about someone spinning up an AI agent to lowball sellers on Facebook Marketplace, scoring a Jeep Wrangler for $1,500 and free PS5s and TVs. Marketplace arbitrage agents are here.

@pvncher launched RepoPrompt 2.0 as a fully integrated agent with built-in oracle and context builder, showcasing how much better agents perform with good context engineering tools.

@jackfriks captured the vibe of the moment perfectly: "cracks knuckles 'claude, read this article and implement all of its advice' retires"

@nicdunz offered a philosophical take: "prompting LLMs is, in a way, similar to using the search bar on the library of babel website." It's a surprisingly apt metaphor for the retrieval-from-latent-space nature of generation.

@TheBronxViking retweeted @BillyM2k's "how to run a company in 2026," which at this point probably involves fewer humans than a 2016 startup's founding team.

@alancarroII posted the obligatory meme about plumbers and electricians watching AI replace everyone who went to college. The trades-vs-knowledge-work inversion narrative continues to gain traction.

Claude Code's Harness Ecosystem Takes Shape

Something interesting is happening in the Claude Code community: developers are increasingly treating the CLI as a foundation to build on rather than a finished product. Today's posts paint a picture of an ecosystem fragmenting in productive ways, with users building custom harnesses, plugins, and integrations that extend Claude Code's capabilities far beyond what ships in the box.

@alxfazio urged developers to be "headless claude maxxing," pointing to an article that apparently explains the pattern better than Anthropic's own docs. Running Claude Code headless, without the interactive terminal UI, unlocks programmatic orchestration that's impossible in the default interactive mode. Meanwhile, @Jaytel declared they're "done with Claude Code" entirely, finding that "building your own harness in Pi is addicting." This isn't a rejection of Claude as a model. It's a rejection of the default interface in favor of something custom-tailored.

The plugin side is evolving too. @affaanmustafa highlighted how easy it is to add Claude Code plugins through Cowork's interface, noting they use "a bit of everything at this point, mainly to check how things work across harnesses." And @noahvnct shared a guide on building an "AI Second Brain Using Obsidian + Claude Code," connecting the coding agent to a knowledge management system. The common thread is that power users want Claude Code integrated into their existing workflows, not the other way around.

@trq212 shared "Lessons from Building Claude Code: Seeing like an Agent," which frames the design philosophy from Anthropic's perspective. The title itself is revealing: the challenge isn't just making a good model, it's making the model see the world the way an effective agent needs to. As the harness ecosystem matures, we're seeing the community answer that question from the other direction, building the scaffolding that helps agents see like developers actually work.

AI Reshapes Professional Work

The workforce disruption conversation took a sharper turn today, moving from abstract predictions to concrete reports of action. @jeffdfeng dropped what might be the most unsettling post of the day: "Spoke with several YC founders planning to lay off all engineers below staff/principal, basically everyone under L5. This only became viable after Opus 4.5 in December." He framed the Block layoffs as a signal that "the floor just collapsed" and advised early-career engineers that "your edge will be how well you integrate AI into the value you create."

Whether these specific claims hold up to scrutiny is less important than the sentiment they represent. The idea that AI coding agents can replace junior and mid-level engineers is now a planning assumption at funded startups, not a thought experiment. Combined with @cgtwts sharing Anthropic CEO Dario Amodei's prediction that "AI will wipe out 50% of lawyers, consultants, and finance professionals within the next 12 months," the message is consistent across industries: the professional class is in the crosshairs.

But the picture isn't purely dystopian. Two posts from the legal world show professionals leaning into AI rather than being displaced by it. @garthwatson, a non-practicing lawyer who built a mobile app with Claude Code, called it "signal" for legal tech, noting his experience founding and scaling a legal tech company. And @zackbshapiro detailed how he's increasingly using Claude as his primary tool in legal practice, not specialized legal AI products like Harvey or CoCounsel, but "a general-purpose AI that I've taught how I practice law." This distinction matters. The professionals who survive the disruption won't be the ones waiting for industry-specific AI tools. They'll be the ones who learn to work directly with general-purpose models and shape them to their domain. The gap between "AI will take your job" and "AI will transform your job" often comes down to whether you're building your own workflows or waiting for someone else to build them for you.

Karpathy Programs an AI Research Organization

Andrej Karpathy shared the most technically substantive post of the day: a detailed account of running eight AI agents (four Claude, four Codex) as a research organization working on nanochat experiments. The setup is ambitious, with each agent getting a GPU, running on git branches with worktree isolation, communicating through simple files, and visible through tmux window grids. The goal: delete logit softcap from the model without regression.

The result? "The TLDR is that it doesn't work and it's a mess... but it's still very pretty to look at." What makes this post valuable is Karpathy's precise diagnosis of why it fails. The agents' ideas are "pretty bad out of the box, even at highest intelligence. They don't think carefully through experiment design, they run a bit non-sensical variations, they don't create strong baselines and ablate things properly." His example is perfect: an agent "discovered" that increasing hidden size improves validation loss, a totally spurious result that conflates model capacity with actual improvement when training time isn't controlled.

But the conceptual frame is where things get interesting. Karpathy describes the work as "programming an organization" where the source code is "the collection of prompts, skills, tools, etc. and processes that make it up. E.g. a daily standup in the morning is now part of the 'org code.'" The evaluation metric becomes: "given an arbitrary task, how quickly does your research org generate progress on it?" This maps directly onto what the Claude Code harness builders are doing at a smaller scale, except Karpathy is applying it to ML research rather than software engineering.

@nummanali picked up a related thread, highlighting Middleman, a tool from the creator of dev-browser that gives you "a single persistent manager agent per project." The pitch captures the emerging consensus: "You're not an IC anymore. You've become a project manager. You need a middle manager." Whether it's Karpathy orchestrating eight researchers or a solo developer managing a Middleman instance, the pattern is the same. The human's role is shifting from doing the work to designing the system that does the work. The question is whether that system can generate genuinely good ideas, or just execute the ones you hand it.

Source Posts

Unsloth AI @UnslothAI · Yesterday

Qwen3.5 is now updated with improved tool-calling & coding performance! Run Qwen3.5-35B-A3B on 22GB RAM. See improvements via Claude Code, Codex. We also benchmarked GGUFs & removed MXFP4 layers from 3 quants. GGUFs: https://t.co/4lSce5zZbO Analysis: https://t.co/rHZK8JWdYM

Sam Altman @sama · Yesterday

We have raised a $110 billion round of funding from Amazon, NVIDIA, and SoftBank. We are grateful for the support from our partners, and have a lot of work to do to bring you the tools you deserve.

CG @cgtwts · Yesterday

Anthropic CEO: “AI will wipe out 50% of lawyers, consultants, and finance professionals within the next 12 months” https://t.co/fkuBs6VfhD

C Claude @claudeai

We've also created plugins across HR, design, engineering, ops, financial analysis, investment banking, equity research, private equity, and wealth management to help users see what's possible and start building their own.

Thariq @trq212 · Yesterday

Lessons from Building Claude Code: Seeing like an Agent

One of the hardest parts of building an agent harness is constructing its action space. Claude acts through Tool Calling, but there are a number of wa...

Alan Carroll @alancarroII · Yesterday

Plumbers and electricians seeing AI replace everyone who went to college https://t.co/CgvnlfVlO7

Jeff @jeffdfeng · 2d ago

Spoke with several YC founders planning to lay off all engineers below staff/principal — basically everyone under L5. This only became viable after Opus 4.5 in December. The Block layoffs are a signal: the floor just collapsed. If you’re early in your career, the next few years are everything. Your edge will be how well you integrate AI into the value you create. The fastest learners are about to compound at absurd rates.

j jack @jack

we're making @blocks smaller today. here's my note to the company. #### today we're making one of the hardest decisions in the history of our company: we're reducing our organization by nearly half, from over 10,000 people to just under 6,000. that means over 4,000 of you are being asked to leave or entering into consultation. i'll be straight about what's happening, why, and what it means for everyone. first off, if you're one of the people affected, you'll receive your salary for 20 weeks + 1 week per year of tenure, equity vested through the end of may, 6 months of health care, your corporate devices, and $5,000 to put toward whatever you need to help you in this transition (if you’re outside the U.S. you’ll receive similar support but exact details are going to vary based on local requirements). i want you to know that before anything else. everyone will be notified today, whether you're being asked to leave, entering consultation, or asked to stay. we're not making this decision because we're in trouble. our business is strong. gross profit continues to grow, we continue to serve more and more customers, and profitability is improving. but something has changed. we're already seeing that the intelligence tools we’re creating and using, paired with smaller and flatter teams, are enabling a new way of working which fundamentally changes what it means to build and run a company. and that's accelerating rapidly. i had two options: cut gradually over months or years as this shift plays out, or be honest about where we are and act on it now. i chose the latter. repeated rounds of cuts are destructive to morale, to focus, and to the trust that customers and shareholders place in our ability to lead. i'd rather take a hard, clear action now and build from a position we believe in than manage a slow reduction of people toward the same outcome. a smaller company also gives us the space to grow our business the right way, on our own terms, instead of constantly reacting to market pressures. a decision at this scale carries risk. but so does standing still. we've done a full review to determine the roles and people we require to reliably grow the business from here, and we've pressure-tested those decisions from multiple angles. i accept that we may have gotten some of them wrong, and we've built in flexibility to account for that, and do the right thing for our customers. we're not going to just disappear people from slack and email and pretend they were never here. communication channels will stay open through thursday evening (pacific) so everyone can say goodbye properly, and share whatever you wish. i'll also be hosting a live video session to thank everyone at 3:35pm pacific. i know doing it this way might feel awkward. i'd rather it feel awkward and human than efficient and cold. to those of you leaving…i’m grateful for you, and i’m sorry to put you through this. you built what this company is today. that's a fact that i'll honor forever. this decision is not a reflection of what you contributed. you will be a great contributor to any organization going forward. to those staying…i made this decision, and i'll own it. what i'm asking of you is to build with me. we're going to build this company with intelligence at the core of everything we do. how we work, how we create, how we serve our customers. our customers will feel this shift too, and we're going to help them navigate it: towards a future where they can build their own features directly, composed of our capabilities and served through our interfaces. that's what i'm focused on now. expect a note from me tomorrow. jack

cogsec @affaanmustafa · Yesterday

If you're a cowork user - its super duper easy to add as a plugin! I use a bit of everything at this point mainly to check how things work across harnesses but coworks plugin interface is super duper easy! get started in 30 seconds! cmd -> affaan-m/everything-claude-code https://t.co/D2yCymO53G

c cogsec @affaanmustafa

The Codex App is still heavily slept on if you aren't using ECC for Codex you're missing out Its super easy and pulls all the skills over Most peoples development related openclaw automations can also just be directly ran from codex I ported a lot of my automations over https://t.co/oCZRV3cvKb

Andrej Karpathy @karpathy · Yesterday

I had the same thought so I've been playing with it in nanochat. E.g. here's 8 agents (4 claude, 4 codex), with 1 GPU each running nanochat experiments (trying to delete logit softcap without regression). The TLDR is that it doesn't work and it's a mess... but it's still very pretty to look at :) I tried a few setups: 8 independent solo researchers, 1 chief scientist giving work to 8 junior researchers, etc. Each research program is a git branch, each scientist forks it into a feature branch, git worktrees for isolation, simple files for comms, skip Docker/VMs for simplicity atm (I find that instructions are enough to prevent interference). Research org runs in tmux window grids of interactive sessions (like Teams) so that it's pretty to look at, see their individual work, and "take over" if needed, i.e. no -p. But ok the reason it doesn't work so far is that the agents' ideas are just pretty bad out of the box, even at highest intelligence. They don't think carefully though experiment design, they run a bit non-sensical variations, they don't create strong baselines and ablate things properly, they don't carefully control for runtime or flops. (just as an example, an agent yesterday "discovered" that increasing the hidden size of the network improves the validation loss, which is a totally spurious result given that a bigger network will have a lower validation loss in the infinite data regime, but then it also trains for a lot longer, it's not clear why I had to come in to point that out). They are very good at implementing any given well-scoped and described idea but they don't creatively generate them. But the goal is that you are now programming an organization (e.g. a "research org") and its individual agents, so the "source code" is the collection of prompts, skills, tools, etc. and processes that make it up. E.g. a daily standup in the morning is now part of the "org code". And optimizing nanochat pretraining is just one of the many tasks (almost like an eval). Then - given an arbitrary task, how quickly does your research org generate progress on it?

T Thomas Wolf @Thom_Wolf

How come the NanoGPT speedrun challenge is not fully AI automated research by now?

Jaytel @Jaytel · Yesterday

I'm done with Claude Code— building your own harness in Pi is addicting

t tobi lutke @tobi

Pi is the most interesting agent harness. Tiny core, able to write plugins for itself as you use it. It RLs itself into the agent you want. I was missing cc’s tasks system and told it to spawn clause in tmux and interrogate it about it and make an implementation for itself. It nailed it, including the UX. Clawdbot is based on it and now it makes sense why it feels so magical. Dawn of the age of malleable software.

Garth Watson @garthwatson · Yesterday

As a non-practising lawyer that just used Claude Code to build a mobile app, and having founded and scaled a legal tech company, and been heavily involved in the legaltech scene, I just wanna say this is signal.

Z Zack Shapiro @zackbshapiro

The Claude-Native Law Firm