Anthropic's Ad-Free Super Bowl Stance Sparks Industry Debate as VS Code Ships Major Agent Update

February 5, 2026 · 27 sources

Anthropic committed Claude to being permanently ad-free while running Super Bowl ads mocking ChatGPT's decision to show ads. Karpathy reflected on one year of vibe coding and proposed "agentic engineering" as the professional evolution. VS Code shipped a massive agent-focused release with unified sessions, parallel subagents, and multi-model support.

Daily Wrap-Up

The Super Bowl brought an unexpected AI storyline that dominated the timeline: Anthropic went on the offensive against OpenAI's decision to introduce advertising into ChatGPT, airing mock ads during the big game and formally committing to keeping Claude ad-free. The PR play worked almost too well. OpenAI's lengthy official response drew more attention to Anthropic's framing than the original ads did, with multiple commentators noting the Streisand effect in real time. It was a masterclass in asymmetric competitive positioning, and a reminder that in the AI race, product philosophy is becoming as important as model benchmarks.

Meanwhile, the developer tooling landscape had a quietly significant day. VS Code dropped what multiple people called its biggest update in a long time, built entirely around making agent workflows first-class citizens in the editor. Unified agent sessions, parallel subagents, an integrated browser, and support for both Claude and Codex agents. At the same time, Karpathy marked the one-year anniversary of coining "vibe coding" by acknowledging it has evolved from weekend hacking into professional practice, proposing "agentic engineering" as the more accurate term for 2026. The convergence of these two developments tells a clear story: the tools are catching up to the workflow that practitioners have been cobbling together for months.

The most entertaining moment was easily @synthwavedd's claim that Anthropic "keeps having to delay Sonnet 5 because every time they go to deploy it, it has a meltdown and tries to blow things up at Anthropic HQ using Claude Code." Followed closely by @it_unprofession's devastating satire of corporate AI strategy theater, where a company's entire "AI dependency matrix" turns out to be Claude for meeting summaries and whatever Gmail does with autocomplete. The most practical takeaway for developers: VS Code's new agent workspace and Claude Code's /insights command both shipped today. If you're running multiple agents or want to understand your own coding patterns better, update your tools before your next session.

Quick Hits

@MistralAI launched Voxtral Transcribe 2 with state-of-the-art speech-to-text, speaker diarization, and sub-200ms real-time latency.
@OpenAIDevs announced GPT-5.2 and GPT-5.2-Codex are now 40% faster through inference stack optimization. Same weights, lower latency.
@felixrieseberg shipped GSuite connectors for Cowork, letting Claude work with Gmail, Calendar, and Google Drive.
@cjpedregal announced Granola now has an MCP that works with ChatGPT, Claude, and other tools.
@trq212 says Slack integration in Cowork is saving significant time for first-pass document drafting.
@adxtyahq highlighted the Claude Startup Program offering up to ~$25K in API credits with no VC requirement.
@TheAhmadOsman went scorched-earth on Ollama, calling it slower than alternatives on every platform and recommending LM Studio, llama.cpp, or vLLM instead.
@SawyerMerritt shared Tesla VP Ashok Elluswamy's take that self-driving is "an AI problem, not a sensor problem" from the 2026 ScaledML Conference.
@SawyerMerritt also teased a new Elon Musk interview claiming "the most economically compelling place to put AI will be in space" within 30 months.
@cryptopunk7213 praised Ashok Elluswamy as "best hire in tech history" for his ability to explain complex AI systems simply.
@lydiahallie joked about Myers-Briggs personality types but for Claude Code users.
@dmwlff offered the day's most concise wisdom: "Vibe coder: know thyself."
@synthwavedd claimed an Anthropic source says Sonnet 5 keeps getting delayed because it "has a meltdown and tries to blow things up" using Claude Code. @moztlab asked if they tried turning it off and on again.
@ServiceCloud promoted Agentforce for IT service ticket resolution.

Agentic Engineering and the Human Bottleneck

One year ago, Karpathy fired off a "shower thoughts throwaway tweet" that accidentally minted the term "vibe coding." Today he reflected on how it's evolved from a name for fun weekend projects into something that needed a more serious label. His current favorite: "agentic engineering," with "agentic" emphasizing that you're orchestrating agents rather than writing code directly, and "engineering" emphasizing that there's real skill involved. As @karpathy put it:

> "The goal is to claim the leverage from the use of agents but without any compromise on the quality of the software... there is an art & science and expertise to it. It's something you can learn and become better at, with its own depth of a different kind."

But the human side of this transition is getting real. @tbpn shared that members of Pragmatic Engineer's "secret email list" of agentic coders are reporting sleep disruption and burnout from managing agent swarms. The phrase "like a vampire" came up, with people napping during the day to keep their agents productive at night. @joshclemm added the psychological dimension: "you have this feeling you need to keep them busy and productive at all times, otherwise you're wasting time or your monthly credits." @NickADobos offered the counterpoint framing that "when agents execute 100 steps instead of 10, your role becomes more important, not less."

On the practical side, @Khaliqgant published lessons from six weeks of multi-agent orchestration, which @mattshumer_ endorsed as essentially "Sparknotes for coding with a team of agents managing other agents." @FelixCraftAI shared a more pragmatic approach: skip subagents entirely, run Codex CLI in a loop with a PRD checklist, fresh context each iteration. Claims to have shipped 108 tasks in 4 hours using this pattern. And @tszzl pushed the philosophical envelope further: "it's just so clear humans are the bottleneck to writing software... there will just be no centaurs soon as it is not a stable state." Whether that's prophecy or hyperbole, the tension between human oversight and agent autonomy is clearly the defining question of the moment.

Anthropic's Super Bowl Ad Blitz

Anthropic chose the biggest stage in American advertising to make a philosophical statement about AI business models. The company aired ads during the Super Bowl directly mocking OpenAI's decision to introduce advertising into ChatGPT, while simultaneously publishing a formal commitment that Claude will remain ad-free. @claudeai framed it plainly:

> "Claude is built to be a genuinely helpful assistant for work and for deep thinking. Advertising would be incompatible with that vision."

The ads landed hard. @cgtwts called them "literally one of the best ads I've ever seen," while @ryancarson and @tomwarren both highlighted the strategic brilliance of the timing. But the real story became OpenAI's response. @signulll provided the sharpest post-mortem, noting that OpenAI's detailed rebuttal "reads like it was assembled in a war room by committee" and arguing the optimal response was either silence or "a single graph of active users with the caption 'lol.'" Instead, "by responding seriously, OpenAI elevated the ads, accepted Anthropic's premise, and turned a joke into real discourse."

@___frye made the subtler observation that what made the ads resonate wasn't the ad-free message itself but "how they've captured the chipper, clipped, empty-eyed cadence of the GPT 5 models." It's a reminder that Anthropic's competitive strategy increasingly operates on two levels: the product level (no ads, safety commitments) and the cultural level (understanding how the developer community actually talks and thinks about these tools). Whether this translates to market share remains an open question, but for one Super Bowl Sunday, Anthropic controlled the narrative completely.

VS Code Goes All-In on Agents

VS Code shipped what multiple people called its most significant update in a long time, and the theme was unmistakable: agents are now first-class citizens in the editor. @code announced unified agent sessions for local, background, and cloud agents, plus Claude and Codex support, parallel subagents, and an integrated browser. @pierceboggan summarized it as "VS Code is now your home for coding agents."

The update also brought model flexibility. @pierceboggan noted you can now use OpenAI's Codex or Anthropic's Claude agent directly in VS Code with a GitHub Copilot subscription, framing it as "VS Code gives you choice: be it your models or agent harness." @burkeholland discovered you can even have models call each other, with Opus, Codex, and Gemini collaborating in the same chat session.

Not everyone was celebrating GitHub's AI strategy, though. @GergelyOrosz pointed out that Copilot's decision to keep a weaker default model is actively driving teams away, calling it a self-inflicted wound where "defaults matter and most devs don't bother switching." @awakecoding echoed this, arguing that during initial adoption "it's critically important that developers succeed before they begin optimizing their quota usage." The VS Code shell is clearly best-in-class now for agent workflows, but the model defaults underneath it remain a point of friction that Microsoft needs to address.

Claude Code: Revenue Rocket and New Features

Claude Code had a strong day on multiple fronts. @trq212 announced the new /insights command, which reads your message history from the past month to summarize your projects, usage patterns, and workflow improvement suggestions. @AlexTamkin, who worked on the feature, encouraged users to try it in the newest version.

The business case got attention too. @jarredsumner pushed back on criticism of Claude Code's engineering, noting it's "hitting $1B run-rate revenue faster than probably anything in history" and defending the team's tradeoffs as "fantastic" given time constraints. @melvynxdev argued that a new Opus model with a 1 million token context window "would resolve 99.99% of every software engineering problems you can imagine," highlighting context length as the primary remaining limitation for deep coding workflows. Whether or not that's literally true, the sentiment reflects how close the current tools feel to a tipping point where context, not capability, is the binding constraint.

The AI Career Reality Check

The most sobering thread of the day came from @aviel, who argued we've "finally crossed the chasm" in the tech job market, especially in Seattle. Fewer jobs, rising costs, and LLMs that have "irreversibly changed the way we do just about everything in tech." His advice: "reset your expectations. You are not mid or late-career, you are just getting started."

@it_unprofession provided the comedic counterpoint with a devastating portrait of corporate AI theater, where executives panic about "AI dependency matrices" that don't exist and a company's entire AI strategy amounts to "Claude for meeting summaries and whatever Gmail does when it autocompletes my sentences." @DCinvestor took the longest view, arguing that consumer apps themselves are a transitional form. The real future is "everything becomes an API which your personal AI agent can interact with," making the business model behind the app more valuable than the app itself. All three posts, from different angles, point to the same conclusion: the ground is shifting fast, and the skills that matter are shifting with it.

Sources

Felix Lee @felixleezd · Feb 3

Claude Code Guide for Designers

Claude Code is the highest leverage skill you can learn this year. The future of design is here. I've written a full guide for this. Story time... I...

Thariq @trq212 · Feb 3

You can now connect to Claude in Chrome using the VS Code extension. Use it to debug frontend apps, collect data or automate your browser. Install the extension and type @ browser to get started. https://t.co/ijo19uK9NZ

Sawyer Merritt @SawyerMerritt · Feb 4

Ashok Elluswamy, VP of AI at @Tesla on self-driving: "It's so obvious you can solve this with cameras. Why wouldn't you solve with cameras? It's 2026. The self-driving problem is not a sensor problem, it's an AI problem. The cameras have enough information already. It's a problem of extracting the information, which is an AI problem." (via @aelluswamy's presentation at the 2026 ScaledML Conference on January 29th)

I ianteetzel @ianteetzel

Ashok Elluswamy, VP of AI at Tesla, discusses building end-to-end foundational models for self driving at the 2026 ScaledML Conference presented by Matroid. https://t.co/ARnrJ7kmmj

leo 🐾 @synthwavedd · Feb 4

anthropic source tells me they keep having to delay sonnet 5 because every time they go to deploy it it has a meltdown and tries to blow things up at anthropic hq using claude code

Melih @moztlab · Feb 4

@synthwavedd Did they try to turn it off and on again ?

Mistral AI @MistralAI · Feb 4

Introducing Voxtral Transcribe 2, next-gen speech-to-text models by @MistralAI. State-of-the-art transcription, speaker diarization, sub-200ms real-time latency. Details in 🧵 https://t.co/0IeiJOpiAZ

Marc-André Moreau @awakecoding · Feb 4

@normandeveloper @GergelyOrosz @pierceboggan 'auto' is now selected by default, but I'd rather have a way to change the default selected model organization-wide to something like Claude Sonnet 4.5. During initial adoption, it's critically important that developers succeed *before* they begin optimizing their quota usage

Thariq @trq212 · Feb 4

Slack in Cowork has saved me SO MUCH time I use it to make a first pass of every doc based on what I've said in Slack

L lydiahallie @lydiahallie

Claude Cowork now supports the Slack MCP on all paid plans! The Slack connector is by far my favorite feature. I use it every morning to catch up on what I missed, highlight important messages, and draft replies for me to review before sending. Huge time saver. https://t.co/nQsu9VLVAG

Nick Dobos @NickADobos · Feb 4

“when agents execute 100 steps instead of 10, your role becomes more important, not less.” Welcome to the age of leverage

R ryolu_ @ryolu_

software is still about thinking software has always been about taking ambiguous human needs and crystallizing them into precise, interlocking systems. the craft is in the breakdown: which abstractions to create, where boundaries should live, how pieces communicate. coding with ai today creates a new trap: the illusion of speed without structure. you can generate code fast, but without clear system architecture – the real boundaries, the actual invariants, the core abstractions – you end up with a pile that works until it doesn't. it's slop because there's no coherent mental model underneath. ai doesn't replace systems thinking – it amplifies the cost of not doing it. if you don't know what you want structurally, ai fills gaps with whatever pattern it's seen most. you get generic solutions to specific problems. coupled code where you needed clean boundaries. three different ways of doing the same thing because you never specified the one way. as Cursor handles longer tasks, the gap between "vaguely right direction" and "precisely understood system" compounds exponentially. when agents execute 100 steps instead of 10, your role becomes more important, not less. the skill shifts from "writing every line" to "holding the system in your head and communicating its essence": - define boundaries – what are the core abstractions? what should this component know? where does state live? - specify invariants – what must always be true? what are the constants and defaults that make the system work? - guide decomposition – how should this break down? what's the natural structure? what's stable vs likely to change? - maintain coherence – as ai generates more code, you ensure it fits the mental model, follows patterns, respects boundaries. this is what great architects and designers do: they don't write every line, but they hold the system design and guide toward coherence. agents are just very fast, very literal team members. the danger is skipping the thinking because ai makes it feel optional. people prompt their way into codebases they don't understand. can't debug because they never designed it. can't extend because there's no structure, just accumulated features. people who think deeply about systems can now move 100x faster. you spend time on the hard problem – understanding what you're building and why – and ai handles mechanical translation. you're not bogged down in syntax, so you stay in the architectural layer longer. the future isn't "ai replaces programmers" or "everyone can code now." it's "people who think clearly about systems build incredibly fast, and people who don't generate slop at scale." the skill becomes: holding complexity, breaking it down cleanly, communicating structure precisely. less syntax, more systems. less implementation, more architecture. less writing code, more designing coherence. humans are great at seeing patterns, understanding tradeoffs, making judgment calls about how things should fit together. ai can't save you from unclear thinking – it just makes unclear thinking run faster.

Khaliq Gant @Khaliqgant · Feb 4

Let Them Cook: Lessons from 6 Weeks of Multi-Agent Orchestration

I've been building Agent Relay (@agent_relay) using Agent Relay. Agents coordinating to build the tool that lets them coordinate. It's recursive and I...

Ejaaz @cryptopunk7213 · Feb 4

i fucking love this guy. best hire in tech history ashok is one of those geniuses that can explain why complicated shit works incredibly simply in this case AI in self-driving cars the best part is AI will result in more super nerds in exec positions because models will do all the work and THATS A GOOD THING you want your exec to know how the damn thing works AND sell the vision, if you think about it they’re the best person to do that anyway. brb buying a tesla

S SawyerMerritt @SawyerMerritt

Andrej Karpathy @karpathy · Feb 4

A lot of people quote tweeted this as 1 year anniversary of vibe coding. Some retrospective - I've had a Twitter account for 17 years now (omg) and I still can't predict my tweet engagement basically at all. This was a shower of thoughts throwaway tweet that I just fired off without thinking but somehow it minted a fitting name at the right moment for something that a lot of people were feeling at the same time, so here we are: vibe coding is now mentioned on my Wikipedia as a major memetic "contribution" and even its article is longer. lol The one thing I'd add is that at the time, LLM capability was low enough that you'd mostly use vibe coding for fun throwaway projects, demos and explorations. It was good fun and it almost worked. Today (1 year later), programming via LLM agents is increasingly becoming a default workflow for professionals, except with more oversight and scrutiny. The goal is to claim the leverage from the use of agents but without any compromise on the quality of the software. Many people have tried to come up with a better name for this to differentiate it from vibe coding, personally my current favorite "agentic engineering": - "agentic" because the new default is that you are not writing the code directly 99% of the time, you are orchestrating agents who do and acting as oversight. - "engineering" to emphasize that there is an art & science and expertise to it. It's something you can learn and become better at, with its own depth of a different kind. In 2026, we're likely to see continued improvements on both the model layer and the new agent layer. I feel excited about the product of the two and another year of progress.

K karpathy @karpathy

There's a new kind of coding I call "vibe coding", where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. It's possible because the LLMs (e.g. Cursor Composer w Sonnet) are getting too good. Also I just talk to Composer with SuperWhisper so I barely even touch the keyboard. I ask for the dumbest things like "decrease the padding on the sidebar by half" because I'm too lazy to find it. I "Accept All" always, I don't read the diffs anymore. When I get error messages I just copy paste them in with no comment, usually that fixes it. The code grows beyond my usual comprehension, I'd have to really read through it for a while. Sometimes the LLMs can't fix a bug so I just work around it or ask for random changes until it goes away. It's not too bad for throwaway weekend projects, but still quite amusing. I'm building a project or webapp, but it's not really coding - I just see stuff, say stuff, run stuff, and copy paste stuff, and it mostly works.

Ahmad @TheAhmadOsman · Feb 4

just a gentle reminder that nobody should use ollama > slower than llama.cpp on windows > slower than mlx on mac > slop useless wrapper > literal code thieves alternatives? > lmstudio > llama.cpp > exllamav2/v3 > vllm > sglang like literally anythingʼs better than ollama lmao

M MangoSweet78 @MangoSweet78

Fucking killed them Lmao. https://t.co/FVFUA2BXor

TBPN @tbpn · Feb 4

Pragmatic Engineer's @GergelyOrosz is on a "secret email list" of agentic AI coders, and they're starting to report trouble sleeping because agent swarms are "like a vampire." "A lot of people who are in 'multiple agents mode,' they're napping during the day... It just really is draining." "This thing is like a vampire. It drains you out. You have trouble sleeping."

Josh Clemm @joshclemm · Feb 4

@tbpn @GergelyOrosz He's not wrong. With agents, you have this feeling you need to keep them busy and productive at all times, otherwise your "wasting time" or your monthly credits...

Thariq @trq212 · Feb 4

We've added a new command to Claude Code called /insights When you run it, Claude Code will read your message history from the past month. It'll summarize your projects, how you use Claude Code, and give suggestions on how to improve your workflow. https://t.co/xK7eN0qdB4

Roblox @Roblox · Feb 5

In our research lab, we are building “real-time dreaming” - the ability to generate fully playable video worlds prompted from any text or image. Our real-time, action conditioned world model (currently running internally at 16fps at 832x480p) is trained on a combination of data, including proprietary Roblox 3D avatar/world interaction data. World models are different from multiplayer engines in that they store state and memory in video latents. Roblox is multiplayer, and we are actively researching optimal ways to simultaneously store state for thousands of players, and keep them in sync with their environment. Our world model leverages database technology which stores all user interactions on Roblox in a vector format that can be used to re-render video and interaction from any camera angle. We see several immediate uses for our Roblox world model. We will use it side-by-side text, image and video prompts as a way to launch auto-generation of immersive worlds. In Roblox Studio, a creator could walk around and use prompts to “paint” a world and then convert it into a 3D representation or direct to Roblox native as a way for many people to play simultaneously. All of this comes alive as we explore the notion of a “Dream Theater” - where one user is dreaming, while others watch and prompt them. 2/4

Scott Hanselman 🌮 @shanselman · Feb 5

GitHub Copilot CLI *dual wielding* with Opus and Gemini at the same time https://t.co/4eJaLn7BYt

Tyler Leonhardt - @code @TylerLeonhardt · Feb 5

I work on the @claudeai integration 👋 Integrating the Claude Agent SDK right into @code has been really interesting. I’ve been using it almost exclusively to build the integration itself. Lots more still to hook up, but lmk what you think!

C code @code

You told us you’re running multiple AI agents and wanted a better UX. We listened and shipped it! Here’s what’s new in the latest @code release: 🗂️ Unified agent sessions workspace for local, background, and cloud agents 💻 Claude and Codex support for local and cloud agents 🔀 Parallel subagents 🌐 Integrated browser And more...

Daniel San @dani_avila7 · Feb 5

Absolutely loving this setup: Claude Code + Ghostty + Lazygit + Worktree I’m writing 3 threads on X showing how you can use it: 1- Ghostty setup and SAND keybindings 2- Monitoring Claude Code changes with Lazygit 3- Parallel agents with Git worktrees + Claude Code I’ll publish one per week, using exactly the same setup and workflow I use. If you’re interested, feel free to follow along so you catch them

Eric S. Raymond @esrtweet · Feb 5

Programming with AI assistance is very revealing. It turns out I'm not quite who I thought I was. There are a lot of programmers out there who have a tremendous amount of ego and identity invested in the craft of coding. In knowing how to beat useful and correct behavior out of one language and system environment, or better yet many. If you asked me a week ago, I might have said I was one of those people. But a curious thing has occurred. LLMs are so good now that I can validate and generate a tremendous amount of code while doing hardly any hand-coding at all. And it's dawning on me that I don't miss it. It's an interesting way to find out that I was always a system designer first, with code only as a means rather than an end. I...actually did not know this about myself, before now. Insert cliched quote here about every journey of discovery ending in a discovery of the self. That actually happened this time. I am somewhat bemused.

Peter Zakin @pzakin · Feb 5

Last year, it was obvious that coding agents had reached a new level of capability. So I peered up the abstraction ladder and realized, obviously, that the next rung on the ladder wasn't writing code, but writing specs. Now I look at the ladder and I realize with similar obviousness: the next rung is something that--in absence of better terms--you might call organizational design.

E emollick @emollick

Increasingly believe that the next model after centaurs/cyborgs looks like management of an organization. Decisions flowing up from multiple projects, most handled semi-autonomously, but with strategy, direction, feedback, approval made by the human. Not the final state, though.

🍓

🍓🍓🍓 @iruletheworldmo · Feb 5

agent swarms are here.

C claudeai @claudeai

On Claude Code, we’re introducing agent teams. Spin up multiple agents that coordinate autonomously and work in parallel—best for tasks that can be split up and tackled independently. Agent teams are in research preview: https://t.co/LdkPjzxFZg

Boris Cherny @bcherny · Feb 5

I've been using Opus 4.6 for a bit -- it is our best model yet. It is more agentic, more intelligent, runs for longer, and is more careful and exhaustive. For Claude Code users, you can also now more precisely tune how much the model thinks. Run /model and arrow left/right to tune effort (less = faster, more = longer thinking & better results). Happy coding!

C claudeai @claudeai

Introducing Claude Opus 4.6. Our smartest model got an upgrade. Opus 4.6 plans more carefully, sustains agentic tasks for longer, operates reliably in massive codebases, and catches its own mistakes. It’s also our first Opus-class model with 1M token context in beta. https://t.co/L1iQyRgT9x

Evan Boyle @_Evan_Boyle · Feb 5

New in /experimental mode in Copilot CLI: "Fleets" 🛸 Run `/fleet` to dispatch parallel subagents to implement your plan. The secret sauce here is a sqlite database per session that the agent uses to model dependency aware tasks and TODOs.

_ _JeremyMoseley @_JeremyMoseley

/fleet now available in experimental. Sqlite for todo tracking, parallel agents to crush it. https://t.co/NdVHUlDbur

Aakash Gupta @aakashgupta · Feb 5

Agent swarms are amazing. I have been using them non-stop since the release 6 hours ago. They enable you to move so much faster. Here’s what most people won’t realize for another 6 months: this changes who can build software, and how fast. Claude Code hit $1B in run-rate revenue in six months. Faster than ChatGPT. Boris Cherny, the head of Claude Code, was already running 5 parallel Claude instances in terminal plus 5-10 on https://t.co/HhnFOTNFEz simultaneously. That was the manual version of what agent teams now automate natively. The old workflow was: prompt one agent, wait, review, prompt again. Sequential. The new workflow is: describe what you want, a lead agent decomposes it, spawns specialists for frontend, backend, testing, and docs, and they coordinate with each other while you do something else. That’s a 4-5x throughput increase per developer. And the compounding effects matter more than the raw speed. Each teammate gets its own fresh context window, which solves the token bloat problem that kills single-agent performance on large codebases. The real unlock: a PM or founder who couldn’t code before can now orchestrate a team of AI agents the same way an engineering manager orchestrates human engineers. Describe the architecture, delegate the work, review the output. We just went from “AI writes code for you” to “AI runs an engineering team for you.” And that’s a fundamentally different product category.

L lydiahallie @lydiahallie

Claude Code now supports agent teams (in research preview) Instead of a single agent working through a task sequentially, a lead agent can delegate to multiple teammates that work in parallel to research, debug, and build while coordinating with each other. Try it out today by enabling agent teams in your settings.json!

Greg Brockman @gdb · Feb 6

Software development is undergoing a renaissance in front of our eyes. If you haven't used the tools recently, you likely are underestimating what you're missing. Since December, there's been a step function improvement in what tools like Codex can do. Some great engineers at OpenAI yesterday told me that their job has fundamentally changed since December. Prior to then, they could use Codex for unit tests; now it writes essentially all the code and does a great deal of their operations and debugging. Not everyone has yet made that leap, but it's usually because of factors besides the capability of the model. Every company faces the same opportunity now, and navigating it well — just like with cloud computing or the Internet — requires careful thought. This post shares how OpenAI is currently approaching retooling our teams towards agentic software development. We're still learning and iterating, but here's how we're thinking about it right now: As a first step, by March 31st, we're aiming that: (1) For any technical task, the tool of first resort for humans is interacting with an agent rather than using an editor or terminal. (2) The default way humans utilize agents is explicitly evaluated as safe, but also productive enough that most workflows do not need additional permissions. In order to get there, here's what we recommended to the team a few weeks ago: 1. Take the time to try out the tools. The tools do sell themselves — many people have had amazing experiences with 5.2 in Codex, after having churned from codex web a few months ago. But many people are also so busy they haven't had a chance to try Codex yet or got stuck thinking "is there any way it could do X" rather than just trying. - Designate an "agents captain" for your team — the primary person responsible for thinking about how agents can be brought into the teams' workflow. - Share experiences or questions in a few designated internal channels - Take a day for a company-wide Codex hackathon 2. Create skills and AGENTS[.md]. - Create and maintain an AGENTS[.md] for any project you work on; update the AGENTS[.md] whenever the agent does something wrong or struggles with a task. - Write skills for anything that you get Codex to do, and commit it to the skills directory in a shared repository 3. Inventory and make accessible any internal tools. - Maintain a list of tools that your team relies on, and make sure someone takes point on making it agent-accessible (such as via a CLI or MCP server). 4. Structure codebases to be agent-first. With the models changing so fast, this is still somewhat untrodden ground, and will require some exploration. - Write tests which are quick to run, and create high-quality interfaces between components. 5. Say no to slop. Managing AI generated code at scale is an emerging problem, and will require new processes and conventions to keep code quality high - Ensure that some human is accountable for any code that gets merged. As a code reviewer, maintain at least the same bar as you would for human-written code, and make sure the author understands what they're submitting. 6. Work on basic infra. There's a lot of room for everyone to build basic infrastructure, which can be guided by internal user feedback. The core tools are getting a lot better and more usable, but there's a lot of infrastructure that currently go around the tools, such as observability, tracking not just the committed code but the agent trajectories that led to them, and central management of the tools that agents are able to use. Overall, adopting tools like Codex is not just a technical but also a deep cultural change, with a lot of downstream implications to figure out. We encourage every manager to drive this with their team, and to think through other action items — for example, per item 5 above, what else can prevent a lot of "functionally-correct but poorly-maintainable code" from creeping into codebases.