AI Digest.

ClickUp Lays Off 22% to Build "100x" AI Teams as Enterprise Token Bills Force a Reckoning

ClickUp's CEO announced a 22% workforce reduction to restructure around AI-native engineering, offering million-dollar salary bands for engineers who orchestrate agents rather than write code. Meanwhile, Microsoft reportedly canceled internal Claude Code licenses over costs, and Anthropic's "let it cook" philosophy is reshaping how developers think about agent workflows. Local inference also hit new milestones with MTP and speculative decoding pushing speeds past 160 tokens per second on consumer hardware.

Daily Wrap-Up

If there was a unifying thread today, it was the growing gap between AI's promise and its price tag. ClickUp CEO Zeb Evans dropped what might be the most detailed public blueprint yet for restructuring a tech company around AI agents, laying off 22% of staff while introducing million-dollar salary bands for engineers who can orchestrate AI rather than just write code. It was equal parts inspiring and unsettling, especially when paired with Aaron Levie's thread about how enterprise AI costs are stratifying faster than anyone predicted. Microsoft reportedly canceled its internal Claude Code licenses because token-based billing made them untenable. Uber allegedly burned through its entire 2026 AI budget in four months. The subsidy era is over, and the bill is coming due.

On the technical side, the day was rich with practical advances for people actually building with AI. Anthropic engineers at Code with Claude London kept repeating "let it cook," a philosophy about writing routines that let Claude prompt itself rather than micromanaging every interaction. Swyx shared his experience running an agent for 16 hours across 103 commits to turn a vibe-coded MVP into a production-ready codebase. And Cursor's internal go-to skill for code quality review turns out to be brutally simple: delete complexity, block long files, reject messy PRs. The through-line is that the industry is moving past "can AI write code?" to "how do we make that code not terrible?"

The most practical takeaway for developers: start treating AI agent orchestration as a core skill, not a novelty. Whether it's writing declarative routines for Claude, building agent-firewalled npm workflows, or running 16-hour refactoring passes, the developers who thrive will be the ones who can direct AI systems with judgment and restraint rather than just pumping out more pull requests.

Quick Hits

  • @honchodotdev highlighted a month of Honcho-powered Hermes usage for AI agent memory, with NetworkChuck switching all his agents to the NousResearch model. Memory as a reasoning task is becoming a serious design consideration for agent builders.

Coding Agents, Quality, and the "Let It Cook" Philosophy

The most technically dense cluster of the day revolved around a shared realization: the real challenge with AI coding agents is not generation, it is governance. Four posts converged on this from different angles.

At Code with Claude London, Anthropic engineers repeatedly used the phrase "let it cook" to describe their philosophy for agent interactions. @Mnilax reported that Boris Cherny, Ravi Trivedi, and Katelyn Lesse all independently returned to the same framing: stop micromanaging prompts, write routines, and let Claude prompt itself. As @Mnilax put it, "routines are higher-order prompts. The runtime is shipped. The prompts are the bottleneck." He tested 30 routines and found that only 9 survived, all sharing three specific properties he detailed in a linked article. The implication is clear: prompt engineering as we have known it is giving way to routine engineering, where the skill is designing the scaffold around the model rather than dictating every token.

Meanwhile, @swyx shared a compelling proof of concept: an agent skill that takes a "vibecoded slop app" and transforms it into a production-ready, end-to-end tested, maintainable codebase. The agent ran for approximately 16 hours and made 103 commits, producing what he described as "exactly the same app but instead of fragile MVP it now looks like a codebase I can actually build on for the long run." This is the missing middle of AI-assisted development, the transition from prototype to product that most teams struggle with.

On the quality enforcement front, @ericzakariasson revealed that the most used internal skill at Cursor right now is called "thermo-nuclear-code-quality-review," which deletes complexity instead of moving it, blocks files over 1,000 lines, flags thin wrappers and leaked logic, and rejects PRs that technically work but make the codebase messier. This is a significant signal: the company building one of the most popular AI coding tools has concluded that its own internal priority should be aggressive curation of AI-generated output, not more generation.

And @0xblacklight offered an architectural insight that ties these threads together. He argued that the best way to configure SaaS for agents is not through MCP servers or CLIs or browser automation, but through declarative code that expresses intent and lets the provider figure out the wiring. After using Codex to build a Pulumi provider around a vendor API in a single afternoon, he concluded that "headless SaaS for agents" means letting agents write version-controlled, type-safe infrastructure-as-code rather than clicking through dashboards or negotiating with MCP servers.

Enterprise AI Costs and the Future of Work

Three posts formed a stark narrative about what happens when AI capability meets organizational reality.

The flagship post of the day came from @DJ_CURFEW, ClickUp CEO Zeb Evans, who announced a 22% headcount reduction alongside a sweeping vision for what he calls the "100x organization." His central argument is that AI does not make everyone more productive. It makes the best engineers wildly more productive, and everyone else using AI slows those engineers down. The bottleneck in AI-driven engineering is orchestration, telling AI what to do, and review, evaluating what AI did. Evans wrote that "the great engineers, the ones who can orchestrate, architect, and review, are becoming 100x engineers. They are not writing code. They are directing agents that write code. The skill is judgment." He described a three-tier structure of Builders, Agent Managers, and Front-Liners, with million-dollar salary bands available to anyone who achieves 100x impact. He was also candid about the wrong strategy: "Companies doing this are celebrating 500% more pull requests. But customer outcomes don't match the volume of code being generated. I call this the great reckoning of AI coding."

@levie provided the macroeconomic context that makes ClickUp's move look prescient rather than reactionary. He observed that AI has shifted from cheap chat tools with small context windows to expensive agents with massive context and inference costs that are "an order of magnitude more" than before. The stratification of pricing is widening, not converging, and enterprises will need new finance teams and technology solutions just to manage the bill. His quoted source, @HedgieMarkets, reported that Microsoft canceled internal Claude Code licenses after token-based billing became untenable, and Uber's CTO warned the company burned through its entire 2026 AI budget in four months.

And in a poignant counterpoint from outside the tech bubble, @toddsaunders shared that he received 81 DMs in a single day from trade business owners building their own software with AI. A septic system installer, pool service techs, a garage door installer, a sign fabricator. His "Blue Collar Builders" series spotlights people like Cory LaChance, who built an agentic application that reads isometric drawings and extracts weld counts, material specs, and commodity codes for industrial contractors, all with zero prior coding experience. The contrast with ClickUp's narrative is striking: for blue collar workers, AI is genuinely democratizing creation, while for the tech industry, it is consolidating power among an elite tier of orchestrators.

Developer Education and Supply Chain Security

Three posts addressed the growing gap between what developers need to know and what they actually know.

@TheAhmadOsman shared what he called "the most complete guide for understanding LLMs from first principles," covering everything from tokenizers and attention mechanisms to local inference, VRAM math, quantization, and failure modes. His claim that any CS person can go from zero to deeply knowledgeable in LLMs in roughly two years is both encouraging and a reminder of how steep the learning curve remains for practitioners transitioning from traditional software engineering.

@slash1sol highlighted that Harvard released a 65-minute masterclass on Git and GitHub specifically because "vibe-coders still don't know how to commit." The framing was blunt but pointed: "Your AI can write the code. That wasn't the problem. The problem is you don't know how to merge it without breaking the repo." When tier-1 tech companies are filtering candidates on merge conflict resolution, the gap between AI-assisted code generation and fundamental version control literacy has become a real career liability.

On the security front, @giuseppegurgone shared a simple setup for Socket Firewall that routes every npm install through a security proxy with just a global install and two shell aliases. With AI agents installing packages automatically, supply chain protection at the CLI level is becoming essential infrastructure, not optional hardening.

Local Inference Hits New Speed Records

Two posts showcased just how far local inference has come on consumer hardware.

@fahdmirza reported that llama.cpp now has a built-in model router that replaces the need for Ollama plus Open WebUI for model switching. One server, one config file, instant model switching without restarts, zero duplicate storage, and full per-model control through a simple INI file. For anyone running local models, this removes a significant layer of friction.

@0xSero shared performance numbers that would have been unthinkable a year ago: 164 tokens per second on Gemma-4-31B and over 100 tokens per second on DeepSeek-4-Flash, achieved through Multi-Token Prediction combined with Speculative Decoding. He noted these optimizations are "nearly free VRAM wise" and encouraged anyone who abandoned LM Studio over slow inference to try again with these features enabled. The underlying data from @atomic_chat_hq showed MTP speeding up Qwen3.6 27B by 137% on dual RTX 5090s, with roughly 80% draft acceptance and zero accuracy loss. Local inference is no longer the slow cousin of cloud APIs. For many workloads, it is now genuinely competitive.

Sources

F
Fahd Mirza @fahdmirza ·
🦙 llama.cpp now has a BUILT-IN model router ♠ and it completely replaces Ollama + Open WebUI for model switching 🔹 One server, one config file, any model on disk 🔹 Switch models instantly without restarting anything 🔹 Zero duplicate model storage across backends 🔹 Full per-model control via a simple INI file 🔹 Native llama.cpp performance, no abstraction layer 🔥 Watch the full video below 👇 https://t.co/3QrlbZkPdp
G
Giuseppe @giuseppegurgone ·
This is the easiest thing to setup and every npm install will go through Socket's firewall. npm i -g sfw alias npm="sfw npm" alias npx="sfw npx"
F feross @feross

This is exactly why @SocketSecurity built Socket Firewall back in 2023. It's 100% free, and it will block malware from making it onto your device. To get protection for VSCode extensions and more, you need to run Socket Firewall as a proxy. Get in touch with our sales team, who can help! https://t.co/e6hS2YgYbt

E
eric zakariasson @ericzakariasson ·
the most used skill internally at cursor right now /thermo-nuclear-code-quality-review - deletes complexity instead of moving it - blocks files over 1k lines - flags thin wrappers and leaked logic - rejects PRs that work but make code messier https://t.co/WwJvGI0j4v
Z
Zeb Evans @DJ_CURFEW ·
Today we reduced headcount by 22%. The business is the strongest it's ever been. So I think it's important to be direct about what I'm seeing and why. First, I made this decision and I own it. I did it because the way to operate at the highest level of productivity is changing, and to win the future, ClickUp needs to change with it. Second, this wasn't about cutting costs. Most savings from this change will flow directly back into the people who stay. We'll be introducing million-dollar salary bands. If you create outsized impact using AI, you'll be paid outside of traditional bands. Most importantly, I have the deepest gratitude for those affected. We're doing this from a position of strength specifically so we can take care of people properly. Everyone affected receives a package aimed at honoring their contributions and easing the transition. I only see two options: wait for this to play out gradually in the market or be honest about what I'm seeing and act proactively. THE 100X ORGANIZATION The primary change is that we're restructuring around what I call 100x org. The goal is 100x output. The roles required to build at the highest level are fundamentally different than they were a year ago. Incremental improvements to existing systems won't get us there. We need new ones. That means creating enough disruption to rebuild rather than iterate on what's already broken. The common narrative is that AI makes everyone more productive. It doesn't. Many of the workflows of today, if left unchanged, create bottlenecks in AI systems. These roles will evolve. But waiting for that to happen naturally means falling behind now. The 100x org is actually heavily dependent on people - infinitely more than today. This is only possible with 10x people that have embraced and adopted new ways of working. THE BUILDERS, AGENT MANAGERS, AND FRONT-LINERS — THE BUILDERS: 10X ENGINEERS I don't think most companies have internalized what's actually happening with AI in engineering. The common narrative is that AI makes all engineers more productive. That may be true in isolation, but at an organization level - that is the farthest thing from reality. Here's what we've validated recently at ClickUp: the great engineers, the ones who can orchestrate, architect, and review, are becoming 100x engineers. They're not writing code. They're directing agents that write code. The skill is judgment. AI makes the best engineers wildly more productive, and everyone else using AI slows these engineers down. Think about it - the bottlenecks are (1) orchestration - telling AI what to do, and (2) reviewing - what AI did. Everything is leapfrogged and no longer needed. So who do you want orchestrating and reviewing code? And how do you want your best engineers to spend their time? If your best engineers are spending time reviewing other people's code, then this is inherently an inefficient bottleneck. These engineers can review their agent's code much faster than reviewing human code. The new world is about enabling your 10x engineers to become 100x. The wrong strategy is to push every engineer to use infinite tokens. Companies doing this are celebrating 500% more pull requests. But customer outcomes don't match the volume of code being generated. I call this the great reckoning of AI coding, and every company will face this soon if not already. More code is just another bottleneck to the best engineers, and ultimately to your company's impact as well. — THE BUILDERS: 10X PRODUCT MANAGERS Product management and design roles are merging. Designers that have customer focus, become more like product managers. And product managers that have intuition for UX become more like designers. The bottleneck of user research is gone. It takes us just one mention of an agent to kickoff research and analyze results. The bottleneck of product <> design iteration is also gone. The product builder iterates on their own, along with agents and skills that ensure alignment with quality and strategy. Also controversial today - I believe that the wrong strategy is to have your PMs shipping code - that just introduces another bottleneck that the best engineers will waste their time on. To be clear, PMs should be coding but they should do this in a playground to iterate, validate, and scope. That code should not go to production. Everything outside of managing systems, orchestrating AI, and reviewing output becomes a bottleneck. That's why the other roles that are critical along with these are the systems managers (to reduce bottlenecks) along with a bottleneck you can't replace - customer meeting time. — THE SYSTEM MANAGERS Ironically, the people that automate their jobs with AI will always have a job. They become owners of the AI systems - agent managers. We have many examples of these people at ClickUp. The underlying systems in which we operate are absolutely critical to get right. I think most companies are delusional to think they can iterate on existing systems and compete in this new world. You must create enough disruption so that old systems are deprecated entirely. If there's any definition for 'AI native' that's what it is. — THE FRONT-LINERS In a world that will become saturated with AI communication, the human touch will matter more than anything to customers. This is a bottleneck that you shouldn't replace - even when agents are high enough quality to do video meetings. One-on-one meeting time with customers is something that shouldn't be automated. The systems around the meetings should be - so that front-liners spend nearly 100% of their time with customers. REWARDING 100X IMPACT In a world where companies are able to do so much more with less, where does that excess money go? In our case, much of the savings in this new operating model will flow directly back to those that enabled it. We must reward people that create productivity accordingly. This aligns incentives on both sides. Plus, in a world where your best people create 100x impact, you can't afford to lose them. You should aim to retain these employees for decades. The context they have and their ability to efficiently orchestrate and review will be nearly impossible to replace. Compensation bands of today should be thrown out the door. We're introducing $1 million cash/year salary bands with a path available to nearly everyone in the company if they produce 100x impact by creating or managing AI systems. THE FUTURE Nearly every company will make changes like these. The ones that do it proactively will define what comes next. The future is not fewer people. It's different work, new roles, and better rewards for those who embrace it. We're already seeing entirely new roles emerge, like Agent Managers, that didn't exist a year ago. ClickUp is positioning to lead this shift, not just internally, but for our customers too. I've never been more certain about where we're headed.
S
slash1s @slash1sol ·
HARVARD RELEASED A 65-MIN MASTERCLASS ON GIT & GITHUB BECAUSE VIBE-CODERS STILL DON'T KNOW HOW TO COMMIT 1 hour and 5 minutes of raw, no-nonsense version control architecture from the creators of CS50. -> The moment you watch it, you realize why most modern developers are breaking their production branches. Every tier-1 tech company is now filtering candidates who can't handle basic merge conflicts. Git isn't a "nice-to-know" anymore -> it's compliance. Your AI can write the code. That wasn't the problem. The problem is you don't know how to merge it without breaking the repo. Don’t forget to bookmark it.
R ridark_eth @ridark_eth

50 GitHub Repos Save Thousands AI Cashflow 2026

M
Mnimiy @Mnilax ·
"let it cook" is the line Anthropic engineers just repeated all day at Code with Claude London. Boris Cherny said it in the keynote. Ravi Trivedi said it in the next talk. Katelyn Lesse said it at the panel. it means: stop micromanaging the prompts. write the routine. let Claude prompt itself. his framing: > routines are higher-order prompts. > the runtime is shipped. > the prompts are the bottleneck. what they didn't say on stage: most routines die without 3 specific properties. i tested 30. 9 made it. the other 21 violated one of the three. the 9 that survived are in the article. all verbatim, just copy-paste. worth more than $300 of "prompt engineering consulting" before you build anything.
M Mnilax @Mnilax

9 Claude Cowork prompt-templates that run my 8-hour workday in 47 minutes of active supervision.

S
swyx🛬 SFO @swyx ·
working on a "take this vibecoded slop app and make it a production-ready, e2e tested, maintainable, parallelizable agent repo" skill. this thing ran for ~16 hours yesterday and made 103 commits all told and i ended up with exactly the same app but instead of fragile mvp it now looks like a codebase i can actually build on for th elong run
H
Honcho @honchodotdev ·
Importantly, a month of 🫡Honcho-powered Hermes usage Perfect for supplementing the local markdown system Memory is a reasoning task... and Chuck's just beginning to see the benefits 🚀 https://t.co/k9RexMjJmR
N NetworkChuck @NetworkChuck

I'm switching to Hermes.... I've been using it for a month.....and I'm sold...moving all of my @openclaw agents to Hermes (@NousResearch) Why? -----&gt; https://t.co/SqrSi0hAEf Thank you to @Hostinger for sponsoring this video! https://t.co/71ZRFPxRNt

T
Todd Saunders @toddsaunders ·
I'm giddy. 81 DMs today from trade business owners building their own software with AI. A septic system installer, 3 pool service techs, a garage door installer, a sign and awning fabricator and so many others. This is the moment for Blue Collar Builders. IT'S TIME TO LET THE BUILDERS BUILD.
T toddsaunders @toddsaunders

Welcome to Blue Collar Builders! Cory LaChance inspired me to start a series spotlighting folks in the trade who are building software using AI. Cory normally works with chemical plants and refineries, but now he's building AI software for his company.... with no pervious experience writing code. He built a full agentic application that industrial contractors are using every day. It reads isometric drawings and automatically extracts every weld count, every material spec, every commodity code. My favorite thing he said was, "I did this with zero outside help other than the AI. My favorite tools are screenshots, step by step instructions, and asking Claude to explain things like I'm five." I hope you enjoy this episode as much as I did. And I can't wait to meet more Blue Collar Builders.

K
Kyle Mistele 🏴‍☠️ @0xblacklight ·
this is basically code-mode for agent-driven SaaS configuration btw don't make the agent use a CLI or MCP server or agent browser - just let it write code that declares intent and let the provider work out how to wire it together
0 0xblacklight @0xblacklight

I recently chose one vendor over a second because the first one had a more robust API and in an afternoon codex has built a pulumi provider around their API for me so that all our configs in their SaaS are managed with declarative code that's version-controlled, type-safe, and explicit for agents (instead of needing their CLI/MCP server) and plugs into our other IaC so we don't need to go do things in dashboards and then configure it in our IaC this is what headless SaaS for agents means btw not "ship an MCP server" let me (or codex) configure it with code code mode for SaaS if you will - IaC for SaaS configuration

A
Ahmad @TheAhmadOsman ·
Any CS person can go from zero to deeply knowledgeable in LLMs and AI in ~2 years, top to bottom And all you need to get started is in this article btw
T TheAhmadOsman @TheAhmadOsman

INCREDIBLE The MOST COMPLETE GUIDE for understanding LLMs from first principles is now available online to read for free Covers the model mechanics - Tokens / tokenizers - Transformers - Attention - KV cache - Prefill vs decode - Decoding controls - Model packages - Chat templates - Long context - RAG - Agents / tools - Fine-tuning - Multimodal models Then connects that to running models locally - What "local" really means - Open-weight vs opensource - Quantization - VRAM math - Hardware tiers - File formats / load safety - Runtimes / serving modes - Model selection - Privacy - Failure modes - Benchmarks - Practical setup paths You should read this, and if you cannot now then you most definitely wanna bookmark it for later Opensource AI FTW

A
Aaron Levie @levie ·
What’s happened is that we went from AI chat tools that were relatively cheap and had small context windows, to AI agents that have giant context windows, the ability to keep track of longer running work, and models that cost an order of magnitude more on inference because they’re that much better. This has compounded far faster than most realized (unless you were paying close attention at the middle or end of last year, which many here were), and the dollars flowing in now are much more real. What follows is a continued march of AI capability that will continue to be used by anyone with a frontier use-case (like coding, sciences, finance, consulting) and then a peeling off of tasks to lower cost models that are capable enough for the job. Whereas we thought the cost of AI might converge on a single low price per token before, it’s clear the stratification is only widening based on the task you need performed. This will be yet another component that has to be figured out for broad AI diffusion. Enterprises will need to put in programs, new finance teams, and technology solutions to manage this all. The labs and platforms that can ensure customers can price optimize for the task at hand will be in the best position.
H HedgieMarkets @HedgieMarkets

🦔Microsoft canceled its internal Claude Code licenses this week after token-based billing made the cost untenable, even for a company with effectively infinite cloud resources. Uber's CTO sent an internal memo warning the company burned through its entire 2026 AI budget in just four months. American AI software prices have jumped 20% to 37%, and GitHub (owned by Microsoft) is dropping flat-rate plans for usage-based billing across its products. My Take The AI subsidy era is ending in real time. The same company that put $13 billion into OpenAI and built the Azure infrastructure powering most of Anthropic's compute just looked at the bill from a competitor's coding tool and decided it was not worth paying. That is not a productivity failure on Anthropic's end. Token-based pricing is forcing every enterprise customer to confront the actual cost of running these models at scale, and the number turns out to be far higher than the flat-rate experiments suggested. This ties directly to my Gemini Flash post yesterday. Anthropic, OpenAI, and Google all raised effective prices in the last six months. Enterprises that built workflows assuming AI costs would keep falling are now watching annual budgets evaporate in months. Two outcomes look likely from here. Either enterprises scale back AI usage to fit budgets, which slows the revenue ramp the labs need to justify their valuations ahead of IPOs, or the labs cut prices and absorb the losses, which makes the unit economics worse at exactly the wrong moment. Both paths land in the same place, the numbers stop working, and somebody has to take the writedown. Hedgie🤗

0
0xSero @0xSero ·
MTP + Speculative Decoding are nearly free VRAM wise and can help speed up local inference to some crazy numbers. If slow inference turned you from lmstudio, take time to try with these optimisations enabled. I'm getting 164 tok/s on Gemma-4-31B & 100+ tok/s on DS4-Flash
A atomic_chat_hq @atomic_chat_hq

MTP speedup Qwen by 2.5x in Atomic Chat Dense vs MoE models on 2x RTX 5090 Qwen3.6 27B: 51 → 117 tps +137% Qwen3.6 35B-A3B: 218 → 267 tps +25% MTP drafts several tokens ahead and verifies them in one pass. The speedup depends on memory moved per pass. Dense 27B reads all 27B params per token, MoE 35B-A3B only reads 3B active. Dense had way more to save by batching. The baseline tps also differ (218 vs 51) for the same reason from the other side. Token generation is memory-bandwidth bound, and MoE moves ~8x less memory per token, so its baseline is already 4x ahead. ~80% draft acceptance. Zero accuracy loss. ~1 GB extra VRAM. Open-source code and local AI app – in the comments 👇