ClickUp Lays Off 22% to Build "100x" AI Teams as Enterprise Token Bills Force a Reckoning
ClickUp's CEO announced a 22% workforce reduction to restructure around AI-native engineering, offering million-dollar salary bands for engineers who orchestrate agents rather than write code. Meanwhile, Microsoft reportedly canceled internal Claude Code licenses over costs, and Anthropic's "let it cook" philosophy is reshaping how developers think about agent workflows. Local inference also hit new milestones with MTP and speculative decoding pushing speeds past 160 tokens per second on consumer hardware.
Daily Wrap-Up
If there was a unifying thread today, it was the growing gap between AI's promise and its price tag. ClickUp CEO Zeb Evans dropped what might be the most detailed public blueprint yet for restructuring a tech company around AI agents, laying off 22% of staff while introducing million-dollar salary bands for engineers who can orchestrate AI rather than just write code. It was equal parts inspiring and unsettling, especially when paired with Aaron Levie's thread about how enterprise AI costs are stratifying faster than anyone predicted. Microsoft reportedly canceled its internal Claude Code licenses because token-based billing made them untenable. Uber allegedly burned through its entire 2026 AI budget in four months. The subsidy era is over, and the bill is coming due.
On the technical side, the day was rich with practical advances for people actually building with AI. Anthropic engineers at Code with Claude London kept repeating "let it cook," a philosophy about writing routines that let Claude prompt itself rather than micromanaging every interaction. Swyx shared his experience running an agent for 16 hours across 103 commits to turn a vibe-coded MVP into a production-ready codebase. And Cursor's internal go-to skill for code quality review turns out to be brutally simple: delete complexity, block long files, reject messy PRs. The through-line is that the industry is moving past "can AI write code?" to "how do we make that code not terrible?"
The most practical takeaway for developers: start treating AI agent orchestration as a core skill, not a novelty. Whether it's writing declarative routines for Claude, building agent-firewalled npm workflows, or running 16-hour refactoring passes, the developers who thrive will be the ones who can direct AI systems with judgment and restraint rather than just pumping out more pull requests.
Quick Hits
- @honchodotdev highlighted a month of Honcho-powered Hermes usage for AI agent memory, with NetworkChuck switching all his agents to the NousResearch model. Memory as a reasoning task is becoming a serious design consideration for agent builders.
Coding Agents, Quality, and the "Let It Cook" Philosophy
The most technically dense cluster of the day revolved around a shared realization: the real challenge with AI coding agents is not generation, it is governance. Four posts converged on this from different angles.
At Code with Claude London, Anthropic engineers repeatedly used the phrase "let it cook" to describe their philosophy for agent interactions. @Mnilax reported that Boris Cherny, Ravi Trivedi, and Katelyn Lesse all independently returned to the same framing: stop micromanaging prompts, write routines, and let Claude prompt itself. As @Mnilax put it, "routines are higher-order prompts. The runtime is shipped. The prompts are the bottleneck." He tested 30 routines and found that only 9 survived, all sharing three specific properties he detailed in a linked article. The implication is clear: prompt engineering as we have known it is giving way to routine engineering, where the skill is designing the scaffold around the model rather than dictating every token.
Meanwhile, @swyx shared a compelling proof of concept: an agent skill that takes a "vibecoded slop app" and transforms it into a production-ready, end-to-end tested, maintainable codebase. The agent ran for approximately 16 hours and made 103 commits, producing what he described as "exactly the same app but instead of fragile MVP it now looks like a codebase I can actually build on for the long run." This is the missing middle of AI-assisted development, the transition from prototype to product that most teams struggle with.
On the quality enforcement front, @ericzakariasson revealed that the most used internal skill at Cursor right now is called "thermo-nuclear-code-quality-review," which deletes complexity instead of moving it, blocks files over 1,000 lines, flags thin wrappers and leaked logic, and rejects PRs that technically work but make the codebase messier. This is a significant signal: the company building one of the most popular AI coding tools has concluded that its own internal priority should be aggressive curation of AI-generated output, not more generation.
And @0xblacklight offered an architectural insight that ties these threads together. He argued that the best way to configure SaaS for agents is not through MCP servers or CLIs or browser automation, but through declarative code that expresses intent and lets the provider figure out the wiring. After using Codex to build a Pulumi provider around a vendor API in a single afternoon, he concluded that "headless SaaS for agents" means letting agents write version-controlled, type-safe infrastructure-as-code rather than clicking through dashboards or negotiating with MCP servers.
Enterprise AI Costs and the Future of Work
Three posts formed a stark narrative about what happens when AI capability meets organizational reality.
The flagship post of the day came from @DJ_CURFEW, ClickUp CEO Zeb Evans, who announced a 22% headcount reduction alongside a sweeping vision for what he calls the "100x organization." His central argument is that AI does not make everyone more productive. It makes the best engineers wildly more productive, and everyone else using AI slows those engineers down. The bottleneck in AI-driven engineering is orchestration, telling AI what to do, and review, evaluating what AI did. Evans wrote that "the great engineers, the ones who can orchestrate, architect, and review, are becoming 100x engineers. They are not writing code. They are directing agents that write code. The skill is judgment." He described a three-tier structure of Builders, Agent Managers, and Front-Liners, with million-dollar salary bands available to anyone who achieves 100x impact. He was also candid about the wrong strategy: "Companies doing this are celebrating 500% more pull requests. But customer outcomes don't match the volume of code being generated. I call this the great reckoning of AI coding."
@levie provided the macroeconomic context that makes ClickUp's move look prescient rather than reactionary. He observed that AI has shifted from cheap chat tools with small context windows to expensive agents with massive context and inference costs that are "an order of magnitude more" than before. The stratification of pricing is widening, not converging, and enterprises will need new finance teams and technology solutions just to manage the bill. His quoted source, @HedgieMarkets, reported that Microsoft canceled internal Claude Code licenses after token-based billing became untenable, and Uber's CTO warned the company burned through its entire 2026 AI budget in four months.
And in a poignant counterpoint from outside the tech bubble, @toddsaunders shared that he received 81 DMs in a single day from trade business owners building their own software with AI. A septic system installer, pool service techs, a garage door installer, a sign fabricator. His "Blue Collar Builders" series spotlights people like Cory LaChance, who built an agentic application that reads isometric drawings and extracts weld counts, material specs, and commodity codes for industrial contractors, all with zero prior coding experience. The contrast with ClickUp's narrative is striking: for blue collar workers, AI is genuinely democratizing creation, while for the tech industry, it is consolidating power among an elite tier of orchestrators.
Developer Education and Supply Chain Security
Three posts addressed the growing gap between what developers need to know and what they actually know.
@TheAhmadOsman shared what he called "the most complete guide for understanding LLMs from first principles," covering everything from tokenizers and attention mechanisms to local inference, VRAM math, quantization, and failure modes. His claim that any CS person can go from zero to deeply knowledgeable in LLMs in roughly two years is both encouraging and a reminder of how steep the learning curve remains for practitioners transitioning from traditional software engineering.
@slash1sol highlighted that Harvard released a 65-minute masterclass on Git and GitHub specifically because "vibe-coders still don't know how to commit." The framing was blunt but pointed: "Your AI can write the code. That wasn't the problem. The problem is you don't know how to merge it without breaking the repo." When tier-1 tech companies are filtering candidates on merge conflict resolution, the gap between AI-assisted code generation and fundamental version control literacy has become a real career liability.
On the security front, @giuseppegurgone shared a simple setup for Socket Firewall that routes every npm install through a security proxy with just a global install and two shell aliases. With AI agents installing packages automatically, supply chain protection at the CLI level is becoming essential infrastructure, not optional hardening.
Local Inference Hits New Speed Records
Two posts showcased just how far local inference has come on consumer hardware.
@fahdmirza reported that llama.cpp now has a built-in model router that replaces the need for Ollama plus Open WebUI for model switching. One server, one config file, instant model switching without restarts, zero duplicate storage, and full per-model control through a simple INI file. For anyone running local models, this removes a significant layer of friction.
@0xSero shared performance numbers that would have been unthinkable a year ago: 164 tokens per second on Gemma-4-31B and over 100 tokens per second on DeepSeek-4-Flash, achieved through Multi-Token Prediction combined with Speculative Decoding. He noted these optimizations are "nearly free VRAM wise" and encouraged anyone who abandoned LM Studio over slow inference to try again with these features enabled. The underlying data from @atomic_chat_hq showed MTP speeding up Qwen3.6 27B by 137% on dual RTX 5090s, with roughly 80% draft acceptance and zero accuracy loss. Local inference is no longer the slow cousin of cloud APIs. For many workloads, it is now genuinely competitive.
Sources
This is exactly why @SocketSecurity built Socket Firewall back in 2023. It's 100% free, and it will block malware from making it onto your device. To get protection for VSCode extensions and more, you need to run Socket Firewall as a proxy. Get in touch with our sales team, who can help! https://t.co/e6hS2YgYbt
50 GitHub Repos Save Thousands AI Cashflow 2026
9 Claude Cowork prompt-templates that run my 8-hour workday in 47 minutes of active supervision.
I'm switching to Hermes.... I've been using it for a month.....and I'm sold...moving all of my @openclaw agents to Hermes (@NousResearch) Why? -----> https://t.co/SqrSi0hAEf Thank you to @Hostinger for sponsoring this video! https://t.co/71ZRFPxRNt
Welcome to Blue Collar Builders! Cory LaChance inspired me to start a series spotlighting folks in the trade who are building software using AI. Cory normally works with chemical plants and refineries, but now he's building AI software for his company.... with no pervious experience writing code. He built a full agentic application that industrial contractors are using every day. It reads isometric drawings and automatically extracts every weld count, every material spec, every commodity code. My favorite thing he said was, "I did this with zero outside help other than the AI. My favorite tools are screenshots, step by step instructions, and asking Claude to explain things like I'm five." I hope you enjoy this episode as much as I did. And I can't wait to meet more Blue Collar Builders.
I recently chose one vendor over a second because the first one had a more robust API and in an afternoon codex has built a pulumi provider around their API for me so that all our configs in their SaaS are managed with declarative code that's version-controlled, type-safe, and explicit for agents (instead of needing their CLI/MCP server) and plugs into our other IaC so we don't need to go do things in dashboards and then configure it in our IaC this is what headless SaaS for agents means btw not "ship an MCP server" let me (or codex) configure it with code code mode for SaaS if you will - IaC for SaaS configuration
INCREDIBLE The MOST COMPLETE GUIDE for understanding LLMs from first principles is now available online to read for free Covers the model mechanics - Tokens / tokenizers - Transformers - Attention - KV cache - Prefill vs decode - Decoding controls - Model packages - Chat templates - Long context - RAG - Agents / tools - Fine-tuning - Multimodal models Then connects that to running models locally - What "local" really means - Open-weight vs opensource - Quantization - VRAM math - Hardware tiers - File formats / load safety - Runtimes / serving modes - Model selection - Privacy - Failure modes - Benchmarks - Practical setup paths You should read this, and if you cannot now then you most definitely wanna bookmark it for later Opensource AI FTW
🦔Microsoft canceled its internal Claude Code licenses this week after token-based billing made the cost untenable, even for a company with effectively infinite cloud resources. Uber's CTO sent an internal memo warning the company burned through its entire 2026 AI budget in just four months. American AI software prices have jumped 20% to 37%, and GitHub (owned by Microsoft) is dropping flat-rate plans for usage-based billing across its products. My Take The AI subsidy era is ending in real time. The same company that put $13 billion into OpenAI and built the Azure infrastructure powering most of Anthropic's compute just looked at the bill from a competitor's coding tool and decided it was not worth paying. That is not a productivity failure on Anthropic's end. Token-based pricing is forcing every enterprise customer to confront the actual cost of running these models at scale, and the number turns out to be far higher than the flat-rate experiments suggested. This ties directly to my Gemini Flash post yesterday. Anthropic, OpenAI, and Google all raised effective prices in the last six months. Enterprises that built workflows assuming AI costs would keep falling are now watching annual budgets evaporate in months. Two outcomes look likely from here. Either enterprises scale back AI usage to fit budgets, which slows the revenue ramp the labs need to justify their valuations ahead of IPOs, or the labs cut prices and absorb the losses, which makes the unit economics worse at exactly the wrong moment. Both paths land in the same place, the numbers stop working, and somebody has to take the writedown. Hedgie🤗
MTP speedup Qwen by 2.5x in Atomic Chat Dense vs MoE models on 2x RTX 5090 Qwen3.6 27B: 51 → 117 tps +137% Qwen3.6 35B-A3B: 218 → 267 tps +25% MTP drafts several tokens ahead and verifies them in one pass. The speedup depends on memory moved per pass. Dense 27B reads all 27B params per token, MoE 35B-A3B only reads 3B active. Dense had way more to save by batching. The baseline tps also differ (218 vs 51) for the same reason from the other side. Token generation is memory-bandwidth bound, and MoE moves ~8x less memory per token, so its baseline is already 4x ahead. ~80% draft acceptance. Zero accuracy loss. ~1 GB extra VRAM. Open-source code and local AI app – in the comments 👇