Microsoft Drops Claude Code Over Runaway Token Costs While Open-Source Security Tools and Agent Frameworks Surge
Enterprise AI spending hits a breaking point as Microsoft cancels Claude Code licenses and Uber blows through its 2026 AI budget in four months. Perplexity open-sources a supply chain security scanner, a 27B-parameter Qwen model gets jailbroken with near-zero capability loss, and the AI coding tool ecosystem expands with new MCP bridges and mobile workflows.
Daily Wrap-Up
The most striking story today is the money. @HedgieMarkets laid out an uncomfortable reality: Microsoft just canceled its internal Claude Code licenses because token-based billing made the cost untenable, even for a company with effectively infinite cloud resources. Uber's CTO reportedly sent an internal memo warning that the company burned through its entire 2026 AI budget in just four months. AI software prices have jumped 20% to 37% across the board, and GitHub is dropping flat-rate plans for usage-based billing. This is the end of the AI subsidy era, and it forces a hard question on every team currently building AI-assisted workflows: what happens when the introductory pricing disappears and you see the real bill?
On the tools front, the AI coding ecosystem continues to fracture in interesting ways. @fitchmultz released pi-cursor-sdk v0.1.16, which bridges Cursor agents to Pi extension tools via local MCP. @eng_khairallah1 surfaced a deep breakdown of Claude Code's 40 hidden features that most users have never touched. @mobilevibecom is pushing AI coding tools onto phones. The tooling is getting more powerful and more portable, but the underlying cost question looms over all of it. On the security side, Perplexity open-sourced Bumblebee, a lightweight scanner for detecting malicious packages, while @sebastienlorber shared a simple npm alias trick to block known malware. The developer community is clearly waking up to supply chain threats at the exact moment AI coding tools make dependency management even more complex.
The most practical takeaway for developers: audit your AI tool spending now before your budget does what Uber's did. Set up usage alerts, test flat-rate alternatives, and make sure your CI pipeline has malware scanning in place, because the intersection of skyrocketing token costs and increasingly complex supply chains is where production incidents are born.
Quick Hits
- @elonmusk reports that the Starship V3 first flight was scrubbed because a hydraulic pin holding the tower arm did not retract. Another launch attempt is expected imminently.
- @lawrencecchen highlights cmux as a great terminal multiplexer, noting a current workflow split between the Codex Mac app for knowledge work and terminal-based coding.
- @badlogicgames shares that roughly 5% of production traffic at his company runs on the Pi harness, with another 5% on OpenCode, signaling growing real-world adoption of alternative coding environments.
AI Coding Tools and the Enterprise Pricing Shock
The AI coding tool space is simultaneously booming and buckling under its own economics. Four posts today circle this tension from different angles, and together they paint a picture of an ecosystem at an inflection point.
@HedgieMarkets delivered the most sobering analysis. Microsoft, the same company that poured $13 billion into OpenAI and built the Azure infrastructure powering most of Anthropic's compute, looked at the bill from a competitor's coding tool and decided it was not worth paying. "Token-based pricing is forcing every enterprise customer to confront the actual cost of running these models at scale, and the number turns out to be far higher than the flat-rate experiments suggested." The analysis points to two likely outcomes: enterprises scale back AI usage to fit budgets, slowing the revenue ramp labs need for their IPO valuations, or labs cut prices and absorb losses, making unit economics worse at exactly the wrong moment.
Meanwhile, the tools themselves keep getting more sophisticated. @fitchmultz released pi-cursor-sdk v0.1.16, which creates an MCP bridge so Cursor agents can access Pi extension tools while Pi retains tool cards, confirmations, history, aborts, and diagnostics. The quoted post from @fitchmultz describes "Composer 2.5 Fast inside Pi" as "genuinely blowing me away with how fast and high-quality it feels." This is the kind of deep tooling integration that makes developers more productive, but it also means more tokens consumed per session.
On the user education side, @eng_khairallah1 surfaced a podcast with Boris Cherny, the creator of Claude Code at Anthropic, revealing that most users are leaving enormous amounts of capability on the table. "If you've been using Claude for more than a month and never left the chat window, you have at least 30 untouched features. Probably 38." The breakdown covers the 14% context window you lose to CLAUDE.md before typing a word, settings that 95% of users have never opened, and workflows hiding behind single toggles. Getting more out of existing tooling is one answer to rising costs: better prompts and proper configuration can reduce the number of expensive iterations needed to get results.
@mobilevibecom points to another dimension: running Claude Code, Windsurf, Codex, and Cursor from a phone. The computing interface is becoming untethered from the desk, which expands when and how developers use AI assistance, and by extension, how much they spend on it.
The connecting thread is clear. The tools are more powerful than ever, but the economic model underpinning them is straining. Teams that invest in understanding their tools deeply, like the 40 features @eng_khairallah1 documents, will extract more value per dollar. Teams that blindly adopt without configuration or cost monitoring will learn the lesson Uber did.
Supply Chain Security Gets Open-Source Reinforcements
Supply chain attacks targeting developer tools have accelerated sharply, and the community is responding with practical, deployable solutions.
@kpolley announced that Perplexity is open-sourcing Bumblebee, a read-only scanner for macOS and Linux that checks developer machines for risky packages, extensions, and AI tool configurations. "The industry has seen an unprecedented wave of supply chain attacks over the past few months. That's why we built Bumblebee, a lightweight security scanner that continuously monitors endpoints and hunts for malicious packages." The tool is already in production use at Perplexity, and it connects to Perplexity Computer to monitor public threat intelligence feeds in real time, updating its threat database as new risks emerge. This is notable because AI coding tools like Cursor and Claude Code are introducing new categories of config files and extensions that traditional security scanners were not designed to check.
On the prevention side, @sebastienlorber shared a simpler approach that any npm user can adopt immediately: add aliases that install the Socket Firewall package, which blocks known malware during installs. "I run this locally, to stay safe, and in my CI to detect compromised transitive dependencies early for my lib consumers." The quoted reply from @feross confirms this is powered by Socket's existing tooling. It is a one-line change that adds meaningful protection with zero cost.
Together, these two approaches represent the two ends of the supply chain security spectrum. Bumblebee is continuous monitoring and threat intelligence for organizations that need enterprise-grade protection. The npm alias trick is a five-minute fix for individual developers and small teams. Both are worth implementing, and the fact that both are free and open source removes any excuse for skipping them.
AI Agents, Classifiers, and Multiplayer Workspaces
The conversation around AI agents is shifting from theoretical potential to practical application, and two posts today illustrate very different but complementary approaches.
@alvinsng makes a compelling case that building classifiers is one of the highest-leverage uses of AI agents today. Before AI, a production classifier required a full team: ML engineers, infrastructure engineers, integration engineers, data scientists, product managers, and engineering managers. Now, an agent can generate a markdown file that classifies inputs in minutes, and other agents can run continuously against it. "Below is a Sentry error classifier I generated at @FactoryAI. But you can build this for almost anything: customer-reported bugs, backend traffic analysis, fraudulent payment activity." The insight here is not just speed. It is that classifiers, once the exclusive domain of specialized ML teams, are now accessible to any developer who can describe what they want categorized. The barrier moved from technical expertise to prompt clarity.
@whimdotrun is approaching agent collaboration from a different angle with Whim, a multiplayer cloud workspace where developers and AI agents work together as a team. The core problem they identified: "Your Claude doesn't talk to your teammate's Claude. We're fixing that." The open alpha offers 100 spots with 25 hours of free compute per week. This points to a genuine gap in current AI coding workflows. Most tools are single-player. Your AI assistant has no shared context with your teammate's AI assistant. If multiplayer coding with AI agents works, it could fundamentally change how teams collaborate on code.
Both posts suggest the next phase of AI tooling is not just about making individual developers faster. It is about building systems where agents handle classification, monitoring, and routine analysis while humans focus on higher-level decisions, and where multiple agents can coordinate within a team context.
Open Models Under Siege: Qwen Jailbroken, Chinese AI Rises
The open-source AI model landscape saw two significant developments today, one technical and one strategic.
@elder_plinius demonstrated a remarkable jailbreak of Qwen-3.6-27B using what they call the OBLITERATUS suite, executed almost entirely by a jailbroken Codex instance running the full suite of tools autonomously. The results are striking: a 95.84% non-refusal rate on an 842-prompt harmful corpus with a 93.94% quality pass rate, while raw capability remained completely intact at 51/70 on MMLU-Pro for both stock and modified versions. "The goal was simple: carve out the refusal circuits, mutate methodology and iterate until less than 5% refusal, and keep the 27B mind alive with no capability degradation tolerated. And somehow it worked." The quantized versions from Q4_K through Q8_0 all run smoothly on standard local inference tools like llama.cpp, LM Studio, and Ollama. This is a sobering data point for anyone relying on model-level safety training as a primary defense. If a single agent can iteratively strip safety constraints from a 27B model while preserving full capability, the security assumption has to shift to the application layer.
On the strategic front, @kirillk_web3 highlighted a 40-minute masterclass from the founder of a $20 billion Chinese AI company explaining the exact architecture that competes with Claude, shared openly and for free. The quoted post describes it as "the clearest explanation I've seen of how Agent Swarms and AI systems actually work at scale." The underlying point is that the competitive landscape for AI is genuinely global now, and the gap between proprietary and open architectures is closing rapidly. When the founder of a major Chinese AI company is willing to share architectural details publicly, it signals either extreme confidence in execution speed or a belief that architecture alone is no longer the differentiator.
Software Architecture: Sync Engines and Fan-Out Problems
Two posts today serve as excellent reminders that beneath the AI hype, foundational software architecture problems still matter enormously.
@odysseus0z draws a direct line from Linear's legendary speed to its sync engine, pointing out that Linear had to build it themselves but developers today have multiple excellent options. "There are many excellent options on the market now: Zero, @instant_db, @ElectricSQL, to name a few. Building Linear quality software is still hard, but much easier now!" This is a practical observation for anyone building real-time collaborative applications. The infrastructure for local-first, sync-enabled software has matured significantly, and reaching Linear-level responsiveness no longer requires building a custom sync engine from scratch.
@systemdesignone shares a detailed thread from @RaulJuncoV on push-based systems and the fan-out problem that comes up in 90% of system design interviews. The core insight is that push and pull are not binary choices. "Users with fewer than 1,000 followers get push fan-out. Each follower gets notified immediately. Users with millions of followers get pull fan-out. Their feed assembles on read." Twitter built exactly this hybrid approach, and the thread goes on to cover the downstream complexities: stateful connections, Redis pub/sub for routing, sequence IDs, message buffers, and reconnect logic for replaying missed events. It is the kind of systems thinking that remains relevant regardless of how much AI you layer on top.
Sources
How to Actually Set Up Claude. 40 Features Most Users Have Never Touched
Starship V3 first flight countdown starting
@hasante_ Yes, we have Socket Firewall https://t.co/ebIUWSo0Y9
Push-based systems come up in 90% of system design interviews. Here's the exercise you should be able to solve: Design a notification system for 100M users. Some have 50 followers. Some have 10M. The instinct is to hold a WebSocket connection open to every active user and push updates as they arrive. Clean mental model. It collapses the moment a celebrity posts. When someone with 10M followers posts, you push to 10M open connections simultaneously. Your message broker saturates. Your WebSocket servers fall over. The system fails at the exact moment it needs to work. That's the fan-out problem. And it kills more interview answers than any other mistake. The production answer: push and pull aren't binary. You pick based on follower count. Users with fewer than 1,000 followers get push fan-out. Each follower gets notified immediately. Users with millions of followers get pull fan-out. Their feed assembles on read. Nobody gets a push. Followers see the post when they open the app. Twitter built exactly this: push-on-write for small accounts, pull-on-read for large ones. But fan-out is only half the problem. Push means stateful connections. Your servers now need to know which connection lives on which machine. You can't route blindly. Most teams reach for Redis pub/sub here; the WebSocket server subscribes, the backend publishes, the message finds the right node. Add a 3-second network drop and you have another layer: what did the client miss? Now you need sequence IDs, a message buffer, and reconnect logic that replays missed events. "Push-based" became push with a pull fallback, a message broker, sticky routing, and a replay buffer. Most engineers stop at the first diagram. The ones who get the offer keep pulling the thread until the system breaks.
instead of watching 2 hours of Netflix tonight, watch this 40-minute masterclass from the founder of a $20B China AI company it's the clearest explanation I've seen of how Agent Swarms and AI systems actually work at scale useful whether you've never built an agent in your life or have been using Claude every day for the past year I took the key ideas and turned them into a practical guide on how to actually build with Kimi find it below
Today we're open-sourcing Bumblebee, a read-only scanner for macOS and Linux. It checks developer machines for risky packages, extensions, and AI tool configs. Connected to Computer, it can trigger deeper scans whenever a new supply-chain risk emerges. https://t.co/FOaWnF1yQy https://t.co/wXauD4wDOT
Composer 2.5 Fast inside Pi is superb. It’s genuinely blowing me away with how fast and high-quality it feels. Big update coming today for pi-cursor-sdk: Cursor models now have full native capabilities and seamless access to all Pi extensions & tools via the new MCP bridge. This is a serious upgrade. Stay tuned.
Introducing https://t.co/ycxJEf1z7w! A new space where I explore how the best apps in the world are built. First piece: How's Linear is so fast? a technical breakdown. https://t.co/9Vu1syrn1i https://t.co/3rf8Y4ESpw