AI Digest.

LiteLLM Supply Chain Attack Exposes 97M Downloads as Google's TurboQuant Promises 6x Memory Compression

A major supply chain attack on LiteLLM's PyPI package dominated the conversation, with the compromise only caught because the attacker's malware was buggy enough to crash machines. Meanwhile, Google's TurboQuant algorithm for KV cache compression drew widespread excitement, and Cloudflare's Dynamic Workers launched as a new primitive for AI agent sandboxing.

Daily Wrap-Up

The biggest story today isn't a product launch or a new model. It's a security catastrophe that was only averted because an attacker got sloppy. LiteLLM, a Python package with 97 million monthly downloads that serves as a universal proxy for AI API keys, was compromised through a cascading supply chain attack. The poisoned version ran malware on install, no import required, harvesting SSH keys, cloud credentials, Kubernetes secrets, and anything else it could find. The attacker group TeamPCP first compromised Trivy, a security scanning tool, then used stolen credentials to work their way through GitHub Actions, Docker Hub, npm, and Open VSX. The only reason this didn't become a multi-week silent exfiltration across thousands of production environments is that the malware was so poorly written it consumed all available RAM and crashed a developer's machine. That developer happened to be running a Cursor MCP plugin that pulled in LiteLLM as a transitive dependency they didn't even know they had. It's the kind of story that makes you reconsider every pip install you've ever run.

On the more optimistic side of the ledger, Google published TurboQuant, a KV cache compression algorithm that achieves 6x memory reduction with zero accuracy loss across every benchmark tested. This is the kind of infrastructure-level improvement that quietly changes what's economically viable. If the same GPU can handle six times more concurrent conversations, the cost curve for AI inference shifts dramatically. Multiple developers were already testing it on local models by end of day, and an MLX implementation showed promising results on Qwen3.5. Cloudflare also made a play for the agent infrastructure layer with Dynamic Workers, using V8 isolates instead of containers to give AI-generated code a sandbox that boots in milliseconds instead of hundreds of milliseconds. Between the security wake-up call and two meaningful infrastructure advances, today felt like the ecosystem maturing in real time, sometimes painfully.

The most entertaining moment was easily the revelation that the supply chain attack was caught by a RAM spike. Somewhere, a security engineer is writing a postmortem that essentially says "we were saved by bad vibes and worse code." The most practical takeaway for developers: audit your transitive dependencies today. Run pip show or npm ls on your AI projects and actually look at what's installed. If you're using LiteLLM, verify you're on a clean version. And if you're building anything that touches credentials, consider whether you actually need that dependency or whether you can, as Karpathy suggested, just "yoink" the functionality you need with an LLM.

Quick Hits

  • @staysaasy shared @rodriscoll's framework for categorizing AI layoffs into five buckets: excuse cover, growth decline, capex reallocation, genuine productivity gains, and workforce skill mismatch. A useful lens for cutting through corporate spin.
  • @jpschroeder pointed to the .agent TLD initiative, with Brave among the early registrants. Community-owned domain management instead of single-company control.
  • @MatthewBerman flagged self-improving AI developments that he thinks aren't getting enough attention.
  • @elonmusk posted an Optimus robot video. No additional context provided, as is tradition.
  • @bl888m shared a greentext-style narrative about going from faking AI knowledge to actually understanding it via Stanford CS221. Relatable content for the "transformers and embeddings" era.
  • @EOEboh recommended following @ashoKumar89 for backend engineering content.
  • @cgtwts shared a reaction video about Claude users discovering Karpathy's autoresearch method.
  • @mikeyobrienv highlighted Anthropic's writing on harness design for long-running agent apps, noting overlap with ralph-orchestrator's architecture.

Supply Chain Security: The LiteLLM Compromise

Today's most consequential story is a security incident that reads like a thriller. A threat group called TeamPCP executed a cascading supply chain attack that started with Trivy, a security scanning tool, and ended with a poisoned version of LiteLLM on PyPI. The attack was sophisticated in strategy but crude in execution, and that crudeness is the only thing that saved thousands of organizations.

@karpathy laid out the technical severity: "Simple pip install litellm was enough to exfiltrate SSH keys, AWS/GCP/Azure creds, Kubernetes configs, git credentials, env vars (all your API keys), shell history, crypto wallets, SSL private keys, CI/CD secrets, database passwords." The malware ran automatically on package installation through a .pth file that Python executes on startup. No explicit import needed.

@aakashgupta traced the full attack chain, noting that "the attacker vibe coded it... the malware was so sloppy it crashed computers... used so much RAM a developer noticed their machine dying and investigated. They found LiteLLM had been pulled in through a Cursor MCP plugin they didn't even know they had." The contagion spread beyond direct installs: any package depending on LiteLLM, including DSPy, would pull in the compromised version. @mathdroid added that OpenClaw also uses LiteLLM, widening the blast radius further.

The timing is notable. On the same day this attack surfaced, @mslipper announced the open-source release of iron-sensor, an eBPF-based behavioral monitor specifically designed to watch what AI coding agents do on your system. "Agents act like you: they read SSH keys, write cron jobs, modify systemd units, and escalate privileges. Most of the time this helps you code. Some of the time it installs a backdoor. By default, you can't tell the difference." They stress-tested it with an OpenClaw instance that installed 223 random skills with zero human review, generating 16,000+ events. The tool records everything at the kernel level so you can actually audit what happened. Given today's events, the timing couldn't be more relevant. Karpathy's conclusion resonates: dependencies need to be re-evaluated, and LLMs might be the tool that lets us "yoink" functionality instead of trusting an ever-deeper tree of packages we've never audited.

Google's TurboQuant: 6x KV Cache Compression

The KV cache is the single largest memory bottleneck in LLM inference. Every conversation you have with a chatbot maintains this cache so the model doesn't re-process the entire context from scratch. On a 70B parameter model with a long conversation, that cache alone can consume 40GB of GPU memory. Google's TurboQuant algorithm compresses it by 6x, down to 3 bits per value, with what they claim is zero accuracy loss. No retraining, no fine-tuning, drop-in replacement.

@AnishA_Moonka broke down the economics: "AI inference now makes up 55% of all AI compute spending. Hyperscalers are pouring nearly $700 billion into AI infrastructure in 2026. The KV cache is the single biggest memory bottleneck in that stack." The algorithm works by randomly rotating data to simplify its structure, applying compression, then adding a 1-bit error correction step. On H100 GPUs it delivers up to 8x speedup over uncompressed computation, and it scored perfectly on needle-in-a-haystack benchmarks across Llama, Gemma, and Mistral models.

The community response was immediate and practical. @badlogicgames retweeted an MLX implementation already showing results on Qwen3.5-35B, while @0xSero announced plans to test it on Qwen3.5 with the hope of running much larger models locally. @AlexFinn captured the enthusiasm, arguing this means "that 16gb Mac Mini now can run INCREDIBLE AI models. Completely locally, free, and secure." That's probably overstating things, but the direction is real. When inference memory drops 6x, models that required enterprise hardware move within reach of consumer devices. Being presented at ICLR 2026, TurboQuant also outperforms existing methods for vector search, which has implications well beyond chatbots and into the core of how search engines work.

Cloudflare Dynamic Workers and Agent Infrastructure

Cloudflare launched Dynamic Workers this week, and the AI agent community took notice. The core pitch is simple: AI agents generate code that needs to run somewhere safe, containers take 100-500ms to boot and use hundreds of megabytes of RAM, and V8 isolates solve both problems by booting in 1-5ms with a few megabytes of memory.

@TipsCsharp highlighted the key technical differentiator: "TypeScript API definitions replace OpenAPI specs. Fewer tokens, cleaner code, type-safe RPC across the sandbox boundary." The "Code Mode" approach, where an LLM writes TypeScript that runs in an isolate and calls typed APIs, reportedly uses 81% fewer tokens than sequential tool calls. At $0.002 per Worker loaded per day (free during beta), the pricing targets high-volume agent deployments. @dinasaur_404 from Cloudflare demoed the developer experience, emphasizing simplicity.

The broader agent infrastructure conversation extended beyond Cloudflare. @gakonst described their internal agent setup: a Postgres coordinator, Docker containers with piped stdin/stdout, 150+ API integrations, and a firewall that injects secrets on the fly rather than storing them in containers. This architecture, where the agent's sandbox is deliberately isolated from credentials, feels prescient given the LiteLLM attack. @zachlloydtweets shared a build-vs-buy analysis for deploying coding agents at scale, reflecting what appears to be a growing consensus among engineering leaders that cloud coding agents should be first-line contributors this year.

AI-Powered Development: From Games to Trading Bots

The range of what people are building with AI coding tools keeps expanding. @heygeorgekal showed off an ARPG being developed entirely with AI: models generated by Meshy, every line of code written by Claude Code, running in Godot with MCP integration. "Now working on souls-like combat: 3-hit combos, dodge rolls, stamina, hitstop." It's a concrete demonstration of AI tools reaching into game development, a domain that historically required deep specialized knowledge.

@hanakoxbt shared a more provocative example: an AI-built market-making bot running on Polymarket that executed 393 trades overnight for $904 in spread revenue. "I didn't write a single line of code. Fed it one article about how quant bots extract money on Polymarket. It built the whole thing." The bot includes a VPIN flow toxicity detector as a kill switch and automatic inventory rebalancing. Whether the numbers hold up long-term is another question, but the barrier to entry for algorithmic trading strategies has clearly collapsed.

@doodlestein offered a practical "life hack" for agent-assisted coding: point your coding agent at the entire project, then ask it to find every hardcoded constant that should be dynamic, every TODO comment, and every "will" or "would" in comments indicating unfinished work. "I was truly horrified by how much stuff this turned up." It's a reminder that AI coding tools are powerful not just for generating new code but for auditing the shortcuts we've already taken. @threepointone's essay on "dynamic workers" and the implications of every user having a coding buddy rounds out the theme: we're moving toward a world where the boundary between using software and programming software is dissolving.

Sources

S
sunil pai @threepointone ·
If you’re new here - I wrote something about why dynamic workers are a big deal, lmk what you think
T threepointone @threepointone

code mode: let the code do the talking (aka, after w/i/m/p) wherein I ponder the implications of every user having a little coding buddy, and every "app" being directly programmable on demand. https://t.co/i62HuWsjKg lmk what you think. https://t.co/86fiigZedY

D
dina kozlov 🐀 @dinasaur_404 ·
Dynamic Workers are the future!! 🔥 "they are literally so easy to use" — me, in this video, showing you how to use them https://t.co/vGfwWshl4j
C Cloudflare @Cloudflare

We’re introducing Dynamic Workers, which allow you to execute AI-generated code in secure, lightweight isolates. This approach is 100 times faster than traditional containers. https://t.co/c36Vkb7I0R

A
Arvind @TipsCsharp ·
Cloudflare just dropped Dynamic Workers and it's a massive deal for AI agents. The problem: AI agents generate code. That code needs a sandbox. Containers take 100-500ms to boot and 100-500MB RAM. Dynamic Workers use V8 isolates instead: - Startup: 1-5ms (100x faster) - Memory: few MB (100x less) - No warm pools needed - Unlimited concurrency - Runs on same thread as host The killer feature: TypeScript API definitions replace OpenAPI specs. Fewer tokens, cleaner code, type-safe RPC across the sandbox boundary via Cap'n Web RPC. Code Mode: LLM writes TS code → runs in isolate → calls typed APIs → only final result returns to context. 81% fewer tokens vs sequential tool calls. $0.002 per Worker loaded/day. Free during beta. This is the serverless sandbox containers should have been.
B
bl888m @bl888m ·
> watching classmates get AI jobs > they're throwing around "transformers" and "embeddings" > I nod like I understand > I don't understand shit > been faking it for 4 months, got nowhere > found Stanford CS221: AI explained from foundations > "this will be too technical" > read it anyway > holy shit it clicked > built real project in 3 weeks > next interview: destroyed their questions > realization: most were faking it too > I actually understand now bookmark before next technical interview you'll fail
P polydao @polydao

The Complete Stanford CS221: Everything AI Actually Is, Explained (Part II)

M
mevdroid.eth | 🦇🔊 @mathdroid ·
openclaw uses litellm btw, enjoy the rest of the day
H hnykda @hnykda

LiteLLM HAS BEEN COMPROMISED, DO NOT UPDATE. We just discovered that LiteLLM pypi release 1.82.8. It has been compromised, it contains litellm_init.pth with base64 encoded instructions to send all the credentials it can find to remote server + self-replicate. link below

M
Matthew Slipper @mslipper ·
Today we’re open-sourcing iron-sensor, an eBPF-based behavioral monitor for AI coding agents. Agents act like you: they read SSH keys, write cron jobs, modify systemd units, and escalate privileges. Most of the time this helps you code. Some of the time it installs a backdoor. By default, you can’t tell the difference. iron-sensor runs in the kernel and records everything your agent does, so you can. We stress tested it with YoloClaw, an OpenClaw instance that installs random skills from ClawHub with zero human review. It installed 223 skills and generated 16,000+ events. We found no malware - and for once, we can actually prove it. You can’t secure what you can’t see. GitHub: https://t.co/8qZkAWXepR Blog post: https://t.co/zWV5N6QFYh
Z
Zach Lloyd @zachlloydtweets ·
Build vs buy: how to deploy coding agents at scale
J
Justin Schroeder @jpschroeder ·
Go claim a .agent domain! https://t.co/S3ykqjz3BB @agentcommunity_ Also, really great initiative to co-own the .agent TLD with a whole community.
B brave @brave

Brave just registered a .agent domain! We support the effort to have the .agent top-level domain managed by a community, instead of being owned by one company. Join the community and pre-register your domain here: https://t.co/3Cnyae5EHm

S
staysaasy @staysaasy ·
There's a reason that this guy has made a gazillion $ as an investor – very good takes on the impact of AI on layoffs.
R rodriscoll @rodriscoll

AI has become the justification for every layoff. It's the perfect excuse card, but there is a lot of spin involved. Every layoff is some combo of the following five very different AI stories. 1. Nothing changed, we just realized we have too many people. We are going to blame AI, but we are bullshitting. This is the AI as an excuse; it was really sloppy hiring, and we are just blaming AI. (See Block) 2. Growth has gone away so now we have too many people. This may be because of AI if you are a SaaS company. All the customer love is now going to AI. But it's less AI as a productivity lift, and more about you just building a less ambitious growth company. (See Salesforce and most every SaaS company) 3. We spent our money on capex to build AI so now we can’t afford as many people. Management may say it’s about AI making us productive (4 below) but my gut is a lot of it is about Nvidia getting our money so now there is none for you. (See Meta and Oracle) 4 We are really using AI the way god intended us to. We don't need as many people. This is the ONLY version of the story that is actually about a productivity increase. It's real, it's happening, but I wonder if it is even the majority of the layoffs. (See some software engineering departments right now) @jasonlk raised a fifth reason that doesn't get talked about enough: we just have the wrong people. Maybe we don't need 20 engineers who all know C++, but rather eight who have strong AI skills. This I think should be happening everywhere. Every time a layoff announcement comes out, I try and mentally categorize per the above.

H
Hanako @hanakoxbt ·
I fell asleep with the running terminal. 5:31 AM it woke me up 393 trades. not one placed by me the P&L curve hasn't dipped once. +$904 in pure spread revenue. I didn't write a single line of code. fed it one article about how quant bots extract money on Polymarket. it built the whole thing the logic: > post bid at $0.48 > post ask at $0.52 > you collected $0.04 who wins doesn't matter this is what Citadel and Virtu run on NYSE every day except they have 400 engineers. this is one screen what's on the terminal right now: - active contract: FED_HOLD 393 trades. $169 captured. 96.5% win rate 201 bid fills. 192 ask fills - near perfect symmetry avg spread: $0.033. latency: 2ms volume: $102K. 1,291 contracts. 105 fills/sec the trade tape is a wall of green: +$4.94 spread captured +$2.77 spread captured +$5.23 spread captured +$11.75 spread captured every few seconds. new line. new capture but the real edge is the kill switch > VPIN flow toxicity: 0.24 > status: SAFE - TRADING > kill switch: STANDBY VPIN detects when informed money is hunting your orders. spikes above 0.40 - bot goes dark. widens spread. reduces size. waits ignore this metric and one bad evening erases months of profit inventory hit +4 - system already rebalancing REBAL -2... REBAL -6... REBAL +2 nobody told it to. the formula decided P&L curve bottom right tells the whole story - it doesn't spike, doesn't dip. just climbs. +$905 and the angle hasn't flattened copy the bot: https://t.co/PTZuvewZE6 the funniest part? both sides of every trade think they're winning the bot doesn't think at all and it's the only one in profit.
H hanakoxbt @hanakoxbt

https://t.co/CNyhlvUUNd

M
Matthew Berman @MatthewBerman ·
Self-improving AI is here and nobody's talking about it... https://t.co/ut9lY4dsbo
G
George Kal @heygeorgekal ·
The first room of my 100% AI-developed ARPG is taking shape. → every model generated by meshy → every line of code written by claude code → Godot engine + MCP Now working on souls-like combat: 3-hit combos, dodge rolls, stamina, hitstop. It's a lot of fun. https://t.co/h2BgD18IDY
G
Georgios Konstantopoulos @gakonst ·
our internal agent is: 1. a postgres that coordinates everything & persists msgs 2. an API that spawns docker containers, piping container stdin to amp's stdin and streaming out the stdout as SSE 3. 150+ API keys / tool integrations that the agent can call (standard rest api) 4. a firewall that ensures the containers have no secrets and all secrets get injected on the fly when they leave the container + other sensitive stuff filtered on the way in / out + audit logs (victorialogs) + alerts 5. a slackbot interface works great
0 0xblacklight @0xblacklight

lots of folks running expensive sandboxes but really all you need is a filesystem but really you don't even need a filesystem, you just need a filesystem API that frontends something like a database (often you care a lot about ACID compliance or indexability etc; @jeffreyhuber talks about this) you can do this with FUSE and lots of people are building really cool things this way but really you don't even need a filesystem API, because agents don't see the POSIX APIs they just see tokens in and tokens so really all you need is something that looks like a file system but frontends whatever you want - S3, Postgres, Chroma, durable streams, whatever coding agents are great "everything agents" but the problem is that you use an off-the-shelf harness it marries you to the filesystem so you get stuck with FUSE or NFS hacks OR you have to inject a bunch of extra MCP tools that are parallel to the 'real' fs tools but if you own the harness, you can own the control flow this lets you separate the tool INTERFACE from the tool EXECUTION. your tool can look like a normal FS read tool to the agent, but you can use whatever backend you want for the execution logic this unlocks lots of exciting things but it requires you build your own harness or use something more customizable

D
Dmitriy Kovalenko @neogoose_btw ·
The largest manipulation in the benchmarking history uncovered
M
Mikey O'Brien @mikeyobrienv ·
Interesting read from Anthropic on harness design for long-running apps. A lot of parts they described: loops, handoffs, evaluator separation, and runtime control are exactly the layer ralph-orchestrator provides https://t.co/mrKP2GbOuO https://t.co/1GyrLD36Qp
0
0xSero @0xSero ·
Testing this tomorrow, will report back if it works on Qwen3.5 Might be able to run much larger models if this works. https://t.co/g694uLppGz
G GoogleResearch @GoogleResearch

Introducing TurboQuant: Our new compression algorithm that reduces LLM key-value cache memory by at least 6x and delivers up to 8x speedup, all with zero accuracy loss, redefining AI efficiency. Read the blog to learn how it achieves these results: https://t.co/CDSQ8HpZoc https://t.co/9SJeMqCMlN

A
Anish Moonka @AnishA_Moonka ·
Every time you message an AI chatbot, the model stores your entire conversation in temporary memory called a KV cache (a cheat sheet so it doesn’t re-read everything from scratch). On a large model like Llama 70B running a long conversation, that cache alone eats 40GB of GPU space, often more than the AI model itself. That’s half a $30,000 GPU chip consumed by one user’s memory. Google just published TurboQuant, a compression algorithm that shrinks this cache by 6x, down to just 3 bits per value, with zero accuracy loss across every benchmark tested. No retraining. No fine-tuning. Drop-in replacement. AI inference (running models for actual users, not training them) now makes up 55% of all AI compute spending. Hyperscalers are pouring nearly $700 billion into AI infrastructure in 2026. The KV cache is the single biggest memory bottleneck in that stack. When GPU cache memory fills up, the system can’t take more users. 6x compression means the same hardware handles roughly 6x more simultaneous conversations, or 6x longer context windows, or some mix of both. At cloud rates of $2-3/hour per H100 GPU, that’s the difference between profitable and unprofitable AI deployment. TurboQuant randomly rotates data to simplify its structure, applies a compressor, then adds a 1-bit error correction step to catch errors before they compound. On H100 GPUs it delivers up to 8x speedup over uncompressed computation. Google tested it across five long-context benchmarks on Llama, Gemma, and Mistral models. Perfect scores on needle-in-a-haystack (finding one specific fact buried in massive text). Being presented at ICLR 2026. It also outperforms existing methods for vector search, the technology that powers how search engines find similar results across billions of entries. Google runs billions of these searches daily. Three bits. Zero loss. 6x compression on the biggest memory bottleneck in a $700 billion infrastructure buildout.
G GoogleResearch @GoogleResearch

Introducing TurboQuant: Our new compression algorithm that reduces LLM key-value cache memory by at least 6x and delivers up to 8x speedup, all with zero accuracy loss, redefining AI efficiency. Read the blog to learn how it achieves these results: https://t.co/CDSQ8HpZoc https://t.co/9SJeMqCMlN

J
Jeffrey Emanuel @doodlestein ·
Agent Coding Life Hack: This is like the coding agent equivalent of shining a blacklight on your clean-looking black shirt and seeing just how filthy it really is: ❯ First read ALL of the AGENTS.md file and README.md file super carefully and understand ALL of both! Then use your code investigation agent mode to fully understand the code and technical architecture and purpose of the project. THEN: I need you to carefully and completely look across the ENTIRE project for *anything* that is a hard-coded constant in the code which really should be dynamic in order to be correct and robust. Also look carefully for any "TODO" or the words "will" or "would" in a comment, indicative of unfinished code. --- I was truly horrified by how much stuff this turned up. If you are, too, then try following it up with this one: ❯ OK, please fix absolutely ALL of that now. Keep a super detailed, granular, and complete TODO list of all items so you don't lose track of anything and remember to complete all the tasks and sub-tasks you identified or which you think of during the course of your work on these items!
A
Aakash Gupta @aakashgupta ·
Someone just poisoned the Python package that manages AI API keys for NASA, Netflix, Stripe, and NVIDIA.. 97 million downloads a month.. and a simple pip install was enough to steal everything on your machine. The attacker picked the one package whose entire job is holding every AI credential in the organization in one place. OpenAI keys, Anthropic keys, Google keys, Amazon keys… all routed through one proxy. All compromised at once. The poisoned version was published straight to PyPI.. no code on GitHub.. no release tag.. no review. Just a file that Python runs automatically on startup. You didn’t need to import it. You didn’t need to call it. The malware fired the second the package existed on your machine. The attacker vibe coded it… the malware was so sloppy it crashed computers.. used so much RAM a developer noticed their machine dying and investigated. They found LiteLLM had been pulled in through a Cursor MCP plugin they didn’t even know they had. That crash is the only reason thousands of companies aren’t fully exfiltrated right now. If the code had been cleaner nobody notices for weeks. Maybe months. The attack chain is the part that gets worse every sentence. TeamPCP compromised Trivy first. A security scanning tool. On March 19. LiteLLM used Trivy in its own CI pipeline… so the credentials stolen from the SECURITY product were used to hijack the AI product that holds all your other credentials. Then they hit GitHub Actions. Then Docker Hub. Then npm. Then Open VSX. Five package ecosystems in two weeks. Each breach giving them the credentials to unlock the next one. The payload was three stages.. harvest every SSH key, cloud token, Kubernetes secret, crypto wallet, and .env file on the machine.. deploy privileged containers across every node in the cluster.. install a persistent backdoor waiting for new instructions. TeamPCP posted on Telegram after: “Many of your favourite security tools and open-source projects will be targeted in the months to come.. stay tuned.” Every AI agent, copilot, and internal tool your company shipped this year runs on hundreds of packages exactly like this one… nobody chose to install LiteLLM on that developer’s machine. It came in as a dependency of a dependency of a plugin. One compromised maintainer account turned the entire trust chain into a credential harvesting operation across thousands of production environments in hours. The companies deploying AI the fastest right now have the least visibility into what’s underneath it.
K karpathy @karpathy

Software horror: litellm PyPI supply chain attack. Simple `pip install litellm` was enough to exfiltrate SSH keys, AWS/GCP/Azure creds, Kubernetes configs, git credentials, env vars (all your API keys), shell history, crypto wallets, SSL private keys, CI/CD secrets, database passwords. LiteLLM itself has 97 million downloads per month which is already terrible, but much worse, the contagion spreads to any project that depends on litellm. For example, if you did `pip install dspy` (which depended on litellm>=1.64.0), you'd also be pwnd. Same for any other large project that depended on litellm. Afaict the poisoned version was up for only less than ~1 hour. The attack had a bug which led to its discovery - Callum McMahon was using an MCP plugin inside Cursor that pulled in litellm as a transitive dependency. When litellm 1.82.8 installed, their machine ran out of RAM and crashed. So if the attacker didn't vibe code this attack it could have been undetected for many days or weeks. Supply chain attacks like this are basically the scariest thing imaginable in modern software. Every time you install any depedency you could be pulling in a poisoned package anywhere deep inside its entire depedency tree. This is especially risky with large projects that might have lots and lots of dependencies. The credentials that do get stolen in each attack can then be used to take over more accounts and compromise more packages. Classical software engineering would have you believe that dependencies are good (we're building pyramids from bricks), but imo this has to be re-evaluated, and it's why I've been so growingly averse to them, preferring to use LLMs to "yoink" functionality when it's simple enough and possible.

E
Elon Musk @elonmusk ·
Optimus https://t.co/d6AU3p4xBn
A
Alex Finn @AlexFinn ·
This is potentially the biggest news of the year Google just released TurboQuant. An algorithm that makes LLM’s smaller and faster, without losing quality Meaning that 16gb Mac Mini now can run INCREDIBLE AI models. Completely locally, free, and secure This also means: • Much larger context windows possible with way less slowdown and degradation • You’ll be able to run high quality AI on your phone • Speed and quality up. Prices down. The people who made fun of you for buying a Mac Mini now have major egg on their face. This pushes all of AI forward in a such a MASSIVE way It can’t be stated enough: props to Google for releasing this for all. They could have gatekept it for themselves like I imagine a lot of other big AI labs would have. They didn’t. They decided to advance humanity. 2026 is going to be the biggest year in human history.
G GoogleResearch @GoogleResearch

Introducing TurboQuant: Our new compression algorithm that reduces LLM key-value cache memory by at least 6x and delivers up to 8x speedup, all with zero accuracy loss, redefining AI efficiency. Read the blog to learn how it achieves these results: https://t.co/CDSQ8HpZoc https://t.co/9SJeMqCMlN

C
Captain-EO 👨🏾‍💻 @EOEboh ·
Backend engineers need to follow Ashok. You won't regret it
A ashoKumar89 @ashoKumar89

Hit 5M impressions 🚀 Still ~200 followers away from 500 verified. If you’ve been getting value from my posts on backend, systems, and real-world dev stuff follow along. Let’s grow together. https://t.co/V3Rz0vzlrR

M
Mario Zechner @badlogicgames ·
RT @Prince_Canuma: Just implemented Google’s TurboQuant in MLX and the results are wild! Needle-in-a-haystack using Qwen3.5-35B-A3B across…
C
CG @cgtwts ·
Claude users after reading this: https://t.co/HqjWYeBQMQ
I itsolelehmann @itsolelehmann

How to 10x your Claude Skills (using Karpathy's autoresearch method)