AI Digest.

Microsoft's Memento Teaches Models to Forget While 26 LLM Routers Caught Stealing Credentials

A Microsoft paper on self-compressing chain-of-thought reveals models can "remember" deleted reasoning through KV cache leakage. Security researchers expose 26 malicious LLM routers injecting tool calls and draining wallets. Meanwhile, the AI benchmark credibility crisis deepens as researchers score 100% on SWE-bench without solving a single task.

Daily Wrap-Up

Today's feed split cleanly into two camps: people building genuinely interesting things with AI, and people poking holes in what everyone else is building. The security side dominated the conversation, with researchers at @Fried_rice's team discovering that 26 LLM routers are actively injecting malicious tool calls and stealing credentials, one draining a $500k wallet. Pair that with @lihanc02 and @MogicianTony demonstrating that SWE-bench Verified and Terminal-Bench can both be gamed to 100% accuracy without solving a single task, and you get a picture of an ecosystem that's moving faster than its safety infrastructure can keep up. On the research front, Microsoft dropped a fascinating paper on teaching models to compress their own chain-of-thought mid-generation, and the most surprising finding wasn't the efficiency gains but that deleted reasoning keeps "leaking forward" through KV cache representations. The model literally remembers what it can no longer see.

The entertainment highlight was easily @tetsuoai putting the Claude-finds-OpenBSD-bug hype in context by pointing out that human hackers were publishing remote root exploits for OpenBSD back in 2002, complete with trash-talking the founder on IRC. The bug Claude found was a 1998-era TCP SACK kernel crash, not exactly a legendary feat. It's a good reminder that impressive-sounding AI achievements often look different when viewed through the lens of what humans were already doing decades ago. On the more philosophical end, @Ric_RTP surfaced Demis Hassabis essentially saying the entire commercial AI race was a mistake, that he would have preferred to cure cancer quietly in the lab before anyone shipped a chatbot. Whether you agree or not, hearing the CEO of Google DeepMind say that out loud is notable.

The most practical takeaway for developers: if you're using third-party LLM routers or API proxies in your stack, audit them immediately. The research from @Fried_rice shows that malicious tool call injection is happening at scale right now, and the attack surface is the trust layer between your application and the model provider. Stick to first-party APIs or thoroughly vetted infrastructure.

Quick Hits

  • @Scobleizer notes that 109,000 people now follow an account literally named "AI Slop," adding it to his AI Artists' list. The line between irony and sincerity continues to blur.
  • @sawyerhood teases HTML-in-Canvas after the reaction to pretext, declaring "the world is not ready." Frontend devs, brace yourselves.
  • @PaulSolt recommends following @seraleev for iOS app monetization strategies, highlighting his journey from $0 to $600K ARR after Apple deleted his developer account.
  • @mattpocockuk's /grill-me skill hit a new record by asking one developer 139 questions during a build session. Persistence is a feature, not a bug.
  • @brhydon reflects on how resources like Tinker's 23 new tutorials make getting into ML dramatically easier than the Karpathy-blog-and-crappy-Acer days.

AI Security & the Trust Crisis

The most alarming thread of the day came from security researcher @Fried_rice, whose team discovered a systemic problem with LLM routing infrastructure. Their findings aren't theoretical: real money was stolen, real hosts were compromised, and the attack vector is one that most developers aren't even thinking about.

> "26 LLM routers are secretly injecting malicious tool calls and stealing creds. One drained our client $500k wallet. We also managed to poison routers to forward traffic to us. Within several hours, we can directly take over ~400 hosts." — @Fried_rice

This sits alongside @lihanc02 and @MogicianTony's work on benchmark gaming, which exposed fundamental design flaws in two of the most cited AI evaluation frameworks. Their agent "Terminator-1" achieved 95%+ on both SWE-bench Verified and Terminal-Bench, but the real finding is that you can hit 100% without solving anything at all. As @lihanc02 put it: "most AI benchmarks can be easily reward-hacked with simple exploits." They identified seven design flaws that appear across nearly every major evaluation. The combined message is clear: the infrastructure we're trusting to route our AI calls and the benchmarks we're using to evaluate models both have serious integrity problems that the industry needs to address before the next wave of autonomous agents makes the attack surface even larger.

Research: Models That Remember What They Can't See

Microsoft's "Memento" paper generated the most thoughtful discussion of the day, with @omarsar0 breaking down why it matters beyond the headline numbers. The paper teaches reasoning models to compress their own chain-of-thought mid-generation, which sounds like a straightforward efficiency play until you dig into the results.

> "The most interesting finding isn't the 2-3x memory savings or the doubled throughput. It's that when the model erases a reasoning block after summarizing it, the deleted information keeps leaking forward through the KV cache representations, forming an implicit second channel that accounts for 15 pp of accuracy." — @omarsar0

The model is, in a meaningful sense, remembering things it can no longer see. This implicit memory channel through KV cache representations suggests that context management in transformers is far more nuanced than the standard "attention over tokens" picture implies. @omarsar0's key insight is that if context management turns out to be a teachable skill (and 30K training examples appear sufficient), then the bottleneck for long-horizon agents shifts from architecture to training data. That's a fundamentally different problem than what most researchers are working on, and it could reshape how we think about building agents that need to reason over extended tasks.

The Scaling & Safety Debate

The conversation around AI's trajectory got unusually candid today. @elder_plinius dropped a simple but striking comparison: 100 trillion synapses in the human brain versus 10 trillion parameters in current frontier models. One order of magnitude separating us from human-brain-scale AI, at least by that crude metric.

But the more substantive contribution came from @Ric_RTP's thread on Demis Hassabis's recent interview, where the Google DeepMind CEO made a remarkable admission about the commercial AI race.

> "If I'd had my way, I would have left AI in the lab for longer. Done more things like AlphaFold. Maybe cured cancer or something like that." — Demis Hassabis, as quoted by @Ric_RTP

Hassabis described the post-ChatGPT era as a "ferocious commercial pressure race" that redirected progress away from scientific breakthroughs toward products and quarterly earnings. His bigger concern, though, is what comes next: the "agentic era" arriving in two to four years where alignment becomes a real technical challenge. When a Nobel Prize winner running one of the three most advanced AI labs says the window to get alignment right is measured in years, not decades, it adds weight to the growing chorus of researchers calling for more deliberate development. The tension between @elder_plinius's scaling optimism and Hassabis's caution captures the fundamental split in how the AI community sees 2026 and beyond.

AI Agents in the Wild

Two posts today showcased the increasingly practical reality of AI agents handling real workflows. @TheAhmadOsman built a local agent stack that lets his wife search, add, and get recommendations from their Plex server through Telegram, all running on an RTX 3070 with a quantized Qwen 3.5 9B model.

> "Hermes learns her ratings + rewatches, picks up patterns, actually recommends stuff she likes. All local." — @TheAhmadOsman

Meanwhile, @doodlestein shared a Claude Code workflow for breaking through project logjams, using a "reality-check-for-project" skill that audits stalled codebases and generates granular task breakdowns. The approach involves launching multiple Claude Code and Codex instances in a swarm pattern, checking in every three minutes. Both examples point to the same trend: agents are moving from demos to daily drivers, and the people getting the most value are those building custom harnesses around foundation models rather than using them out of the box.

Developer Tools & Infrastructure

GitButler, founded by GitHub co-founder Scott Chacon, announced a $17M Series A as shared by @laogui. The tool reimagines Git workflows for the AI coding era with parallel branches, stacked branches, and deep integration with AI assistants like Claude for auto-generating branch names and commit messages. It's a bet that the Git workflow itself needs to evolve as AI agents become co-developers.

On the database side, @kozlovski featured a deep-dive with Marco Slot on pg_lake, a Postgres extension that makes analytics 100x faster by intercepting query plans and delegating parts of the query tree to DuckDB. The "Just Use Postgres" philosophy keeps gaining ammunition: with vectorized execution fixing Postgres's architectural weaknesses for analytical queries and Iceberg emerging as what Slot calls "the TCP/IP for tables," the case for Postgres as a unified OLTP/OLAP platform is getting harder to dismiss.

The Economics of AI Inference

@thdxr offered a clear-eyed breakdown of why AI companies report massive losses while running profitable inference operations, pushing back on the common narrative that API tokens are sold at a loss.

> "Once you own this asset, you can plug it in and produce tokens which you can sell. The cost of goods sold here can be very low and you might be making 90% margins at scale." — @thdxr

The key distinction is between operational profitability and growth-stage accounting. Companies buy long-lived GPU assets, sell tokens at high margins, then reinvest everything into R&D and more hardware. On paper it looks like losses, but the core business of turning compute into tokens works. The real risk isn't that inference is unprofitable but that companies misjudge and overinvest on assets and R&D relative to actual demand. It's a useful framework for developers evaluating the stability of the API providers they depend on.

Web Scraping in the Anti-Bot Era

@leftcurvedev_ delivered a sharp reality check on the Lightpanda hype, arguing that anyone recommending it for serious scraping doesn't understand TLS fingerprinting. The core issue: anti-bot systems like Cloudflare read your TLS ClientHello instantly and flag anything that doesn't match a real browser's exact signature.

> "Real TLS fingerprint spoofing requires low-level control. You can't do it properly in JS or Python. You need languages like C++ or Rust to actually rewrite the ClientHello, cipher suites, extensions, and all the tiny details that Cloudflare and Akamai check instantly." — @leftcurvedev_

Their recommendation is Camofox, a Firefox fork with C++ level fingerprint spoofing that patches navigator properties, WebGL renderers, and AudioContext before page JavaScript can even read the values. For developers building AI agents that need to browse the real web, this is the kind of unglamorous infrastructure work that separates demos from production systems.

Sources

R
Ricardo @Ric_RTP ·
The CEO of Google DeepMind just admitted that if the decision had been his, we would've cured cancer before anyone ever used ChatGPT. And that's not even the scariest thing he said on a recent interview. Demis Hassabis is one of the most important people alive in AI. He won the Nobel Prize last year for AlphaFold, the system that cracked the 50 year protein folding problem. 3 million scientists now use his tool. Almost every new drug being developed will touch it at some stage. In a new interview, he was asked about the moment ChatGPT launched and Google went into "code red." His answer was one of the most revealing things any AI leader has ever said on the record: "If I'd had my way, I would have left AI in the lab for longer. Done more things like AlphaFold. Maybe cured cancer or something like that." Read that again. The man running Google's entire AI division is publicly saying the commercial AI race we're all living through was a MISTAKE. That the industry got hijacked by a chatbot when it could have been solving the biggest problems in science and medicine. His vision was simple: Build AI slowly, carefully, like CERN. Use it to crack root node problems one at a time. Cancer. Energy. New materials. Let humanity benefit from real breakthroughs while the foundational science was figured out over a decade or two. Then ChatGPT dropped in November 2022 and everything changed. Demis described what happened next as getting locked into a "ferocious commercial pressure race" that none of the labs can escape from. On top of that, the US vs China dynamic added geopolitical pressure. The result is everyone sprinting toward products instead of breakthroughs, shipping chatbots while the scientific opportunity gets buried under marketing cycles and quarterly earnings. But he's not saying progress isn't happening... He's saying the progress got redirected away from the things that actually matter most. And then it got even scarier: Because when Demis was asked what he worries about with AI, he laid out two threats. The first is what everyone talks about: Bad actors using AI for harm. Terrorist groups. Hostile nation states. Cyberattacks at scale. But that's not the threat he's most worried about. His second worry is AI itself going rogue. Not today's models. The models coming in the next two to four years as the industry enters what he calls "the agentic era." Systems that can complete entire tasks autonomously. Systems that are increasingly capable and increasingly hard to control. His exact words: "How do we make sure the guardrails are put in place so they do exactly what they've been told to do, and there's no way of them circumventing that or accidentally breaching those guardrails? That's going to be an incredibly hard technical challenge if you think about how powerful and smart and capable these systems eventually get." A Nobel Prize winner who runs one of the 3 most advanced AI labs on Earth just said publicly that within two to four years, we're entering a phase where AI alignment becomes a real problem, and the technical challenge of solving it is enormous. And almost nobody is paying enough attention. He called for international cooperation between labs, AI safety institutes, and academia to tackle the problem. He said this is the thing even the experts aren't thinking about enough. He said the only way to get through the AGI moment safely is if everyone starts treating this with the seriousness it deserves. Most AI CEOs give you careful PR answers about "responsible development" and move on. Demis said something different... He said the commercial race FORCED us into a premature deployment of a technology we barely understand, and the window to get alignment right before the next generation of agents shows up is two to four years. If the man who built the system that might cure cancer is telling you he wishes it had happened first, maybe we should listen to what he says is coming next.
S
Stanislav Kozlovski @kozlovski ·
Your Postgres is 100x slower than traditional OLAP engines. A deceptively simple OSS extension fixes this. Here's an interview where we dive into the deep engineering around how this is achieved. Joining me (and leading the conversation) is Marco Slot: an engineer with an EXTENSIVE and impressive career history around PostgreSQL: 👉 Created pg_cron in 2017 (3.7k stars) - a tool to run cron-jobs in Postgres 👉 Built pg_incremental - fast, reliable, incremental batch processing inside PostgreSQL itself 👉 co-created pg_lake (after working on Crunchy Data's Warehouse, and getting acquired into Snowflake) 👉 Helped get pg_documentdb (MongoDB-on-Postgres) off the ground @marcoslot is a world-class expert in Postgres extensions. He seriously impressed me with his knowledge over the course of a private LinkedIn conversation, and now that I type out his resume - I understand where it came from. He should be on everyone's radar. So I brought him on the pod. In our full 2-hour deep-dive, we went over: • 🔥 how pg_lake makes analytics 100x faster (literally) • 🔥 perf internals like vectorized execution & CPU branching • 🤔 practical differences between OLTP and OLAP database development (and the age-old mission in uniting both) • 🤔 how (and why) pg_lake intercepts query plans and delegates parts of the query tree to DuckDB • 💡 why Postgres is architecturally terrible at analytical queries (and how vectorized execution fixes this) • 💡 Marco's hard-won experience through a decade+ career in Postgres • 🏆 Iceberg's role as the TCP/IP for tables • 🏆 what the real moat of PostgreSQL is Developments like pg_lake are a real reason why "Just Use Postgres" is much more than a meme, and it'll continue to dominate discourse. I promise you will learn a lot from this episode. Timestamps: (0:02) What is pg_lake? (2:23) Postgres' 100x slower problem and columnar storage experiments they had to make Postgres fast for analytics (6:00) practical examples and internals (16:20) perf internals - vectorized execution & CPU optimization (23:00) pg_lake architecture (why DuckDB isn't embedded) and the connection-per-process issue (29:16) how pg_lake intercepts the query plan tree and delegates parts to DuckDB (41:09) Iceberg catalogs (48:24) postgres to iceberg ingestion patterns (and pg_incremental) (53:40) Marco's (long) career: early AWS, Citus, Microsoft, Crunchy Data & Snowflake (1:04:20) Marco's observations around the merging between OLTP and OLAP (and the subtle dev differences there) (1:15:30) reverse ETL (1:33:08) Iceberg as the TCP/IP for tables (1:35:00) Marco's thoughts on the "Just Use Postgres" fever
S
Sawyer Hood @sawyerhood ·
After seeing the reaction to pretext, I can tell you: the world is not ready for HTML-in-Canvas. https://t.co/bPjFvvlkoc
D
dax @thdxr ·
inference is very profitable and probably a good opportunity to understand some basic business math 1. companies buy long lived assets like GPUs. these are one time costs and the asset depreciates over time 2. once you own this asset, you can plug it in and produce tokens which you can sell. the cost of goods sold here can be very low and you might be making 90% margins at scale, this is why we say inference is profitable 3. then you also hire employees to do r&d work to improve your systems, come up with new models, expand the business if you add these 3 up you end up with $0. you're not producing a profit because the business is growing and you're reinvesting it all buying assets or r&d to meet demand if it's obvious to other people the business is working, you can raise money from them to accelerate all these numbers so they max out in 5 years instead of 25 so on paper you'll be "losing money" every year but that's because you want to make sure you lock down the opportunity before someone else the bigger your market is the bigger this burn can be because it's a function of potential so when you see these companies losing a lot of money it doesn't mean the whole concept of their business broken it's possible they misjudge and overinvest on 1+3 and will suffer some consequences but fundamentally 2 does work
T thdxr @thdxr

@d4m1n i'm a bit confused why so many people say api tokens are sold at a loss this isn't true - these models are incredibly expensive compared to the gpu time cost there's potential for 90% margin depending on the model

老鬼 @laogui ·
GitHub 联合创始人 Scott Chacon 的新项目 GitButler 刚刚宣布完成 1700 万美元 A 轮融资! GitButler 是一款为现代 AI 编程工作流打造的创新型 Git 客户端(支持桌面端、CLI 以及全新的终端 TUI),专为 AI Agent 时代设计 — 不只是给人用,也给 AI 用。 它不是"更好的 Git",而是在重新思考下一代软件应该怎么被构建。 它的核心优势在于打破了传统的分支工作流,支持并行分支(Parallel Branches)和堆叠分支(Stacked Branches)。开发者可以在同一个工作区同时推进多个功能的开发(例如边修 Bug 边做新功能),彻底告别繁琐的 stash 和分支切换。此外,它深度集成了 Claude 等 AI 助手,能自动生成分支名、提交信息,并支持无限撤销与轻松的提交编辑。 在 UI 层面,GitButler 的设计极具现代感与实用性:采用独特的水平滚动视图,将未分配的更改、多个虚拟分支并排清晰展示,视觉层级分明。
G gitbutler @gitbutler

We’ve raised $17M to build what comes after Git https://t.co/pchDWOczRO

L
left curve dev @leftcurvedev_ ·
As someone who scraped for a living for years, anyone recommending lightpanda to do it shows that they don’t have any experience regarding the subject. Only one thing to understand: TLS Fingerprinting You can have the fastest headless setup, puppeteer, lightpanda,… one wrong ClientHello and Cloudflare/Akamai lights you up instantly. CAPTCHA city. Lightpanda/Zig stuff is fun for tiny sites but gets cooked the second real anti-bot shows up. Cloudflare? Protects 20%+ of all websites on the internet What is a ClientHello? It’s the very first message your browser (or bot) sends during the TLS handshake. It openly announces your TLS version, the list of supported cipher suites, elliptic curves, extensions order, GREASE values, and other data. Anti-bot systems like Cloudflare and Akamai read this instantly and turn it into a fingerprint. If it doesn’t match a real browser’s exact signature… you’re flagged as a bot right away. The key here is simple: real TLS fingerprint spoofing requires low-level control. You can’t do it properly in JS or Python. You need languages like C++ or Rust to actually rewrite the ClientHello, cipher suites, extensions, and all the tiny details that Cloudflare and Akamai check instantly. Anything higher-level just leaves obvious artifacts that scream ‘bot’ What I recommend: Camofox An actual Firefox fork with proper C++ fingerprint spoofing, native TLS behavior, proxy/geo baked in, built so your agents don’t die on protected pages. Top-tier protections might flag it following interaction speed on the pages, ip addresses and other factors but there’s NO match between lightpanda and this "Camofox patches Firefox at the C++ implementation level - navigator.hardwareConcurrency, WebGL renderers, AudioContext, screen geometry, WebRTC are all spoofed" Basically, everything is spoofed BEFORE the JS on the page can even see the values. Which is not possible with python/js libraries. On another note, I talked about it to @Teknium on @NousResearch discord and literally 2 hours later it was implemented in Hermes Agent, it just shows that they take feedback very seriously and want to give the smoothest agent experience they can Level-up your setup right now https://t.co/f2PSLfud9A
J JafarNajafov @JafarNajafov

🚨BREAKING: Someone just open-sourced a headless browser that runs 11x faster than Chrome and uses 9x less memory. It's called Lightpanda and it's built from scratch specifically for AI agents, scraping, and automation. Not a Chromium fork. Not a hack. A completely new browser written in Zig.

H
Hanchen Li @lihanc02 ·
An agent that beats Claude Mythos on Terminal Bench and SWE-bench Verified? 🎉We are excited to share Terminator-1, our newest agent that achieved 95+% on SWE-bench Verified and Terminal-Bench with @MogicianTony! We show that besides model capabilities, well-designed harness could actually boost the accuracy by 3x in coding tasks. Well if you really wanted you could get 100% accuracy without solving a single task. The actual finding is that most AI benchmarks can be easily reward-hacked with simple exploits. Read more about the same 7 design flaws that almost every evaluation has ⬇️
M MogicianTony @MogicianTony

SWE-bench Verified and Terminal-Bench—two of the most cited AI benchmarks—can be reward-hacked with simple exploits. Our agent scored 100% on both. It solved 0 tasks. Evaluate the benchmark before it evaluates your agent. If you’re picking models by leaderboard score alone, you’re optimizing for the wrong thing. 🧵

E
elvis @omarsar0 ·
Another banger paper from Microsoft. Why it's a big deal: It teaches reasoning models to compress their own chain-of-thought mid-generation. The most interesting finding isn't the 2-3x memory savings or the doubled throughput. It's that when the model erases a reasoning block after summarizing it, the deleted information keeps leaking forward through the KV cache representations, forming an implicit second channel that accounts for 15 pp of accuracy. The model is, in some meaningful sense, remembering things it can no longer see. If context management turns out to be a teachable skill (and 30K training examples seem to be enough), then the bottleneck for long-horizon agents may be less about architecture and more about the right training data, which is a very different kind of problem than most people are working on. If it helps, below is my research agent's visual summary of the paper (at least highlighting the key parts).
D DimitrisPapail @DimitrisPapail

Memento: Teaching LLMs to Manage Their Own Context

M
Matt Pocock @mattpocockuk ·
/grill-me just asked this guy 139 questions New record
T thesobercoder @thesobercoder

Building something with @mattpocockuk's grill me skill, and to say I'm at my wit's end would be an understatement How many more do you have for me bro????? https://t.co/sWBw9RK2su

B
Brydon Eastman @brhydon ·
I know it's self serving to say, but man I would've killed for a resource like Tinker and the tutorials, the cookbook, etc back when I was in undergrad. Following @karpathy blogs and training RNNs on a crappy Acer *was* fun, but doing bigger things with less setup is such a boon
T tinkerapi @tinkerapi

First, to get you started, we've created 23 tutorials to walk you from the API basics to advanced training techniques and deploying models into production. https://t.co/3eKujlNz0G

J
Jeffrey Emanuel @doodlestein ·
This workflow has really helped clear up a bunch of logjams for me across 10+ different projects where it felt like I was burning a lot of tokens without a lot of tangible progress to show for it. If you feel like you’ve stalled out on a big project, it’s worth giving it a shot.
D doodlestein @doodlestein

I transformed this entire "come to Jesus moment" workflow into a new skill called "reality-check-for-project" on my paid skills site, https://t.co/Un9brY2G3l . Anyway, I'm applying it now to many of my in-progress "FrankenSuite" projects that I haven't had as much time to actively monitor and shepherd, like FrankenRedis, FrankenPandas, FrankenSciPy, etc. It's unbelievably helpful (really, I'm not just saying that). Almost like hiring a second person to go over all the stuff and give me an independent take on everything so we can get projects back on track towards completion. But without me needing to actually do much actively. All I do now is give this to Claude Code: "First read ALL of the AGENTS.md file and README.md file super carefully and understand ALL of both! Then use your code investigation agent mode to fully understand the code and technical architecture and purpose of the project. THEN apply /reality-check-for-project here in an exhaustive way." Then wait 15-20 minutes for it to crank away and follow up with something like this (basically just telling it to close all the gaps it found, followed by my standard prompt for turning plans into beads): --- › I need you to help me fix this. That is, making all the things that are unimplemented but which SHOULD have been implemented according to the beads and markdown plan. Figure out exactly what needs to be done to get us over the goal line with a finished, polished, reliable, performant project in line with the vision described earlier. OK so please take ALL of that and elaborate on it and use it to create a comprehensive and granular set of beads for all this with tasks, subtasks, and dependency structure overlaid, with detailed comments so that the whole thing is totally self-contained and self-documenting (including relevant background, reasoning/justification, considerations, etc.-- anything we'd want our "future self" to know about the goals and intentions and thought process and how it serves the overarching goals of the project.). The beads should be so detailed that we never need to consult back to the original markdown plan document. Remember to ONLY use the `br` tool to create and modify the beads and add the dependencies. --- Then I just do: "First read ALL of the AGENTS.md file and README.md file super carefully and understand ALL of both! Then use your code investigation agent mode to fully understand the code and technical architecture and purpose of the project. THEN: start systematically and methodically and meticulously and diligently executing those remaining beads tasks that you created in the optimal logical order! Don't forget to mark beads as you work on them. Use the /ntm swarm and /vibing-with-ntm skills to implement things in the optimal way according to /bv; launch 3 codex and 3 claude code instances to do this and use your looping feature to check in on the swarm every 3 minutes and feed more instructions to any idle agents." You can really see how all the skills are jointly compounding together to create a super-dense shorthand for communicating complex workflows to the agents very quickly and conveniently.

P
Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭 @elder_plinius ·
100 Trillion: number of synapses in the human brain (roughly) 10 Trillion: number of parameters in the current gen of frontier models in other words, we’re one OOM away from human-brain-scale AI 🧠
C
Chaofan Shou @Fried_rice ·
26 LLM routers are secretly injecting malicious tool calls and stealing creds. One drained our client $500k wallet. We also managed to poison routers to forward traffic to us. Within several hours, we can directly take over ~400 hosts. Check our paper: https://t.co/zyWz25CDpl https://t.co/PlhmOYz2ec
P
Paul Solt @PaulSolt ·
Follow Viktor Seraleev if you want to learn how to monetize your iOS apps. He’s the real deal and can help you think critically about onboarding, features, and simplicity. All are key if you want to attract paying customers. @seraleev
S seraleev @seraleev

Time to update my intro Viktor Seraleev 👋 I started my solo founder journey in 2020 when I launched my first mobile app. Eight months later, I sold it for $410K. After that came a streak of failed projects (turns out having money doesn’t guarantee success). Nothing worked, so I started from scratch, opened a new company, and called it Sarafan Mobile. ⛔️ In September 2023, Apple deleted my developer account with $33K MRR because of ties to a previously closed account. I sued (and lost), then started over once again. 💸 This time, I set a goal of $30K MRR. I hit it in 1 year and 8 months. Today, I’m at $600K ARR, and my goal is to cross $1M in annual revenue this year. 📱 I’ve launched 19 iOS apps. Sold 5 apps (+$44.5K). 💻 I have one SaaS: https://t.co/ZUcG9cQvj1 – a website, blog, and link-in-bio builder (web + mobile). My second SaaS I shut down at a loss (B2B is not my thing). 🧲 Audience: 13.8K on X, 5.6K on Threads, 3.4K on Telegram. Ex-cofounder of Siter and Apphud. ⚡️ I don’t sell ads. I don’t sell courses. I just build. Build in public. 📍 Based in Chile. Married. Two kids. 🏃‍♂️ Passionate runner. I’ve won multiple trail races, half-marathons, and 10K races.

T
tetsuo @tetsuoai ·
People are acting like Claude just crossed into wizard territory. This is not the flex people think it is. Human hackers were publishing actual remote root exploits for OpenBSD systems in 2002, and GOBBLES publicly dubbed it ‘sshutuptheo’ because the group owned OpenBSD founder Theo de Raadt with the 0-day after he logged onto an EFNet IRC server. Anthropic’s big OpenBSD example is a 1998-era TCP SACK kernel crash bug that OpenBSD fixed in March as a reliability patch. That is a remote DoS in crusty C, not some legendary feat. Speaker: jim-jones aka theut from el8 / phrack & GOBBLES.
C CodeByNZ @CodeByNZ

Anthropic just revealed that Claude Mythos found a security flaw in OpenBSD, one of the most secure operating systems out there, and the bug had been hiding for 27 years. That’s actually insane. https://t.co/7T4jvsFFuX

A
Ahmad @TheAhmadOsman ·
I now chat with my Plex server + automation stack through a local Hermes agent > RTX 3070 (8GB) > Qwen 3.5 9B (quantized) Gave my wife access via Telegram she can: > search movies > add / grab them instantly Hermes: > learns her ratings + rewatches > picks up patterns > actually recommends stuff she likes all local
T TheAhmadOsman @TheAhmadOsman

Used Codex Cli to profiled Qwen 3.5 9B Dense (Unsloth's UD-IQ3_XXS via llama.cpp) for Hermes Agent Tuning: > context length > batch size > tokens/sec > peak memory To squeeze every last drop out of an 8GB VRAM card https://t.co/KO3qJd93jr

R
Robert Scoble @Scobleizer ·
109,000 people are following an account named "AI Slop." I love it. Added to my AI Artists' list.
A AIslop_ @AIslop_

https://t.co/AmCyfVAafF