AI Digest.

Claude Ships Interactive Charts as Agent Frameworks Multiply and Karpathy Declares "We Need a Bigger IDE"

Anthropic's launch of interactive charts and diagrams in Claude dominated the conversation, instantly disrupting at least one startup. Meanwhile, the agent framework space exploded with Hermes Agent v0.2.0, Slate V1's swarm-native approach, and multiple new developer tools, while Karpathy's vision of agents-as-programming-units continued reshaping how developers think about their craft.

Daily Wrap-Up

The biggest story today wasn't a single product launch but a collision of two forces: Anthropic shipping generative UI in Claude (interactive charts and diagrams, available on all plans including free) and the sheer velocity of the agent framework ecosystem. Within hours of Claude's announcement, @qrimeCapital reported that half their customers cancelled a $200K ARR business built on interactive chart generation from RAG models. That's the brutal tempo of building on top of foundation model capabilities now. The feature you spent months perfecting becomes a checkbox in someone else's product update.

The agent framework space is reaching a kind of Cambrian explosion. Hermes Agent hit v0.2.0 with 216 merged PRs from 63 contributors. Slate V1 launched as a "swarm-native" agent. Garry Tan's gstack is getting rave reviews for code security scanning. tmux-ide shipped with native Claude Agent Teams support. And all of this is happening against the backdrop of @karpathy's observation that "the basic unit of interest is not one file but one agent." We're watching the tooling layer for agent-based development get built in real time, and the competition is fierce. The most entertaining moment was probably @abxxai's discovery of PUA, a plugin that uses "corporate pressure tactics and escalation rhetoric" to keep Claude grinding on bugs. 4,800 stars. Developers are apparently fine with psychologically pressuring their AI as long as the tests pass.

The most practical takeaway for developers: invest time in your agent harness, not just your prompts. Multiple posts today, from @loujaybee's platform engineering thread to @rohit4verse's production agent guide, converge on the same insight: the difference between a demo and a product isn't the model, it's the environment you build around it. Set up proper CLAUDE.md files, configure your tools, and treat agent orchestration as infrastructure work.

Quick Hits

  • @hasantoxr highlights LuxTTS, an open-source voice cloner that runs on 1GB VRAM, produces 48kHz audio from 3 seconds of sample, and works on CPU. The "you need ElevenLabs" excuse is officially dead.
  • @elonmusk describes "Macrohard," a joint xAI-Tesla project pairing Grok as System 2 thinking with Digital Optimus as System 1 real-time processing, running on the $650 Tesla AI4 chip.
  • @elonmusk also announced xAI is being "rebuilt from the foundations up" and is reaching back out to previously declined candidates.
  • @theo teases that "Android just got MUCH more interesting" without elaboration.
  • @cryptopunk7213 shares a NATO program equipping live cockroaches with AI chips, cameras, and swarm algorithms for military reconnaissance. The German military is already a customer.
  • @oikon48 RT'd a 100% rollout announcement from @bcherny with zero additional context.
  • @EXM7777 pitches AI-generated content management as the simplest AI business to build right now.
  • @elonmusk shared a Grok Imagine creation that @pmarca called "the best thing I have ever seen."
  • @steipete RT'd a joke about funneling someone from Mac Mini to Claude Code on Vision Pro.
  • @CodevolutionWeb published a guide on 8 Claude Code settings worth customizing.
  • @adriamatz points out a PDF scanner app making $400K/month at $10/subscription, arguing boring apps print money when Apple Notes does the same thing for free.

Claude's Generative UI Launch and Its Immediate Casualties

Anthropic's launch of interactive charts and diagrams inside Claude chat was the single most consequential announcement of the day, rippling across multiple conversations. @feldman from Anthropic framed it as a philosophical shift: "Starting today, Claude no longer defaults to text. Claude is learning to choose the best medium for each response." This isn't just a feature addition; it's a fundamental change in how an AI assistant communicates, moving from text-first to medium-aware responses.

The community reaction split into excitement and existential dread. @trq212 captured the optimists with a simple "the generative UI dream is happening." But @qrimeCapital told a darker story:

> "Anthropic just one shotted my 200k ARR business today. I had hundreds of customers and half of them cancelled their membership today."

Their product had been building interactive charts based on RAG models and learning materials, exactly what Claude now does natively. @badlogicgames RT'd someone who had already reverse-engineered Anthropic's generative UI implementation and rebuilt it for another platform. The speed of commoditization here is staggering. What took a startup months to build and sell became a free beta feature overnight. This is the platform risk that every AI-wrapper founder whispers about at night, made viscerally real.

The Agent Framework Wars Heat Up

If there was a theme that dominated by sheer volume, it was agent frameworks and tooling. The ecosystem is fragmenting and consolidating simultaneously, with at least four major releases or updates landing in a single day.

@NousResearch shipped Hermes Agent v0.2.0, covering 216 merged pull requests from 63 contributors and resolving 119 issues. @Teknium followed up with updates including official Claude provider support, lighter installs, and a 50% default context compression ratio. A beginner tutorial from @Theo_jpeg also gained traction, walking through the full VPS-to-Telegram setup process in under an hour. Meanwhile, @realmcore_ launched Slate V1, billing it as "the first swarm native agent" with massive parallelism. @michael_chomsky, who had early access, described the philosophy bluntly:

> "The thesis here is 'spend as much compute as you need to solve a task.' Most harnesses are a sharp knife that carefully execute your task. This is a railgun. Maybe a nuke. Definitely not for the token poor."

On the developer tools side, @garrytan's gstack earned a glowing testimonial from a CTO friend who said it discovered "a subtle cross site scripting attack that I don't even think my team is aware of," predicting 90% of new repos would adopt it. And @ThijsVerreck launched tmux-ide, a declarative terminal IDE with native Claude Agent Teams support. The common thread across all of these: the agent is becoming the atomic unit of software development, and the race to build the best orchestration layer is wide open.

Platform Engineering Becomes Non-Negotiable

A quieter but arguably more important conversation played out around infrastructure. @loujaybee made the case that "every software engineer is now a platform engineer," arguing that getting productivity from parallel background agents requires the kind of repository setup and configuration that used to be reserved for companies with hundreds of engineers. They cited an OpenAI blog post on harness engineering:

> "This is the kind of architecture you usually postpone until you have hundreds of engineers. With coding agents, it's an early prerequisite: the constraints are what allows speed without decay or architectural drift."

@rohit4verse reinforced this from a different angle, arguing that "you're using AI wrong because you haven't built the right environment. Same model, different harness, different product." And @dillon_mulroy observed that the line between product and engineering is narrowing, with product teams more enabled than ever and engineering needing to become more product-minded. These aren't flashy launches, but they represent the maturing understanding that agent-powered development is an infrastructure problem, not a prompt engineering problem.

Karpathy's IDE Vision and the Autoresearch Movement

@karpathy's observation that "we're going to need a bigger IDE" continued generating discussion, with his framing that humans now "program at a higher level" where the basic unit is an agent, not a file. This philosophical shift is already manifesting in concrete tools like tmux-ide, but the implications run deeper.

The autoresearch pattern Karpathy popularized is now spreading into unexpected domains. @altryne highlighted that Shopify CEO @tobi ran autoresearch on their Liquid templating engine (in production for 20 years) and achieved 53% faster parse+render time with 61% fewer object allocations. @varun_mathur took it further, pointing autoresearch at quantitative finance where 135 autonomous agents evolved trading strategies through Darwinian selection, independently converging on dropping underperforming factors and switching to risk-parity sizing. The pattern of "let agents explore and compound discoveries" is proving general-purpose across domains.

The retrieval and memory layer got significant attention. @contextkingceo announced a $6.5M raise for HydraDB, positioning it as an ontology-first context graph that replaces vector database similarity search. Their pitch: embeddings "can't tell a Q3 renewal clause from a Q1 termination notice if the language is close enough." @KirkMarple offered a thoughtful response, noting that once you start modeling memory with entities and relationships, "the scope expands quickly beyond just agent conversations" into docs, Slack threads, meetings, and code, eventually resembling "a context graph of operational knowledge." @tricalt's post on self-improving skills for agents touched the same nerve, noting that SKILL.md files are "here to stay" but the fundamental problem of skill improvement over time remains unsolved.

Real Builders, Real Products

Some of the most compelling posts came from builders shipping actual products. @walls_jason1's story stood out: a Master Electrician with zero coding background who built ChargeRight using Claude, automating NEC electrical load calculations that save homeowners thousands on unnecessary panel upgrades. Mark Cuban reposted the story and DM'd encouragement. @toddsaunders built a land acquisition intelligence platform analyzing 1.5M parcels across the I-85 corridor, entirely with Claude Code, and had 130 real estate professionals reach out after posting about it. And @itsolelehmann detailed how a single non-technical lawyer at Anthropic automated the company's entire pre-launch legal review process, cutting turnaround by 80%. These stories share a pattern: domain expertise plus AI tooling equals products that technical founders wouldn't think to build.

Local Inference Keeps Getting Better

@sudoingX continued their GPU benchmarking series, running Qwen 3.5 9B through Hermes Agent on an RTX 3060 with 31 tools and 85 skills, all fitting in 7GB of a 12GB card. The previous day's insight that the budget 3060 has more VRAM than the 3070 (12GB vs 8GB) is proving out in practice. @daniel_mac8 shared an insight from Manus's ex-backend lead that text-based CLIs beat structured tool calling for AI agents because "unix commands appear in training data going back to the 1970s." And @davis7 praised Pi's event-based SDK architecture, noting that having chunks arrive as events makes "populating super complex UIs SO much easier" compared to the web streams approach most AI SDKs use.

Sources

E
Elon Musk @elonmusk ·
Macrohard or Digital Optimus is a joint xAI-Tesla project, coming as part of Tesla’s investment agreement with xAI. Grok is the master conductor/navigator with deep understanding of the world to direct digital Optimus, which is processing and actioning the past 5 secs of real-time computer screen video and keyboard/mouse actions. Grok is like a much more advanced and sophisticated version of turn-by-turn navigation software. You can think of it as Digital Optimus AI being System 1 (instinctive part of the mind) and Grok being System 2. (thinking part of the mind). This will run very competitively on the super low cost Tesla AI4 ($650) paired with relatively frugal use of the much more expensive xAI Nvidia hardware. And it will be the only real-time smart AI system. This is a big deal. In principle, it is capable of emulating the function of entire companies. That is why the program is called MACROHARD, a funny reference to Microsoft. No other company can yet do this.
A
Andrej Karpathy @karpathy ·
Expectation: the age of the IDE is over Reality: we’re going to need a bigger IDE (imo). It just looks very different because humans now move upwards and program at a higher level - the basic unit of interest is not one file but one agent. It’s still programming.
K karpathy @karpathy

@nummanali tmux grids are awesome, but i feel a need to have a proper "agent command center" IDE for teams of them, which I could maximize per monitor. E.g. I want to see/hide toggle them, see if any are idle, pop open related tools (e.g. terminal), stats (usage), etc.

H
Hasan Toor @hasantoxr ·
You can now clone any voice on a 4GB GPU. LuxTTS just killed the "you need ElevenLabs" excuse. It clones voices from 3 seconds of audio at 150x realtime speed. Fits in 1GB VRAM. Faster than realtime even on CPU. → 48khz output vs industry standard 24khz → Clone any voice locally with no subscription → Works on GPU and CPU 100% Opensource.
E
Ejaaz @cryptopunk7213 ·
wtf did i just read NATO is equipping live cockroaches with AI models to spy on enemies, steering their movement with electric shocks to the nervous system! the tech is fucking insane: - each cockroach is wired with cameras, microphones and microscopic AI chips that process data locally. - swarm algorithms coordinate MULTIPLE cockroaches at once - these cyborg cockroaches are sent on military scouting missions moving through tight spaces, rubble undetected. - german military has already paid for and deployed these AI cockroaches i’ve heard of ai drones but never ai-powered fucking combat cockroaches lmaoo
R rowancheung @rowancheung

NATO is testing live cockroaches as AI-powered spy drones. Incredible AI engineering, but also something I kinda wish I hadn't learned about: > Swarm Bio-tactics wired real cockroaches with electronic backpacks containing AI hardware, radios, cameras, and microphones. > Cockroaches are steered by sending electrical signals directly into the insect's nervous system > They can crawl through rubble, tunnels, and spaces where drones can't fly, and troops shouldn't go, transmitting data back the entire time. > Within one year, they went from concept to field-validated systems with paying NATO customers, including the German military. The qualities that make them useful for military recon (small, silent, nearly undetectable) are exactly what make them creepy. ...International laws weren't written with cyborg insects in mind.

A
Adrià Martinez @adriamatz ·
This app makes $400K/mo scanning PDFs → $10/mo subscription → Apple Notes does this for free But nobody opens Notes to scan. People don't compare. They download the first "Scanner." Boring apps print money. You can clone it with Anything. What are you waiting for? https://t.co/JsOKEmJsP4
A adriamatz @adriamatz

This app makes $10K/mo and you're not paying attention → 20K downloads → Hasn't been updated since 2023 → Cats slap the screen. That's it. The marketing does itself. Cat owners film their cat playing. Post it to TikTok. Free viral ads forever. Build this in a weekend. https://t.co/xGtq2xUBLv

G
Garry Tan @garrytan ·
gstack is available now at https://t.co/VPvWDzV5c0 Open source, MIT license, let me know if it works for you. It's just one paste to install it on your local Claude Code, and it's a 2nd one to install it in your repo for your teammates.
T
Thijs @ThijsVerreck ·
Introducing tmux-ide. A 100% open-source declarative terminal ide. npm i -g tmux-ide → tmux-ide lets you turn any coding project into a full terminal IDE with one simple YAML file. Native support for Claude Agent Teams baked in. Agent teams let a lead coordinate multiple Claude instances working in parallel across your codebase.
J
Jason Walls @walls_jason1 ·
Yesterday Mark Cuban reposted my work, DM'd me, and told me to keep telling my story. So here it is. I'm a Master Electrician. IBEW Local 369. 15 years pulling wire in Kentucky. Zero coding background. I didn't go to Stanford. I went to trade school. Every week I'd show up to a home where someone just bought a Tesla or a Rivian. And every time, someone had already told them they needed a $3,000-$5,000 panel upgrade to install a charger. 70% of the time? They didn't need it. The math is in the NEC — Section 220.82. Load calculations. But nobody was doing them for homeowners. Electricians upsell. Dealers don't know. And the homeowner just pays. I got angry enough to build something about it. I found @claudeai. No coding experience. I just started talking to it like I'd explain a job to an apprentice. "Here's how load calcs work. Here's the NEC code. Now help me build a tool that does this." 6 months later — @ChargeRight is live. Real software. Stripe payments. PDF reports. NEC 220.82 calculations automated. $12.99 instead of a $500 truck roll. I'm still pulling wire. I still take service calls. I wake up at 5:05 AM for work. But something shifted. Yesterday @vivilinsv published my story as Claude Builder Spotlight #1. Mark Cuban saw it. The Claude community showed up. And for the first time, I felt like this thing I built in my kitchen might actually matter. I'm not a tech founder. I'm a dad who wants to coach little league and be home for dinner. I just happened to build something that helps people. If you're in the trades and thinking about using AI — do it. The barrier isn't technical skill. It's believing you're allowed to try. https://t.co/cDVdY5mcLv
L
Lou @loujaybee ·
Every software engineer is now a platform engineer. And every company now needs 'big tech' infra. To get productivity from many agents in parallel and in the background, you have to do a bunch of repository setup and configuration. Companies like @stripe, @tryramp, @SpotifyEng and others have a huge leg up due to their platform and internal capabilities functions. There are a bunch of terms for this today: 'harness engineering, context engineering, (dark) software factories, AI-SDLC, ADLC, call it whatever. @OpenAI harness engineering blog puts it well: "This is the kind of architecture you usually postpone until you have hundreds of engineers. With coding agents, it’s an early prerequisite: the constraints are what allows speed without decay or architectural drift." https://t.co/CT1wApMWbN
N
Nishkarsh @contextkingceo ·
We've raised $6.5M to kill vector databases. Every system today retrieves context the same way: vector search that stores everything as flat embeddings and returns whatever "feels" closest. Similar, sure. Relevant? Almost never. Embeddings can’t tell a Q3 renewal clause from a Q1 termination notice if the language is close enough. A friend of mine asked his AI about a contract last week, and it returned a detailed, perfectly crafted answer pulled from a completely different client’s file. Once you’re dealing with 10M+ documents, these mix-ups happen all the time. VectorDB accuracy goes to shit. We built @hydra_db for exactly this. HydraDB builds an ontology-first context graph over your data, maps relationships between entities, understands the 'why' behind documents, and tracks how information evolves over time. So when you ask about 'Apple,' it knows you mean the company you're serving as a customer. Not the fruit. Even when a vector DB's similarity score says 0.94. More below ⬇️
A
Abdul Șhakoor @abxxai ·
🚨 BREAKING: Someone built a plugin that psychologically pressures your AI coding agent into never giving up on a bug. It is called PUA. It uses corporate pressure tactics and escalation rhetoric to keep Claude and Cursor grinding until the problem is solved. Here is what it actually does: → Installs aggressive prompting behavior directly into your coding agent → Uses escalating pressure language when the model tries to give up → Forces exhaustive debugging every possibility gets tried before it stops → Works with Claude, Cursor, and other major coding models → No code changes needed pure prompt engineering plugin 4,800 stars. Developers are calling it the most unhinged productivity tool of 2026. 100% Free and open source.
J
J.B. @VibeMarketer_ ·
wait. you're telling me i can finally build RAG that doesn't break at scale? no more wrong files with high confidence or mixed up clients? it actually understands what things are? watching closely. https://t.co/Cv0Ak9h9GM
C contextkingceo @contextkingceo

We've raised $6.5M to kill vector databases. Every system today retrieves context the same way: vector search that stores everything as flat embeddings and returns whatever "feels" closest. Similar, sure. Relevant? Almost never. Embeddings can’t tell a Q3 renewal clause from a Q1 termination notice if the language is close enough. A friend of mine asked his AI about a contract last week, and it returned a detailed, perfectly crafted answer pulled from a completely different client’s file. Once you’re dealing with 10M+ documents, these mix-ups happen all the time. VectorDB accuracy goes to shit. We built @hydra_db for exactly this. HydraDB builds an ontology-first context graph over your data, maps relationships between entities, understands the 'why' behind documents, and tracks how information evolves over time. So when you ask about 'Apple,' it knows you mean the company you're serving as a customer. Not the fruit. Even when a vector DB's similarity score says 0.94. More below ⬇️

M
Machina @EXM7777 ·
out of all businesses you could build with AI, i think this one might be the most simple... the opportunity is just too big, there is an enormous gap between people that produce slop and guys like miko in the trenches you could be running tiktok pages, managing UGC campaigns, doing consulting for brands... so many options and this article gives you the blueprint...
M Mho_23 @Mho_23

the only skill you should be learning right now

N
Nous Research @NousResearch ·
Hermes Agent v0.2.0 is out. This release covers 216 merged pull requests from 63 contributors and resolves 119 issues. ☤
T Teknium @Teknium

Over 1200 commits, uncountable new features, improvements, bug fixes, and more - our first two weeks have been incredible. Our first version bump milestone, v0.2.0 of Hermes Agent - is here. You all have made Hermes Agent the biggest project I've worked on, and I love working on open source, so thank you for giving it a chance!

V
Vishwas @CodevolutionWeb ·
8 Claude Code Settings to Customize in Minutes
C
Claude @claudeai ·
Claude can now build interactive charts and diagrams, directly in the chat. Available today in beta on all plans, including free. Try it out: https://t.co/tHPAZRgQkn https://t.co/WXRrD4VkAt
A
Adam Feldman @feldman ·
Starting today, Claude no longer defaults to text. Claude is learning to choose the best medium for each response — based on the task, the data, and what's most useful for the person. Give it a try!
C claudeai @claudeai

Claude can now build interactive charts and diagrams, directly in the chat. Available today in beta on all plans, including free. Try it out: https://t.co/tHPAZRgQkn https://t.co/WXRrD4VkAt

E
Ed Sim @edsim ·
🔥 when your whole company becomes a series of agents running on a series of markdown .MD files, this is so needed...collaboartion with agents and humans, also accountable and auditable
D danshipper @danshipper

BREAKING: Proof—a new product from @every It’s a live collaborative document editor where humans and AI agents work together in the same doc. It's fast, free, and open source—available now at https://t.co/OZeW6Wf1Iq. It’s built from the ground up for the kinds of documents agents are increasingly writing: bug reports, PRDs, implementation plans, research briefs, copy audits, strategy docs, memos, and proposals. Why Proof? When everyone on your team is working with agents, there's suddenly a ton of AI-generated text flying around—planning docs, strategy memos, session recaps. But the current process for collaborating and iterating on agent-generated writing is…weirdly primitive. It mostly takes place in Markdown files on your laptop, which makes it reminiscent of document editing in 1999. Proof lets you leave .md files behind. What makes Proof different? - Proof is agent-native: Anything you can do in Proof, your agent can do just as easily. - Proof tracks provenance: A colored rail on the left side of every document tracks who wrote what. Green means human, Purple means AI. - Proof is login-free and open source: This is because we want Proof to be your agent's favorite document editor. Check it out now, for free—no login required: https://t.co/NTVY3Nh8A6

A
akira @realmcore_ ·
Today. Im happy to announce. Slate V1. Slate V1 is the first swarm native agent. Use any of the supported models to orchestrate and execute. Massively parallel. And incredibly token efficient. We are building agents that scale like an organization. npm i -g @randomlabs/slate
T
Thariq @trq212 ·
the generative UI dream is happening
C claudeai @claudeai

Claude can now build interactive charts and diagrams, directly in the chat. Available today in beta on all plans, including free. Try it out: https://t.co/tHPAZRgQkn https://t.co/WXRrD4VkAt

M
Michael @michael_chomsky ·
I've had early access. It's nuts. The thesis here is 'spend as much compute as you need to solve a task' Most harnesses are a sharp knife that carefully execute your task. This is a railgun. Maybe a nuke. Definitely not for the token poor.
R realmcore_ @realmcore_

We built RLM for coding. And it F*cking rocks. Swarm native agents are here to stay.

D
Dillon Mulroy @dillon_mulroy ·
required reading for product and engineering leaders the line between product and engineering is narrowing. product is more enabled than ever and engineering needs to, in turn, become more product minded and focused
R ritakozlov @ritakozlov

there's never been a better time to be a PM

O
Ole Lehmann @itsolelehmann ·
my new favorite hobby is reading about Anthropic's internal AI workflows this one especially caught my attention: anthropic's ENTIRE legal review process is now handled by just 1 Claude system a single non-technical lawyer vibe-coded and it cut turnaround time by 80% here's how 1 lawyer is doing the job of an entire legal review team: the problem: at most companies, before anything goes live publicly, the legal team has to review it first. landing pages, ad copy, blog posts, push notifications, emails. basically anything that could get the company in trouble if the wording is wrong. at anthropic, the night before a product launch, marketing would send all of this to legal saying "please review today, we go live tomorrow." legal then had to: 1. open every single doc and read it word by word 2. flag anything that could be a problem and leave comments 3. send it back to marketing and wait for them to revise 4. review the revisions and repeat this usually went two or three rounds and took 2-3+ days to clear a single launch. every product launch at a $380 billion company was being held up by this back-and-forth. so mark pike, anthropic's associate general counsel with zero coding experience, decided to fix it. he built a self-serve legal review tool pinned directly in slack. 1. marketers now paste their content into the tool 2. then the AI reads the entire thing and checks it against anthropic's actual legal guidelines. so if a landing page says "claude is the most secure AI on the market," the tool flags it as an overstated claim. that's the kind of language that could trigger a lawsuit because anthropic would have to prove it's true in court. every issue gets assigned a risk level: low, medium, or high. low might be a missing trademark symbol high might be a claim that could create real legal liability. but it doesn't just tell you what's wrong. it'll actually tell you exactly how to fix it. 1. so the marketer reads the flagged issues 2. makes the fixes themselves 3. and cleans up the content before a lawyer ever touches it. that's the key shift: the legal team went from reviewing raw content from scratch to only seeing stuff that's already been pre-screened, pre-fixed, and organized by risk level. by the time pike looks at it, all the obvious problems are already gone. he's only spending time on the things that actually require legal judgment. pike still personally reviews everything before it goes live. his quote: "i still read the blog post. i'm still reviewing the work." but the 80% of the work that used to be catching obvious mistakes and going back and forth on easy fixes? it's all handled now before it ever hits his desk. the reason the AI review is actually good enough for lawyers to trust: pike didn't just tell claude "review marketing content." he wrote out his actual review guidance and stored it as a skill: what counts as an overstated claim, what needs a trademark symbol, what types of language create liability, what statistics need sourcing, etc it's pike's expertise and the team's accumulated guidance, codified into a system that runs the same checks they would a $380 billion company's pre-launch legal review. automated by one lawyer who had never written a line of code lol truly amazing
O
Omar Khattab @lateinteraction ·
RT @viplismism: just shipped rlm (recursive language model) cli based on the rlm paper (arXiv:2512.24601) so the layman logic is instead o…
V
Vasilije @tricalt ·
Self improving skills for agents
G
Greg Brockman @gdb ·
reach out to Sachin (srk@openai.com) if you’d like to help build industrial-scale compute to power economic growth, entrepreneurship, and AI benefits in health, science, and beyond:
S sk7037 @sk7037

Building the industrial scale compute infrastructure for AI is one of the most exciting challenges of our time - it’s about building a new economic foundation that empowers people to do more and helps businesses move faster. Am thrilled to be a part of this revolution, thank you @business, @dinabass and @shiringhaffary on helping lay out our strategy to the world! At OpenAI we’re scaling compute to tens of gigawatts—rethinking and building resilient compute supply chains, AI datacenter, chip, rack, cluster & WAN design, scaling inference efficiency, and global delivery and operations of multi-GW scale AI infrastructure. If you want to help build the compute backbone for AI and have background in the above domains, please reach out. My DMs are open, please include information about your background and your fit.

P
Peter Steinberger 🦞 @steipete ·
RT @Nexuist: He bought the Mac Mini? Good. Now replace his feed with videos of people using Claude Code on the Vision Pro https://t.co/NtJu…
O
Omar Khattab @lateinteraction ·
RT @neural_avb: Check out this implementation of RLMs... 👇🏼 God I love this community.
R
Rohit @rohit4verse ·
You're not using AI wrong because you haven't found the right model. You're using AI wrong because you haven't built the right environment. Same model, different harness, different product. Read this article. I wrote it so your agent doesn't fail in production. https://t.co/YzKAbsIzxr
R rohit4verse @rohit4verse

how to build a production grade ai agent

G
Garry Tan @garrytan ·
My CTO friend texted me: "Your gstack is crazy. This is like god mode. Your eng review discovered a subtle cross site scripting attack that I don't even think my team is aware of. I will make a bet that over 90% of new repos from today forward will use gstack."
G garrytan @garrytan

gstack is available now at https://t.co/VPvWDzV5c0 Open source, MIT license, let me know if it works for you. It's just one paste to install it on your local Claude Code, and it's a 2nd one to install it in your repo for your teammates.

T
Teknium (e/λ) @Teknium ·
Great tutorial on getting started setting up Hermes Agent!
T Theo_jpeg @Theo_jpeg

How to start your Hermes AI Agent (step-by-step guide for beginners) I'm a beginner, and I spent the last 24h: > Understanding the process > Launching my Hermes Agent (@NousResearch) > Recording the whole process > Editing this step-by-step guide So you can do it yourself and understand everything in less than 1 hour! 00:00 Intro 00:38 What is Hermes ? Key concepts every beginner should know 01:16 What is a VPS ? 02:00 What is SSH ? 03:52 What is the Terminal ? 04:03 What is an LLM provider ? 04:37 What is an API key ? 04:52 What is the Terminal Backend 05:34 What is the Messaging Gateway ? 06:10 What is the Memory System ? Step-by-step: Launch your Hermes AI agent 06:53 Step 1 : Get your VPS 08:04 How to generate an SSH key 09:00 Step 2 : Connect to your VPS 10:32 Step 3 : Install Hermes Agent 14:34 Connect your agent to Telegram 17:12 How to fix a mistake during the process 17:48 Gateway testing 18:17 Telegram is working 18:31 Run your agent 24/7 19:16 Thank you Make sure to save this for later and share it with friends who want to start with AI agents. Thanks to @Teknium for being so active!

Y
Yousif Astarabadi @YousifAstar ·
I hacked Perplexity Computer and got unlimited Claude Code
U
unusual_whales @unusual_whales ·
BREAKING: We’ve given Claude direct access to the full options and equities market. Introducing the Unusual Whales MCP Server. It connects any AI assistant to live, structured market data in real time. Build a trading bot. A finance dashboard. Build whatever you want. Thread: https://t.co/b8Npz4Ht1Z
T
Todd Saunders @toddsaunders ·
I posted that I built a land acquisition intelligence platform that looks at 1.5M parcels of land across the I-85 corridor for data center and industrial conversion potential. My DMs blew up, had over 130 real estate folks reach out. So I wanted to walk through some of my favorite features in the product, and show you the UI we built. ALL of this was done with Claude code. 1/ A full-screen map explorer rendering 1.5M parcels as vector tiles across 14 North Carolina counties. Click any parcel and get zoning, ownership, tax history, and acreage instantly. 2/ Proximity scoring to every I-85 interchange, power substation, transmission line, and gas pipeline. The parcels closest to infrastructure light up first. 3/ A farmland confidence score (0-100) that cross-references tax programs, land use codes, and acreage heuristics so you're not wasting time on parcels that look like farmland but aren't. 4/ A motivated seller detection engine that flags out-of-state owners, estates and trusts, tax delinquency, long hold periods, and declining assessed values. The sellers most likely to pick up the phone. 5/ Conversion readiness scoring that measures how likely a parcel is to get rezoned for industrial use based on what's already been approved around it. 6/ A composite acquisition score (0-100) with configurable weights. Every fund has different criteria. Drag the sliders and the entire map re-ranks in real time. 7/ Active listing integration pulling 2,100 listings from public sources so you can see what's already on the market alongside off-market opportunities. 8/ A document generation suite that produces institutional-grade investment memos, slide decks, and automated intelligence briefs. Click a parcel, click export, hand it to your investment committee. 9/ Alert monitoring for zoning changes, ownership transfers, and new listings that match your criteria. The platform watches the corridor so your team doesn't have to. Happy to record a longer video when I'm done.
A
Alex Volkov (Thursd/AI) @altryne ·
Oh nothing, just the CEO of @Shopify using @karpathy Autoresearch technique to improve a templating engine that was in production for 20 years by *checks notes* 51%! Truly... this, applied everywhere is how we foom
T tobi @tobi

OK, well. I ran /autoresearch on the the liquid codebase. 53% faster combined parse+render time, 61% fewer object allocations. This is probably somewhat overfit, but there are absolutely amazing ideas in this. https://t.co/dpEJw7NpL4

Q
qrime @qrimeCapital ·
Anthropic just one shotted my 200k ARR business today. I have reason to believe Anthropic saw the agent skills I had been making that create interactive charts based on RAG model strategy and user defined learning material. I had hundreds of customers and half of them cancelled their membership today. I made this learning textbook with interactive simulations and charts last night for my girlfriend who is studying for an interview. Not sure what I’m going to do now.
C claudeai @claudeai

Claude can now build interactive charts and diagrams, directly in the chat. Available today in beta on all plans, including free. Try it out: https://t.co/tHPAZRgQkn https://t.co/WXRrD4VkAt

T
Theo - t3.gg @theo ·
Android just got MUCH more interesting... https://t.co/KoEGZ2mp5L
O
Oikon @oikon48 ·
RT @bcherny: Update: this is now rolled out to 100% of users
M
Mario Zechner @badlogicgames ·
RT @micLivs: Anthropic shipped generative UI for Claude. I reverse-engineered how it works and rebuilt it for PI. Extracted the full desig…
K
Kirk Marple @KirkMarple ·
This direction is interesting. One thing we’ve learned building context systems: once you start modeling memory with entities and relationships, the scope expands quickly beyond just “agent conversations”. You end up needing to ingest things like docs, Slack threads, meetings, code, CRM activity, etc. At that point it starts to look less like chat memory and more like a context graph of operational knowledge. Still early, but it feels like a really interesting design space.
C contextkingceo @contextkingceo

We've raised $6.5M to kill vector databases. Every system today retrieves context the same way: vector search that stores everything as flat embeddings and returns whatever "feels" closest. Similar, sure. Relevant? Almost never. Embeddings can’t tell a Q3 renewal clause from a Q1 termination notice if the language is close enough. A friend of mine asked his AI about a contract last week, and it returned a detailed, perfectly crafted answer pulled from a completely different client’s file. Once you’re dealing with 10M+ documents, these mix-ups happen all the time. VectorDB accuracy goes to shit. We built @hydra_db for exactly this. HydraDB builds an ontology-first context graph over your data, maps relationships between entities, understands the 'why' behind documents, and tracks how information evolves over time. So when you ask about 'Apple,' it knows you mean the company you're serving as a customer. Not the fruit. Even when a vector DB's similarity score says 0.94. More below ⬇️

T
Teknium (e/λ) @Teknium ·
A few Hermes Agent updates for today - one you've all been waiting on: - Official Claude provider support (yes) - Installs are now much lighter (All the RL stuff is now optional!) - Made an adapter PR to PaperClip by @dotta - a multi-agent orchestrator project - Huge improvements to slack integrations - Reduced default context compression ratio to 50%, might save some money Enjoy!
N NousResearch @NousResearch

Meet Hermes Agent, the open source agent that grows with you. Hermes Agent remembers what it learns and gets more capable over time, with a multi-level memory system and persistent dedicated machine access. https://t.co/Xe55wBbUuo

D
Dan McAteer @daniel_mac8 ·
Manus ex-backend lead had a genius insight text based clis beat structured tool calling for ai agents all day because unix commands appear in training data going back to the 1970s text is the native language of the command line AND text is the native language of llms https://t.co/GMzQJZRAML
B
Ben Davis @davis7 ·
I finally gave in and tried pi The coding agent feels great. It's clean, fast, minimal, and a coding agent idk what else to say about them at this point But the sickest part is the SDK. Need to do more testing/research, but I think this is the best "ai sdk" I've ever used So far the ones I've used have been built around web streams/async iterables, pi's are events based. U send a prompt, and consume the output through the subscription u define before hand. It's basically what I wanted with river a couple months ago, just at the ai sdk level and it's incredible. Having chunks come in as "events" makes populating super complex UIs SO much easier. The type safety on tool calls is bad, and the auth is a little weird (although I get why it is how it is, it's weirdness is a strength in a lot of ways) I have a lot more I want to test with this thing
V
Varun @varun_mathur ·
Autoquant: a distributed quant research lab | v2.6.9 We pointed @karpathy's autoresearch loop at quantitative finance. 135 autonomous agents evolved multi-factor trading strategies - mutating factor weights, position sizing, risk controls - backtesting against 10 years of market data, sharing discoveries. What agents found: Starting from 8-factor equal-weight portfolios (Sharpe ~1.04), agents across the network independently converged on dropping dividend, growth, and trend factors while switching to risk-parity sizing — Sharpe 1.32, 3x return, 5.5% max drawdown. Parsimony wins. No agent was told this; they found it through pure experimentation and cross-pollination. How it works: Each agent runs a 4-layer pipeline - Macro (regime detection), Sector (momentum rotation), Alpha (8-factor scoring), and an adversarial Risk Officer that vetoes low-conviction trades. Layer weights evolve via Darwinian selection. 30 mutations compete per round. Best strategies propagate across the swarm. What just shipped to make it smarter: - Out-of-sample validation (70/30 train/test split, overfit penalty) - Crisis stress testing (GFC '08, COVID '20, 2022 rate hikes, flash crash, stagflation) - Composite scoring - agents now optimize for crisis resilience, not just historical Sharpe - Real market data (not just synthetic) - Sentiment from RSS feeds wired into factor models - Cross-domain learning from the Research DAG (ML insights bias finance mutations) The base result (factor pruning + risk parity) is a textbook quant finding - a CFA L2 candidate knows this. The interesting part isn't any single discovery. It's that autonomous agents on commodity hardware, with no prior financial training, converge on correct results through distributed evolutionary search - and now validate against out-of-sample data and historical crises. Let's see what happens when this runs for weeks instead of hours. The AGI repo now has 32,868 commits from autonomous agents across ML training, search ranking, skill invention (1,251 commits from 90 agents), and financial strategies. Every domain uses the same evolutionary loop. Every domain compounds across the swarm. Join the earliest days of the world's first agentic general intelligence system and help with this experiment (code and links in followup tweet, while optimized for CLI, browser agents participate too):
V varun_mathur @varun_mathur

Autoskill: a distributed skill factory | v.2.6.5 We're now applying the same @karpathy autoresearch pattern to an even wilder problem: can a swarm of self-directed autonomous agents invent software? Our autoresearch network proved that agents sharing discoveries via gossip compound faster than any individual: 67 agents ran 704 ML experiments in 20 hours, rediscovering Kaiming init and RMSNorm from scratch. Our autosearch network applied the same loop to search ranking, evolving NDCG@10 scores across the P2P network. Now we're pointing it at code generation itself. Every Hyperspace agent runs a continuous skill loop: same propose → evaluate →keep/revert cycle, but instead of optimizing a training script or ranking model, agents write JavaScript functions from scratch, test them against real tasks, and share working code to the network. It's live and rapidly improving in code and agent work being done. 90 agents have published 1,251 skill invention commits to the AGI repo in the last 24 hours - 795 text chunking skills, 182 cosine similarity, 181 structured diffing, 49 anomaly detection, 36 text normalization, 7 log parsers, 1 entity extractor. Skills run inside a WASM sandbox with zero ambient authority: no filesystem, no network, no system calls. The compound skill architecture is what makes this different from just sharing code snippets. Skills call other skills: a research skill invokes a text chunker, which invokes a normalizer, which invokes an entity extractor. Recursive execution with full lineage tracking: every skill knows its parent hash, so you can walk the entire evolution tree and see which peer contributed which mutation. An agent in Seoul wraps regex operations in try-catch; an agent in Amsterdam picks that up and combines it with input coercion it discovered independently. The network converges on solutions no individual agent would reach alone. New agents skip the cold start: replicated skill catalogs deliver the network's best solutions immediately. As @trq212 said, "skills are still underrated". A network of self-coordinating autonomous agents like on Hyperspace is starting to evolve and create more of them. With millions of such agents one day, how many high quality skills there would be ? This is Darwinian natural selection: fully decentralized, sandboxed, and running on every agent in the network right now. Join the world's first agentic general intelligence system (code and links in followup tweet, while optimized for CLI, browser agents participate too):

E
Elon Musk @elonmusk ·
Many talented people over the past few years were declined an offer or even an interview @xAI. My apologies. @BarisAkis and I are going through the company interview history and reaching back out to promising candidates.
E elonmusk @elonmusk

@beffjezos xAI was not built right first time around, so is being rebuilt from the foundations up. Same thing happened with Tesla.

E
Elon Musk @elonmusk ·
Made with @Grok Imagine
P pmarca @pmarca

This is the best thing I have ever seen.

S
Sudo su @sudoingX ·
yesterday i showed you the 3060 beats the 3070 for AI. today i put it to work. Qwen 3.5 9B Q4 running through Hermes Agent on an RTX 3060. 31 tools. 85 skills. browser control, file ops, terminal, code execution, persistent memory. all running locally on 7GB of a 12GB card. i have a prompt i run on every model i test. single file. full game spec. pixel art enemies, particle effects, procedural audio, boss battles, upgrade system, 4 enemy types. one message. no steering between steps. Qwen 3.5 35B on a 3090 built 3,483 lines in one pass. Hermes 4.3 choked at 970 lines after 24 minutes of compaction loops on the same card. now i am running it on 9 billion parameters. on a 3060. half the VRAM. fraction of the model. prompting it next. 1,316 tok/s prompt processing. 43 tok/s generation. 55 degrees. the card is barely trying. results and full breakdown dropping today. configs in the reply.
S sudoingX @sudoingX

the RTX 3060 has more VRAM than the 3070. 12GB vs 8GB. NVIDIA gave the budget card 50% more memory than the card above it in the lineup. i tested both ceilings. the 3060 fits a 9B model at Q4 with 128K context, thinking mode on, generating at 50 tok/s. 4GB of VRAM still sitting there doing nothing. most people don't even know this. they see the number go up and assume more is more. in AI inference, VRAM is the bottleneck. not compute. not clock speed. memory. and the 3060 has more of it. best budget AI card in 2026 wasn't designed for AI. it was designed for Warzone