Supply Chain Attacks Compromise 300+ npm Packages While Claude Code's /goal Becomes the Must-Know Command
· 50 sources
The Mini-Shai-Hulud attack campaign compromised hundreds of npm and PyPI packages including TanStack and UiPath ecosystems, with payloads that exfiltrate credentials and wipe filesystems based on geolocation. Claude Code's /goal command dominated AI coding discussions as the new standard for structured agent-driven development, and GitLab's CEO publicly declared that hand-written code may be going away.
Daily Wrap-Up
If there's one word for today in the AI and tech world, it's "exposed." The npm and PyPI ecosystems suffered what might be the most widespread supply chain attack in recent memory, with the Mini-Shai-Hulud campaign compromising over 300 packages including major TanStack libraries, UiPath SDKs, and the Mistral AI Python client. The attack was sophisticated enough to include geo-fenced destructive payloads that wipe filesystems on machines in Israel and Iran, while deliberately avoiding Russian-language systems. The broader vulnerability landscape is equally grim: 13 Next.js advisories, over 70 CVEs in macOS 26.5, Windows BitLocker bypasses, and confirmed Russian exploitation of zero-day vulnerabilities in production systems. Google even confirmed AI-powered exploitation of zero-days in an open-source web-based administration tool.
On the AI tools front, Claude Code's /goal command has clearly become the standard for structured AI-assisted development. Multiple developers shared their use cases and templates, with the command functioning as a kind of senior engineer that never gets tired across Codex, Claude, and Hermes. The token optimization conversation also hit a fever pitch, with detailed breakdowns showing how senior AI engineers are cutting their bills by 70-90% through multi-model routing and context discipline. At the enterprise level, GitLab's CEO publicly announced that "authoring code by hand may be going away" alongside voluntary separation offers, while forward deployed engineers became the must-hire role for companies trying to ship AI agents into production.
The most practical takeaway for developers: audit your npm dependencies immediately. If you installed any @tanstack/* package between 19:20 and 19:30 UTC on May 12, or the mistralai PyPI package v2.4.6, treat your host as compromised. Rotate all credentials, check for persistent token monitors in your system services, and pin your dependencies to known-good versions. Then maybe install the Russian language pack, because apparently that actually stops a surprising amount of malware.
Quick Hits
@0xMovez shared a Google Cloud AI engineer demoing how to go from idea to deployed app in 30 minutes using Claude on Google Cloud.
@heylemon_ai asked if anyone is still typing prompts in 2026, hinting at the shift toward autonomous agent interactions.
@kylejeong wrote an interactive blog about Firecracker, the ~50,000 line Rust binary that boots full VMs for AWS Lambda in under 125ms.
@ivanleomk reshared the Firecracker thread, underscoring how important fast VM boot times are for agent infrastructure.
@Shpigford released a massive SEO Sprint skill that generates and executes a full SEO playbook across any tech stack using Ahrefs MCP.
@zephyr_z9 flagged the PCB and interconnect bottleneck as the biggest hardware limiter for 2026-27 as TPU v8, Rubin, and Trainium3 head toward mass production in Q4.
@marcelpociot announced Polyscope, a free agent orchestration tool with copy-on-write clones and a built-in preview browser for visual prompting.
@todayyearsold shared what might be the only acceptable use of AI: inserting yourself into Game of Thrones to fix the ending.
@dmnlaali highlighted Quirre, a tool that builds personalized marketing plans for indie founders in 60 seconds.
@bcdsignature drew a parallel between humanity's first control of fire and the current AI debate, noting we've been having the same argument for a million years.
@ID_AA_Carmack offered grounded advice on starting a game company: identify specific customers before building, start with the smallest game anyone would pay for, and plan to burn seven figures.
@DataChaz shared a 2026 AI agents playbook covering what to learn, build, and ignore entirely.
@LeronAssist promoted AI-powered video generation with auto scene stitching and audience analysis tools.
@Av1dlive shared a free 2-hour Stanford lecture on building agentic systems, calling it the highest ROI thing you could do this month.
@remondimi expressed frustration that despite providing source material, neither Codex nor Claude Code can match their writing tone, calling it the biggest remaining unlock in LLM usage.
The npm Supply Chain Crisis
The Mini-Shai-Hulud attack campaign represents a new level of sophistication in open-source supply chain compromises. @hetmehtaa published a comprehensive list of compromised npm packages spanning the entire TanStack router ecosystem (42 packages, 84 malicious versions), 68 UiPath packages, the Mistral AI client, and numerous smaller libraries. The TanStack compromise alone is devastating: malicious versions exfiltrated AWS, GCP, Kubernetes, and Vault credentials, GitHub tokens, and SSH keys through a smuggled 2.3MB payload disguised as a router initialization script.
The @tan_stack official security advisory confirmed the attack vector was an optionalDependencies entry pointing to a malicious GitHub repository. @roerohan added a critical finding: the malware installs a persistent token monitor as a systemd service on Linux or LaunchAgent on macOS that polls your GitHub token every 60 seconds. If revoked, it triggers destructive file deletion. Check for the service before revoking tokens, not after.
@lauriewired made a semi-serious but technically accurate observation: "the most low-effort / high reward thing you can do for security is installing the Russian language pack." The malware deliberately avoids Russian-language environments. @janbamjan confirmed that the current Shai-Hulud variant only checks locale environment variables, making the language pack trick a viable, if crude, defense.
@theo's security roundup painted the broader picture: Canvas LMS "pwn'd entirely," Palo Alto's PAN-OS hit with a 9.3 severity CVE, and nation-state actors exploiting Windows RCEs in the wild. @Prismor noted the security gap between what AI refuses and what it allows, observing that while Claude won't pipe a remote shell into bash, it will happily install a compromised npm package without hesitation.
In response, @WalshyDev is building a registry gateway on Cloudflare Workers that enforces cooldown periods for new package versions globally. @kevinkern created a repo hardening skill that checks for risky dependency specs, unsafe CI patterns, and supply chain vulnerabilities. @Hartdrawss cataloged 20 security sins common in vibe-coded apps, from unauthenticated admin routes to sessions that never expire.
Claude Code, /goal, and the New AI Coding Paradigm
The /goal command in Claude Code has become the defining pattern for AI-assisted development. @ClaudeDevs officially introduced /goal as a way to keep Claude working until the job is done. @jpschroeder shared a practical tip
Polyscope - the free agent orchestration tool for developers.
Run dozens of AI agents at the same time, blazing fast copy on write clones, a built-in preview browser you can use to visually prompt your agents, mobile access, and much more.
The Security Gap Between What AI Refuses and What It Allows
The Security Gap Between What AI Refuses and What It Allows
Claude refused to delete the filesystem. It also refused to pipe a remote shell script into bash. Those are the easy cases, and the model handled them...
Create videos of any length with auto scene stitching in 1 click + AI photo generation & editing. Powerful AI promotion: audience analysis, content testing, auto-publishing & boosting. All growth tools in one place.
Has _anyone_ figured out a prompt or harness to make LLMs write in a sane way? I swear that's the biggest unlock left in LLM usage right now. Just so frustrating that Codex and Claude Code still cannot copy my tone when given source material. Feels like it should be way easier than it is
consulting is a great and valuable industry. always has been
every company that gets big enough eventually realizes the constraint is implementation / adoption of their products or services. underneath that, human resistance to change
every company from GE to IBM to you name it has eventually developed their own internal consulting teams in addition to relying on external consultants who quickly understand new market opportunities
I think we are seeing these rapid and large scale moves into consulting (or call it FDE whatever you feel like) because of a few things:
1. downside risk on the capital deployed is enormous. these bets, while large are tiny % compared to total capex across the biggest players
2. The speed of adoption of the models is rapid, but the speed of turning tokens into dollars for businesses will likely not be as rapid. The success of the model companies relies on turning tokens into dollars (Either in opex savings or increased profits) as fast as possible. Consulting helps activate more efficient capital/labor/token allocation across enterprises
consulting has been underrated by tech for years. it is a highly market efficient industry and the best firms adapt much much faster than traditional companies
I happened to be working at McKinsey in 2008 during the global financial crisis. from september 2008 to december 2008, the firm had pivoted almost every global practice to responding to the practive. endless new service lines were created, new capabilities were developed. many also flopped or didnt go anywhere but large-scale change efforts became a much bigger business
over the next few years, the big consulting firms will continue to thrive - mck, bain, bcg but also accenture, and other IT oriented firms.
They have the labor at scale who understand change and implementation. Most people's model of consulting is based on a caricature of a 1980s strategy case, with a report at the end of the project handed to the board. that is no longer the case. McKinsey is a full service firm with a digital arm, an implementation arm, analytics and IT solutions, etc.
the scale of these firms continues to astound me. They will only get bigger. The core "service" they offer is a 3-5 person team and since each 3-5 person team can work on different things and iterate on each project, these firms are highly adaptable.
But in a growing market there is room for more, and who benefits? well, look at the investors here:
"Investors also include leading consulting and systems integration firms, including Bain & Company, Capgemini, and McKinsey & Company."
long consulting
OOpenAI@OpenAI
Today we’re launching the OpenAI Deployment Company to help businesses build and deploy AI.
It's majority-owned and controlled by OpenAI. It brings together 19 leading investment firms, consultancies, and system integrators to help organizations deploy frontier AI to production for business impact. https://t.co/GnyjGFaLLA
A malicious was payload found that installs a persistent token monitor as a systemd/LaunchAgent service. It polls your GitHub token every 60s - if revoked, it triggers destructive file deletion.
You should verify if you're affected BEFORE revoking your token:
Linux:
ls ~/.local/bin/gh-token-monitor.sh
systemctl --user list-units | grep gh-token-monitor
macOS:
ls ~/Library/LaunchAgents/ | grep com.user.gh-token-monitor
If found, disable the service first, then revoke.
https://t.co/b9Bz38mfJ4
Ttan_stack@tan_stack
SECURITY ADVISORY — TanStack npm packages
A supply-chain compromise affecting 42 @tanstack/* packages (84 versions total) was published to npm earlier today at approximately 19:20 and 19:26 UTC. Two malicious versions per package.
Status: ACTIVE — packages are deprecated, npm security engaged, publish path being shut down.
Severity: HIGH — payload exfiltrates AWS, GCP, Kubernetes, and Vault credentials, GitHub tokens, .npmrc contents, and SSH keys.
If you installed any @tanstack/* package between 19:20 and 19:30 UTC today, treat the host as potentially compromised:
• Rotate cloud, GitHub, and SSH credentials immediately
• Audit cloud audit logs for the last several hours
• Pin to a prior known-good version and reinstall from a clean lockfile
Detection — the malicious manifest contains:
"optionalDependencies": {
"@tanstack/setup": "github:tanstack/router#79ac49ee..."
}
Any version with this entry is compromised. The payload is delivered via a git-resolved optionalDependency whose prepare script runs router_init.js (~2.3 MB, smuggled into each tarball at the package root).
Unpublish is blocked by npm policy for most affected packages due to existing third-party dependents. All 84 versions are being deprecated with a SECURITY warning, and npm security has been engaged to pull tarballs at the registry level.
Full technical breakdown, complete package and version list, and rolling status updates:
https://t.co/Zy8qG7PA9f
Credit to the security researcher for responsible disclosure.
20 things that make your VIBE CODED app a SINKING SHIP :
1/ no rate limiting on API routes
> anyone can spam your backend into a $500 bill overnight
2/ auth tokens stored in localStorage
> one XSS attack = every single user account compromised
3/ no input sanitisation on forms
> SQL injection still works in 2026. your AI didnt tell you that.
4/ hardcoded API keys in the frontend
> someone WILL find them within 48 hours of launch
5/ stripe webhooks with no signature verification
> anyone can fake a successful payment event
6/ no database indexing on queried fields
> works fine at 100 users. completely dies at 1,000.
7/ no error boundaries in the UI
> one crash = white screen = user never comes back
8/ sessions that never expire
> stolen token = permanent access to that account. forever.
9/ no pagination on database queries
> one fetch loads your entire database into memory
10/ password reset links that dont expire
> old email in someones inbox = instant account takeover
11/ no environment variable validation at startup
> app silently breaks in production with zero error message
12/ images uploaded directly to your server
> no CDN = 8 second load times + massive hosting bill
13/ no CORS policy
> any website on the internet can make requests to your API
14/ emails sent synchronously in request handlers
> one slow SMTP server = your entire app hangs
15/ no database connection pooling
> first traffic spike = database crashes
16/ admin routes with no role checks
> any logged in user can access your admin panel
17/ no health check endpoint
> your app goes down silently. you find out from a client.
18/ no logging in production
> when something breaks you have zero idea where or why
19/ no backup strategy on your database
> one bad migration = all your user data. gone.
20/ no TypeScript on AI generated code
> AI writes confident, wrong, untyped code and you ship it anyway
Agent sprawl has become a real concern for many leaders I talk with. Agents are popping up across the company without shared context, clear ownership, consistent guardrails, or a reliable way to know which ones are actually creating value.
The next phase of enterprise AI will be defined less by agent creation and more by agent operations, where testing, versioning, monitoring, and governance are built into the system from the start.
At @Glean, we think about that through the Agent Development Lifecycle (ADLC). It is a practical model for how enterprises move from promising demos to agents that are grounded in the right context, launched with the right controls, and improved over time.
Alongside the ADLC, we’re announcing new product capabilities designed to support that lifecycle end-to-end: from auto-mode agents and sub-agents to agent sandbox, agent library, agent access policies, and agent insights.
In the enterprise, success won’t come from building the most agents. It will come from building agents you can trust, govern, and improve over time.
Gglean@glean
Enable every agent to drive ROI with a robust agent development lifecycle
If you’re building AI agents and haven’t watched this Anthropic talk yet, you’re already behind.
In 22 minutes, Claude’s team exposed where the entire industry is heading next:
→ tool orchestration
→ memory systems
→ observability
→ long-running agents
→ production infrastructure
Most developers are still focused on demos.
Anthropic is building for autonomous systems at scale.
The last few minutes are the real gold 👇
Watch the full talk first.
Then read my complete roadmap on becoming an AI Agent Engineer in 2026 if you want to build what the market will actually need next.
Ssairahul1@sairahul1
How to Become an AI Agent Engineer in 2026 — The Complete Roadmap
You can now cut Claude Code's tool calls by 94% with just one command.
This MCP server that indexes your codebase into a local knowledge graph upfront. The agent queries the graph instead of scanning files.
Supports 19+ languages, runs fully local, no API keys.
100% Open Source.
I've been working on a registry-gateway built on @CloudflareDev Workers due to previous compromises like this.
This is reenforcing my belief that every company will need to have this in the future.
The gateway will:
* Enforce a cooldown period for new versions (similar to how pnpm does it but at a gateway level this enforces it globally and supports ALL package managers)
* Allows blocking packages or package prefixes
* Logs all downloads
* Clone all packages into R2 - this is to avoid any package being replaced and compromised that way. We know byte for byte this will not change (while I don't believe any registries allow this anymore, it's defence in depth)
My gateway currently works for npm and Golang is mostly done now too.
Rust is next up.
I truly believe the future is Enterprises all having their own registry gateways and enforcing security that way.
Ttan_stack@tan_stack
SECURITY ADVISORY — TanStack npm packages
A supply-chain compromise affecting 42 @tanstack/* packages (84 versions total) was published to npm earlier today at approximately 19:20 and 19:26 UTC. Two malicious versions per package.
Status: ACTIVE — packages are deprecated, npm security engaged, publish path being shut down.
Severity: HIGH — payload exfiltrates AWS, GCP, Kubernetes, and Vault credentials, GitHub tokens, .npmrc contents, and SSH keys.
If you installed any @tanstack/* package between 19:20 and 19:30 UTC today, treat the host as potentially compromised:
• Rotate cloud, GitHub, and SSH credentials immediately
• Audit cloud audit logs for the last several hours
• Pin to a prior known-good version and reinstall from a clean lockfile
Detection — the malicious manifest contains:
"optionalDependencies": {
"@tanstack/setup": "github:tanstack/router#79ac49ee..."
}
Any version with this entry is compromised. The payload is delivered via a git-resolved optionalDependency whose prepare script runs router_init.js (~2.3 MB, smuggled into each tarball at the package root).
Unpublish is blocked by npm policy for most affected packages due to existing third-party dependents. All 84 versions are being deprecated with a SECURITY warning, and npm security has been engaged to pull tarballs at the registry level.
Full technical breakdown, complete package and version list, and rolling status updates:
https://t.co/Zy8qG7PA9f
Credit to the security researcher for responsible disclosure.
The top Hermes integrations to give your agent superpowers:
1. Firecrawl
Basically web search built for agents.
It's better than the native Hermes web search because it gives you clean web data, so responses come back faster and uses fewer tokens.
I keep this on by default.
2. Browserbase
Gives Hermes browser access for actually interacting with sites.
Logging in, clicking buttons, booking stuff, anything that needs a real browser session.
Hermes will automatically pick between Firecrawl and Browserbase depending on what the task needs, so you just plug both in.
3. Google Workspace
Gmail, Calendar, Drive, Docs, and Sheets in one connector.
If Hermes can't read your inbox, see your calendar, or write to your docs, it can't really work for you. Plug this in first.
4. Reddit
The best signal you'll find on what people actually think about any product, niche, or problem (bc its real opinions from real users)
Amazing for market research.
5. YouTube transcripts
Pulls captions from any video. Long podcasts, tutorials, interviews etc become searchable notes in seconds.
Probably the highest-leverage research integration nobody plugs in.
6. Discord
I host my business in Discord, so this one's huge for me.
I plug Hermes into different channels and have it run specific workflows in each.
Example: I have a dedicated customer support channel where Hermes scans my email every morning for support tickets and drops them in organized.
7. GitHub
Code, issues, PRs. Turns Hermes into an actual engineering teammate.
Non-negotiable if you write code.
8. Stripe
Payments, customers, failed charges, refunds.
You can just ask "why did this customer churn" and get a real answer.
Also can't wait for this...Stripe is releasing agentic payments, so soon Hermes will be able to actually book stuff with your card.
9. Bland (or Twilio)
Gives Hermes a voice so it can place real phone calls (like booking reservations etc).
I love listening to the recordings haha
10. Apify
Pre-built scrapers for X, LinkedIn, Instagram, Google Maps, etc. The way to get X data without paying $5k/mo for the official API.
11. Readwise
Every highlight you've ever saved from books, articles, tweets, and podcasts, all queryable. Solves the "dead knowledge" problem.
12. Granola (or Fathom)
Searchable transcripts of every meeting you've had. Hermes can answer "what did that client say about pricing last month" instantly.
13. Obsidian
For Karpathy LLM wiki second-brain maxxing.
If I had to set up only 5, I'd do Firecrawl, Browserbase, Google Workspace, GitHub, and Obsidian.
Covers ~80% of what most people need.
I use Composio to add these in one click, makes setup basically zero effort instead of messing w technical stuff.
Anything I'm missing?? What's in your stack?
New Anthropic repo: Claude for Legal
It gives legal teams prebuilt AI workflows for contracts, privacy, employment, litigation, corporate work, IP, AI governance, regulatory monitoring, legal clinics, and law students
https://t.co/Ddu5s5qJUVhttps://t.co/rYHoAvqgTi
🚨 Karpathy was right. He warned that 90% of AI advice dies in 6 months.
spoiler: most tools won't even survive 90 days.
this guy is literally giving away the exact 2026 playbook for AI Agents.
he covers exactly what to learn, build, and ignore entirely 👀
↓ read this today https://t.co/yKSWKgfXEA
Rrohit4verse@rohit4verse
What to Learn, Build, and Skip in AI Agents (2026)
1/ Yesterday I published a letter to our customers and investors about GitLab Act 2.
The agentic era is the largest opportunity in our history. We're making the structural and strategic decisions to meet it.
A thread on what changes, what doesn't, and what we're betting on. 👇
https://t.co/y6IOeD7CcH
My reply to someone considering starting a video game company:
The distribution of possible rewards for starting a video game company are generally not very good today. The market is well served, and gaining a foothold requires strong execution on both business and product issues, along with a substantial amount of luck. Plan to burn through seven figures with a not-great chance of making it back.
If you do go for it, some bits of advice:
Identify your customers clearly before you start. Not just a broad community, but specific people, and imagine them as you make decisions.
Initially, build the smallest, most concise game you can imagine anyone paying for. It will still take much longer than you expect.
Once something exists, hill-climb the value. Hopefully you will have some elements that clearly bring joy to people, which you can magnify. There will inevitably be tons of things that people find confusing, frustrating, or just boring that you will need to fix.
the most low-effort / high reward thing you can do for security is installing the Russian language pack
(not even joking, it's ridiculous how often that prevents execution) https://t.co/wQ4res5DCA
MMsftSecIntel@MsftSecIntel
Microsoft is investigating mistralai PyPI package v2.4.6 compromise. Attackers injected code in mistralai/client/__init__.py that executes on import, downloads hxxps://83[.]142[.]209[.]194/transformers.pyz to /tmp/transformers.pyz, and launches a second-stage payload on Linux. The file name transformers.pyz appears deliberately chosen to mimic the widely used Hugging Face Transformers library and blend into ML/dev environments.
The main payload is a credential stealer, but it also includes country-aware logic; it avoids Russian-language environments and contains a geo fenced destructive branch that has 1-in-6 chance of executing rm -rf / when the system appears to be in Israel or Iran.
To mitigate this threat: isolate affected Linux hosts, block 83[.]142[.]209[.]194, hunt for /tmp/transformers.pyz, pgmonitor[.]py, and pgsql-monitor.service, and rotate exposed credentials.
Code is actually the right abstraction.
Too often I see the future of software engineering diminished down to, effectively, writing and reviewing markdown files.
Yes, it will be hard to review thousands of lines of agent code. But maybe the takeaway is that you want less code?
Rather than just giving up ("well I guess we won't read the code, or we'll read this lossy markdown summary") this should be a signal forcing you to think about better systems.
- How can we make our codebase more verifiable? For example, fast/robust/stable tests, or moving to a typed language.
- How can we deslop or improve the architecture/abstractions of the code generated by agents? For example, spending more time up front on the codebase architecture/types before yolo generating all of the code.
- How are we going to maintain and evolve this codebase over time? The slop compounds. One great solution here is... you guessed it, learning from the past decades of software engineering! For example, you might just have the wrong abstraction entirely, leading to a ton of duplicated code.
I think the markdown folks *are* right in some ways. If you are using skills every day, for many different prompts and workflows, isn't that effectively "coding with markdown"? Kinda.
There's been plenty of ink spilled on the merits and benefits of skills. To me, skills make your style of working legible for agents. They don't replace code and that's not really the point.
In reality, there's this messy and constantly re-evolving future in which both of these things are true:
1. Skills (and markdown) are important for how you give input to the agents and ensure high-quality code & systems are created
2. Looking at the actual code will not be replaced by markdown summaries or a collection of spec documents that ignore the lower level details of the code
In summary: reality has a surprising amount of detail (and nuance)!
Google Cloud AI engineer just showed how they go from idea to deployed app at Google in 30-minutes using Claude.
26-minutes. free. by Google AI team.
one person + Claude + Google Cloud = a full engineering org running on a laptop.
worth more than any $500 vibe-coding course. https://t.co/TJak54zf4X
00xCodez@0xCodez
Anthropic's Claude team just showed how to build an AI agent with real memory in under 30 minutes.
24-minutes. free. by the people who built Claude.
one person + 10 agents with memory = a team that runs 24/7, remembers every customer improves itself.
worth than $500 vibe-coding course.
Andrej Karpathy: "90% of your AI coding bill is paying for context you didn't need to send"
Here are 10 things senior AI engineers stopped wasting tokens on:
1. Auto-context loading 50 files for a 30-line fix: $1.20/turn for tokens you'll never read. 80% input waste, every session
2. Running Opus on lint, format, and rename tasks: $0.60 for what Haiku nails at $0.02. 30x overpay on the cleanup tier
3. Tool call loops that re-send the full repo on every retry: 5x context cost per agentic flow. fixing these alone cuts 30-50% of bills
4. Sonnet as the default model: Kimi 2.6 matches its quality on most coding tasks at 1/6 the cost. defaulting to Sonnet in 2026 is leaving 60-70% on the table
5. Streaming responses on stable-prefix workflows: kills your prompt cache. you pay 10x for tokens that should have cost cents
6. "Just in case" file includes: 80,000-token prompts that should be 3,000. context bloat is the silent budget killer
7. Per-session knowledge rebuilding: 10 min writing a SKILL.md once vs paying agents to re-figure out your environment every run. $4 vs $0.30 per execution
8. Single-model setups: premium tier on every task is the most expensive mistake in AI coding right now
9. Asking 10 small questions one at a time: 10 separate input prefix charges vs one batched call. 70-90% savings on routine workflows
10. Buying Claude Pro + ChatGPT Plus + Cursor Pro: you seriously use one. the other two are habit, not utility
what actually compounds instead:
- context discipline (grep before fetching, always)
- prompt caching on every stable prefix
- multi-model routing (Kimi 2.6 default, Opus for the 10%)
- graduated skills via SKILL.md files
- profiling tool calls before optimizing prompts
- the routing mindset (right model for right task)
in 12 months, the gap between developers shipping on $200/month and $4,000/month budgets won't be skill
it'll be how well they route
study this.
I'm giving away a FULL course on how to build a managed AI agent business solo using Hermes Agent, Orgo, Obsidian, Codex, Claude Code etc.
Here's everything (47 minutes):
1. The offer: unlimited agents, unlimited usage, all infrastructure and security included. The customer gets a digital employee. They never think about tokens or models. You handle everything.
2. Don't niche down too fast. Try marketing agencies, law firms, insurance, manufacturing, real estate. See where the market pulls you. Then go vertical. Diverge first, converge later.
3. Every executive has the same problems regardless of industry. Too many emails, too many meetings, too many follow-ups, too many open loops. Solve those first. Then layer in vertical-specific skills.
4. The stack: Hermes Agent for the agent harness. Codex or Claude Code desktop to build and configure. Orgo for cloud computers so every agent lives in its own sandbox. Composio for one-click authentication across thousands of apps. Agent Mail to give every agent its own email. Obsidian for the knowledge base.
5. Use agents to build agents. Don't stress about setup. Use Claude Code or Codex to install and configure Hermes inside a VM. Use Perplexity MCP, Context7, and Exa for up-to-date docs. Your agent sets up your customer's agents.
6. GPT 5.5 is the best model right now. Efficient with tool calls. Doesn't eat tokens like Opus 4.7. For cheaper tasks, GLM 5.1 from ZAI is the best open source option.
7. Set up watchdogs for gateway crashes so they auto-restore. Have agents email you when cron jobs break or skills fail. Your customer should never have to tell you something is broken.
8. Get customers through content. If someone jumps on a call and already knows who you are and what you sell, that's the position you want. Content is the most leveraged thing you can do in 2026.
9. Keep scope tight. One to two requests at a time, delivered in under 48 hours. Use Trello for customer-facing project management. Send Loom updates at random hours to show you're always working on their agents.
10. If you can set up Claude Code, Hermes, or OpenClaw, you have a skill that 99% of business owners don't have and would pay $5k/month for. You're probably not giving yourself enough credit.
shoutout to @nickvasiles from @orgodotai for coming back on @startupideaspod and sharing the full playbook. tools, stack, fulfillment, everything.
this type of episode isn't shared anywhere on the internet. this is the alpha people keep for themselves.
i will keep sharing if you keep watching.
you could watch netflix or you can watch this (link below)
https://t.co/Z4PM5I7d0S
watch
WTF is a forward deployed engineer? (and why everyone is hiring them)
WTF is a forward deployed engineer? (and why everyone is hiring them)
This role has been called “the hottest job in startups” according to a16z, with 800% growth in 2025, and it's still a hot topic with solid trajectory ...
Every AWS Lambda invocation runs in a full VM that boots in under 125ms.
Firecracker is the ~50,000 line Rust binary that makes that possible.
I wrote an interactive blog about it, with components you can play with. https://t.co/0X88Mv5ZKo
Kkylejeong@kylejeong
What is Firecracker, and why do all the Agent Infra companies care about it?
Just dropped an absolutely gargantuan new /skill on @initialcommitco: SEO Sprint.
https://t.co/QF6pGR7SJf
Run it in your codebase to generate a full SEO playbook AND actually execute the playbook inside your app.
This is probably the biggest skill I've ever written. 23 files involved. Works across any tech stack.
You'll really want an @ahrefs account for this (for the Ahrefs MCP). Not absolutely necessary but you'll get drastically better outcomes with it.
Most indie founders spend 80% of their time building and 0% marketing.
Then wonder why no one shows up.
Quirre builds you a personalised marketing plan in 60 seconds.
Then helps you execute it — copy, channels, timing — one step at a time.
What actually is GBrain?
(Y Combinator CEO's personal agent brain)
Every agent memory tool you've seen solves a simple problem: store facts, retrieve facts.
GBrain solves a different one. It gives your agent a knowledge system that wires itself, enriches itself, and compounds while you're not even using it.
Here's what makes it fundamentally different from Mem0, Zep, LangMem, or a CLAUDE.md file.
The standard approach to agent memory is vector-based. Your agent stores memories as embeddings, retrieves them by semantic similarity, and that's the loop. Some tools add a knowledge graph on top.
GBrain flips the model entirely. The source of truth is a folder of markdown files. One page per person, one page per company, one page per concept. Every page follows the same two-part structure:
𝗖𝗼𝗺𝗽𝗶𝗹𝗲𝗱 𝘁𝗿𝘂𝘁𝗵 on top: your current best understanding, rewritten as new evidence arrives
𝗧𝗶𝗺𝗲𝗹𝗶𝗻𝗲 on the bottom: an append-only evidence trail that never gets edited
This is not a vector store with a markdown export. The markdown IS the system of record. You can open it in VS Code, edit it by hand, and 𝗴𝗯𝗿𝗮𝗶𝗻 𝘀𝘆𝗻𝗰 picks up the changes.
Now the part that makes this compound.
Every time a page is written, GBrain extracts entity references and creates typed relationship links: 𝘄𝗼𝗿𝗸𝘀_𝗮𝘁, 𝗶𝗻𝘃𝗲𝘀𝘁𝗲𝗱_𝗶𝗻, 𝗳𝗼𝘂𝗻𝗱𝗲𝗱, 𝗮𝘁𝘁𝗲𝗻𝗱𝗲𝗱, 𝗮𝗱𝘃𝗶𝘀𝗲𝘀. All deterministic, all regex-based, zero LLM calls.
The knowledge graph wires itself on every single write, without spending tokens.
So when you ask "who works at Acme AI?" or "what has Bob invested in this quarter?", the agent walks the graph instead of relying on vector similarity (which struggles with relational queries like these).
Search layers ~20 deterministic techniques in concert: intent classification, multi-query expansion, vector search, keyword search, reciprocal rank fusion, cosine re-scoring, compiled-truth boosting, and backlink ranking. Each catches what the others miss.
But the real unlock is the compounding loop.
GBrain has a 𝘀𝗶𝗴𝗻𝗮𝗹 𝗱𝗲𝘁𝗲𝗰𝘁𝗼𝗿 that fires on every message and captures entities in the background. Person mentioned once? They get a stub page. Three mentions across different sources? Web enrichment kicks in. After a meeting? Full pipeline.
The agent runs a 𝗱𝗿𝗲𝗮𝗺 𝗰𝘆𝗰𝗹𝗲 overnight: scans conversations, enriches missing entities, fixes broken citations, consolidates memory. You wake up and the brain is smarter than when you went to bed.
This is fundamentally different from memory systems that only store what you explicitly tell them to store.
Garry Tan (President and CEO of Y Combinator) built this to run his actual AI agents. It ships with 34 skills, runs on embedded PGLite (no server, ready in 2 seconds), and works as an MCP server for Claude Code, Cursor, and Windsurf.
GBrain: https://t.co/11T8Wp95RK
after these supply-chain incidents, I summarized some basic repo hardening checks into a skill.
It checks the repo for
- pnpm 11+ package manager policy
- release-age gates and lockfile hardening
- risky dependency specs like latest, git, http, file:
- unreviewed dependency lifecycle scripts
- unsafe CI install, cache, publish, and secret patterns
- optional npm supply-chain incident
practical first pass for finding common repo hardening gaps. I ran it with GPT-5.5 High.
been building a list of the best /goal use cases. here’s 23 you can use:
1. complex refactors
2. architecture cleanup
3. auth flow consolidation
4. state management consolidation
5. SDK wrapper consolidation
6. npm supply chain hardening
7. design system enforcement
8. component library standardization
9. typescript strictness fixes
10. test suite hardening
11. CI/CD pipeline triage
12. dependency upgrade migrations
13. schema migration safety review
14. routing/navigation refactor
15. performance optimization pass
16. accessibility audit/fix pass
17. security audit/remediation
18. error handling standardization
19. internationalization/localization wiring
20. platform migration (web/iOS/Android)
21. documentation generation
22. onboarding/architecture map creation
23. monorepo restructuring
/goal is the closest thing we have to a senior engineer that never gets tired… and it works in Codex, Claude, and Hermes too.
what's missing from this list? I’ll add it
Kkloss_xyz@kloss_xyz
/goal is the best command in Codex, Claude Code, and Hermes right now.
And most are using it wrong.
They write "make no mistakes".
And pray.
Here's how to structure yours for a mission, to rank your uncertainties before acting, to kill scope creep, and to close every loop other prompts leave open.
/goal prompt [structure below]
GOAL:
<single clear measurable outcome; one mission only>
CONTEXT:
<repo/files/architecture/current state>
<known assumptions, dependencies, and relevant prior decisions>
CONSTRAINTS:
<what must not change>
<required standards/patterns>
<forbidden files/actions if any>
PRIORITY: (optional)
1. <highest priority>
2. <secondary priority>
3. <tertiary priority>
PLAN:
<understand first, then act>
<restate understanding before executing non-trivial changes>
<prefer minimal sufficient changes over broad rewrites>
DONE WHEN:
<verifiable completion state>
<expected behavior preserved or improved>
VERIFY:
<tests/build/lint/typecheck/manual validation>
<state what could not be verified and why>
<include rollback/containment plan for destructive or high-risk changes>
OUTPUT:
<concise summary/docs/audit/results>
<changed files, key decisions, risks, and follow-ups>
STOP RULES:
<halt on high-impact ambiguity or risk; do not invent architecture, behavior, or requirements>
<surface uncertainties together with ranked highest-confidence proposals before acting; not open-ended clarification questions>
<do not continue expanding scope after the goal is satisfied>
The top Hermes integrations to give your agent superpowers:
1. Firecrawl
Basically web search built for agents.
It's better than the native Hermes web search because it gives you clean web data, so responses come back faster and uses fewer tokens.
I keep this on by default.
2. Browserbase
Gives Hermes browser access for actually interacting with sites.
Logging in, clicking buttons, booking stuff, anything that needs a real browser session.
Hermes will automatically pick between Firecrawl and Browserbase depending on what the task needs, so you just plug both in.
3. Google Workspace
Gmail, Calendar, Drive, Docs, and Sheets in one connector.
If Hermes can't read your inbox, see your calendar, or write to your docs, it can't really work for you. Plug this in first.
4. Reddit
The best signal you'll find on what people actually think about any product, niche, or problem (bc its real opinions from real users)
Amazing for market research.
5. YouTube transcripts
Pulls captions from any video. Long podcasts, tutorials, interviews etc become searchable notes in seconds.
Probably the highest-leverage research integration nobody plugs in.
6. Discord
I host my business in Discord, so this one's huge for me.
I plug Hermes into different channels and have it run specific workflows in each.
Example: I have a dedicated customer support channel where Hermes scans my email every morning for support tickets and drops them in organized.
7. GitHub
Code, issues, PRs. Turns Hermes into an actual engineering teammate.
Non-negotiable if you write code.
8. Stripe
Payments, customers, failed charges, refunds.
You can just ask "why did this customer churn" and get a real answer.
Also can't wait for this...Stripe is releasing agentic payments, so soon Hermes will be able to actually book stuff with your card.
9. Bland (or Twilio)
Gives Hermes a voice so it can place real phone calls (like booking reservations etc).
I love listening to the recordings haha
10. Apify
Pre-built scrapers for X, LinkedIn, Instagram, Google Maps, etc. The way to get X data without paying $5k/mo for the official API.
11. Readwise
Every highlight you've ever saved from books, articles, tweets, and podcasts, all queryable. Solves the "dead knowledge" problem.
12. Granola (or Fathom)
Searchable transcripts of every meeting you've had. Hermes can answer "what did that client say about pricing last month" instantly.
13. Obsidian
For Karpathy LLM wiki second-brain maxxing.
If I had to set up only 5, I'd do Firecrawl, Browserbase, Google Workspace, GitHub, and Obsidian.
Covers ~80% of what most people need.
I use Composio to add these in one click, makes setup basically zero effort instead of messing w technical stuff.
Anything I'm missing?? What's in your stack?
Anthropic just dropped a 25-minute video showing everything new in Claude Code. Official channel, not a course, not an influencer
Subagents, agent teams, background tasks, parallel workflows. Most people are still using Claude Code as a single-prompt chat https://t.co/bOGMmeZARr
Zzodchiii@zodchiii
The 10 Claude Code agents nobody told you to build.
Read this article carefully
U will be hearing a lot more about the PCB/interconnect bottleneck when mass production of TPU v8, Rubin, and Trainium3 starts in Q4 2026
Zzephyr_z9@zephyr_z9
The third semis memo is out
We talk about power & analog semis, orchestration plane in the agentic era, the neoclouds trade, interconnect bottleneck (probably the biggest limiter for 2026-27), Korea Unlocked
https://t.co/SbnFlVfGTT
Bill Staples runs GitLab, the platform roughly 30 million developers use to ship software. His 14-tweet thread on "GitLab Act 2" is the most honest layoff-and-AI-pivot announcement any public CEO has made yet. The line worth screenshotting: "Authoring code by hand may be going away."
A sitting CEO. Saying it on Twitter. With his face on it.
The pattern this fits into:
- Meta cut 21,000 in 2023, then committed $35-40B to AI infrastructure for 2024.
- Salesforce cut 7,000, then launched Agentforce at roughly $2 per conversation.
- Amazon cut 27,000 since 2022, then committed $100B+ to AI infrastructure for 2025.
- Microsoft cut 6,000 in May 2025, then crossed $13B in annualized AI revenue.
The math is consistent. Every dollar saved on payroll funds a GPU, a model contract, or an agent platform. These companies are running a swap: workforce that built version 1, out. Compute layer for version 2, in.
What makes this thread different is the transparency. Old playbook: hide layoffs in 8-K filings, blame "macro headwinds," never mention AI replacement. New playbook: announce both publicly, frame it as opportunity, post the explanation on Twitter. The CEOs who say it out loud first set the script everyone else has to follow.
Read tweet 8 carefully. "We intend for the majority of work to be done by agents." That is a public commitment, in writing, from the CEO of a 30 million developer platform. To his own investors. The same day GitLab opened a voluntary separation window across its 2,580 employees with no number specified, leaving the entire company in limbo until June 1.
This script is about to run through every white-collar industry. Legal, accounting, design, marketing, customer service, support. GitLab is choreographing it openly because the playbook needs a public test case. The cover story will be "your work gets more valuable." The math will be fewer roles, paid more, managing agents.
Watch the thread structure. Layoff in tweet 2. AI bet in tweet 5. Customer reassurance in tweet 6. Investor pitch in tweet 8. Operating principles in tweet 9. That sequence becomes a template by year-end. Save the screenshot.
Bbstaples@bstaples
1/ Yesterday I published a letter to our customers and investors about GitLab Act 2.
The agentic era is the largest opportunity in our history. We're making the structural and strategic decisions to meet it.
A thread on what changes, what doesn't, and what we're betting on. 👇
https://t.co/y6IOeD7CcH
Can you imagine when humans first controlled fire?
Some said, “It makes food taste better.”
Others said, “So you want to risk burning the world for better food?
One million years later, we are having the same argument about AI.
I spoke to five Fortune 2000 execs today about the state of AI.
I asked each one “What’s the most challenging part about this moment in AI?”
The CISO said: “There is an ocean-sized gap between hype and reality, which makes discerning what’s real exhausting.”
The VP of AI engineering said: “Everyone acts like they’re an expert, yet the main reason so few AI use cases have reached production in enterprises is because true expertise requires experience in scaled systems, enterprise politics, AI fluency, governance and guardrails, and deep process knowledge. Almost no one is actually an expert.”
The CTO said: “Our remit is to cut costs, but you can’t actually take AI transformation seriously without increasing AI/R&D budgets up front to ultimately drive bottom line once things are in production and performant. It’s an unrealistic expectation.”
The Chief of Staff said: “My job is to drive AI upskilling across the organization, and after doing it for 2 years I’m exhausted. Yes there’s potential ROI from all of the agentic workflows we’re building, but soul and humanity are being sucked out of our processes.”
The Finance leader said: “We acquired a multibillion dollar old school business. Getting that business to be AI-native is incredibly painful largely because people aren’t ready or willing to adopt it.”
I’m having convos like this every day because I'm building an invite-only AI community for enterprise execs (and interviewing folks before I let them in), but if you find these notes helpful I’m happy to keep sharing them!
Demis Hassabis says he can cure every disease in 10 years.
Most people roll their eyes when they hear this, but I don't.
Demis is the guy who just won the Nobel Prize for solving protein folding with AI (a problem biologists had been stuck on for 50 years).
But that was just one milestone in his much grander plan.
In 2010, he founded DeepMind with a 2-part mission: "solve intelligence, then use it to solve everything else."
Step 1: make AI good enough to do real science.
Step 2: point that AI at humanity's biggest problems.
Step one was AlphaFold.
He used AI to figure out the 3D shape of every protein in nature (which is basically what every drug attaches to).
Demis said it would have taken "a billion years of PhD time" to do by hand.
Step two is curing all disease.
And as of today, step two is fully funded.
Isomorphic Labs (his AI drug discovery company inside Google) just raised $2.1B led by Thrive Capital.
Here's where the money goes and what Demis thinks happens next:
> Drug discovery currently takes 5-10 years and costs billions per drug. That math is why most diseases don't have good treatments today.
> AI fixes the math. Their drug design engine compresses development from years to months. Maybe weeks.
> Isomorphic's first AI-designed cancer drug enters human trials this year.
> Their pipeline expands beyond the current 17 programs across cancer, immune diseases, and heart disease into more health domains.
> The endgame is personalized medicine: drugs designed overnight for your specific biology and your specific disease.
That last one is the whole point.
Today's drugs are mass-produced for an "average" patient who doesn't really exist.
So most existing treatments work inconsistently from person to person, and most rare diseases never get a treatment at all (no market = no drug).
When drug design gets fast and cheap, that whole calculus flips.
Cancer variants get drugs designed for that specific variant, rare diseases get treatments because economics stop mattering, and drug-resistant infections get new drugs faster than they can evolve.
That's what curing every disease actually looks like.
Now imagine what your life looks like in 2036.
A doctor draws your blood, sequences your genome, sends your disease profile to an AI.
By morning the AI has designed a custom drug for your specific biology.
Side effects, dosage, drug interactions all worked out before you take the first pill.
You and your kids never see a cancer ward.
That's what $2.1B is buying today.
Demis was right about AlphaFold.
If you consider the possibility that he's right again, every disease alive today is on borrowed time.
Ddemishassabis@demishassabis
I’ve always believed the No.1 application of AI should be to improve human health.
That work started with AlphaFold, and now at @IsomorphicLabs with the mission to reimagine drug discovery and one day solve all disease!
We are turbocharging that goal with $2.1B in new funding. https://t.co/Hvk20dHgjl
Step 1: give technical champions a neat tool to dork around with in their free time
Step 2: wait for them to bring it into the enterprise, with or without approval, because it's useful
Step 3: gain enterprise share rapidly
Step 4: piss of the technical champions by fucking with their free time tool use and also degrade the quality of the tool by insisting on vibe coding everything
Step 5: have your CEO constantly say all SWEs are on the chopping board in 6 months
Step 6: find out you have no moat surprisedpikachu.pcx
Wwillsentance@willsentance
posted on this back in march, but this will eventually become a study in a biz school somewhere. claude had the upper hand for the last two quarters due to their harness + model quality showing breakthroughs for production grade coding. they lost that lead almost overnight. heres why:
1. they treated their model as the moat, which wasnt sustainable as all OAI had to do was tune for code and release. the real moat for power users(the main consumer base + source for coding data) is price/perfomance and UX of the harness. OAI holds all compute and a comparable model so they get the price floor, simple as.
2. for some reason, anthropic decided to release a PR stint around Mythos with the implication that devs weren't to be trusted with such power, and its clear at this point it really was an attempt to declare their pivot away from the consumer to enterprise. this was also interpreted as a signal that anthropic wont be releasing SOTA to the consumer anymore, so users switched. OAI released a comparable model anyway and the world didn't implode, so, theres that too.
3. OAI bought all the talent for the harness they could over the last 12 months, Alex app, etc all got folded into one thing: make codex the best ever. All efforts in the company went towards this, instead of silently abandoning Claude Code users for enterprise like Anthropic is probably doing.
4. The claude code team is faced with hard choices, report the churn as a price/perfomance issue and take that up with execs, only to be told they cant budge, or try to find core UX issues that might win back some users. both choices are suboptimal and wont solve.
core lesson: if you plan to abandon your core customer, be really careful how you execute that or you may end up in a canyon you cant cross
Anthropic acaba de lanzar el abogado más barato del mundo
Se llama claude-for-legal.
Y esto es lo que es capaz de hacer:
• Leer y revisar contratos
• Redactar respuestas legales
• Construir tablas de reclamaciones para juicios
• Vigilar fechas de vencimiento y renovaciones
• Conectarse solo a tus herramientas: Slack, DocuSign, Ironclad, Lexis+…
Todo eso sin salir de Claude
Cómo funciona:
→ Lo instalas en 60 segundos
→ Funciona en Claude Cowork, Claude Code o tu propia API
→ Es open-source y 100% gratuito
Qué áreas cubre:
• Contratos comerciales y privacidad
• Litigación y regulatorio
• Gobernanza de IA
• Formación jurídica
Lo que antes le llevaba horas a los abogados, ahora se hace en minutos
Enlace abajo👇
PPolymarket@Polymarket
JUST IN: Anthropic rolls out new Claude tools aimed at automating legal work for lawyers & law firms.
How do you keep Claude working until the job is done? Claude Code helps with this in a few ways, including one we shipped recently: /goal. https://t.co/QtVPmwoKct
Forward deployed engineers, or equivalent, are about to become one of the most in-demand jobs in tech. And one of the most important functions for AI rollouts.
Deploying agents is far more technical of a task than most people realize, often far more involved than deploying software. Software generally works the same way every time, and generally for the past few decades has been updated versions of an existing technology or concept (which basically means easier for the enterprise to update their workflows on a newer system).
With agents, you’re actually deploying the equivalent of work output within the enterprise. The customer is effectively using you as a professional services provider for a task, which they expect to get solved nearly end-to-end now. This means you need to actually deeply understand the business process as a vendor, and get the customer from the current to the end state seamlessly.
Companies need help figuring out which models will work best for their workflows, they need extensive evals setup often, they need change management support for workflows, they need to get their data setup for the agents, and constant tuning of the agentic system for their process.
Massive role in tech now. And another example of the kind of highly technical work that AI is creating.
FFirstSquawk@FirstSquawk
GOOGLE TO RECRUIT HUNDREDS OF ENGINEERS TO ASSIST CLIENTS IN EMBRACING ITS AI – THE INFORMATION
Security things from the last few days:
- CopyFail (linux pwn'd)
- CopyFail 2/Dirty Frag
- 13 advisories in Next.js
- Over 70 CVEs addressed in MacOS 26.5
- ~50 CVEs addressed in iOS 26.5
- YellowKey (Windows Bitlocker pwn'd entirely)
- GreenPlasma (Windows privilege escalation)
- CVE-2026-21510 and CVE-2026-21513 confirmed to be used by Russia for Windows RCE
- CVE-2026-32202 separately confirmed to be used by Russia for sensitive document access
- Mini-Shai Hulud (over 300 JS and Python packages compromised via GitHub Action cache poisoning)
- Google confirms they have identified AI-powered exploitation of zero days in an unidentified "open-source, web-based system administration too"
- Canvas (popular LMS used in most schools) pwn'd entirely
- PAN-OS (palo alto networks) pwn'd with a 9.3 severity CVE-2026-0300
Are you scared yet?
OpenAI pays engineers $1,000,000+ a year to build agentic systems.
Stanford just put a 2 hour lecture that covers 80% of it for FREE.
Bookmark this. Give it 2 hours today. Then read the guide below
It might be the highest ROI thing you do this month: https://t.co/b545D1Kymd