The Harness Becomes the Moat: Agent Memory Systems Mature While Claude Code and Codex Battle for Developer Loyalty

May 14, 2026 · 19 sources

Today's AI discourse crystallized around agent architecture as the real competitive differentiator, with a detailed Hermes memory system breakdown and analysis of why agent codebases fail at month six. Meanwhile, the Claude Code vs Codex war intensified as Anthropic's enterprise pivot appeared to alienate power users, and a breakthrough paper on multi-stream LLMs challenged the sequential thinking paradigm inherited from ChatGPT.

Daily Wrap-Up

If there was one message that cut through today's noise, it is that model quality is no longer the battleground. The conversation has moved decisively to the harness: the architecture, memory system, and context layer wrapped around the model that determines whether an agent actually works in production or collapses six months in. We saw this from multiple angles. @akshay_pachaar published a remarkably detailed breakdown of Hermes' three-tier memory architecture, from tiny always-present markdown files to SQLite full-text search to pluggable external providers. @ghumare64 analyzed the four canonical ways agent codebases break at month six and argued for structural fixes over developer discipline. And @steipete shipped a skill that loops Codex's review until the code is clean, a small but telling example of the meta-tooling being built around these coding agents.

The competitive drama between Claude Code and Codex reached what feels like a turning point. @willsentance's thread laying out why Anthropic lost its coding lead almost overnight went viral, arguing that treating the model as the moat was doomed once OpenAI simply tuned for code and leveraged their compute advantage on price. The technical evidence backs this up: @antirez reported that every contributor in the recent DS4 benchmarking found GPT 5.5 immensely helpful while Opus was "completely useless." Anthropic is now offering dedicated monthly credits to Claude Code users, a clear response to churn, but the trust damage from their enterprise pivot may prove harder to fix than the pricing.

On the research front, @jonasgeiping's multi-stream LLM paper, highlighted by @ShashwatGoel7, challenges something so fundamental that most of us never questioned it: the sequential message-based exchange inherited from ChatGPT's original design. The idea that models could maintain parallel streams for reading, writing, thinking, and subvocalizing concerns feels like one of those shifts that seems obvious only after someone articulates it. The most practical takeaway for developers: stop selecting AI coding tools based on model benchmarks alone and start rigorously evaluating the harness, memory architecture, and context management layer, because that is where production value is actually created and where the competitive moat now lives.

Quick Hits

@mattpocockuk released an improved version of /grill-me, his most popular skill ever, saying he has stopped using the original for code and now gets 5-10 messages daily from users whose workflows it transformed.
@derekmeegan demonstrated a /browser-to-api skill that analyzes network activity and CDP logs to auto-generate OpenAPI specs, showing Codex one-shotting a fully documented OpenTable API client.
@fitchmultz recommends pi-agent-browser-native for making browser automation a native tool within the pi agent framework, reporting materially improved tool uptake and token efficiency over bash-based workarounds.
@0xMovez highlighted a free 32-minute vibe-coding session between Claude Code creator Boris and Bun's creator, calling it the best vibe-coding masterclass available this week.
@0xSero received a $100,000 grant from the Human Rights Foundation, adding to donations and hardware from Nvidia, Lambda, and private donors, all supporting the open source AI mission.

Agent Architecture and Memory Systems

The dominant theme of the day was the maturation of agent architecture from experimental playground into rigorous systems engineering. @Saboo_Shubham_ showcased a multi-agent pipeline where Codex builds, Claude Code reviews and refines, and Hermes orchestrates the handoff, all tracked on a single Kanban board with agents running in continuous loops. This pattern of specialized agents with defined roles and shared state management is becoming the default architecture for serious development workflows.

The standout contribution was @akshay_pachaar's deep dive into Hermes' three-tier memory system, which tackles the fundamental problem of agent amnesia with remarkable elegance. Tier 1 uses two tiny markdown files, MEMORY.md at 2,200 characters and USER.md at 1,375 characters, injected into the system prompt at session start. When MEMORY.md hits roughly 80% capacity, the agent consolidates by merging related entries and dropping redundancy. As @akshay_pachaar puts it: "natural selection pressure applied to memory. the files stay small, but what's inside gets sharper over time." Tier 2 stores every conversation in SQLite with FTS5 indexing, enabling sub-10ms search across 10,000+ documents. Tier 3 brings pluggable external providers like Honcho for dialectic user modeling and Holographic for local-first HRR vectors, all orchestrated by an autonomous nudge that decides what is worth saving every 300 seconds.

This architectural rigor was reinforced by @ghumare64's analysis of @mfpiccolo's framework for identifying the four canonical month-six failures in agent codebases: class-level mutable defaults shared between agents, tool functions that return None on every failure type, session memory mutated by LLM-extracted strings, and multi-agent setups leaking parent conversation history to sub-agents. The proposed Worker/Function/Trigger pattern makes these failure modes structurally inexpressible. @ghumare64 draws the historical analogy sharply: "What Mike's arguing is the React-of-2013 move. jQuery apps scattered DOM state across whichever closure was handy. The discipline of 'keep state in one place' was well understood and ignored everywhere. React made the discipline structural: the bug class went away because the framework stopped allowing the bad shape."

On the tooling front, @mvanhorn introduced Granola CLI integrations for Claude Code and Hermes skills, adding cross-meeting SQLite search, MEMO pipeline runner, and attendee timelines. @steipete contributed a skill that runs codex /review in a loop until there are no issues left, with the self-aware caveat that "It won't fix system architecture for ya, so you still need BRAIN as master model." Together these posts paint a picture of an ecosystem that is rapidly building the infrastructure layer between LLMs and production software.

The Claude Code vs Codex War

The competitive dynamics of the AI coding market took center stage with @willsentance's viral thread responding directly to Claude Code team member @bcherny's request for feedback. His four-point analysis of Anthropic's rapid loss of its coding lead is worth studying as a business case. The core argument: Anthropic treated their model as the moat, which was fundamentally unsustainable since all OpenAI had to do was tune for code and release. Meanwhile OpenAI controls the compute and therefore the price floor, bought up harness talent aggressively over the past year, and focused the entire company on making Codex the best coding experience possible.

As @willsentance frames the pivotal misstep: "for some reason, anthropic decided to release a PR stint around Mythos with the implication that devs weren't to be trusted with such power, and its clear at this point it really was an attempt to declare their pivot away from the consumer to enterprise." His conclusion carries weight: "core lesson: if you plan to abandon your core customer, be really careful how you execute that or you may end up in a canyon you cant cross."

The technical evidence reinforces the narrative. @antirez, creator of Redis and one of the most respected infrastructure engineers in the industry, reported bluntly that during the recent DS4 benchmarking, "not just me but every other contributor found GPT 5.5 able to help immensely and Opus completely useless." When engineers of that caliber publicly declare a preference, it signals a real shift in sentiment. The community's response has been pragmatic rather than loyal: @LLMJunky noted that someone already built a workaround to Anthropic's controversial claude -p changes, with @FUCORY shipping npx claude-p as a drop-in replacement. And @malikwas1f shared that Anthropic is now offering "dedicated monthly credit" that effectively slashes Claude Code usage limits by 5 to 20 times, a clear acknowledgment of the price-performance problem driving churn.

MCP, Context Layers, and Headless Software

Two posts today addressed where value actually accrues when agents replace traditional user interfaces. @a16z highlighted Salesforce's decision to open its APIs and launch a headless product, essentially betting that in an agentic world, its value lies in the data layer rather than the UI layer. The question @a16z's Seema Amble poses is sharp: if you strip away the UI and expose the database, what are you actually left with?

@jainarind, CEO of Glean, provided the technical counterpart with benchmark results showing Glean's MCP server was preferred roughly 2.5x over off-the-shelf MCP tools in Claude Cowork while using about 30% fewer tokens, 44k versus 57k median. The key insight: "MCP is a protocol, not a context layer. It standardizes how models call tools. It does not solve ranking, permissions, memory, identity, or cross-system understanding." When MCP sits on top of a unified context layer with connectors, indexes, enterprise graph, and permissions baked in, agents return better results at lower cost because they are not brute-forcing their way through fragmented context with additional tool calls and reasoning loops.

This connects directly to the broader theme of the day. The harness, the context layer, the memory architecture: these are the new infrastructure challenges. Companies like Salesforce are betting their future on being the data substrate that agents query. Glean has built a product around being the unified context layer. Developers building with agents should invest in their context infrastructure with the same rigor they would invest in their data model.

Multi-Stream LLMs: Breaking the Sequential Bottleneck

The most intellectually provocative content came from @jonasgeiping's paper on multi-stream LLMs, highlighted by @ShashwatGoel7 who said it "made me re-think my understanding of what transformers can do (the changes are so simple and elegant!), and what language models could be." The argument targets something so fundamental that most of us never thought to question it: the sequential message-based exchange paradigm inherited from ChatGPT's original design.

As @jonasgeiping frames the problem: "The models cannot read while writing, cannot act while thinking and cannot think while processing information." Multi-stream LLMs address this through instruction-tuning for parallel stream formats, enabling models to predict and read tokens across multiple streams simultaneously in each forward pass. The benefits span latency, user experience, security through separation of concerns, and a novel form of parallel reasoning. The most striking claim: "Even if the main CoT stream is accidentally pressured or too focused on a particular task to voice concerns, other internal streams can subvocalize concerns that would otherwise not be verbalized."

This is a direct challenge to the single-stream orthodoxy that has defined how every major LLM interacts with users. If the instruction-tuning results hold, it could reshape agent architectures from the ground up, making many of the orchestration patterns developers are building today feel as dated as callback-based JavaScript feels now.

Security: AI Finds an 18-Year-Old NGINX Bug While Supply Chain Attacks Escalate

Two security stories illustrated AI's dual role in cybersecurity: as a powerful tool for vulnerability discovery and as part of an increasingly complex attack surface. @IntCyberDigest reported that AI discovered a critical remote code execution vulnerability in NGINX that had been present for 18 years, affecting versions 0.6.27 through 1.30.0. Triggered via the rewrite and set directives in config files, the vulnerability includes published PoC code on GitHub, making immediate patching urgent for the widely deployed web server.

On the supply chain side, @DeRonin_ published a comprehensive protection guide following the TanStack npm compromise, where 42 packages were infected with credential-stealing payloads. The attack vector was elegant: malicious code hidden in optionalDependencies resolved through

Sources

Will Sentance @willsentance · May 12

posted on this back in march, but this will eventually become a study in a biz school somewhere. claude had the upper hand for the last two quarters due to their harness + model quality showing breakthroughs for production grade coding. they lost that lead almost overnight. heres why: 1. they treated their model as the moat, which wasnt sustainable as all OAI had to do was tune for code and release. the real moat for power users(the main consumer base + source for coding data) is price/perfomance and UX of the harness. OAI holds all compute and a comparable model so they get the price floor, simple as. 2. for some reason, anthropic decided to release a PR stint around Mythos with the implication that devs weren't to be trusted with such power, and its clear at this point it really was an attempt to declare their pivot away from the consumer to enterprise. this was also interpreted as a signal that anthropic wont be releasing SOTA to the consumer anymore, so users switched. OAI released a comparable model anyway and the world didn't implode, so, theres that too. 3. OAI bought all the talent for the harness they could over the last 12 months, Alex app, etc all got folded into one thing: make codex the best ever. All efforts in the company went towards this, instead of silently abandoning Claude Code users for enterprise like Anthropic is probably doing. 4. The claude code team is faced with hard choices, report the churn as a price/perfomance issue and take that up with execs, only to be told they cant budge, or try to find core UX issues that might win back some users. both choices are suboptimal and wont solve. core lesson: if you plan to abandon your core customer, be really careful how you execute that or you may end up in a canyon you cant cross

B bcherny @bcherny

@DavidKPiano Hey, Boris from the team here. What can we do better?

Shubham Saboo @Saboo_Shubham_ · May 12

Codex /goal builds it. Claude Code /goal review and refines it. Hermes /goal manages the orchestration and handoff. All tracked on a single Kanban Board and agents keep running in the loop. https://t.co/WAIr8zCP4o

Mitch Fultz @fitchmultz · May 13

If you use pi, try pi-agent-browser-native. It makes agent-browser a native pi tool, so agents actually use browser automation instead of awkward bash glue. In my runs it has materially improved tool uptake, speed, and token-efficient browser work. https://t.co/KCaBQvRzMY

Ronin @DeRonin_ · May 13

🚨USE THIS GUIDE TO PROTECT YOUR COMPUTER FROM NPM HACKS THAT STEAL EVERYTHING IN ONE INSTALL TanStack, a code library used in millions of web apps, got hacked on Monday one install steal every password, key, and credential on your computer this is far not the first hack this month and definitely just the beginning Here's how to protect your machine: [ 1. lock down npm with a 7-day cooldown ]: open ~/.npmrc. keep all existing lines (auth tokens, registry config). append: """ min-release-age=7 minimum-release-age=10080 save-exact=true """ this makes npm refuse any package version published in the last 7 days. attack windows are usually under 24 hours, you skip them entirely [ 2. same cooldown for bun ]: open ~/.bunfig.toml (create if missing). append: """ [install] minimumReleaseAge = 604800 """ 7 days in seconds, same protection in bun's config format [ 3. pin every npm dependency in your projects ]: open package.json. strip every ^ and ~ from versions under: - dependencies - devDependencies - peerDependencies exact versions only. commit your lockfile (bun.lock / package-lock.json / pnpm-lock.yaml) to git so the resolved tree is frozen [ 4. same discipline for python ]: if you use uv (the modern default): commit uv.lock, run `uv sync` to restore if you use pip: requirements.txt with pinned versions, run `pip install --require-hashes -r requirements.txt` if you use poetry: commit poetry.lock, use `poetry install --no-update` never trust `>=` or `~=` ranges in production projects [ 5. pin GitHub Actions to commit SHAs ]: stop using `actions/checkout@v4`. switch to: ```yaml uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 ``` every third-party action runs in your CI with access to repo secrets. pinning the SHA means a compromised maintainer cannot push malicious code into your pipeline [ 6. audit your IDE extensions ]: Cursor, VSCode, Windsurf, every extension is code running with full access to your filesystem, clipboard, and open files - review installed extensions monthly - remove anything you haven't actively used in 30 days - check the publisher, install count, last update, GitHub source before installing - never install extensions that ask for permissions they shouldn't need [ 7. lock down API tokens and credentials ]: - never commit .env to git (add to .gitignore on every project, no exceptions) - use minimum-scope tokens: one repo, one bucket, one workspace - rotate API keys every 90 days, force expiry on critical ones - separate tokens by environment (dev / staging / prod) - enable 2FA on every developer account: GitHub, npm, PyPI, Cloudflare, AWS, OpenAI, Anthropic - never paste secrets into Claude / ChatGPT / any AI chat, they're logged [ 8. set up continuous monitoring ]: - enable Dependabot alerts on every repo (free, takes 2 minutes) - install https://t.co/pIHMFrg8sY or Snyk for live vulnerability scanning - subscribe to the npm and PyPI security advisory feeds - follow @snyksec, @socketsecurity, @stepsecurity for early warnings [ 9. how to detect if you got the TanStack payload ]: if you installed any @tanstack/* package between 19:20 and 19:30 UTC on Monday, May 11, treat the host as compromised the detection signature: a malicious manifest contains "optionalDependencies": { "@tanstack/setup": "github:tanstack/router#79ac49ee..." } any version with this entry is compromised. the payload is delivered via the git-resolved optionalDependency, whose prepare script runs router_init.js (~2.3 MB, smuggled into the tarball root) how to check fast: - search your lockfile for `@tanstack/setup` references - search node_modules for any `router_init.js` file - if either shows up, jump to section 10 immediately future attacks will use the same trick: malicious code hidden in optionalDependencies or postinstall/prepare scripts. add `grep -r "postinstall\|prepare" node_modules/*/package.json | grep -iE "curl|wget|eval|base64"` to your weekly audit routine [ 10. emergency response if you're already compromised ]: ran an install during a suspected attack window? do this in this exact order: - rotate every cloud credential: AWS, GCP, Kubernetes service accounts, Vault tokens - rotate GitHub personal access tokens, OAuth tokens, SSH keys - revoke active sessions on GitHub, npm, PyPI, all cloud providers - audit AWS / GCP / Kubernetes / Vault audit logs for the last several hours, look for unauthorized API calls - pin to the last known-good version of every @tanstack package and reinstall from a clean lockfile - check ~/.npm, ~/.config, browser cookie stores for tampered files - wipe ~/.bash_history, ~/.zsh_history, local AI chat logs that might have secrets - if you ran the install as root or with sudo: nuke the machine, reinstall from scratch, restore code from git only [ why this matters right now ]: attack chains in supply chain hacks usually only last a few hours before the malicious package gets caught and yanked. during those hours, every developer running `npm install` becomes a victim worse: npm couldn't even UNPUBLISH most of the TanStack malicious versions because of third-party dependencies. the registry's own safeguards are part of the problem. you can't rely on the platform, you have to protect yourself the patterns from the last 18 months: - npm: TanStack on May 11 (42 packages, AWS/GCP/Vault credentials), Shai-Hulud worm hit Nx packages, chalk/debug/ansi-styles worm hit qix maintainer - GitHub Actions: tj-actions/changed-files compromise exposed thousands of repos' secrets - PyPI: ongoing typosquatting campaigns targeting AI/ML packages - IDE extensions: VSCode marketplace caught hosting credential stealers the frequency is rising because the payoff is massive one compromised package lands on millions of machines in hours if you don't lock this down tonight, you're exposed to the next one. and there will be one 30 minutes tonight, or wait for the next attack to clean out your machine Full TanStack breakdown: https://t.co/v4OBghlLxQ

Movez @0xMovez · May 13

Сreator of Claude Code just did a 30-minute Claude vibe-coding live session with creator of Bun 32 minutes. free. By the person who built it. 100% of Boris’s code is written by Claude. It's the best vibe-coding masterclass you’ll watch this week. One video replace 10 paid vibe-coding courses.

D DeRonin_ @DeRonin_

How To Cut Your AI Coding Bill by 80% (FULL GUIDE)

a16z @a16z · May 13

Last month Salesforce announced it would open its APIs and launch a headless product, essentially betting that in an agentic world, its value lies in the data layer, not the UI. The announcement is a useful prompt for a more interesting question: if you strip away the UI and expose the database, what are you actually left with? a16z's Seema Amble on where defensibility moves in the agentic era & how businesses will adapt: https://t.co/8hOj26bPuf

S seema_amble @seema_amble

Is Software Losing Its Head?

Arvind Jain @jainarvind · May 13

When MCP took off, a lot of people assumed plugging models into tools would be enough. A year later, enterprise teams are realizing that off-the-shelf MCP servers still miss basic context, and also burn too much budget. We wanted to test this directly. So we benchmarked @glean's MCP server against off-the-shelf MCP tools in Claude Cowork across ~175 queries. The harness was the same, and so were the queries. The difference was the context layer behind them. Glean was preferred ~2.5x as often, and off-the-shelf MCP setups used ~30% more tokens (median token usage: 44k vs. 57k). MCP is a protocol, not a context layer. It standardizes how models call tools. It does not solve ranking, permissions, memory, identity, or cross-system understanding. When MCP is wired directly to a set of tools, the model has to search across systems and assemble context on its own. When MCP sits on top of a unified context layer (connectors, indexes, enterprise graph, permissions, memory), it can draw from a consistent view of the company and return better results. And it’s a lot less expensive. When systems have to brute-force their way through fragmented context, they need more tool calls, more reasoning loops, and more tokens to produce a usable answer. That’s the motivation behind Glean’s MCP server. It brings the same context layer behind Glean Assistant into tools like Claude, ChatGPT, and coding environments, without asking teams to rebuild retrieval and permissions from scratch, or pay the hefty token cost of reconstructing context over and over.

T tonygentilcore @tonygentilcore

Context makes the Coworker: Glean preferred ~2.5x as often as off-the-shelf MCP tools

Shashwat Goel @ShashwatGoel7 · May 13

Learning about this project over lunches and meetings has constantly blown my mind. It has made me re-think my understanding of what transformers can do (the changes are so simple and elegant!), and what language models could be. We've all been super hyped internally, now u2

J jonasgeiping @jonasgeiping

We’re training models wrong and it’s due to chatGPT. Even the modern coding agents used daily still use message-based exchanges: They send messages to users, to themselves (CoT) and to tools, and receive messages in turn. This bottlenecks even very intelligent agents to a single stream. The models cannot read while writing, cannot act while thinking and cannot think while processing information. In our new paper, see below, we discuss LLMs with parallel streams. We show that multi-stream LLMs can … 🔵Be created by instruction-tuning for the stream format 🔵Simplify user and tool use UX removing many pain points with agents and chat models (such as having to interrupt the model to get a word in) 🔵Multi-Stream LLMs are fast, they can predict+read tokens in all streams in parallel in each forward pass, improving latency 🔵 LLMs with multiple streams have an easier time encoding a separation of concerns, improving security 🔵 LLMs with many internal streams provide a legible form of parallel/cont. reasoning. Even if the main CoT stream is accidentally pressured or too focused on a particular task to voice concerns, other internal streams can subvocalize concerns that would otherwise not be verbalized. Does this sound related to a recent thinky post :) - Yes, but I don’t feel so bad about being outshipped with such a cool report on their side by 23 hours. I’ll link a 2nd thread below with a more direct comparison. I actually think both are complementary in interesting ways.

0xSero @0xSero · May 13

I just received a 100,000$ grant from the Human Rights Foundation. In total I received: - 100K USD through HRF - 25.8K USD through donations site - 25K Brev credits through Nvidia - 4x B200s for a month - 5K from lambda - 4x RTX PRO 6000 private donor Open source must win

0 0xSero @0xSero

Open Source must win.

Rohit Ghumare @ghumare64 · May 13

This is the most concrete thing I read today on why agent architecture matters in production, and the framing belongs in the harness debate alongside Anthropic's and Glean's. The frame: agent codebases that survive past six months don't survive because the team has more discipline. They survive because the architecture made the bad shape harder to write than the right one. That's a sharper claim than the "harness is the backend" version. It says the production failures are reproducible across teams because the abstractions allow them. The four canonical month-six failures Mike lists are worth memorizing: → Class-level mutable defaults shared between agents the moment a second user shows up → Tool functions that accept any string and return None on every kind of failure → Session memory mutated by an LLM-extracted string, silently poisoning every subsequent action → Multi-agent setups passing the parent's full conversation history to a sub-agent because it was the easiest wire-up Every team I've talked to in the last year has shipped at least two of these to production. The fix posts always say the same thing: validate inputs, isolate state, propagate spans, and bound your loops. Discipline. It gets forgotten in approximately every codebase. What Mike's arguing is the React-of-2013 move. jQuery apps scattered DOM state across whichever closure was handy. The discipline of "keep state in one place" was well understood and ignored everywhere. React made the discipline structural: the bug class went away because the framework stopped allowing the bad shape. Worker / Function / Trigger does the same thing one layer up. Class-level mutable state stops being expressible because worker invocations are stateless and persistent state lives in a memory worker addressed by namespace. Two agents in two processes can't share Python state because there is no shared Python state. Tool functions returning None on every failure stop being expressible because every function has a typed input/output schema the engine validates at the boundary. Wrong-shaped input gets rejected before the worker code runs. Failures return typed events with status and error type. Cross-agent history leakage stops being expressible because sub-agents are workers with their own context, called by function ID. The orchestrator passes a payload, not a conversation buffer. Agent Loops without step bounds stop being expressible because step bounds and timeouts are engine-level config, not something the agent author remembers to wrap. The Claude Code April 2026 postmortem is the cleanest evidence anyone's produced for why this matters. Three runtime changes, no model change, dropped median thinking length 73% and pushed retry rates up 80x. The community had to surface this from sampled session logs because most production systems don't ship that level of runtime telemetry by default. Making it default is the lever.

M mfpiccolo @mfpiccolo

Agent codebases that break at month six all break the same way.

noname @malikwas1f · May 13

RT @mattpocockuk: Anthropic has given us a "dedicated monthly credit" Which, in effect, slashes AFK usage limits of Claude Code by ~5-20X…

derek @derekmeegan · May 13

Turn any website into an API with /browser-to-api. This skill analyzes network activity, CDP logs, and website behavior to generate a custom OpenAPI spec. Watch Codex one-shot a fully documented OpenTable API client from a single prompt 👀 https://t.co/MDKGaHKzAy

am.will @LLMJunky · May 14

someone already built a workaround to the `claude -p` changes using zmux. 🐈 & 🐭

F FUCORY @FUCORY

Introducing npx claude-p A dropin replacement for claude -p https://t.co/aRWI6tWnD6

Matt Van Horn @mvanhorn · May 14

Introducing: @meetgranola CLI/Claude Code Skill/OpenClaw and Hermes skill from the @ppressdev printed by @damienstevens . - Cross-meeting SQLite search - MEMO pipeline runner - Attendee timelines - Stop the MCP logged-out pain Really excited about this one. I can't live without @meetgranola I may have told @damienstevens I loved him when he submitted the PR to the Printing Press. https://t.co/d9i2RSwqiF

Matt Pocock @mattpocockuk · May 14

/grill-me is my most popular skill ever. I get 5-10 messages a day about how it’s changed people’s workflows for the better But… I’ve stopped using it for code. Here’s the improved version: https://t.co/Nuviji95au

International Cyber Digest @IntCyberDigest · May 14

‼️🚨 MAJOR IMPACT: AI just found an 18-year-old NGINX critical remote code execution vulnerability. It has been disclosed on GitHub including PoC code. - Affects NGINX 0.6.27 through 1.30.0 - Triggered via the rewrite and set directives in config - Update NGINX ASAP - NGINX is a widely used HTTP web server, be sure to check its prevalence in other products

Peter Steinberger 🦞 @steipete · May 14

Wrote a skill that runs codex /review in a loop until there's no booboos anymore. Caveat: It won't fix system architecture for ya, so you still need BRAIN as master model. https://t.co/0Z6iJnCqCX

antirez @antirez · May 14

Gentle reminder on how, in the recent DS4 fiesta, not just me but every other contributor found GPT 5.5 able to help immensely and Opus completely useless.

Akshay 🚀 @akshay_pachaar · May 14

the three-tier memory of Hermes agent. AI agents forgets everything when your session ends. Hermes doesn't. it has three memory layers, each at a different speed. 𝘁𝗶𝗲𝗿 𝟭: 𝘁𝘄𝗼 𝘁𝗶𝗻𝘆 𝗺𝗮𝗿𝗸𝗱𝗼𝘄𝗻 𝗳𝗶𝗹𝗲𝘀 MEMORY.md (2,200 chars) and USER.md (1,375 chars). injected into the system prompt at session start as a frozen snapshot. MEMORY.md holds project conventions, tool quirks, lessons learned. USER.md holds your profile: name, communication style, skill level. these files are tiny on purpose. when MEMORY.md hits ~80% capacity, the agent consolidates: merges related entries, drops redundancy, keeps only the densest facts. natural selection pressure applied to memory. the files stay small, but what's inside gets sharper over time. 𝘁𝗶𝗲𝗿 𝟮: 𝗳𝘂𝗹𝗹-𝘁𝗲𝘅𝘁 𝘀𝗲𝘀𝘀𝗶𝗼𝗻 𝘀𝗲𝗮𝗿𝗰𝗵 (𝘀𝗾𝗹𝗶𝘁𝗲 + 𝗳𝘁𝘀𝟱) every conversation gets stored in SQLite with FTS5 indexing. the agent can search weeks of past sessions on demand. when the agent calls session_search: FTS5 ranks matches in ~10ms over 10,000+ docs, an LLM summarizes the top hits, and a concise result returns to context. tier 1 is always present but tiny. tier 2 has unlimited capacity but requires an active search. critical facts live in memory, everything else is searchable. 𝘁𝗶𝗲𝗿 𝟯: 𝗲𝘅𝘁𝗲𝗿𝗻𝗮𝗹 𝗺𝗲𝗺𝗼𝗿𝘆 𝗽𝗿𝗼𝘃𝗶𝗱𝗲𝗿𝘀 8 pluggable providers that run alongside tiers 1 and 2, never replacing them. three worth knowing: Honcho (dialectic user modeling, 12 identity layers), Holographic (local-first, HRR vectors, no external calls), and Supermemory (context fencing that prevents the same fact from being re-stored infinitely). when active, hermes auto-syncs every turn: prefetch before, sync after, extract at session end. 𝗵𝗼𝘄 𝘁𝗵𝗲𝘆 𝗰𝗼𝗺𝗽𝗼𝘀𝗲 𝗶𝗻 𝗮 𝘀𝗶𝗻𝗴𝗹𝗲 𝘁𝘂𝗿𝗻 this is the part most people miss. the tiers compose on every turn through a five-step cycle: 1. turn opens. tier 1 is already in prompt, tier 3 prefetches and prepends. 2. agent responds using all three tiers as context. 3. periodic nudge fires (~every 300s). the agent reflects: "has anything worth persisting happened?" if yes, it writes. if no, it returns silently. 4. memory written to MEMORY.md on disk. invisible this session because the prefix cache stays warm. 5. session closes. tier 2 logs the transcript, tier 3 extracts semantics. next session opens with the new state. agent memory today is either always-on but shallow (stuff everything in the prompt) or deep but passive (vector store that never fires at the right time). hermes composes across both: tiny always-present files for critical facts, full-text search for deep recall, external providers for semantic modeling, all orchestrated by a nudge that decides autonomously what's worth saving. the agent doesn't just store memories. it curates them under pressure. i wrote a full deep dive (article below) covering hermes agent's memory system, self-evolving skills, GEPA optimization, and how to set up multiple specialized agents on your machine.

A akshay_pachaar @akshay_pachaar

Hermes Agent Masterclass