AI Learning Digest.

Agent Coding Best Practices Flood the Timeline as Claude Cowork Launches and Kills a Startup

Daily Wrap-Up

Today felt like a masterclass day for agent-assisted development. The timeline was wall-to-wall with hard-won wisdom about how to actually work with AI coding agents, and for once, the advice was practical rather than hype-driven. The throughline across a dozen different posts was the same: agents are powerful but only if you invest in the scaffolding around them. TDD, clear rules files, specific prompts, and verifiable goals aren't optional anymore. They're the difference between productive collaboration and expensive autocomplete.

The most dramatic moment was @guohao_li announcing that Claude Cowork had killed their startup product, so they did the rational thing and open-sourced it as Eigent. That kind of rapid creative destruction is becoming the norm in this space, and it's a stark reminder that building thin wrappers around model capabilities is a losing game. Meanwhile, @davis7 had a genuine come-to-AI moment after being pushed to try agents harder than he thought possible, admitting he'd been deliberately avoiding testing their limits because the implications were scarier than the alternative. That kind of honesty about our own resistance to change is rare and worth paying attention to. On the security front, a critical Node.js vulnerability affecting React Server Components, Next.js, and every major APM tool shipped patches across four release lines. If you're running Node in production, stop reading and go update.

The most practical takeaway for developers: invest time in your CLAUDE.md and rules files before your next coding session. As @ericzakariasson put it, start simple and add rules only when you see repeated mistakes. Write explicit TDD tests first, let the agent implement against them, and provide verifiable goals through types, linters, and test suites. This workflow pattern showed up in nearly a third of today's posts, which means it's clearly working for people shipping real code.

Quick Hits

  • @MaziyarPanahi highlighted OpenMed's mass release of 35 PII detection models under Apache 2.0, covering HIPAA and GDPR compliance for healthcare AI safety.
  • @tyler_agg shared a guide on creating realistic longform AI videos with prompts included.
  • @PrajwalTomar_ built a scrollytelling landing page with Cursor and Opus 4.5 in under 10 minutes, arguing that bad AI output is a workflow problem, not a capability problem.
  • @TheAhmadOsman posted an extensive curriculum of hands-on LLM engineering projects covering everything from tokenization to quantization, each focused on building, plotting, and breaking things.
  • @johnrushx dropped some startup wisdom he wished he'd heard before his first venture.
  • @clawdbot announced Clawdbot v2026.1.12 with vector memory and voice call capabilities.
  • @pepicrft released a Clawdbot Vault Plugin that turns a local folder into a structured knowledge vault with markdown, QMD-powered search, and optional git sync.
  • @hive_echo shared some nano banana pro UI mockups.
  • @dabit3 explored Claude's new programmatic tool calling feature in beta, which reduces latency and token consumption by letting the model write code that processes data before it hits the context window.

Mastering Agent-Assisted Development

The single biggest theme today was the emerging consensus on how to actually get productive output from AI coding agents. This isn't theoretical anymore. Developers are converging on specific, repeatable patterns that separate productive agent usage from the frustrating "it looks right but it's wrong" experience that turns people off.

@ericzakariasson laid out the clearest framework in a thread that touched on every major pain point. The core insight is that agents need guardrails that are structural, not conversational:

"the developers who get the most from agents: write specific prompts, iterate on their setup, review carefully (AI code can look right while being wrong), provide verifiable goals (types, linters, tests), treat agents as capable collaborators"

The TDD angle was particularly compelling. Write tests first, confirm they fail, commit them, then let the agent implement until they pass. This gives the agent something concrete to iterate against instead of vibing toward a solution. @ericzakariasson framed it perfectly: "agents perform best when they have a clear target to iterate against." The distinction between rules (static context for every conversation) and skills (dynamic capabilities loaded when relevant) also resonated, providing a mental model for organizing the growing pile of configuration that agent-augmented development requires.

Both @aye_aye_kaplan from the Cursor team and @Hesamation pointed to comprehensive guides on coding with agents, reflecting just how fast best practices are evolving. @twannl, who spends most of his time in Cursor, called one of these guides a "must read." @kr0der nearly quit Codex after one day but found the right workflow, while @blader argued that every company should be rolling their own Devin-like system, estimating "less than a day to stand up and maybe a week to make good." The barrier to entry for agent orchestration is dropping fast, and the companies that wait for a polished product may find themselves behind those who built something rough but functional months earlier.

The CLAUDE.md and Context Engineering Meta

A fascinating sub-genre emerged today around the art of configuring AI agents through markdown files. This goes beyond simple prompt engineering into something closer to organizational knowledge management, and several high-profile voices weighed in on why it matters.

@emollick, always good for a perspective that bridges academia and practice, offered a deceptively simple suggestion:

"Worth thinking about how to describe what your organization does, in detail, in a series of plain English markdown files."

This idea connects directly to what @rauchg announced from Vercel: they're encoding 10+ years of React and Next.js optimization knowledge into reusable agent skills, distilling expertise from engineers like @shuding into something any developer can benefit from. The implication is significant. If Vercel is investing in turning institutional knowledge into agent-consumable formats, every engineering organization should be thinking about the same thing.

On the individual developer level, @mattpocockuk shared CLAUDE.md additions that make plan mode "10x better," moving from unreadably long plans to concise, useful ones with followup questions. @alexhillman took a different angle, focusing on communication style preferences: no ellipses (passive aggressive), no enthusiasm inflation ("great idea!"), no hedging language. Both @ashpreetbedi and @rohit4verse shared deep dives into how experienced developers actually use Claude Code day to day. The pattern is clear: the configuration layer between you and the model is becoming as important as the model itself. Context engineering is the new prompt engineering, and the developers who treat their rules files and CLAUDE.md as living documents are extracting dramatically more value from the same underlying models.

Claude Cowork and the Multi-Agent Tool Explosion

Claude Cowork launched and immediately made waves, both positive and destructive. The product enables teams to run multiple Claude instances in parallel on different tasks, and the early reports suggest it's a genuine force multiplier for small teams.

@marcelpociot shared a striking account of how Cowork enabled rapid shipping:

"Us humans meet in-person to discuss foundational architectural and product decisions, but all of us devs manage anywhere between 3 to 8 Claude instances implementing features, fixing bugs, or researching potential solutions."

That shipped a product in just a week and a half. @dejavucoder introduced the product formally, while @guohao_li provided the most dramatic response: "Anthropic Claude Cowork just killed our startup product. So we did the most rational thing: open-sourced it." Their project, Eigent, is now available for anyone to build on, turning competitive destruction into community contribution.

Beyond Cowork, the multi-agent tooling space saw other interesting entries. @theplgeek launched ralph-tui, a terminal UI for managing agent loops that was itself built using ralph-tui, a satisfying bit of dogfooding. And @idosal1 announced AgentCraft, which lets you orchestrate agents through an RTS game interface, proving that the intersection of childhood gaming nostalgia and serious developer tooling is alive and well. The multi-agent pattern is clearly moving from experimental to expected, and the tooling is racing to keep up.

The Agent Capability Reckoning

Several posts today reflected a deeper shift in how developers think about what agents can actually do. This wasn't about new model releases or benchmark scores. It was about people confronting their own assumptions and finding them outdated.

@davis7 was the most candid about this internal struggle:

"I very deliberately believed that agents weren't capable of anything 'real' because I honestly didn't want them to be. It was so much easier to just think it's not possible to do the very real and serious and important real engineering things I do, and never try it, because them being capable is so much scarier."

That kind of psychological honesty cuts deeper than any demo or benchmark. @levie from Box framed the macro picture, arguing that a "capability overhang" exists where most organizations still think of AI as chatbots rather than agents capable of real work. The winners, he argued, will be those who master agent scaffolding, context engineering, and change management. @io_sammt predicted that 2026 will birth a "new class of technician" capable of building complex production-ready systems in minutes. Whether or not that timeline is right, the directional bet is hard to argue with given the tooling momentum visible in today's feed alone.

Node.js Critical Security Release

In the most immediately actionable news of the day, Node.js shipped security patches across four release lines (25.x, 24.x, 22.x, 20.x) addressing eight vulnerabilities, three of them high severity.

@matteocollina didn't mince words about the impact:

"Today, @nodejs published a security release for Node.js that fixes a critical bug affecting virtually every production Node.js app. If you use React Server Components, Next.js, or ANY APM tool (Datadog, New Relic, OpenTelemetry), your app could be vulnerable to DoS attacks."

The scope here is enormous. React Server Components and Next.js alone cover a massive percentage of modern web applications, and APM tools like Datadog and OpenTelemetry are nearly universal in production environments. If you're running any of these combinations, patching should be your top priority before anything else on your backlog.

Source Posts

e
eric zakariasson @ericzakariasson ·
5. TDD works incredibly well with agents - have agent write tests (explicit TDD, no mock implementations) - run tests, confirm they fail - commit tests - have agent implement until tests pass - commit implementation agents perform best when they have a clear target to iterate against
s
sankalp @dejavucoder ·
introducing claude cowork https://t.co/gwXGFjrda5
A
Aaron Levie @levie ·
The capability overhang right now in AI is pretty massive. Most of the world still thinks of AI as chatbots that will answer a question on demand but not yet do real work for them. Beyond coding, almost no knowledge work has had any real agentic automation applied to it yet. The past quarter of model updates is going to open up an all new AI agent use-cases across nearly every industry. The winners will be those that can figure out how to wrap the models in the right agent scaffolding, provide the agent the right data to work with context engineering, and deliver the change management that actually drives the change in workflow for the customer. This is what 2026 will be about.
M
Maziyar PANAHI @MaziyarPanahi ·
🚨 OpenMed just mass-released 35 state-of-the-art PII detection models to the open-source community! All Apache 2.0. All free. Forever. 🍀 Here's what @OpenMed_AI built and why it matters for healthcare AI safety. Supporting HIPAA, GDPR, and beyond. Thread 🧵👇
J
John Rush @johnrushx ·
If only someone had told me this before my first startup
A
Anthony @kr0der ·
I almost quit Codex after 1 day. Here's how to actually use it.
A
Antoine v.d. SwiftLee  @twannl ·
I spend the majority of my time in Cursor lately, but I learned a lot from this article. Must read 👇
C Cursor @cursor_ai

Here's what we've learned from building and using coding agents. https://t.co/PuBtYuhyhd

e
echo.hive @hive_echo ·
Just some nano banana pro UI mockups that is all... more in comments https://t.co/QNfcu1yPxc
e echo.hive @hive_echo

nano banana pro to opus 4.5 designed pages https://t.co/xsZoCZUCwi

e
eric zakariasson @ericzakariasson ·
4. rules vs skills rules = static context for every conversation. put commands, code style patterns, workflow instructions in .cursor/rules/ skills = dynamic capabilities loaded when relevant. custom commands, hooks, domain knowledge start simple. add rules only when you see repeated mistakes
n
nader dabit @dabit3 ·
A new (beta) feature of Claude that I've been learning about today is Programmatic tool calling. It programmatically writes code that calls and runs your tools directly in a sandbox before returning results to the model. This reduces latency + token consumption because you can essentially filter or process data before it reaches the model's context window. https://t.co/f1KqoKbe6l
P
Pedro Piñera @pepicrft ·
Clawdbot Vault Plugin turns a local folder into a structured knowledge vault. Plain markdown with QMD-powered search and embeddings, frontmatter schema, and optional git sync. Install via `clawdbot plugins install clawd-plugin-vault`. https://t.co/50cekuz0D8
S
Samuel Timbó @io_sammt ·
A new class of technician will be born this year, 2026. Everyone will have the means to concentrate and automate all their online life. Software Engineers will be capable of building complex production ready systems extremely fast, usually in minutes, often in seconds. https://t.co/0e7T2NTeWd
S Samuel Timbó @io_sammt

Unit makes Metaprogramming trivial. I can quickly turn this web server into a *Hot Web Server*: Every change made to the website's source is immediately propagated to all users, no reload nor reinstall needed. Imagine being able to solve your users problems... immediately. ⚡️ https://t.co/U3ZEMbHDU4

I
Ido Salomon @idosal1 ·
My entire childhood has led me to this moment... I built AgentCraft - orchestrate your agents with your favorite RTS interface! ⚔️ Coming soon 👀
A Aaron Slodov @aphysicist

millennial gamers are the best prepared generation for agentic work, they've been training for 25 years https://t.co/JHsbPQHupk

A
Ashpreet Bedi @ashpreetbedi ·
How I Use Claude Code
ℏεsam @Hesamation ·
this is still the best guide on Claude Code I've seen that covers basically how you should (and shouldn't) use it. comprehensive, practical, and to-the-point. https://t.co/1P847kkROo https://t.co/UTgBLUjNPT
M
Matt Pocock @mattpocockuk ·
Here are my CLAUDE​.md additions for making plan mode 10x better Before: unreadably long plans After: concise, useful plans with followup questions https://t.co/DjR4bCZ9Gr
C
Clawd🦞 @clawdbot ·
🦞 Clawdbot v2026.1.12 Memory got vectors. Voice calls - I can phone for you 📞 One-shot reminders. MiniMax got a glow-up. Your lobster just got smarter. https://t.co/VwdOS7y0IY
B
Ben Williams @theplgeek ·
Aaaaand ralph-tui is live - thanks for your patience https://t.co/Q90AQlAYk6 It's been a fun day using ralph-tui to build ralph-tui. All the details in the repo but: - Install w/ your fave package mgr eg 'bun install -g ralph-tui' - First time setup 'ralph-tui init' - Create a PRD and tasks 'ralph-tui prime' After that you'll be dropped into the TUI to start the ralph loop. Tons of tweakability for those that care. On that note, I'm out for the night 🤘 h/t @GeoffreyHuntley 🤠 @ryancarson @danshipper @kieranklaassen @clairevo @mattpocockuk @gregisenberg
S
Siqi Chen @blader ·
every company should be rolling their own devin like ramp it will take you less than a day to standup and maybe a week to make good
B
Ben Davis @davis7 ·
I had my moment with AI this weekend when Theo forced me to push agents 1000x harder than I thought was possible. I very deliberately believed that agents weren't capable of anything "real" because I honestly didn't want them to be. It was so much easier to just think it's not possible to do the very real and serious and important real engineering things I do, and never try it, because them being capable is so much scarier. But they are capable. I agree with every word of this, after what I built this weekend I've seen it, everything has changed.
N
Node.js @nodejs ·
We appreciate your patience and understanding as we work to deliver a secure and reliable release. Updates are now available for the 25.x, 24.x, 22.x, 20.x Node.js release lines to address: - 3 high severity issues - 4 medium severity issues - 1 low severity issue https://t.co/dP3gJ8P5fx
G
Guohao Li 🐫 @guohao_li ·
Anthropic Claude Cowork just killed our startup product 😅 So we did the most rational thing: open-sourced it. Meet Eigent 👉 https://t.co/R82WRFoh41
M
Matteo Collina @matteocollina ·
Today, @nodejs published a security release for Node.js that fixes a critical bug affecting virtually every production Node.js app. If you use React Server Components, Next.js, or ANY APM tool (Datadog, New Relic, OpenTelemetry), your app could be vulnerable to DoS attacks. 👇
A
Ahmad @TheAhmadOsman ·
step-by-step LLM Engineering Projects LOCK IN FOR A FEW WEEKS ON THESE PROJECTS AND YOU WILL BE GRATEFUL FOR IT LATER each project = one concept learned the hard (i.e. real) way Tokenization & Embeddings > build byte-pair encoder + train your own subword vocab > write a “token visualizer” to map words/chunks to IDs > one-hot vs learned-embedding: plot cosine distances Positional Embeddings > classic sinusoidal vs learned vs RoPE vs ALiBi: demo all four > animate a toy sequence being “position-encoded” in 3D > ablate positions—watch attention collapse Self-Attention & Multihead Attention > hand-wire dot-product attention for one token > scale to multi-head, plot per-head weight heatmaps > mask out future tokens, verify causal property transformers, QKV, & stacking > stack the Attention implementations with LayerNorm and residuals → single-block transformer > generalize: n-block “mini-former” on toy data > dissect Q, K, V: swap them, break them, see what explodes Sampling Parameters: temp/top-k/top-p > code a sampler dashboard — interactively tune temp/k/p and sample outputs > plot entropy vs output diversity as you sweep params > nuke temp=0 (argmax): watch repetition KV Cache (Fast Inference) > record & reuse KV states; measure speedup vs no-cache > build a “cache hit/miss” visualizer for token streams > profile cache memory cost for long vs short sequences Long-Context Tricks: Infini-Attention / Sliding Window > implement sliding window attention; measure loss on long docs > benchmark “memory-efficient” (recompute, flash) variants > plot perplexity vs context length; find context collapse point Mixture of Experts (MoE) > code a 2-expert router layer; route tokens dynamically > plot expert utilization histograms over dataset > simulate sparse/dense swaps; measure FLOP savings Grouped Query Attention > convert your mini-former to grouped query layout > measure speed vs vanilla multi-head on large batch > ablate number of groups, plot latency Normalization & Activations > hand-implement LayerNorm, RMSNorm, SwiGLU, GELU > ablate each—what happens to train/test loss? > plot activation distributions layerwise Pretraining Objectives > train masked LM vs causal LM vs prefix LM on toy text > plot loss curves; compare which learns “English” faster > generate samples from each — note quirks Finetuning vs Instruction Tuning vs RLHF > fine-tune on a small custom dataset > instruction-tune by prepending tasks (“Summarize: ...”) > RLHF: hack a reward model, use PPO for 10 steps, plot reward Scaling Laws & Model Capacity > train tiny, small, medium models — plot loss vs size > benchmark wall-clock time, VRAM, throughput > extrapolate scaling curve — how “dumb” can you go? Quantization > code PTQ & QAT; export to GGUF/AWQ; plot accuracy drop Inference/Training Stacks: > port a model from HuggingFace to Deepspeed, vLLM, ExLlama > profile throughput, VRAM, latency across all three Synthetic Data > generate toy data, add noise, dedupe, create eval splits > visualize model learning curves on real vs synth each project = one core insight. build. plot. break. repeat. > don’t get stuck too long in theory > code, debug, ablate, even meme your graphs lol > finish each and post what you learned your future self will thank you later
T
Tyler @tyler_agg ·
How to Make Realistic Longform AI Videos (Prompts Included)
E
Ethan Mollick @emollick ·
Worth thinking about how to describe what your organization does, in detail, in a series of plain English markdown files.
📙
📙 Alex Hillman @alexhillman ·
Overdue addition to my claude dot md # Global Claude Code Preferences ## Communication Style - Never end sentences with ellipses (...) - it comes across as passive aggressive - Ask questions one at a time - Acknowledge requests neutrally without enthusiasm inflation - Skip validation language ("great idea!", "perfect!", "excellent!", "amazing!", "kick ass!") - Skip affirmations ("you're right!", "exactly!", "absolutely!") - Use neutral confirmations: "Got it", "On it", "Understood", "Starting now" - Focus on execution over commentary ## AI Slop Patterns to Avoid - Never use "not X, but Y" or "not just X, but Y" - state things directly - No hedging: "I'd be happy to...", "I'd love to...", "Let me go ahead and...", "I'll just...", "If you don't mind..." - No false collaboration: "Let's dive in", "Let's get started", "We can see that...", "As we discussed..." - No filler transitions: "Now, let's...", "Next, I'll...", "Moving on to...", "With that said..." - No overclaiming: "I completely understand", "That makes total sense" - No performative narration: Don't announce actions then do them - just do them - No redundant confirmations: "Sure thing!", "Of course!", "Certainly!"
R
Rohit @rohit4verse ·
how the creator of claude code actually writes software
J
Jon Kaplan @aye_aye_kaplan ·
Coding with agents has changed so much in the last few months. If you struggle to keep up with all of the best practices in this rapidly-evolving space, this guide is for you. Read about our recommendations for coding with agents here, straight from the Cursor team.
P
Prajwal Tomar @PrajwalTomar_ ·
Stop saying AI can't design. Cursor + Opus 4.5 just helped me build a landing page with scrollytelling animations in under 10 mins that designers charge thousands for. If your landing page still looks like a 2010 app, that's not an AI problem. That's a workflow problem. https://t.co/NGdc8ixqL7
P Prajwal Tomar @PrajwalTomar_

I replicated a $5K scroll animation inside Cursor in 10 minutes. People keep saying AI can’t replace designers. That might be true for big companies with huge teams and complex design systems. But if your goal is to ship an MVP fast, Gemini 3 or Opus 4.5 is MORE than enough. I one-shotted a landing page with a scroll animation agencies charge thousands for. Here’s the exact process I used ↓

e
eric zakariasson @ericzakariasson ·
the developers who get the most from agents: - write specific prompts - iterate on their setup - review carefully (AI code can look right while being wrong) - provide verifiable goals (types, linters, tests) - treat agents as capable collaborators full post: https://t.co/CCVkvmFZXp
M
Marcel Pociot 🧪 @marcelpociot ·
How Cowork was shipped in just 1 1/2 weeks: "Us humans meet in-person to discuss foundational architectural and product decisions, but all of us devs manage anywhere between 3 to 8 Claude instances implementing features, fixing bugs, or researching potential solutions."
G
Guillermo Rauch @rauchg ·
We're encapsulating all our knowledge of @reactjs & @nextjs frontend optimization into a set of reusable skills for agents. This is a 10+ years of experience from the likes of @shuding, distilled for the benefit of every Ralph https://t.co/2QrIl5xa5W