Ramp's Inspect Agent Authors 30% of Merged PRs While Ralph Wiggum Tooling Ecosystem Proliferates

January 13, 2026 · 31 sources

Ramp shared hard numbers on their internal coding agent Inspect, now responsible for 30% of merged PRs with non-engineers submitting code. The Ralph Wiggum agentic workflow pattern spawned multiple competing CLI tools in a single day. Claude Code continued expanding into non-engineering workflows as developers debated whether senior expertise matters more than AI-assisted speed.

Daily Wrap-Up

The most concrete signal today came from Ramp, where @eglyman dropped real production numbers: their internal coding agent Inspect now authors roughly 30% of merged PRs across core repos, and people from "essentially every job function" submitted code last week. That's not a demo or a hackathon stat. That's a Fortune 500 fintech running a significant chunk of its engineering output through an autonomous agent with feedback loops built around tests, telemetry, and visual checks. @thdxr's reaction captured the vibe perfectly: "it's insane how much ramp built." The gap between companies that have figured out agent-assisted development and those still debating whether to adopt Copilot is widening fast.

On the tooling side, the Ralph Wiggum pattern of agentic task orchestration is clearly hitting escape velocity. Two independent developers shipped competing CLI tools on the same day: @iannuttall built a multi-agent ralph CLI that works across Codex, Claude, and Droid, while @theplgeek dropped ralph-tui with plugin architecture and end-to-end observability. When multiple people independently build tooling around the same workflow pattern, it's a strong signal that the pattern has legs. The agentic loop of PRD generation, plan decomposition, and autonomous execution is becoming a standard workflow rather than an experiment.

The career anxiety thread ran hot as usual, with @ujjwalscript's story about a junior dev outperforming a 10-year veteran using three orchestrated AI agents landing as the day's most polarizing post. His conclusion that senior developers should "compete on wisdom, not speed" is comforting but incomplete. The more honest read is that architectural judgment still matters enormously, but the window where that judgment alone justifies a senior title is narrowing. The most practical takeaway for developers: study how Ramp built Inspect's feedback loops (tests, telemetry, feature flags, visual checks). The winning pattern isn't "AI writes code" but "AI writes code and then verifies it against reality." Build that verification layer into your own workflows now, whether you're using Claude Code, Cursor, or a custom agent setup.

Quick Hits

@steveruizok dropped a cryptic "$100B app" post with no further context. Filing under "things that will make sense in a week or never."

@doodlestein pointed someone to a new project replacing their previous work, linking to a repo without much explanation.

@steipete released bird 0.7.0, a fast X CLI for reading tweets, now with home timeline support, trending/news, and user-tweets. Multiple community contributors credited.

Claude Code Expands Beyond Engineering

The most interesting Claude Code development today wasn't about writing better code. It was about not writing code at all. @boringmarketer declared "Claude Code for non technical work is massive," pointing to a trend that's been building for weeks: the tool's filesystem-based workflow, context management, and iterative execution model work just as well for non-engineering tasks as they do for shipping features. @rohanvarma fleshed this out with a detailed example from a Fortune 500 company where PMs are using Cursor (same general category) to run their entire product workflow in code:

> "A GitHub repository for all PMs. Customer call transcripts checked directly into the repo. Cursor agents extract insights from those transcripts and write them to a dedicated insights directory. PRDs are generated into a separate folder, creating a durable record of product decisions that agents can reference later."

This is "PMing in code," as @rohanvarma put it, treating product work as "an evolving, inspectable system rather than a collection of docs and meetings." The implication is significant: once your workflow lives in a repo with rules files and agent instructions, you get version history, diffing, branching, and automation for free. That's a better knowledge management system than most companies have ever had.

On the technical side, @parcadei highlighted the token cost problem that anyone running agents at scale has hit: paying 320K tokens to read a single file when a smarter summarization approach costs 400. Context window management is becoming a core competency for agent builders, not just a nice-to-have optimization. And @DanielMiessler flagged a category of software that's suddenly vulnerable: mid-quality, niche tools that won because they were the only option. "Claude Code can just reverse engineer it," he noted, which is both a threat to incumbents and an opportunity for anyone who's been stuck using mediocre tooling. @eyad_khrais continued building out educational content with a level 2 Claude Code tutorial, feeding the growing demand for structured learning around these tools.

Agents Hit Production: Ramp's Inspect and the Feedback Loop Pattern

Ramp's @eglyman provided what might be the most important data point of the day for anyone building or evaluating coding agents. Their Inspect agent doesn't just generate diffs. It runs in sandboxed dev environments, observes test results, checks telemetry, evaluates feature flags, and for UI work, takes screenshots and compares live previews. The key insight is framing agents as control systems:

> "Generating output is easy. Feedback is everything... It doesn't just propose diffs; it iterates until the evidence says the change is correct."

Two specific observations stood out. First, cheap parallel sessions change behavior fundamentally. When the agent runs in its own sandbox rather than on your laptop, you stop micromanaging and start running more experiments. Second, the multiplayer dimension matters more than expected. Inspect shows up in PRs, Slack, VS Code, and the web, and sessions can be handed off between teammates. It becomes "shared infrastructure, not a novelty."

The 30% merged PR stat is striking, but the detail that non-engineers are submitting code is arguably more significant. That's the bridge between "AI helps developers code faster" and "AI changes who can contribute to a codebase." @thdxr's reaction captured the developer community's mix of admiration and anxiety: Ramp essentially built out someone else's entire product roadmap as an internal tool. The build-vs-buy calculus is shifting when your internal agents can ship features this fast.

The Ralph Wiggum Tooling Boom

Two independent Ralph Wiggum CLI tools shipped on the same day, which says something about where the agentic workflow community is headed. @iannuttall built a cross-platform ralph CLI that works with Codex, Claude, and Droid, combining PRD creation with plan generation and a ralph build command to execute:

> "works with codex, claude, droid. creates a prd for you. turns prd into a plan. run ralph build to cook"

Meanwhile, @theplgeek went deeper with ralph-tui, a terminal UI that adds plugin architecture for both agents and task trackers, built-in PRD creation leveraging skills, task dependency understanding, and what they describe as "end-to-end observability." The plugin system ships with Claude Code and OpenCode integrations out of the box, plus JSON and Beads tracker formats.

The convergence is notable: both tools independently arrived at the same core workflow of PRD-to-plan-to-execution, but took different approaches to extensibility. @iannuttall optimized for simplicity and multi-agent support, while @theplgeek built for observability and plugin ecosystems. This kind of parallel evolution typically means the underlying pattern is solid and the community is entering a differentiation phase where tooling competes on developer experience rather than core capabilities.

Local AI: From Prediction to Practice

@TheAhmadOsman showed up twice today, once with predictions and once with receipts. The prediction thread was straightforward: open-source AI wins, AGI runs local, learn how it works now. The receipts were more interesting: a working demo of Claude Code running against local models served by vLLM on 4x RTX 3090s, with GLM-4.5 Air handling generation:

> "vLLM serving GLM-4.5 Air on 4x RTX 3090s. nvtop showing live GPU load. Claude Code generating code + docs. end-to-end on my AI cluster. this is what local AI actually looks like"

The practical reality of local inference is still rough around the edges. Consumer GPU setups require significant investment and expertise, and the models you can run locally trail the frontier by meaningful margins. But the gap is closing, and for certain workflows, especially ones involving proprietary codebases or compliance requirements, having the entire stack on your own hardware is worth the tradeoffs. The "buy a GPU" advice is becoming less meme and more genuine career investment for developers who want to understand the full stack of AI-assisted development.

AI and the Senior Developer Identity Crisis

@ujjwalscript's post about a junior developer outperforming a 10-year veteran hit a nerve, as these posts always do. The story follows a familiar arc: junior uses three orchestrated AI agents to ship in 4 hours what would take 3 days, but the PR has security holes and lacks architectural vision. The conclusion lands on "compete on wisdom, not speed," which is reassuring but glosses over the uncomfortable middle ground where "wisdom" needs to be encoded into agent instructions and review processes rather than held in someone's head.

@DaveShapi took a much longer view with a thread on preparing for AI job displacement, covering everything from relocating to lower cost-of-living areas to identifying remaining job categories (attention economy, experience economy, authenticity economy, meaning economy). It's a sobering framework, though the practical advice of "save more, invest wisely, find purpose beyond work" applies regardless of whether AI displaces your specific role. The more actionable signal for working developers is in the Ramp data: the companies moving fastest aren't replacing engineers. They're augmenting everyone into a contributor. The question isn't whether your job disappears but whether you're building the systems that make that augmentation work, or waiting for someone else to build them around you.

Sources

Marcel Pociot 🧪 @marcelpociot · Jan 13

How Cowork was shipped in just 1 1/2 weeks: "Us humans meet in-person to discuss foundational architectural and product decisions, but all of us devs manage anywhere between 3 to 8 Claude instances implementing features, fixing bugs, or researching potential solutions."

Pedro Piñera @pepicrft · Jan 13

Clawdbot Vault Plugin turns a local folder into a structured knowledge vault. Plain markdown with QMD-powered search and embeddings, frontmatter schema, and optional git sync. Install via `clawdbot plugins install clawd-plugin-vault`. https://t.co/50cekuz0D8

Ben Davis @davis7 · Jan 13

I had my moment with AI this weekend when Theo forced me to push agents 1000x harder than I thought was possible. I very deliberately believed that agents weren't capable of anything "real" because I honestly didn't want them to be. It was so much easier to just think it's not possible to do the very real and serious and important real engineering things I do, and never try it, because them being capable is so much scarier. But they are capable. I agree with every word of this, after what I built this weekend I've seen it, everything has changed.

Clawd🦞 @clawdbot · Jan 13

🦞 Clawdbot v2026.1.12 Memory got vectors. Voice calls - I can phone for you 📞 One-shot reminders. MiniMax got a glow-up. Your lobster just got smarter. https://t.co/VwdOS7y0IY

John Rush @johnrushx · Jan 13

If only someone had told me this before my first startup

Ahmad @TheAhmadOsman · Jan 13

step-by-step LLM Engineering Projects LOCK IN FOR A FEW WEEKS ON THESE PROJECTS AND YOU WILL BE GRATEFUL FOR IT LATER each project = one concept learned the hard (i.e. real) way Tokenization & Embeddings > build byte-pair encoder + train your own subword vocab > write a “token visualizer” to map words/chunks to IDs > one-hot vs learned-embedding: plot cosine distances Positional Embeddings > classic sinusoidal vs learned vs RoPE vs ALiBi: demo all four > animate a toy sequence being “position-encoded” in 3D > ablate positions—watch attention collapse Self-Attention & Multihead Attention > hand-wire dot-product attention for one token > scale to multi-head, plot per-head weight heatmaps > mask out future tokens, verify causal property transformers, QKV, & stacking > stack the Attention implementations with LayerNorm and residuals → single-block transformer > generalize: n-block “mini-former” on toy data > dissect Q, K, V: swap them, break them, see what explodes Sampling Parameters: temp/top-k/top-p > code a sampler dashboard — interactively tune temp/k/p and sample outputs > plot entropy vs output diversity as you sweep params > nuke temp=0 (argmax): watch repetition KV Cache (Fast Inference) > record & reuse KV states; measure speedup vs no-cache > build a “cache hit/miss” visualizer for token streams > profile cache memory cost for long vs short sequences Long-Context Tricks: Infini-Attention / Sliding Window > implement sliding window attention; measure loss on long docs > benchmark “memory-efficient” (recompute, flash) variants > plot perplexity vs context length; find context collapse point Mixture of Experts (MoE) > code a 2-expert router layer; route tokens dynamically > plot expert utilization histograms over dataset > simulate sparse/dense swaps; measure FLOP savings Grouped Query Attention > convert your mini-former to grouped query layout > measure speed vs vanilla multi-head on large batch > ablate number of groups, plot latency Normalization & Activations > hand-implement LayerNorm, RMSNorm, SwiGLU, GELU > ablate each—what happens to train/test loss? > plot activation distributions layerwise Pretraining Objectives > train masked LM vs causal LM vs prefix LM on toy text > plot loss curves; compare which learns “English” faster > generate samples from each — note quirks Finetuning vs Instruction Tuning vs RLHF > fine-tune on a small custom dataset > instruction-tune by prepending tasks (“Summarize: ...”) > RLHF: hack a reward model, use PPO for 10 steps, plot reward Scaling Laws & Model Capacity > train tiny, small, medium models — plot loss vs size > benchmark wall-clock time, VRAM, throughput > extrapolate scaling curve — how “dumb” can you go? Quantization > code PTQ & QAT; export to GGUF/AWQ; plot accuracy drop Inference/Training Stacks: > port a model from HuggingFace to Deepspeed, vLLM, ExLlama > profile throughput, VRAM, latency across all three Synthetic Data > generate toy data, add noise, dedupe, create eval splits > visualize model learning curves on real vs synth each project = one core insight. build. plot. break. repeat. > don’t get stuck too long in theory > code, debug, ablate, even meme your graphs lol > finish each and post what you learned your future self will thank you later

Prajwal Tomar @PrajwalTomar_ · Jan 13

Stop saying AI can't design. Cursor + Opus 4.5 just helped me build a landing page with scrollytelling animations in under 10 mins that designers charge thousands for. If your landing page still looks like a 2010 app, that's not an AI problem. That's a workflow problem. https://t.co/NGdc8ixqL7

P PrajwalTomar_ @PrajwalTomar_

I replicated a $5K scroll animation inside Cursor in 10 minutes. People keep saying AI can’t replace designers. That might be true for big companies with huge teams and complex design systems. But if your goal is to ship an MVP fast, Gemini 3 or Opus 4.5 is MORE than enough. I one-shotted a landing page with a scroll animation agencies charge thousands for. Here’s the exact process I used ↓

Node.js @nodejs · Jan 13

We appreciate your patience and understanding as we work to deliver a secure and reliable release. Updates are now available for the 25.x, 24.x, 22.x, 20.x Node.js release lines to address: - 3 high severity issues - 4 medium severity issues - 1 low severity issue https://t.co/dP3gJ8P5fx

Matt Pocock @mattpocockuk · Jan 13

Here are my CLAUDE.md additions for making plan mode 10x better Before: unreadably long plans After: concise, useful plans with followup questions https://t.co/DjR4bCZ9Gr

nader dabit @dabit3 · Jan 13

A new (beta) feature of Claude that I've been learning about today is Programmatic tool calling. It programmatically writes code that calls and runs your tools directly in a sandbox before returning results to the model. This reduces latency + token consumption because you can essentially filter or process data before it reaches the model's context window. https://t.co/f1KqoKbe6l

Rohit @rohit4verse · Jan 13

how the creator of claude code actually writes software

the creator of claude code just revealed his personal setup and it makes every other workflow look obsolete. Boris Cherny runs 5 Claude instances in h...

Samuel Timbó @io_sammt · Jan 13

A new class of technician will be born this year, 2026. Everyone will have the means to concentrate and automate all their online life. Software Engineers will be capable of building complex production ready systems extremely fast, usually in minutes, often in seconds. https://t.co/0e7T2NTeWd

I io_sammt @io_sammt

Unit makes Metaprogramming trivial. I can quickly turn this web server into a *Hot Web Server*: Every change made to the website's source is immediately propagated to all users, no reload nor reinstall needed. Imagine being able to solve your users problems... immediately. ⚡️ https://t.co/U3ZEMbHDU4

eric zakariasson @ericzakariasson · Jan 13

4. rules vs skills rules = static context for every conversation. put commands, code style patterns, workflow instructions in .cursor/rules/ skills = dynamic capabilities loaded when relevant. custom commands, hooks, domain knowledge start simple. add rules only when you see repeated mistakes

eric zakariasson @ericzakariasson · Jan 13

5. TDD works incredibly well with agents - have agent write tests (explicit TDD, no mock implementations) - run tests, confirm they fail - commit tests - have agent implement until tests pass - commit implementation agents perform best when they have a clear target to iterate against

eric zakariasson @ericzakariasson · Jan 13

the developers who get the most from agents: - write specific prompts - iterate on their setup - review carefully (AI code can look right while being wrong) - provide verifiable goals (types, linters, tests) - treat agents as capable collaborators full post: https://t.co/CCVkvmFZXp

Tyler @tyler_agg · Jan 13

How to Make Realistic Longform AI Videos (Prompts Included)

This is going to be a step by step breakdown on how to make longform AI videos that LOOK and SOUND realistic… So you can push out crazy amounts of rea...

Ido Salomon @idosal1 · Jan 13

My entire childhood has led me to this moment... I built AgentCraft - orchestrate your agents with your favorite RTS interface! ⚔️ Coming soon 👀

A aphysicist @aphysicist

millennial gamers are the best prepared generation for agentic work, they've been training for 25 years https://t.co/JHsbPQHupk

Anthony @kr0der · Jan 13

I almost quit Codex after 1 day. Here's how to actually use it.

I almost rage-quit Codex after one day. It doesn't infer intent as well as Claude Code. I literally pasted in an error log and it said "what do you wa...

Matteo Collina @matteocollina · Jan 13

Today, @nodejs published a security release for Node.js that fixes a critical bug affecting virtually every production Node.js app. If you use React Server Components, Next.js, or ANY APM tool (Datadog, New Relic, OpenTelemetry), your app could be vulnerable to DoS attacks. 👇

Siqi Chen @blader · Jan 13

every company should be rolling their own devin like ramp it will take you less than a day to standup and maybe a week to make good

Antoine v.d. SwiftLee  @twannl · Jan 13

I spend the majority of my time in Cursor lately, but I learned a lot from this article. Must read 👇

C cursor_ai @cursor_ai

Here's what we've learned from building and using coding agents. https://t.co/PuBtYuhyhd

Ethan Mollick @emollick · Jan 13

Worth thinking about how to describe what your organization does, in detail, in a series of plain English markdown files.

Guillermo Rauch @rauchg · Jan 13

We're encapsulating all our knowledge of @reactjs & @nextjs frontend optimization into a set of reusable skills for agents. This is a 10+ years of experience from the likes of @shuding, distilled for the benefit of every Ralph https://t.co/2QrIl5xa5W

Maziyar PANAHI @MaziyarPanahi · Jan 13

🚨 OpenMed just mass-released 35 state-of-the-art PII detection models to the open-source community! All Apache 2.0. All free. Forever. 🍀 Here's what @OpenMed_AI built and why it matters for healthcare AI safety. Supporting HIPAA, GDPR, and beyond. Thread 🧵👇

Ashpreet Bedi @ashpreetbedi · Jan 13

How I Use Claude Code

I built one of our most complex features - learning machines - in 5 days. 100% of the code was written by claude code. This would've taken months befo...

Ethan Mollick @emollick · Jan 14

Could this meeting be an email? Could this organization be a set of markdown files?

Peter Steinberger @steipete · Jan 14

Did some statistics. My productivity ~doubled with moving from Claude Code to codex. Took me a bit to figure out at first but then 💥 https://t.co/cfyKg0E1hf

向

向阳乔木 @vista8 · Jan 14

这篇文章有点厉害，把组织如何用AI提效讲的很清楚。文章超级长，转写一半大家感受下，推荐看原文 --- 你可能会看到一个矛盾的现象。 AI帮个人干活，效率高得惊人，但放到公司里，效果就大打折扣了。为什么？因为公司里的活儿，本质上不是一个人能搞定的。需要协作、谈判、升级决策，要在时间线上不断对齐判断。一个再聪明的AI，如果只能单打独斗，在组织里也就是个"局部优化"的工具。作者这篇文章，主要讲AI怎么从"个人助理"进化成"组织智能"。上下文不是藏在某个地方的宝藏很多人觉得，只要给AI足够多的上下文，它就能理解组织怎么运作。前提是：组织的上下文是个完整的、结构化的东西，就像化石埋在地层里，只要挖出来就行。真相是，大部分组织根本不是这样运作的。上下文不存在于某个数据库里，不在某份文档里，甚至不在老板脑子里。它是在互动中不断生成和消失的。今天开会定的事，明天可能因为一封邮件就变了。 AI要理解组织，不能只是"读资料"，它得参与进来，像人一样在邮件、会议、文档里观察决策怎么展开，冲突怎么升级，共识怎么形成。这才是真正的"上下文学习"。人类的协作史，就是AI的未来尤瓦尔·赫拉利在《人类简史》里说，人类能统治地球，不是因为个体更聪明，而是因为学会了大规模协作。我们发明了神话、法律、货币、宗教这些"共同故事"，让陌生人也能对齐行为。科学也是这样。 17世纪之前，科学知识是碎片化的，靠私人信件和书籍传播，错误会一直流传，发现会不断丢失。转折点不是某个新理论，而是协作系统的出现如科学期刊、学术社团、同行评议。知识开始积累，是因为判断变成了社会化的过程。电话也一样。早期电话是点对点连接的，你得知道线通到哪儿才能打。网络一大，这套就崩了。怎么办？接线员出现了。她们坐在交换机前，手动连接电话，记得谁在打给谁，哪些电话更紧急，怎么处理冲突。电话能规模化，是因为有了这个"人工中介层"。软件开发也经历过这个阶段。 Git之前，代码协作很脆弱。 CVS和SVN是中心化的，多人改代码得排队，冲突成本很高。 Git让分支变便宜了，记录变成了一等公民，冲突变得可见、可解决。 GitHub又加了一层社会化协作：PR、代码审查、issue讨论。规律很明显：个体能力先出现，但指数级的生产力，只有在协作结构出现后才会爆发。 AI现在就在这个节点上。组织不会按"角色"重组，而是按"协作单元" 很多人想象的未来是：AI接管某些岗位，人类做剩下的。但作者觉得不是这样。 AI不受人类的限制——注意力、带宽、专业分工、层级结构——这些都不存在。所以未来的组织不会按"角色"设计，而是按"协作单元"设计。比如法务。法务的核心工作是"共同立场"。合同要经过律师、合伙人、客户的多轮谈判，立场在这个过程中不断演化。今天，资深合伙人的价值很大一部分在于"记得住"——记得之前的先例、风险、立场变化。未来，AI会承担这部分协调工作。它跟踪所有未解决的问题，发现立场冲突，把判断性的决策升级给合适的人。法务团队会重组：大量AI做机械性的起草和信息收集，少数资深合伙人做决策、风险判断、客户关系维护。再比如市场。市场的挑战是"叙事一致性"。产品市场、增长、品牌、销售，各自有各自的说法，怎么对齐？今天靠开会、审稿、非正式影响力。未来，AI会跨渠道追踪叙事，发现偏离，升级冲突。人类的角色从"渠道负责人"变成"叙事把关人"和"战略意图制定者"。财务、产品也是类似的逻辑。 AI不是替代某个岗位，而是重新分配了协调工作。最快的路径是：把AI嵌入到组织已经在用的协作工具里——邮件、消息、浏览器、文档。这不是"遗留系统"，它们是工作的活基础设施。意图怎么表达、分歧怎么浮现、决策怎么升级、责任怎么记录，都编码在这些工具里。而且，升级机制已经内置了：@提及、批注、评论、建议编辑、通知。（AI也可以做） AI要做的，不是发明新的协作方式，而是学会在这些已有的机制里参与和升级。

N nayakkayak @nayakkayak

Collaborative Intelligence

Ethan Mollick @emollick · Jan 14

Had Claude Code build a little plugin that visualizes the work Claude Code is doing as agents working in an office, with agents doing work and passing information to each other. New subagents are hired, they acquire skills, and they turn in completed work. Fun start. https://t.co/wm93gsiBWi

📙

📙 Alex Hillman @alexhillman · Jan 14

Early in building my exec assistant system, I created a workflow to capture proto-ideas that I don't want to forget but don't have time to explore or implement right now. I call them "seeds" and they all go into a folder with markdown that captures the idea, the context that generated it, and the goal. At the moment I have 132 seeds planted 😅 So I worked with my assistant to develop a scoring framework for these seeds. Here's what it is and how we use it.

Jeff Tang @jefftangx · Jan 14

Last night I stayed up late talking to Cowork about how it was built I exported the entire VM snapshot What I learned: - It's an Electron App with its own Linux sandbox (bubblewrap) - Cowork is a wrapper around Claude Code (which is a wrapper around Opus) - It has an "internal-comms skill" made by Anthropic - I found 2 small-ish security vulnerabilites 👀 The craziest part: When I asked it what questions I should've asked it, it suggested adding memory and leaving notes for itself once it "dies" 🥲

S simonw @simonw

I used Claude Code to reverse-engineer the Claude macOS Electron app and had Cowork dig around in its own environment - now I've got a good idea of how the sandbox works It's an Ubuntu VM using Apple's Virtualization framework, details here: https://t.co/lRWVhrNFk0