Multi-Agent Coding Pipelines Mature as Edge Inference Hits Raspberry Pi and AI Careers Diverge

June 2, 2026 · 16 sources

The AI development landscape is shifting toward sophisticated multi-agent workflows, with Uncle Bob Martin detailing a full pipeline from specification to mutation testing. Meanwhile, DeepSeek-V4-Flash running on a Raspberry Pi signals that edge inference is becoming practical, and Andrew Ng maps the emerging split between Forward Deployed Engineers and AI Engineers.

Daily Wrap-Up

June 2, 2026 brought a clear signal that multi-agent development workflows have graduated from experimental to operational. The most striking example came from @unclebobmartin, who laid out a complete pipeline where specialized agents handle specification, coding, refactoring, and architecture review in sequence, with human oversight decreasing at each stage. This was not theoretical. He described the exact handoff points, testing strategies, and mutation testing passes that make the system work. Combined with @zodchiii sharing an Anthropic engineer building five focused agents in 45 minutes on camera, and @_jonatasantos posting a detailed build-anything playbook using Hermes with memory vaults, the picture is clear: solo developers and small teams are assembling multi-agent systems that handle work previously requiring entire engineering departments.

The enterprise side of the conversation was equally revealing. @levie used the Kirkland law firm's $500M bet on custom AI tools as a launching point to argue that competitive advantage now lives in institutional knowledge and domain-specific workflows, not in the models themselves. @gabepereyra, who co-founded Harvey, confirmed this view from the vendor side, noting that speed-running to enterprise platform was the right call from day one. The tension between building custom AI tools versus adopting vendor solutions is playing out across every industry, and the answer seems to depend on whether your data is actually differentiated or just feels that way.

The most practical takeaway for developers: start building your personal multi-agent workflow now. Pick one repetitive daily task, wrap an agent around it, and iterate. As @_jonatasantos put it, define clear evals, save learnings to memory, and develop a feeling for how long tasks should take. The developers who build muscle memory with agent orchestration today will have a significant edge as these tools mature.

Quick Hits

@danveloper got DeepSeek-V4-Flash (284B params) running on a Raspberry Pi 5 at over 1 token per second using only 8 watts, a milestone for edge inference that took 160 experiments over 5 days to achieve.

@0xSero shared a comprehensive guide to the best AI models by VRAM tier, highlighting LFM-2.5-8B as the standout for 8-16GB setups with its mixture-of-experts architecture and 131k context window.

@malikwas1f shared UnslothAI's new guide on using the Model Context Protocol (MCP) with local LLMs like Qwen3.6 and Gemma 4 for controlled tool and API access.

@hooeem highlighted Microsoft's open-sourced SkillOpt, which trains self-evolving agent skills by making the skill document itself the optimization target, clearing strong baselines on SearchQA, Sheet, Office, and DocVQA benchmarks.

@_djdumpling recommended a deep dive from @whatthelukh on asynchronous RL theory and infrastructure, surveying eight frontier labs on how they handle train-inference mismatch at scale.

@Saccc_c reported on a vibe coder live-streaming product builds and clearing $20K per month, spending most of the revenue on Claude Max and Codex subscriptions while running six or seven terminals simultaneously.

@10_Xeng gave a shoutout to @dthakur for impressive robotics content that deserves way more than 500 followers.

@steipete configured Codex to call him via phone when it gets stuck during tasks like npm releases gated by 1Password, describing the voice notifications as "the coolest thing ever."

Multi-Agent Development Workflows Go Operational

The day's densest theme was the maturation of multi-agent coding systems, with four posts detailing how developers are orchestrating specialized AI agents to handle the full software development lifecycle. What makes this moment different from earlier automation attempts is the granularity of the handoffs and the specificity of the testing regimes.

@unclebobmartin provided the most detailed blueprint. His pipeline starts with informal hand-written specifications that an agent converts into harder, subdivided tasks. From there, a specifier agent converts each task to Gherkin, prunes it, and hands off to a coder agent. The coder writes acceptance tests directly from the Gherkin, then unit tests, then code. When tests pass, the result goes to a refactorer agent that reduces complexity and writes property tests. Finally, an architect agent runs language mutation and Gherkin mutation testing, killing any survivors before running the full suite. As he described it: "This is an exercise of transformations from the informal to the formal through managed stages, with human interaction decreasing with each stage." The limiting factor is raw compute, since mutation tests eat CPU cycles fast.

@_jonatasantos offered a complementary but more accessible approach with his "how to build anything right now" playbook. His stack involves Hermes hosted on a VPS with a custom memory vault using qmd and SQL, GPT-5.5 in fast mode for execution, and a structured process of deep research, grilling sessions to uncover unknowns, and iterative building against defined evals. The key insight is that saving learnings to memory throughout the process creates a compounding knowledge base that makes future sessions more effective. His advice to "pay attention to the process" and "develop a feeling for how long tasks should take" is the kind of tacit knowledge that separates productive agent use from expensive spinning.

@zodchiii amplified an Anthropic engineer who built five focused agents from scratch in 45 minutes on camera, each handling a daily manual task like code review, testing, and documentation. The barrier to building useful agents has dropped so low that the real competitive advantage lies in identifying which tasks to automate, not in the technical implementation.

@steipete showed the lighter side of agent autonomy, configuring Codex to phone him when it needs help with tasks like npm releases gated by 1Password. The image of an AI coding assistant literally calling its developer for help captures something essential about where we are: the agents are capable enough to know when they are stuck, and the human still holds the keys.

Enterprise AI Strategy and the Data Moat Debate

Three posts converged on the question of where competitive advantage lives when everyone has access to the same frontier models. The discussion was sparked by news that Kirkland & Ellis is spending $500M over four years to build custom internal AI legal tools, a move that @levie used as a springboard to articulate a broader strategic framework.

@levie argued that the companies best able to harness their institutional knowledge, existing data assets, and domain-specific workflows connected with AI will be the ones that stay ahead. At Box, he is seeing customers demand the flexibility to bring any AI model to their data at any time, rather than being locked into a single vendor's ecosystem. The Kirkland case illustrates the tension: building custom tools preserves control, but most enterprises lack the software development culture to execute well, and non-lawyers cannot be given equity in the firm due to regulation.

This point was sharpened by @fleetingbits in a quoted post that @levie engaged with, arguing that Kirkland likely does not have differentiated data compared to other elite firms, that cultural and structural issues make it hard for law firms to manage software developers, and that firms are better off using tools like Harvey and Legora while focusing on client relationships and legal R&D. He framed this as a broader phenomenon: "AI creates [unbundling] across a lot of industries, where firms that were previously vertically integrated become unbundled due to AI because part of the intelligence gets moved to the labs or otherwise gets commoditized."

@gabepereyra, who co-founded Harvey, confirmed the strategic calculus from the vendor side: "When we started Harvey we got a ton of advice to solve a specific legal workflow as a wedge but felt very strongly we had to speed run to law firm / in-house enterprise platform as fast as possible." The lesson extends beyond legal tech. In a world where AI commoditizes certain types of intelligence, the winning strategy is to build platforms that capture domain expertise across workflows, not point solutions that solve single tasks.

@businessbarista is launching a 30-day series on enterprise AI transformation observations, drawing from his experience co-running an enterprise AI firm with deep relationships across Anthropic, OpenAI, Lovable, Cursor, Perplexity, and Vercel. His promise of actionable, non-technical daily observations suggests the enterprise adoption conversation is moving past "what is AI" to "how do we actually deploy this at scale."

Local and Edge Inference Keeps Getting More Practical

The dream of running capable models on consumer hardware took several steps forward. @danveloper achieved what sounds impossible on paper: DeepSeek-V4-Flash, a 284 billion parameter model, running on a Raspberry Pi 5 with only 8GB of RAM at over 1 token per second while drawing roughly 8 watts. He used an unmodified GGUF file from @antirez and arrived at the solution after 160 experiments over 5 days, bouncing between GPT-5.5 xhigh and Opus 4.8 max for debugging help. This is less about the practical utility of a 1 tok/s model and more about what it signals for the trajectory of efficient inference. If a 284B model runs on a Raspberry Pi today, what runs on your laptop next year?

@0xSero provided the community-oriented counterpart with a detailed tier list of the best models for different hardware configurations. For the 8-16GB VRAM bracket, he highlighted LFM-2.5-8B as a standout: an 8 billion parameter mixture-of-experts model with only 1.5B active parameters, trained on 38 trillion tokens, with 131k context. He called it "truly a phenomenal piece of work" for GPU-poor users. At the high end, he pointed to Step-3.7-Flash at 199B with 11B active parameters, vision support, and 150 tok/s on 6000-series GPUs.

The local inference thread was rounded out by @malikwas1f sharing UnslothAI's guide on using MCP with local LLMs including Qwen3.6 and Gemma 4. The ability to connect locally running models to tools, files, and APIs through a standardized protocol removes one of the last advantages of cloud-based AI services. Together, these posts paint a picture of a rapidly closing gap between local and cloud AI capabilities.

AI Career Paths Diverge

Andrew Ng weighed in with a nuanced take on the emerging split between Forward Deployed Engineers and AI Engineers, a distinction that is becoming urgent as OpenAI and Anthropic both build FDE teams to embed within client organizations. Ng traced the FDE role back to Palantir's model of sending engineers to government locations on secure networks, noting that modern AI FDEs need communication and business skills alongside technical ability to navigate client relationships and push back on unrealistic requests.

However, Ng argued that AI Engineer roles will vastly outnumber FDE positions. Companies might accept a few embedded vendor engineers, but they will want far more of their own employees building on their own projects. He also flagged a key concern: FDEs are incentivized to tightly integrate their employer's product, which reduces a company's optionality at a time when it is impossible to predict which AI service will be best in twelve months. He sees surging demand for AI Engineers who can build applications using LLM prompting, agentic frameworks, and evals while effectively using coding agents like Claude Code, Codex, Antigravity CLI, and OpenCode. As the role matures, Ng expects it to fragment into specialized positions the way Software Engineer eventually split into frontend, backend, mobile, devops, and data engineering.

The Coming AI Memory Wars

@garrytan made a concise but provocative prediction about the next frontier of platform competition: "You should want to control and host your own memory. It's the one thing that you should be able to take to any platform. Watch for this to be a defining battle in the new browser war: the AI harness wars of 2027."

The framing of AI memory as the next battleground reframes the current competition between AI platforms. Models are increasingly commoditized. Context windows are growing. But the persistent memory that an AI assistant builds about your preferences, workflows, and knowledge is becoming the real lock-in mechanism. The platforms that own your AI memory own the switching cost. This connects directly to @_jonatasantos's memory vault approach and the broader push for local AI infrastructure. If your memory lives on your own server in a format you control, no platform can hold it hostage. Expect this to become a central tension in the next twelve months.

Sources

darkzodchi @zodchiii · May 31

Anthropic engineer: "You can build 5 assistants in one afternoon. Each one handles a task you've been doing manually every single day." In 45 minutes he builds 5 focused agents from scratch on camera. Most people are still doing code review, testing, and documentation by hand every single day Watch the session, then save all templates below 👇

0 0x_rody @0x_rody

https://t.co/XKq6qBjLxY

Garry Tan @garrytan · May 31

You should want to control and host your own memory It’s the one thing that you should be able to take to any platform Watch for this to be a defining battle in the new browser war: the AI harness wars of 2027

P pejmanjohn @pejmanjohn

https://t.co/8lOtmSSCGw

Sac @Saccc_c · Jun 1

这哥们靠vibe coding直播公开build产品，月收入超2万美金最大成本花在了token上，3个Claude Max+1个200刀Codex。每天同时开六七个终端在youtube直播使劲造当初刷到以为是整活的，没想到哥们玩真的😢 https://t.co/xlR37pTUQm

M matthewmillerai @matthewmillerai

I WILL NOT STOP VIBE CODING UNTIL $1,000,000 Just crossed $16,646 MRR. Three months ago I was at $1,500. No investors. No employees. No safety net. Just me and a bunch of AI agents shipping every day. Here's what nobody tells you about consistency: it's boring. There's no secret. I show up, I build, I ship, and I do it again the next day. The results are starting to compound. And I'm just getting started. https://t.co/LKWKqN52F3

Jonata Santos @_jonatasantos · Jun 1

how to build anything rn: - get a hetzner, do, or hostinger vps - host hermes on it - add gbrain or implement your own memory vault using qmd + sql - set up hermes with codex auth -> gpt-5.5 / no reasoning / fast mode - install orca on your macbook and phone with tailscale to have a nice ide to work on both - before starting any work, ask hermes to conduct deep research on the subject and save it to gbrain as source material for the project - use the `/grill-me` skill or a similar prompt to uncover as many unknowns as possible. save results to memory too - define/write clear evals for every project to determine whether a run was successful - have hermes iterate over the project until all evals pass, saving all learnings to the vault along the way - whenever it gets stuck, use memory + a new research or `/grill-me` session to unblock it rinse and repeat until the work is done. pay attention to the process. develop a feeling for how long tasks should take and do not be afraid to stop a model mid session to ask for status and why it's taking so long.

Dan Woods @danveloper · Jun 1

I can't believe this works, but I got DeepSeek-V4-Flash (284B params) running on a Raspberry Pi 5 (8GB edition) at >1tok/s @ ~8W during full-tilt inference! It uses an untouched copy of @antirez's GGUF. Took 160+ experiments over 5 days between GPT-5.5 xhigh and Opus 4.8 max. https://t.co/RAJjNZg44Z

Alex Lieberman @businessbarista · Jun 1

Introducing 30 days of AI. For the next 30 weekdays, I’m going to share one observation per day from the frontlines of AI. I have the privilege of co-running an enterprise AI transformation firm, where I experience the edges of this technology, see the biggest challenges the biggest companies are facing, and have deep relationships with companies on the frontier (Anthropic, OpenAI, Lovable, Cursor, Perplexity, Vercel). I get to live in the future for free, and I want to bring that future to those trying to disrupt themselves before they get disrupted. There’s just two rules: 1) Each observation is actionable & understandable to the non-technical leader. 2) I can’t miss a day. Post 1 coming soon.

Andrew Ng @AndrewYNg · Jun 1

One of the new, buzzy jobs in Silicon Valley is the AI Forward Deployed Engineer (FDE), an engineer who is embedded within a client organization to help customize solutions, such as building and tuning agentic workflows that suit the client’s particular needs. I’ve heard from people who are wondering anew about the FDE career path since OpenAI and Anthropic started building new teams to place FDEs within client organizations. The rise of FDEs for AI workloads is one way AI is creating new jobs (and why the jobpolcalypse narrative of upcoming job market collapse is false -- there will be many AI and non-AI jobs). However, I believe there will be far more AI Engineer jobs than FDEs, as I explain below. The FDE role was pioneered about two decades ago by Palantir, which sent engineers to government locations to work on secure, air-gapped networks. In addition to having good technical skills, FDEs need communication skills and sometimes business skills. For example, they may need to speak with clients to understand their needs, formulate a strategy to prioritize projects, explain complex technology, and respectfully push back if a client asks for something unrealistic. They’re enjoying a resurgence because of the amount of work involved in taking an off-the-shelf LLM and building it into a custom agentic workflow that fits particular business needs. However, I believe the number of AI Engineer jobs will be far larger. A company might accept a few FDEs to be embedded within its organization. But most companies will want far more of their own employees working on their projects. While my organizations do hire FDEs, we hire far more AI Engineers! Also, a common client concern is that it is hard to find vendor-neutral FDEs — they are, after all, there to deeply integrate a particular vendor’s product into a company. In this moment when it’s hard to predict which AI service will be the best one in a year’s time, optionality (the ability to pick whatever vendor turns out to fit best in the future) is very valuable. In contrast, letting FDEs tightly bind a company’s processes significantly reduces optionality. Right now, I see surging demand for AI Engineers who can build software applications using AI software components (like LLM prompting, agentic frameworks, evals, etc.) and effectively use AI coding agents (like Claude Code, Codex, Antigravity CLI, and OpenCode). As the AI Engineer role matures, I expect it to fragment into more specialized roles, like the generic Software Engineer role from decades ago fragmented into frontend, backend, mobile, data engineering, devops, and so on. What will be the future, specialized AI engineering roles? I don’t know. Perhaps there will be AI FDEs, LLMOps Engineers, Evals Engineers, AI Data Engineers, Harness Engineers, and other roles we don’t have names for yet. But for now, I see a lot of AI engineers who are generalists create a lot of value. Skilled AI Engineers are in very high demand! As our field continues to mature over the coming decade, I look forward to new specializations within AI Engineering that create even more job opportunities. [Original text: The Batch newsletter]

Uncle Bob Martin @unclebobmartin · Jun 1

I start with very informal specifications written by hand. I have an agent convert these into harder specifications that are subdivided into tasks. I review these. Then I feed those tasks into the specifier agent, which converts each task to Gherkin, prunes the Gherkin, and then hands it off to the coder agent. I spot check the Gherkin. The coder agent writes acceptance tests directly from the Gherkin. Then writes unit tests. Then writes code. When all those tests pass, the coder agents hands off to the refactorer agent. The refactorer agent reduces crap to 6 or below, and reduces any duplication. Then it write property tests and gets them to pass. Then it hands off to the architect agent. The architect agent runs language mutation and covers any uncovered sections, and kills all survivors. Then it runs Gherkin mutation and kills any of those survivors. Then it runs the entire test suite, and when it passes it hands the result off to the specifier, coder, and refactorer. I spot check the code. This is an exercise of transformations from the informal to the formal through managed stages, with human interaction decreasing with each stage. Raw computer power is the limiting factor. Those mutation tests are CPU intensive.

hoeem @hooeem · Jun 1

Drop what you're doing. Microsoft just open sourced an executive strategy for self-evolving agent skills. At the end of May we got the repo that allowed us to not only see, but use SkillOpt that trains the procedure. What does that mean? SkillOpt makes the skill document itself the target. That allows a natural-language training-loop for skills that giga sends skills, in fact, in using this they were able to improve GPT and Qwen models in SearchQA, Sheet, Office, and DocVQA. It literally made the models significantly better and it clears the strongest baseline on every benchmark. Please read more: 👇

H hooeem @hooeem

I want to create self-evolving agent skills.

Alex Wa @_djdumpling · Jun 1

Luke is one of the best people when it comes to RL infra, definitely worth reading!

W whatthelukh @whatthelukh

New blog! Is frontier asynchronous RL solved? The blog covers Async RL theory and infrastructure, surveying 8 open-weight frontier labs for the algorithmic techniques and systems fixes to handle train-inference mismatch. Also answered: why do current methods still fail at high policy lag? Which methods scale with horizon and compute?

Gabe Pereyra @gabepereyra · Jun 1

This is spot on - when we started Harvey we got a ton of advice to solve a specific legal work flow as a wedge but felt very strongly we had to speed run to law firm / in-house enterprise platform as fast as possible

M mvernal @mvernal

The Death of the Three-Act Playbook

Peter Steinberger 🦞 @steipete · Jun 1

I told codex to use https://t.co/oHS8ombQcW whenever I'm distracted and it needs my help to be unblocked, and ever once it a while I hear it talking to me, and it's the coolest thing ever. (e.g. for releases, that needs npm and is 1Password-gated)

0xSero @0xSero · Jun 1

Still stand by this, especially LFM-2.5 for all my GPU poors out there. Truly a phenomenal piece of work

0 0xSero @0xSero

Best models I’ve seen this week for your hardware: if you have 8-16gb you have a competitive model finally! ———- 4gb - 8gb: - minicpm5: this model was built for agentic tool use on tiny machines: https://t.co/LvNnIDSh7u - tops benchmarks in weight class - extremely small - great for using in projects with AI - blazing fast ———— 8gb - 16gb Most exciting model - LFM-2.5-8B: https://t.co/5SYi6D56FR Frontier for vram: - 8b moe with - 1.5B active - trained on 38T tokens (MASSIVE) - 131k context ————- 96GB - 128GB - ds4flash either q2 or reap + q4 https://t.co/BaphZfWrwG - or https://t.co/EAZB4bDYjA - very strong agent - logical pleasant to talk to - good in Hermes - fast - high contexts for little vram ————- 196gb+ Step-3.7-Flash: https://t.co/oaVf5wMILx - 199B with 11B active (FAST) - vision support! - its predecessor was topping benchmarks for 3 months - 256k context - 150 tok/s on 6000s

RobitOverload @10_X_eng · Jun 1

Guys I found another one. How in TF does this dude have less than 500 followers. He has robits. Thanks for sharing @JonathanRo7398

D dthakur @dthakur

Motors and blades https://t.co/oAJ1SxxS28

noname @malikwas1f · Jun 2

RT @UnslothAI: We made a guide on using MCP with local LLMs. Connect Qwen3.6 and Gemma 4 for controlled access to tools, files, APIs, enab…

Aaron Levie @levie · Jun 2

As we enter the era of AI agents, one of the defining questions is how you develop competitive advantage when your competitor has access to the same AI models and intelligence as you. The companies that are able to best harness their internal institutional knowledge, existing data assets, and domain-specific workflows -- connected with AI -- will be those that are able to stay ahead in the future. Whether a company decides to build out the tech stacks themselves, or leverage a variety of best-in-class tools is certainly one core variable. But the key is to find the way that the enterprise can capture and protect the value created by their unique data, processes, and expertise over the long run. Each industry will have their own version of this, and the competitive advantage will vary by vertical. We’re increasingly seeing this at Box, where customers want to ensure that they can take advantage of their institutional knowledge and have the flexibility of bringing any AI model and intelligence to their data at any time. This is a pattern that will increasingly become a core principle of strategy in the future.

F fleetingbits @fleetingbits

some thoughts on kirkland building its own harvey 1) kirkland is spending $500m over four years in order to build its own internal ai legal tools; kirkland intends to spend $100m this year 2) i suspect that kirkland is doing this because they have told themselves that they have valuable data and because they want to appear differentiated 3) i think the first issue is that kirkland probably does not have differentiated data from other elite law firms; at least, not at the level a harvey would absorb 4) all the elite firms probably have similar internal workflow data and so long as some of them defect, that is enough to commoditize the data kirkland wants to use for its platform 5) and, to the extent that they do have different internal workflows, harvey and legora will end up representing a better version of them and this will put kirkland at a disadvantage 6) moreover, companies like kirkland will have difficulty building their internal legal platforms because they do not have experience with software development 7) and, there are both cultural and structural issues with them managing software developers, like they cannot give non-lawyers equity in the firm due to regulation 8) so, i think firms like kirkland are better off using tools like harvey and legora and then looking to focus on where their value really is now: client relationships, local knowledge (litigation, regulation) and legal r&d (novel structures, etc...) 9) anyway, this seems to me like a phenomenon that ai creates across a lot of industries, where firms that were previously vertically integrated become unbundled due to ai because part of the intelligence gets moved to the labs or otherwise gets commoditized 10) and so, a new set of companies are created whose job it is in order to provide services complementary to the labs: forward deployed like harvey and legora and data providers like mercor, surge and handshake