Multi-LLM Adversarial Prompting as a Strategy for Better Plans

December 20, 2025

A quiet day in the AI feed surfaces one notable prompting technique: using competing LLMs against each other to synthesize superior outputs. The approach highlights a maturing understanding of how to extract more value from the models we already have.

Daily Wrap-Up

Some days the feed is a firehose and some days it's a slow drip. December 20th was firmly in the latter camp, landing right in that pre-holiday lull where most of the tech world is winding down, closing out sprints, and debating whether to deploy on a Friday (don't). But even a single post can be worth examining when it touches on something fundamental about how we work with these tools.

The one post that did surface is about a prompting pattern that deserves more attention than it typically gets: adversarial multi-model synthesis. The idea is simple in concept but surprisingly effective in practice. You take the same task, hand it to three competing LLMs, then feed all three plans back to one of them with explicit instructions to be intellectually honest about what the others did better. It's a technique that acknowledges something many practitioners have learned the hard way: no single model consistently produces the best output across all dimensions of a problem. Sometimes Claude nails the architecture but misses an edge case that GPT caught, or Gemini structures the rollout plan better than either. Using them as adversarial reviewers of each other forces a kind of intellectual honesty that's hard to get from a single model iterating on its own work.

The most practical takeaway for developers: if you're using LLMs for planning or architecture work, stop asking one model to iterate on its own output and start running the same prompt across two or three models, then asking your preferred model to synthesize the best elements. It costs a few extra API calls but the quality improvement on complex planning tasks is significant. Think of it as code review, but for AI-generated plans.

Quick Hits

@doodlestein shares a go-to prompting technique: run the same task across competing LLMs, then ask one to honestly evaluate what the others did better and produce a hybrid plan that integrates the best ideas from all of them. A solid pattern for anyone doing serious planning work with AI. (link)

Prompting & RAG

The art of prompting has evolved well past "write a better system message" into genuinely interesting territory around multi-model orchestration. @doodlestein describes what they call one of their "most used and most useful prompts," and it's a pattern worth internalizing:

> "I asked 3 competing LLMs to do the exact same thing and they came up with pretty different plans which you can read below. I want you to REALLY carefully analyze their plans with an open mind and be intellectually honest about what they did that's better than your plan." — @doodlestein

The key insight here isn't just "use multiple models." It's the specific framing of intellectual honesty. LLMs have a well-documented tendency to defend their own outputs when asked to compare or revise. By explicitly instructing the model to approach the competing plans "with an open mind" and be "intellectually honest about what they did that's better," you're fighting against that default behavior. The prompt also asks for git-diff style changes rather than a full rewrite, which is a smart structural choice. It forces the model to be precise about what's changing and why, rather than producing a vaguely improved wall of text that may have quietly dropped good ideas from the original.

This pattern works because it exploits a genuine strength of LLMs: synthesis and comparison. Models are often better at evaluating and combining existing ideas than they are at generating optimal solutions from scratch. By giving the model three concrete plans to work with instead of asking it to brainstorm in a vacuum, you're providing the kind of structured input that leads to better structured output. It's the same principle behind why ensemble methods work in traditional ML. No single model is best at everything, but a thoughtful combination frequently outperforms any individual. The prompting world is slowly rediscovering this, one technique at a time.