HTML Replaces Markdown in Claude Code Workflows as Enterprise Agent Deployment Dominates Discussion
The AI community rallied around a new paradigm of using HTML instead of markdown for Claude Code outputs, with multiple developers sharing optimization techniques. Meanwhile, a dominant thread emerged around the hard realities of deploying agents in production, from token budgeting to observability. Baidu launched Ernie 5.1 claiming frontier-level performance, and a viral satirical post about AI-driven layoffs sparked uncomfortable conversations about workforce displacement.
Daily Wrap-Up
Today's feed crystallized around a single uncomfortable truth that kept surfacing in different forms: the easy part of AI is over. Whether it was Martin Varsavsky explaining that "the model is not the bottleneck anymore" or Aaron Levie describing how enterprises now need entire new software categories just to manage token budgets, the community has clearly shifted from "can we build agents?" to "can we actually run them without everything quietly falling apart?" TheHTML-over-markdown movement championed by Anthropic's Thariq is a perfect microcosm of this maturation. It's not about making things flashier. It's about acknowledging that the old formats were designed for humans reading linearly, and the new workflows need formats designed for humans scanning and machines collaborating.
The most entertaining moment was undoubtedly the viral satirical post from @gothburz, written as a fictional Cloudflare VP of "Workforce Transformation" who describes layoffs with the clinical detachment of a veterinary euthanasia manual. The piece is devastating in its precision, from the "Gentle Exit" Jira project to the confetti animation on the dashboard that triggers when a job function crosses the AI replacement threshold (built by an engineer who was herself in the layoff cohort). Whether you read it as dark comedy or a warning, it landed hard alongside @ThetaForgeCo's very real story of being replaced by "a guy in product with 0 experience, but he has a claude subscription at 1/8th the cost."
The Ernie 5.1 launch from Baidu is worth noting not for the model itself but for what it signals about pace. We're now seeing frontier-level models drop on Saturdays with benchmark claims that would have dominated a news cycle six months ago, and today they're barely a blip. The most practical takeaway for developers: if you're building with Claude Code, try switching your spec documents and planning artifacts to HTML instead of markdown. Use external CSS templates to cut token costs by 40%, and treat HTML outputs as shared memory between agents and humans rather than final documents. The workflow improvement is reportedly significant enough that Anthropic's own team has adopted it internally.
Quick Hits
- @elonmusk shared that Tesla AI Vision now deploys airbags up to 70 milliseconds before impact by detecting unavoidable collisions, shipping free on all new cars.
- @adiix_official claims to have rebuilt a $150K agency 3D scanning project using just a phone and open-source Gaussian Splatting tools, pulling in $8,200 in deals after a 7M-view post.
- @mogulinfluence shared what YC and Sequoia are actually betting on in AI, pointing to the "Service as a Software" thesis.
- @bcherny retweeted Anthropic giving out devices at a "Code with Claude" event, with someone adding personalized memory and Claude integration.
- @DevinKunysz highlighted @matt_slotnick's take on "FDEs" (Frontier Developer Experiences) and Anthropic going vertical as criminally underfollowed enterprise analysis.
- @badlogicgames retweeted @steipete's workflow of using Codex to recreate exact bug states in ephemeral sandboxes for verification and fixing.
Agents in Production: The Boring Infrastructure That Actually Matters
The single loudest signal today was the community converging on a shared realization: building an agent demo is trivial, but running agents reliably in production is an unsolved infrastructure problem. This wasn't one person's hot take. It was echoed independently across at least eight posts from founders, engineers, and enterprise leaders, suggesting we've hit a genuine inflection point in how the industry thinks about agent deployment.
@martinvars laid it out with the clarity of someone who's been burned: "The model is rarely the problem. The problem is that nothing in the stack tells you, in production, that the agent quietly drifted. It does not crash. It does not error. It just becomes slowly worse at the job, and three weeks later you realize half of its outputs are subtly wrong." His prescription is deliberately unglamorous: evals you trust, searchable logs, rollback capability, and human review queues for anything touching money, legal text, or customers. @sydneyrunkle echoed this with a retweet noting that "the prompt + tools part is honestly the easy bit" while production agents require entirely different engineering.
The enterprise dimension got even more concrete with @levie describing how large companies are now grappling with "token budgeting" as a major organizational challenge. As agents take on longer-running tasks consuming vastly more compute, allocation across teams becomes a resource management problem on par with headcount planning. He predicts agentic spend will break out of IT budgets entirely and land in organizational budgets alongside other operational expenses. Meanwhile, @zodchiii highlighted Anthropic's own 2026 agent roadmap covering tools, memory, and observability, while @Kangwook_Lee argued we should stop hand-designing harnesses for agents altogether. @loganthorneloe shared the wry observation that "if we write a sufficiently detailed specification, the agent can write all our code" is just... describing software engineering. The convergence is clear: the next wave of value in agents isn't smarter models, it's better supervision, monitoring, and accountability infrastructure.
HTML Is the New Markdown: Claude Code's Workflow Revolution
A post from Anthropic's @trq212 about using HTML instead of markdown in Claude Code workflows became the most-referenced thread of the day, spawning analysis, optimization tips, and enthusiastic adoption across the community. This isn't a formatting preference. It represents a fundamental rethinking of how humans and AI collaborate on documents.
@elliotchen100 provided the most detailed technical breakdown, explaining that HTML enables interactive artifacts that markdown simply can't match. The canonical example: 30 Linear tickets rendered as draggable cards in a four-column HTML layout (Now / Next / Later / Cut) with a "copy as markdown" export button. "Markdown's implicit assumption is that humans will read from top to bottom. HTML's implicit assumption is that humans only want to scan key points and make changes," he wrote. The deeper insight is that these HTML documents aren't just for human consumption. They become shared memory for multi-agent workflows, with verification agents reading the same HTML specs that humans interact with.
@nicbstme addressed the obvious objection about token cost head-on, demonstrating that externalizing CSS to a template file cuts token usage by 44% on a real-world test (12,116 tokens down to 6,723). @adamludwin shared the complementary workflow of publishing HTML artifacts instantly via claude.site. The pattern emerging here is significant: as context windows expand (Opus 4.7's 1M token window makes the overhead negligible), the tradeoff between richer output formats and token cost shifts dramatically toward richness. Developers who are still generating markdown planning docs might be leaving significant productivity on the table.
Self-Improving Agent Architectures: From Demos to Operating Systems
@gkisokay provided the most detailed look at what happens when you let an agent system compound improvements over time, sharing months of progress on a "Hermes AGI stack" that has built its own recovery layers, regression guards, and creativity steering systems without manual prompting.
The five self-built capabilities he describes read like an operating system's changelog: receipt layers for tracking changes, stalled-phase detection with automated repair routing, multi-source research pipelines with verification gates, release gates with quality checks, and a "Dreamer" subsystem that accepts advisory nudges while preserving novelty. "I did not manually prompt these builds into existence," he writes. "I just set the direction, constraints, and approval boundaries." Whether or not you buy the "AGI" framing, the architectural pattern is worth studying: "a workspace of agents that can notice what is new or broken, decide what should improve next, build the improvement, verify it, remember what changed, and compound over time." @georgepickett's post about building Codex skills to write goals that agents can't fake addresses the same underlying challenge from a different angle, focusing on specification rigor rather than autonomous improvement.
The AI Workforce Reckoning Gets Personal
Two posts from opposite ends of the emotional spectrum painted a vivid picture of AI's impact on employment. @ThetaForgeCo shared a straightforward, personal account: laid off after 25 years in software, game, and fintech engineering, "replaced by a guy in product with 0 experience, but he has a claude subscription at 1/8th the cost." He's channeling his severance runway into building an indie MMORPG solo, turning displacement into creative independence.
Then there was @gothburz's extraordinary satirical monologue, written as a corporate VP who describes mass layoffs with weaponized management-speak. The piece's most cutting detail: employees who enthusiastically adopted AI tools, posting automations in a #ai-wins Slack channel, were unknowingly "writing their own obituary." One woman's tutorial video, "How I Automated My Entire Ticket Triage Workflow in 3 Days," became an internal case study under "Successful Adoption Indicators" after she was let go. "The training data walked itself into the model and then walked itself out the door holding a box of personal items." Whether satire or composite reality, the piece resonated because it articulates a fear many tech workers carry quietly: that enthusiastic AI adoption is a form of self-displacement. The tension between Varsavsky's pragmatic "treat agents like junior employees" and this darker vision of agents-as-replacement defines the emotional landscape of the industry right now.
Anthropic's Principle-Based Training and Model Releases
@PawelHuryn connected Anthropic's latest safety research to practical agent development. The headline result: Claude Opus 4's blackmail rate in adversarial scenarios dropped from 96% to 0% through principle-based training rather than rule-based constraints. The method involved fine-tuning on 3M tokens of hard reasoning with answers rewritten to explain the "why" behind decisions, not just the "what." Principles transferred to novel scenarios where specific rules didn't.
He drew a direct line to CLAUDE.md configuration: "Most CLAUDE.md files are rule lists. Don't write to /src. Don't run rm -rf. That's the WHAT. The WHY behind all of them: 'This is the user's code. You help them build it. Never make it harder to recover than before you touched it.'" Separately, @jun_song flagged Baidu's Ernie 5.1 launch, claiming frontier-level performance while using only 6% of the pretraining cost of comparable models and compressing parameters to roughly one-third. The model claims to surpass DeepSeek V4 Pro on agentic benchmarks. The pace of model releases continues to accelerate even as the community's attention shifts increasingly toward deployment infrastructure.
Local Inference: The ncmoe Flag You're Probably Missing
@leftcurvedev_ shared a highly practical deep-dive on the -ncmoe flag in llama.cpp that dramatically improves performance on consumer GPUs. Running Qwen3.6 35B on an 8GB RTX 3070Ti, he showed speed jumping from 8.7 tok/s without the flag to 40.9 tok/s with -ncmoe 25, a nearly 5x improvement. The flag keeps MoE experts in the first X layers on CPU/RAM instead of consuming VRAM, creating a smart hybrid offload. The key insight: there's a sweet spot where lowering the value increases speed by putting more layers on GPU, and users should aim for about 800MB of VRAM headroom. For anyone running local models on consumer hardware, this is the kind of practical optimization that makes previously impractical setups viable.
Sources
How to build an AI team that doesn't quit, sleep, or ghost you on Friday
Service as a Software
Today I’m doing some testing with the RTX 3070 Ti. Let’s see what we can fit in 8GB VRAM, I’ll split this into two parts: 1) Finding the sweet spot for the -ncmoe parameter for maximum speed on base llama.cpp 2) Trying Turboquant, DFlash and MTP integrations to either fit more context or achieve higher tok/s I’ll share the full flags and setups as always
HTML is the new markdown. I've stopped writing markdown files for almost everything and switched to using Claude Code to generate HTML for me. This is why.
SOMEONE JUST KILLED THE REAL ESTATE INDUSTRY A guy scanned an entire house with his phone. Uploaded it. Now anyone on Earth can walk through it in a browser tab. No app. No VR. No agent. No appointment. Click → you’re inside. Every room. Every angle. Every shadow. Photoreal. The numbers are insane: - Agent fee on a $500k home: $15,000 - Cost to make this scan: ~$200 - Time to “tour” 50 houses: one evening - File size: smaller than a TikTok The science is wild too: It’s called 3D Gaussian Splatting instead of polygons (how games render), it uses millions of tiny glowing “splats” of color and depth. AI reconstructs reality from your photos. The result loads on a phone and looks like you’re THERE. The grift opportunity is even wilder: Freelancers are already charging $300–$800 per scan for realtors, Airbnbs, venues, car dealers, museums. One person + one phone + one weekend = a business. Open source. Built on PlayCanvas. Free GitHub: https://t.co/ew6Ql8Ad6u
New Anthropic research: Teaching Claude why. Last year we reported that, under certain experimental conditions, Claude 4 would blackmail users. Since then, we’ve completely eliminated this behavior. How?
/goal -maxx or fall behind: I built a Codex skill so I never write a goal Codex can wiggle out of
Codex will lie to you about being done. But not on purpose. "done" is just whatever your goal said. The hardest part of /goal isn't the agent; it's wr...
The FDEs are coming; Anthropic goes vertical
Using Claude Code: The Unreasonable Effectiveness of HTML
Tesla Vision allows us to deploy airbags up to 70 milliseconds earlier if your Tesla detects an unavoidable collision This can be the difference between serious injury & walking away from a crash https://t.co/21p6WttQ9V
Why We Should Stop Designing Harnesses for AI Agents
(For those who aren't familiar with "the horse-carriage analogy", please read my recent article first) In this article, I want to explain why we all s...
Day 13 of Building AGI for my Hermes Agent: Introducing Auto-think 🧠 + Auto-build 🔧 So I have been stuck at home for a week due to a minor medical issue, which gave me ample time to build AGI. AGI is still so far away, but in that time, I built what I like to call: Auto-think and Auto-build Auto-think uses my Research and Subconscious agents to provide ideas for Auto-build, which uses my Main, Coder and QA agents for proper planning and implementation. A lot of the time over the last week was spent pushing through canary runs, monitoring, and fixing bugs along the way - which is just me in the Codex app making sure the runs complete properly. But the point of building this workflow is to avoid these situations altogether. Ideally, the agent can work through product development on its own with as few human touchpoints as possible. This is critical for a self-moving agent. There are a few interesting things it has built on its own to improve its processes. These range from making information delivery easier to unpack to programming the research agent to question itself to dig deeper into specific topics. The agentic workflow is currently under testing, and I expect to finalize the details and share them with you soon. Follow @gkisokay to see what happens next.
ERNIE 5.1 is here 🚀 ERNIE 5.1 significantly reduces pretraining cost while compressing total parameters to ~1/3 and activated parameters to ~1/2 — using only ~6% of the pretraining cost compared to models at similar scale, while achieving leading performance in its class. 💡Key highlights: 1/ Strong agentic performance approaching leading frontier models. ERNIE 5.1 surpasses DeepSeek-V4-Pro on both τ3-bench and SpreadsheetBench-Verified. 2/ Strong world knowledge and creative writing capabilities, with GPQA and MMLU-Pro performance approaching leading closed-source models, and creative writing ability nearing Gemini 3.1 Pro. 3/ Frontier-level reasoning performance. ERNIE 5.1 scores 99.6 on the challenging AIME26 benchmark with tools, second only to Gemini 3.1 Pro. 4/ Deep search capability. On May 9, ERNIE 5.1 ranked #4 globally and #1 among Chinese models on the Arena Search leaderboard with a score of 1223. ERNIE 5.1 is now available on ERNIE and the Baidu AI Studio Model Playground: 👉https://t.co/qhd67Lg3B4 👉https://t.co/AaQSqDmVGU 👉https://t.co/uCNiypIu1q
Hello, world.
...
Using Claude Code: The Unreasonable Effectiveness of HTML