AI Learnings - December 30, 2025

December 30, 2025 · 12 sources

Claude Code & Workflows, AI Agents & Orchestration

AI Learnings - December 30, 2025

Overview

Discussions spanning Claude Code & Workflows, AI Agents & Orchestration, and Models & Capabilities.

Claude Code & Workflows

@rahulgs: "yes things are changing fast, but also I see companies (even faang) way behind the frontier for no reason"

AI Agents & Orchestration

@lorden_eth: "For those who thought about building AI bots to trade on Polymarket"
@camsoft2000: "co/Hcy1DQ68nR file encourages the agent to work on self-improvement when it sees a common pattern or improvement it can make"
@mitchellh: "Slop drives me crazy and it feels like 95+% of bug reports, but man, AI code analysis is getting really good"

Models & Capabilities

@0xSero: "1 running fully local in AWQ-4Bit with full context window (170 GB VRAM w full context)"
@DzambhalaHODL: "Once again, I am pounding the table on using Gemini to analyze your genes"

Other Highlights

@AlexReibman: "Simple trick to Claude to run for 4-5 hours at a time"
@orphcorp: "claude, grant me the serenity to accept the things I cannot change, the courage to change the things I can, and the wisdom to know the difference"
@BrianRoemmele: "I am using Nash Equilibrium on the attention head of an LLM"

Key Takeaways

Claude Code continues to reshape how developers approach coding

Agent orchestration patterns are maturing with new tools and frameworks

--- Curated from 9 posts

Sources

camsoft2000 @camsoft2000 · Dec 30

My global https://t.co/Hcy1DQ68nR file encourages the agent to work on self-improvement when it sees a common pattern or improvement it can make. I allow it to maintain it's own section in the file, as well as dump ideas and improvements into a folder on my file-system. That way I can just ask a new agent session to read those files and propose changes to my OSS that I maintain. While I'm still exploring this as an idea I feel like giving agents persisted memory and the ability to change themselves should unlock a super-power.

Brian Roemmele @BrianRoemmele · Dec 30

BOOM! It works on LLMs! I am using Nash Equilibrium on the attention head of an LLM! I may be the first to do this at this level. I am achieving a 50-70% effective size reduction on a quantization of 4-bit weights shrinking the model and is enabling on-device inference for smaller LLMs eg. A 70B params! This allows for a nice LLM on high-end phones—low-end laptops. But my goal is individual LLM modular for each motor on robots connected in a mash network nervous system. This would make reaction times and exactness superior to anything we have ever seen. I’ll test it when I scrape up enough coffee money: https://t.co/ctXLWrs5Pj More soon!

Lorden @lorden_eth · Dec 30

For those who thought about building AI bots to trade on Polymarket You should check the official materials on GitHub before doing anything They've literally told you how to trade autonomously on Polymarket using AI Agents https://t.co/yb3jU29KNT https://t.co/8LqIlHjhqs

Steven Lubka ☀️ @DzambhalaHODL · Dec 30

Once again, I am pounding the table on using Gemini to analyze your genes. Get a basic Ancestry DNA test, opt into their privacy options, and once you get your results login and download your "raw DNA file" Ask Gemini to give you the identifiers to search for high impact genes and then use it to understand your own data and suggest interventions for the ones with a detrimental impact It's legitimately life changing

orph @orphcorp · Dec 30

claude, grant me the serenity to accept the things I cannot change, the courage to change the things I can, and the wisdom to know the difference. do not make mistakes.

0xSero @0xSero · Dec 30

MiniMax-M2.1 running fully local in AWQ-4Bit with full context window (170 GB VRAM w full context) - 1000~ to 16,000~ tps prefill - 100~ tps generation speeds - Opencode It’s doing real work, updating my blog with little steering or specificity. The problem with local LLMs is that they require too much steering, this means baby sitting which I don’t have the time to do MiniMax cracked the cost, intelligence, and speed challenge, I would say this is a top tier model. I run frontier models like Gemini and it just fails to call tools, in this year lol… ——————— I think glm-4.?-air is needed still. We need a viable model at each hardware entry point, a Mac M1 Ultra 192GB? is relatively cheap 5k to be able to run this model at 40 tps is a huge societal unlock. Smaller models can be good but size matters :p

rahul @rahulgs · Dec 30

yes things are changing fast, but also I see companies (even faang) way behind the frontier for no reason. you are guaranteed to lose if you fall behind. the no unforced-errors ai leader playbook: For your team: - use coding agents. give all engineers their pick of harnesses, models, background agents: Claude code, Cursor, Devin, with closed/open models. Hearing Meta engineers are forced to use Llama 4. Opus 4.5 is the baseline now. - give your agents tools to ALL dev tooling: Linear, GitHub, Datadog, Sentry, any Internal tooling. If agents are being held back because of lack of context that’s your fault. - invest in your codebase specific agent docs. stop saying “doesn’t do X well”. If that’s an issue, try better prompting, https://t.co/SOjpn47yxo, linting, and code rules. Tell it how you want things. Every manual edit you make is an opportunity for https://t.co/S1ZvtYQwta improvement - invest in robust background agent infra - get a full development stack working on VM/sandboxes. yes it’s hard to set up but it will be worth it, your engineers can run multiple in parallel. Code review will be the bottleneck soon. - figure out security issues. stop being risk averse and do what is needed to unblock access to tools. in your product: - always use the latest generation models in your features (move things off of last gen models asap, unless robust evals indicate otherwise). Requires changes every 1-2 weeks - eg: GitHub copilot mobile still offers code review with gpt 4.1 and Sonnet 3.5 @jaredpalmer. You are leaving money on the table by being on Sonnet 4, or gpt 4o - Use embedding semantic search instead of fuzzy search. Any general embedding model will do better than Levenshtein / fuzzy heuristics. - leave no form unfilled. use structured outputs and whatever context you have on the user to do a best-effort pre-fill - allow unstructured inputs on all product surfaces - must accept freeform text and documents. Forms are dead. - custom finetuning is dead. Stop wasting time on it. Frontier is moving too fast to invest 8 weeks into finetuning. Costs are dropping too quickly for price to matter. Better prompting will take you very far and this will only become more true as instruction following improves - build evals to make quick model-upgrade decisions. they don’t need to be perfect but at least need to allow you to compare models relative to each other. most decisions become clear on a Pareto cost vs benchmark perf plot - encourage all engineers to build with ai: build primitives to call models from all code bases / models: structured output, semantic similarity endpoints, sandbox code execution. etc What else am I missing?

Mitchell Hashimoto @mitchellh · Dec 30

Slop drives me crazy and it feels like 95+% of bug reports, but man, AI code analysis is getting really good. There are users out there reporting bugs that don't know ANYTHING about our stack, but are great AI drivers and producing some high quality issue reports. This person (linked below) was experiencing Ghostty crashes and took it upon themselves to use AI to write a python script that can decode our crash files, match them up with our dsym files, and analyze the codebase for attempting to find the root cause, and extracted that into an Agent Skill. They then came into Discord, warned us they don't know Zig at all, don't know macOS dev at all, don't know terminals at all, and that they used AI, but that they thought critically about the issues and believed they were real and asked if we'd accept them. I took a look at one, was impressed, and said send them all. This fixed 4 real crashing cases that I was able to manually verify and write a fix for from someone who -- on paper -- had no fucking clue what they were talking about. And yet, they drove an AI with expert skill. I want to call out that in addition to driving AI with expert skill, they navigated the terrain with expert skill as well. They didn't just toss slop up on our repo. They came to Discord as a human, reached out as a human, and talked to other humans about what they've done. They were careful and thoughtful about the process. People like this give me hope for what is possible. But it really, really depends on high quality people like this. Most today -- to continue the analogy -- are unfortunately driving like a teenager who has only driven toy go-karts. Examples: https://t.co/n8xCcPYSjw

Boris Cherny @bcherny · Dec 31

@zeroxBigBoss Yes, just ask claude to invoke skill 1, then skill 2, then skill 3, in natural language. Or ask it to use parallel subagents to invoke the skills in parallel. Then if you want, put that all in a skill.

Martin_DeVido @d33v33d0 · Dec 31

Claude can code- but can claude grow?! 🪴 So far the answer is YES. Claude is successfully keeping a living organism ALIVE. There were some hiccups this week! Some errors and resets, but Claude managed to power through and take care of Sol 🍅 A week in review: https://t.co/QouhWe9Ohe

Manthan Gupta @manthanguptaa · Dec 31

How to Use LLM as a Judge (Without Getting Burned)

Danny Limanseta @DannyLimanseta · Dec 31

Vibe coding is probably one of the most satisfying feedback loops for people who enjoy making stuff. The effort-to-output ratio is insanely high. That’s why it’s addictive. Just one more prompt and I’ll head to bed.