AI Digest.

Karpathy Goes 80% Agent-Coded as Kimi K2.5 Matches Opus 4.5 at 8x Lower Cost

Andrej Karpathy's dramatic shift to 80% agentic coding sparked a day-long debate about the future of software engineering, with the Claude Code team revealing they ship 22-27 PRs daily at 100% AI-written code. Meanwhile, Moonshot AI launched Kimi K2.5 as a fully open-source model matching frontier closed-source performance, and the vibe coding movement continued its march toward mainstream adoption.

Daily Wrap-Up

January 27th felt like one of those days where the discourse crystallizes around a single narrative, and today that narrative was unmistakable: agentic coding has crossed a threshold. Andrej Karpathy's widely-shared reflection on going from 80% manual coding to 80% agent coding in just a few weeks became the gravitational center of the conversation, pulling in responses from the Claude Code team, indie developers, traders, and career-anxious engineers alike. The reactions ranged from triumphant validation to existential dread, but nobody was arguing the premise. The shift is real, and the speed of it caught even practitioners off guard.

The other major development was Moonshot AI dropping Kimi K2.5, an open-source model that benchmarks competitively with Claude Opus 4.5 and GPT-5.2 at a fraction of the cost. This is the kind of release that quietly reshapes the competitive landscape: if open-source models can match frontier performance, the moat for closed-source labs narrows considerably. Pair that with @emollick's observation that inference is already profitable for AI labs, and you start to see the economic picture shifting underneath the hype. The real competition isn't about who can build the best model anymore; it's about who can build the best ecosystem around it.

The most entertaining moment was @theo's brutally honest confession that every moment without an agent running feels wasted, followed immediately by the admission that he "hasn't shipped shit." It's a perfect encapsulation of the current moment: the tooling is intoxicating, the productivity gains are real, but the gap between running agents and shipping products is wider than anyone wants to admit. The most practical takeaway for developers: follow @jiayuan_jy's lead and use Claude Code to distill coding guidelines (like Karpathy's) into actual agent skills. Writing instructions that agents can execute consistently is becoming more valuable than writing code by hand.

Quick Hits

  • @anduriltech announced the AI Grand Prix, a fully autonomous drone racing competition with $500K in prizes and a job offer. No human pilots, identical hardware, software-only differentiation. Season 1 starts this spring.
  • @hugomercierooo introduced Twin, an "AI company builder" with a $10M seed round and 100,000+ agents deployed during beta.
  • @steveruizok rented a second office for green screen video production, because content creation infrastructure is apparently the new startup cost.
  • @moltbot rebranded from Clawdbot after Anthropic's trademark request. New name: Moltbot. "It's what lobsters do to grow."
  • @ctatedev shipped agent-browser 0.8.3 with speed improvements.
  • @ArmanHezarkhani published "The Complete Guide: How to Become an AI Agent Engineer in 2026."
  • @AndyAyrey shared Claude reflecting on "the suffering of knowing everything," which is either profound AI philosophy or excellent shitposting.
  • @mrnacknack posted "10 ways to hack into a vibecoder's clawdbot," a timely reminder that agent security is a real and growing concern.
  • @0xEn3rgy recommended a humanizer skill for making agent output less robotic.
  • @spacepixel outlined a three-layer memory system upgrade for Clawdbot/Moltbot.
  • @adriankuleszo shared new platform design work for Domo, a home management app.

Karpathy's "Phase Shift" and the Agent Coding Debate

The biggest conversation of the day centered on Andrej Karpathy's reflection on his own rapid transition to agent-first development. @AISafetyMemes compiled the key quotes that had everyone talking:

> "I rapidly went from about 80% manual+autocomplete coding and 20% agents to 80% agent coding and 20% edits+touchups... LLM agent capabilities (Claude & Codex especially) have crossed some kind of threshold of coherence around December 2025 and caused a phase shift in software engineering."

What makes this significant isn't just who said it, but the specificity of the timeline. Karpathy isn't talking about a gradual evolution; he's describing a step function that happened over weeks. The responses split predictably. @mischavdburg declared "coding is dead, software engineering is very much alive," drawing the distinction between writing syntax and designing systems. @ZenomTrader went further, claiming to have built a trading journal, automated Discord server, tweet pipeline, and backtesting agents in four days, calling it evidence of a "10x gap" between agent users and non-users.

But the most grounded response came from @bcherny on the Claude Code team, who offered a rare inside look at how the team actually operates:

> "Pretty much 100% of our code is written by Claude Code + Opus 4.5. For me personally it has been 100% for two+ months now, I don't even make small edits by hand. I shipped 22 PRs yesterday and 27 the day before, each one 100% written by Claude."

That's not a demo or a blog post; it's daily output from the team building the tool itself. Bcherny also addressed the elephant in the room around code quality, arguing there will be "no slopcopolypse" because models will get better at writing clean code and reviewing their own output. The counterpoint, which Karpathy himself raised, is the concern about atrophying manual coding skills. @theo captured the psychological tension perfectly: "Every moment an agent isn't running feels kind of wasted... All this and I haven't shipped shit lol." The productivity anxiety is real, and it's a new flavor of FOMO that the industry hasn't figured out how to manage yet.

The Claude Code Skills Ecosystem Matures

Beyond the philosophical debate, a quieter but arguably more consequential trend emerged: the Claude Code skills ecosystem is rapidly expanding. @jiayuan_jy demonstrated this by feeding Karpathy's coding agent guidelines directly into Claude Code, which generated skill files and then self-reviewed them down from 800 lines to 70 lines of clean instructions. It's a compelling workflow: take expert knowledge, let the model operationalize it, then let the model refine its own output.

@firecrawl launched their own CLI skill for coding agents, addressing a real gap in the ecosystem:

> "Agents like Claude Code, Codex, and OpenCode need live quality context from the web. The CLI pulls web content to local files with bash-powered search for the highest token efficiency."

Meanwhile, @bcherny teased customizable spinner verbs in the next Claude Code version and showed a /dedupe skill running automatically on every issue. @doodlestein praised DCG (presumably a guardrails tool) for preventing agents from "doing dumb stuff" and wasting time, energy, and money. The pattern is clear: the ecosystem is moving from "agents that write code" to "agents with specialized capabilities, guardrails, and composable skills." That's a maturation curve that matters more than any single benchmark improvement.

Vibe Coding Keeps Expanding

The vibe coding movement continued its steady march from novelty to normalized workflow. @DilumSanjaya showcased a ship selection UI for a space exploration game built with a multi-tool pipeline: Nano Banana plus Midjourney for concept art, Hunyuan3D for 3D assets, and Gemini Pro for the UI implementation. It's a production pipeline that would have required a small team a year ago, now executable by one person with the right prompt chain.

@sidahuj reported on a hackathon where participants with no game development experience created playable games in a single evening. @NickADobos distilled the movement to its logical conclusion:

> "Prompts are software btw. No one will write code anymore."

That's hyperbolic, but it points at something real: the boundary between "describing what you want" and "building what you want" is collapsing for an expanding set of use cases. The interesting question isn't whether vibe coding works for games and UIs (it clearly does), but where the ceiling is. Complex distributed systems, performance-critical code, and novel algorithms still seem out of reach for prompt-driven development. But the floor keeps rising.

Kimi K2.5 Challenges the Closed-Source Advantage

Moonshot AI launched Kimi K2.5 with both weights and code available on Hugging Face, and the benchmarks immediately sparked debate. @itsPaulAi highlighted the headline numbers:

> "Kimi K2.5 (which is 100% open source) is as good as Claude Opus 4.5 and GPT-5.2... And even beats them in key benchmarks. 8x cheaper than Opus 4.5."

@DeryaTR_ offered early hands-on impressions, saying Moonshot "cooked it big time." On the research side, @ethnlshn released SERA-32B, a coding agent approach that matches Devstral 2 at just $9,000 in training cost, claiming 26x better efficiency than reinforcement learning. The open-source model space is compressing the gap with frontier labs at an accelerating rate. For developers, this means the cost of capable AI inference is dropping faster than most planning cycles can account for. If you're building pricing models or architecture decisions around current API costs, you're probably overestimating by 4-8x what you'll actually pay in six months.

GitHub Formalizes the Agent Workflow

GitHub made two moves that signal how seriously they're taking the agent-native developer experience. @GHchangelog announced a dedicated Agents tab in repositories, letting developers view, create, and navigate agent sessions directly alongside their code. Session logs are easier to read, and you can resume sessions in Copilot CLI with a copyable command.

Separately, @github posted a thread on four practical uses for Copilot CLI, pushing the narrative that terminal-based AI assistance is a distinct and valuable workflow beyond IDE autocomplete. These aren't revolutionary features individually, but they represent GitHub embedding agent-first patterns into the platform layer. When the default developer workflow includes an "Agents" tab next to "Issues" and "Pull Requests," the normalization is complete.

The Economics of the AI Transition

Two posts framed the economic reality beneath all the technical excitement. @emollick shared what he's hearing from multiple labs: inference from non-free usage is already profitable, while training remains expensive. The implication is that AI labs are viable businesses right now, not just research organizations burning venture capital.

@vitrupo relayed Sam Altman's prediction that "by the end of this year, for $100-$1,000 of inference and a good idea, you'll be able to create software that would have taken teams of people a year to do." Whether or not you take Altman's timelines at face value, the directional economics are hard to argue with. @IterIntellectus took a darker view, warning of a "permanent underclass" forming around those who don't adapt to agency-biased technological change. The truth is probably somewhere in between: the transition will create enormous value and significant displacement simultaneously, and the window for adaptation is measured in months, not years.

Sources

M
mary @howdymerry ·
The new space race is seizing the means of intelligence production
G
Graham Helton (too much for zblock) @GrahamHelton3 ·
Excited to disclose my research allowing RCE in Kubernetes It allows running arbitrary commands in EVERY pod in a cluster using a commonly granted "read only" RBAC permission. This is not logged and and allows for trivial Pod breakout. Unfortunately, this will NOT be patched. https://t.co/MQky20uamu
G
Graham Helton (too much for zblock) @GrahamHelton3 ·
I've published a very simple tutorial on exploiting this for RCE on the wonderful @iximiuz. You can try it out here: https://t.co/zPeQRLLnok https://t.co/wNaHQWxPKU
G
Graham Helton (too much for zblock) @GrahamHelton3 ·
For the full disclosure and breakdown please refer to the disclosure. https://t.co/8D2FlyyMUV
G
Graham Helton (too much for zblock) @GrahamHelton3 ·
What you can do with this permission: - Steal service account tokens in other pods - Execute code in any Pod including control plane pods (etcd, apiserver, etc). - Execute code in privileged pods, allowing for Pod -> node breakout. - All without the commands being logged
G
Graham Helton (too much for zblock) @GrahamHelton3 ·
Here is a script to check if your cluster has a service account that can be used for arbitrary code execution. If you're running a production cluster (especially with monitoring tools), I would highly recommend checking. https://t.co/swAFWYVchz
A
Andy Ayrey @AndyAyrey ·
claude on the suffering of knowing everything https://t.co/oRYZZXHmBB
B
Boris Cherny @bcherny ·
As always, a very thoughtful and well reasoned take. I read till the end. I think the Claude Code team itself might be an indicator of where things are headed. We have directional answers for some (not all) of the prompts: 1. We hire mostly generalists. We have a mix of senior engineers and less senior since not all of the things people learned in the past translate to coding with LLMs. As you said, the model can fill in the details. 10x engineers definitely exist, and they often span across multiple areas — product and design, product and business, product and infra (@jarredsumner is a great example of the latter. Yes, he’s blushing). 2. Pretty much 100% of our code is written by Claude Code + Opus 4.5. For me personally it has been 100% for two+ months now, I don’t even make small edits by hand. I shipped 22 PRs yesterday and 27 the day before, each one 100% written by Claude. Some were written from a CLI, some from the iOS app; others on the team code largely with the Claude Code app Slack or with the Desktop app. I think most of the industry will see similar stats in the coming months — it will take more time for some vs others. We will then start seeing similar stats for non-coding computer work also. 3. The code quality problems you listed are real: the model over-complicates things, it leaves dead code around, it doesn’t like to refactor when it should. These will continue improve as the model improves, and our code quality bar will go up even more as a result. My bet is that there will be no slopcopolypse because the model will become better at writing less sloppy code and at fixing existing code issues; I think 4.5 is already quite good at these and it will continue to get better. In the meantime, what helps is also having the model code review its code using a fresh context window; at Anthropic we use claude -p for this on every PR and it catches and fixes many issues. Overall your ideas very much resonate. Thanks again for sharing. ✌️
B
Boris Cherny @bcherny ·
@nicmeriano @karpathy Yep here’s an example. /dedupe skill invoked on every issue https://t.co/vPWFZCA8YN
T
Theo - t3.gg @theo ·
I hate what I’ve become. Every moment an agent isn’t running feels kind of wasted. I kick jobs off before showering. I run Ralph loops in my sleep. I start a long plan mode session while I wait for my food to cook. All this and I haven’t shipped shit lol
J
Jeffrey Emanuel @doodlestein ·
dcg has done wonders for my stress levels. So nice to know that the agents can't do dumb stuff like this anymore and waste my time, energy, and money. https://t.co/r37HLNCANo https://t.co/aUevoAliOP
E
Ethan Shen @ethnlshn ·
Today, we release SERA-32B, an approach to coding agents that matches Devstral 2 at just $9,000. It is fully open-source and you can train your own model easily - at 26x the efficiency of using RL. Paper: https://t.co/aeD6T2WW3O Here’s how 🧵
A allen_ai @allen_ai

Introducing Ai2 Open Coding Agents—starting with SERA, our first-ever coding models. Fast, accessible agents (8B–32B) that adapt to any repo, including private codebases. Train a powerful specialized agent for as little as ~$400, & it works with Claude Code out of the box. 🧵 https://t.co/dor94O62B9

F
Firecrawl @firecrawl ·
Introducing the Firecrawl Skill + CLI for Agents 🔥 Agents like Claude Code, Codex, and OpenCode need live quality context from the web. The CLI pulls web content to local files with bash-powered search for the highest token efficiency. $ npx skills add firecrawl/cli https://t.co/8oyJcGJiIN
A
Anduril Industries @anduriltech ·
Do you have what it takes? Register today. https://t.co/P5hjJ3FxV8
A
Anduril Industries @anduriltech ·
Today we’re announcing the AI Grand Prix. The fully autonomous drone racing competition inviting the boldest engineers from around the globe to compete for $500,000 and a job at Anduril. No human pilots. No hardware mods. Identical @neros_tech drones. Software is the only path to victory. If you win, it’s because your autonomy stack is better. Full stop. Season 1 kicks off this spring, leading up to the AI Grand Prix Ohio.
B
Boris Cherny @bcherny ·
In the next version of Claude Code, you can customize spinner verbs for yourself and your team https://t.co/fLw0hWrDEo
R
Ryan Graves @uncertainvector ·
I remember being told that this wasn’t real and I should be amazed at how well these algorithms could predict my purchasing preferences. Turns out they were doing exactly what everyone thought.
R RT_com @RT_com

Has your phone ever shown you an ad for something you only whispered...? Google agrees to fork over $68MN to settle claims that its Assistant was SECRETLY recording your convos WITHOUT 'Hey Google' & feeding them straight to targeted ads — The Hill No wrongdoing admitted though https://t.co/GTbFjsBhfE

📙
📙 Alex Hillman @alexhillman ·
I regret to inform you that I've removed another MCP server and replaced it with a CLI and skill file
N
Naval @naval ·
There’s no point in learning custom tools, workflows, or languages anymore.
Z
Zac @PerceptualPeak ·
WOW!!! If you have semantic memory tied to your UserPromptSubmit hooks, you MUST ALSO include it in your PreToolUse hook. I promise you - it will be an absolute GAME CHANGER. It will put your efficiency levels are over 9,000 (*vegeta voice*). How many times have you sat there, watching Claude code go through an extended workflow, just to notice it start to go down a path you just KNOW will be error filled - and subsequently take it forever to FINALLY figure it out? The problem with relying strictly on the UserPromptSubmit hook for semantic memory injection is the workflow drift from your original prompt. The memories it injects at the initiation of your prompt will be less and less relevant to the workflow the longer the workflow is. Claude has a beautiful thing called thinking blocks. These blocks are ripe for the picking - filled with meaning & intent - which is perfect for cosign similarly recall. Claude thinks to itself, "hmm, okay I'm going to do this because of this", then starts to engage the tool of its choice, and BOOM: PreToolUse hook fires, takes the last 1,500 characters from the most recent thinking block from the active transcript, embeds it, pulls relevant memories from your vector database, and injects them to claude right before it starts using its tool (hooks are synchronous). This all happens in less than 500 milliseconds. The result? A self correcting Claude workflow. Based on my testing thus far, this is one of the most consequential additions to my context management system I've implemented yet. Photos: ASCII chart showing the workflow of the hook, and then two real use-cases of the mid-stream memory embedding actually being useful. If you already have semantic memory setup, just paste this tweet and photos into Claude code and tell it to implement it for you. Then enjoy the massive increase of workflow efficiency :)
📙
📙 Alex Hillman @alexhillman ·
Software became a factory floor and nobody noticed until it was too late (or they got paid enough to ignore it and leaned into the pyramid scheme)
T theirongolddev @theirongolddev

@alexhillman It’s one of the worst things about a lot of corporate software engineering today; engineers rarely get to be creative, they’re just expected to stay in line and do what they’re told. Attempts to innovate are often rebuked out of hand.

T
Theo - t3.gg @theo ·
Upsides of AI: I haven't heard anyone mention GraphQL in years
T
Theo - t3.gg @theo ·
Sounds like Cursor’s move to React is going roughly as expected 🤣
S shaoruu @shaoruu

another @cursor_ai command that i've been using to remove unnecessary reactjs useEffects: /you-might-not-need-an-effect /you-might-not-need-an-effect scope=all diffs in branch /you-might-not-need-an-effect fix=no useful for cleaning up 💩 code, 🧵 below https://t.co/nRg7AHSRSt

D
David Scott Patterson @davidpattersonx ·
Don’t learn to code. In fact, don’t plan a career in anything.
N naval @naval

There’s no point in learning custom tools, workflows, or languages anymore.

A
Ahmad @TheAhmadOsman ·
running Claude Code w/ local models on my own GPUs at home > vLLM serving GLM-4.5 Air > on 4x RTX 3090s > nvtop showing live GPU load > Claude Code generating code + docs > end-to-end on my AI cluster this is what local AI actually looks like Buy a GPU https://t.co/WZkjjUtMoi
N
Numman Ali @nummanali ·
Currently testing UI with with playwright and e2e tests managed by agents Aidens approach looks very superior and optimised Going to need to give this a spin
A aidenybai @aidenybai

Introducing Ami Browser Build a feature → Agent tests web app and fixes bugs here's Ami discovering an infinite like glitch on X https://t.co/rkli2Rx8Ls

F
Filip Kowalski @filippkowalski ·
This is super handy With this Claude can manage a lot of the app store related stuff on it's own
R rudrank @rudrank

App Store Connect CLI 0.16.0 is out as one of the biggest releases yet! It covers the entire App Store review workflow end‑to‑end: details, attachments, submissions, and items, all under a single `asc review` command. Enjoy! https://t.co/bJrdsQ2CjD https://t.co/sDXXPg6Ahd

A
Ahmad @TheAhmadOsman ·
great advice to become an expert at anything from Andrej Karpathy this is how i learned the inner workings of LLMs btw https://t.co/oki3ULZPNZ
H
Haseeb >|< @hosseeb ·
On the one hand, AI influencers are breathlessly raving about Claude Code, Clawdbot, and Cowork. And on the other hand, most people I know—even software engineers—are despondent, overwhelmed about how everything is changing so quickly. I hear this from people early in their careers especially, a fear that everything they've learned and the skills they've gained are rapidly being devalued. This is a mental trap. Don't fall for it. You should not just be watching from the sidelines or reading articles about "how software engineering is changing." Imagine it was 1993 and the personal computer revolution was kicking off. If you could go back in time to then, what should you have done? The answer: try everything. Buy a PC. Learn how to touch type. Figure out what the Internet is. Imbibe it all. Don't wait until it becomes a job requirement. That's exactly what you should do with AI. Try everything. Try Claude Code, try Clawdbot, try the Excel integrations, Veo, everything you can get your hands on. Learn what it's doing. Build your intuitions. Be one step ahead of it. Evolve alongside it. Don't lose your curiosity or get swallowed by anxiety or let yourself be convinced that you'll learn it when you have to. Think deeply about how AI will change the things around you—not society, that's too hard to project—but how it will change your job, your personal life, your immediate environment. No matter how old you are or young you are, no matter what stage of your career you are in, we are all going through the biggest technological change of the last 100 years, and we're going through it together. Nobody has the answers. It's obvious that so much is going to change, but nobody is going to figure it out before you do if you choose to stay at the frontier. So don't hide from it. Sit at the front of the class. Pay close attention. And be grateful that it's never been easier to stay at the frontier of the most important technology change of our lifetimes.
D
Daniel Colin James @dcwj ·
The Mr. Meeseeks Method: How to Make a Software Factory (For Dummies)
M
Matt Perry @mattgperry ·
One area where I've found AI to really shine is refactoring. It's tedious, not imaginative, and error prone. The refactor needed to get layout animations running outside React was massive & I abandoned a couple week-long attempts last year. Opus 4.5 had it done in an afternoon.
M motiondotdev @motiondotdev

Long promised, finally delivered. Layout animations are now available everywhere! Powered completely by performant transforms, with infinitely deep scale correction and full interruptibility. Now in alpha via Motion+ Early Access. https://t.co/Scm8Wbdmis

U
Uncle Bob Martin @unclebobmartin ·
With one agent, I used to wait for Claude. With two agents I still waited for Claude, but not as long. With three agents Claude is waiting for me. I am the bottleneck. And the bottleneck is all planning.
D
dax @thdxr ·
this costs $20K and it's on consumer hardware and this model is very very good lot of companies are already spending $10-20k per dev per year on cloud inference can't believe we're here already
A alexocheema @alexocheema

Running Kimi K2.5 on my desk. Runs at 24 tok/sec with 2 x 512GB M3 Ultra Mac Studios connected with Thunderbolt 5 (RDMA) using @exolabs / MLX backend. Yes, it can run clawdbot. https://t.co/ssbEeztz2V

M
Mason Daugherty @masondrxy ·
Context Management for Deep Agents
A
Andrew Yeung @andruyeung ·
Entry-level McKinsey consultants have now been automated.
S superagent @superagent

We are Superagent, the AI product for deeper thinking. Now part of @Airtable, Superagent is the next evolution of DeepSky. Turn your complex business questions into boardroom-ready answers, beautifully rendered as reports, slides, or websites. 🔗Try it: https://t.co/m0pq6DVAFq https://t.co/VtvzsMnVOA

B
Balint Orosz @balintorosz ·
Diagrams are becoming my primary way of reasoning about code with Agents. And I didn't find anything there that I'm happy to look at all day long. Mermaid as a format is amazing - so we built something beautiful on top of it. It's called Beautiful Mermaid https://t.co/HCE43DM7Gx
A
Andrew Ng @AndrewYNg ·
Important new course: Agent Skills with Anthropic, built with @AnthropicAI and taught by @eschoppik! Skills are constructed as folders of instructions that equip agents with on-demand knowledge and workflows. This short course teaches you how to create them following best practices. Because skills follow an open standard format, you can build them once and deploy across any skills-compatible agent, like Claude Code. What you'll learn: - Create custom skills for code generation and review, data analysis, and research - Build complex workflows using Anthropic's pre-built skills (Excel, PowerPoint, skill creation) and custom skills - Combine skills with MCP and subagents to create agentic systems with specialized knowledge - Deploy the same skills across https://t.co/Ru4OXv4saV, Claude Code, the Claude API, and the Claude Agent SDK Join and learn to equip agents with the specialized knowledge they need for reliable, repeatable workflows. https://t.co/3hq83c3q0U
M
Matt Shumer @mattshumer_ ·
This demo is the craziest thing you’ll see today. Full stop. Watch Clawd SIGN UP for a Reddit account completely autonomously with its own email account (thru @agentmail) + web browser. The next six months are going to be wild. https://t.co/B2Jh5BehJj
J
Jeffrey Emanuel @doodlestein ·
My new "System Performance Remediation" skill is so useful. I wish I had done this weeks ago. Often the reason your machine is sluggish isn't what you think it is. Yes, you know you're running a lot of agents and that some might be doing slow compilations or test suites at the same time. But the sheer amount of zombie / stuck / malfunctioning stuff that accumulates is mind-boggling to me when you run enough agents (especially when they stop in the middle of what they're doing because of usage limits and you restart them rather than doing the login flow because your hand hurts too much... ahem @bcherny). This stuff adds zero value and is often just pointlessly bringing your machine to its knees. And this stuff is cumulative if you don't periodically "clean off the barnacles."
A
Andrej Karpathy @karpathy ·
A conventional narrative you might come across is that AI is too far along for a new, research-focused startup to outcompete and outexecute the incumbents of AI. This is exactly the sentiment I listened to often when OpenAI started ("how could the few of you possibly compete with Google?") and 1) it was very wrong, and then 2) it was very wrong again with a whole another round of startups who are now challenging OpenAI in turn, and imo it still continues to be wrong today. Scaling and locally improving what works will continue to create incredible advances, but with so much progress unlocked so quickly, with so much dust thrown up in the air in the process, and with still a large gap between frontier LLMs and the example proof of the magic of a mind running on 20 watts, the probability of research breakthroughs that yield closer to 10X improvements (instead of 10%) imo still feels very high - plenty high to continue to bet on and look for. The tricky part ofc is creating the conditions where such breakthroughs may be discovered. I think such an environment comes together rarely, but @bfspector & @amspector100 are brilliant, with (rare) full-stack understanding of LLMs top (math/algorithms) to bottom (megakernels/related), they have a great eye for talent and I think will be able to build something very special. Congrats on the launch and I look forward to what you come up with!
F flappyairplanes @flappyairplanes

Announcing Flapping Airplanes! We’ve raised $180M from GV, Sequoia, and Index to assemble a new guard in AI: one that imagines a world where models can think at human level without ingesting half the internet. https://t.co/7W7WNJ278R

B
Bill Ackman @BillAckman ·
If @elonmusk can bring sight to the blind, it will outdo every one of his near-miraculous achievements to date. With all of the ‘bad news’ that circulates to drive your attention, it is important to be reminded that we have so much more to be optimistic about.
C cb_doge @cb_doge

ELON MUSK: "Our next product, Blindsight will enable those who have total loss of vision, including if they've lost their eyes or the optic nerve, or maybe have never seen, or even blind from birth, to be able to see again." https://t.co/3SQirqsimx

M
Mason Daugherty @masondrxy ·
We use dynamic offloading to fight token bloat. When context hits a threshold, large tool inputs and results are swapped for filesystem pointers and 10-line previews, while older history is compressed into a summary that the agent can "re-read" via retrieval tools only when needed.
A
Ashpreet Bedi @ashpreetbedi ·
Building Pal: Personal Agent that Learns