AI Digest.

Enterprise Agent Roles Take Shape as Developers Optimize Claude Code Spend and Docker Workflows

Today's discourse centered on the emerging enterprise agent ecosystem, with Aaron Levie outlining a new "agent deployer" role and Ramp's Glass system showing 99% company-wide AI adoption. On the developer side, practical optimization dominated: Claude Code token tracking tools, Dockerfile best practices, and local LLM acceleration on Apple Silicon all drew significant attention.

Daily Wrap-Up

The conversation today split cleanly between two worlds: enterprise leaders figuring out how to reorganize entire companies around AI agents, and individual developers grinding on the nuts and bolts of making their tools faster, cheaper, and less wasteful. Aaron Levie continued his field notes from enterprise AI conversations, now proposing a concrete new job title for the people who will wire agents into business workflows. Meanwhile, Steven Sinofsky dropped a characteristically thorough essay on why "just go headless" is far harder than the agent-enthusiast crowd wants to admit. The tension between these two perspectives is the most interesting thread of the day: everyone agrees agents are coming for enterprise workflows, but the plumbing required to make that real is genuinely daunting.

On the practitioner side, the vibe was refreshingly pragmatic. A Claude Code token-tracking dashboard revealed that over half of one developer's spend was going to conversational responses rather than actual code generation. Dockerfile optimization tips got traction not because they were novel, but because people are still making the same mistakes. And the GPU discourse continued its eternal cycle, with pushback against the RTX 3090 nostalgia crowd pointing to newer cards with better performance per dollar. The most entertaining moment was @doodlestein casually dropping a 158-file, 2.7MB tax preparation skill for AI agents, filed just hours before the tax deadline, and having GPT 5.4 write its own marketing copy.

The most practical takeaway for developers: install a token-tracking tool like codeburn (npx codeburn) to audit where your AI coding spend actually goes. If you're like the developer who built it, you may find that more than half your tokens are burned on conversation rather than code generation, and that visibility alone can reshape how you prompt and interact with coding agents.

Quick Hits

  • @indie_maker_fox shared impressive architecture diagrams generated by an AI skill for OpenHarness, showcasing how documentation tooling keeps improving.
  • @berryxia highlighted DFlash speculative decoding on Apple M-series chips achieving up to 4.13x speedups for Qwen3.5, pushing local LLM inference further into "good enough" territory.
  • @kimberlywtan spotlighted former OpenAI researcher @philhchen building "Filbert," a lightweight coding agent wrapper that runs on your own infrastructure and can improve itself recursively.
  • @juristr teased a video on "Agent Factories," exploring patterns for spinning up and managing fleets of AI agents programmatically.
  • @mattpocockuk floated a relatable prompt pattern: asking AI to "go up a layer of abstraction" and map all relevant modules and callers when navigating unfamiliar code. He's looking for a name for the skill.
  • @TheAhmadOsman recommended pairing local LLMs with a self-hosted SearXNG instance for web search, calling it a significant intelligence boost for local setups.

Enterprise Agents and the New Org Chart (4 posts)

The biggest cluster of conversation today revolved around how enterprises are actually deploying AI agents at scale, and what organizational changes that demands. @levie laid out a detailed job description for what he sees as an inevitable new role: the agent deployer and manager. This person sits on individual teams, maps workflows, connects business systems via MCP and CLIs, and runs agents against KPIs. It's not a centralized IT function. It's embedded, operational, and deeply technical:

> "The gnarly part of the work is mapping structured and unstructured data flows, figuring out the ideal workflow, getting the agent the context it needs to do the work properly, figuring out where the human interfaces with the agent and at what steps, manages evals and reviews after any major model or data change."

This framing got reinforced by @jordan_ross_8F's breakdown of Ramp's internal "Glass" system, which achieves 99% daily AI usage across the company through pre-configured workspaces, 350+ reusable skills, and persistent memory. The key insight from Ramp's approach, originally shared by @eglyman, is that adoption stalled not because models weren't capable but because "the setup was too painful and unintuitive for most." Glass solved that by making every employee's AI workspace ready on day one with integrations already connected via SSO.

What makes this cluster interesting is the gap between ambition and reality. While Levie describes agent deployers connecting systems seamlessly, Sinofsky's lengthy analysis (quoted from @stevesi) argues that most enterprise software was never designed to be operated headlessly. The 100,000+ tables in a typical SAP installation aren't just an API problem; they represent decades of implementation complexity that no agent can easily navigate. The enterprise agent revolution everyone's planning for may arrive, but it's going to hit a wall of legacy architecture that "headless" handwaving can't solve.

Developer Tooling and Token Economics (3 posts)

A practical thread emerged around making AI-assisted development more efficient and observable. @om_patel5 showcased codeburn, an open-source terminal dashboard that classifies every Claude Code turn into 13 categories without any LLM calls. The finding that jumped out:

> "56% of his spend was 'conversation' where Claude is just responding with no tool use and the actual coding (edits and writes) was only 21%."

That's a striking ratio that suggests many developers are essentially paying for an expensive rubber duck. The tool reads session transcripts and breaks down costs by task type, project, model, and MCP server, with daily activity charts and interactive navigation.

On the configuration side, @diegohaz shared Claude Code settings that he says "fixed most of the issues" he was experiencing: forcing Opus 4-6, setting effort level to high, enabling always-on thinking, and disabling adaptive thinking with a 32K thinking token cap. These kinds of community-shared configurations are becoming their own form of knowledge, and the fact that users need to tune these knobs at all says something about how much performance variance exists in default setups. Together, these posts paint a picture of a maturing ecosystem where developers are moving past "wow this is cool" into "how do I make this cost-effective and reliable."

Dockerfile Practices That Actually Matter (1 post)

@immanuel_vibe dropped a comprehensive thread of Dockerfile opinions that resonated because they're the kind of advice that sounds obvious in retrospect but trips up even experienced developers. The highlights challenge common dogma: stop using Alpine for Python and Node apps because musl libc causes silent performance issues, use debian-slim instead. Pin base images by digest, not tag, because node:20 today won't be node:20 in six months. And the most practical gem:

> "BuildKit cache mounts (--mount=type=cache) will change your life. pip/apt/cargo cache between builds without it ending up in the final layer. Nobody talks about this enough."

The thread also pushed back gently on container orthodoxy, arguing that the "one process per container" rule costs more complexity than it saves when you're not at Kubernetes scale. The post plugged dockerfile-roast, a Rust-based linter with 63 rules, but the real value was in the framing: there's no best practice in a vacuum, only workload-appropriate choices.

Context Engineering Over RAG (1 post)

@nyk_builderz articulated a shift that's been building for months: the most sophisticated AI teams have moved past optimizing retrieval and are now focused on context governance. The distinction matters:

> "RAG fetches fragments. Context engineering manages decisions. Control planes enforce safety + provenance. In 2026, memory quality compounds faster than model quality."

This framing reframes the entire RAG conversation from a technical plumbing problem into an architectural one. It's not about fetching the right chunks anymore; it's about managing what context an agent sees, when, and with what provenance guarantees. As agents get more autonomous and operate across more business systems, the question of "what does this agent know and why" becomes a governance problem, not just an engineering one.

AI-Powered Security Testing (1 post)

@_avichawla made waves with a post about Strix, an open-source framework for continuous AI pentesting that deploys a graph of specialized agents to probe attack surfaces. The pitch is compelling: traditional pentests cost $30-50K, take weeks, and produce PDFs that are outdated by the next deploy. Strix runs continuously, chains vulnerabilities automatically, and generates merge-ready remediation code. The most interesting architectural detail is that agents share discoveries in real time, so "if one finds an auth bypass, another immediately tests whether it chains into privilege escalation." Whether this replaces professional pentesters entirely is debatable, but as a continuous supplement to scheduled audits, it fills a real gap in CI/CD pipelines that catch build failures but not security exposures.

Local Inference and Hardware (2 posts)

The local AI hardware debate continued with @kaiostephens pushing back against the RTX 3090 nostalgia that's dominated GPU discussions. While a quoted thread argued the 3090's $42/GB VRAM makes it unbeatable, Kaios pointed to the 5060 Ti, AMD's 9060 XT, and even Tenstorrent's P100A as better performance-per-dollar options. Meanwhile, @LottoLabs highlighted llama.cpp's RPC capability for splitting models across heterogeneous GPU setups on separate machines, no fancy networking required. For homelabbers running mixed hardware, this is a meaningful unlock: you can finally combine that old 3090 with a newer card on a different machine and actually use them together for inference.

Sources

N
Nyk 🌱 @nyk_builderz ·
Everyone is still optimizing retrieval. Top teams are optimizing context governance. The shift: - RAG fetches fragments - Context engineering manages decisions - Control planes enforce safety + provenance In 2026, memory quality compounds faster than model quality.
N nyk_builderz @nyk_builderz

Context Engineering killed RAG

K
Kimberly Tan @kimberlywtan ·
.@philhchen is a former @OpenAI researcher who is a great follow for learning how to build a company in an agent-first way
P philhchen @philhchen

how I built Filbert (phil-bot) the future of coding agents is lightweight wrappers around existing harnesses running on your own infra. it has access to your full dev env and can improve itself recursively. it's also so easy to build when you're set up for it.

S
Steven Sinofsky @stevesi ·
This point (screenshot) from customers via Aaron's personal deep dives is the one I find most interesting. Let's talk about what it means to be headless. Think about this number: a typical SAP installation has over 100,000 tables to start and that grows with every module and customization (and product update.) What SAP sells is the UI to this data set which abstracts the complexity and knowledge of ERP required to build such a system. Of course SAP has APIs, but SAP is not designed to run headless. And by SAP I mean that as a proxy for any complex business application vendor. Most people have no idea the complexity behind the UI of even simple SaaS products. The old joke about Docusign having 30,000 engineers comes to mind. From an engineering perspective, building an API-based or "headless" service has never been easy on its own and doing one while also building a user experience or "front end" is doubly difficult. From the earliest days of computing separating human interaction from "functionality" has been a significant computer science problem. Academically it is understood how to do this even if the mechanisms change over time (two tier, three tier, SOAP, Web Services, perhaps MCP now, and so on). Every company with a product exposes some APIs but almost always those APIs reflect the user experience. Many people will cite examples of pure service APIs but there's little history of rising to an example of a read/write enterprise system of record that exists as a fully functional API along with a full-coverage user experience. A most interesting modern example might be prove to be Amazon which has developed its whole implementation around APIs and as far as we know from the outside, the web site is a 1:1 reflection of those APIs. But even today ordering by API is not a general purpose thing. Perhaps no huge company is better positioned to be service via API for the core business, but they will have the challenges outlined below. I also think the first systems that can actually go headless are massive in house systems that have already gone through redesign for browser and mobile, but those systems are entirely in-house for one customer—the house. Banking, insurance, government, some healthcare, and more come to mind. Even when there is a point-in-time completeness and clear separation, maintaining that is monumentally difficult. The primary reason is that the history of innovation of software is the bundle-unbundle cycle. In the case of APIs and UI, it is the layering that is under attack from competitors and partners. All software is layered. APIs cement those layers. Invariably consumers of APIs seek access at the "deadly" combination of lower level and more abstract level. Some developers seek an advantage or to innovate in ways the API did not by driving underneath to the "implementation" layers. Others seek an advantage by providing easier/better/different capabilities by abstracting the API. Both of these are impossibly difficult if you're in the business of providing an API. The clear difference in iPhone v Mac v Windows v Android v Game Consoles shows how difficult it is to maintain integrity and developers over time. The market and competitors attack APIs by poking at the layers. An old adage is if you want to compete with a big company build a product that lands between two VPs. A corollary is if you want to compete with an API, go after the layers—build something with different layers or undermine the layers of a competitor against their wishes. As difficult as that is, from a company business perspective the problem is compounded by being in the business of acting as a headless data repository, especially a system of record. All companies today build, demo, sell, and support their products as UI-first. New features are shown via UI. Systems are updated with new user experience. The underlying implementation changes to support new UI. New capabilities are added in the UI and the API is updated often a bit later. The companies view the user experience layer and the key delivery mechanism for value. Inside the company they put as much engineering ability into the implementation as they do the user experience, but when push comes to shove, the API plays second to the need to maintain the implementation and to deliver value through user experience. This is so entrenched that overwhelmingly enterprise companies that sell "software" (cloud, mobile, on-prem, whatever) actually protect their data. In almost all cases this is because of legitimate reasons to avoid the implementation details from becoming a "contract" with third parties or customers. Companies want the ability to change the features of a product without fear of breaking everything. After all, abstraction is a good thing. Invariably this leads to a world with two issues. The official API for most products is somewhere between a straight mapping of the UI and the underlying implementation. Most all transactional systems are SQL but in no cases do companies offer direct access to data via SQL, which is precisely what many customers _think_ they want. In practice, the API layer on top of SQL enforces much of the data model and constraints of the system which is good. But that also means developers can't actually build the full product that they buy on top of the SQL tables just as the company did. The APIs is limited. The system isn't really headless. The second issue with the inability to access data directly is the one that will be HUGE with AI. Putting aside what it means to access data directly, the problem of not having a massively performant real-time data access API that also respects the underlying data model and implementation is that the data for your company is difficult to use for AI training. That in addition to not being able to write the UI you want or to go "headless" (whatever that might mean for a massive system like ERP). Those 100,000+ tables mentioned above are essentially impossible for a customer to understand. An API to these 100,000 tables will be (and is today) at a level of abstraction that makes sense for perhaps static reporting, B2B connections, or other non-destructive, system-of-record-preserving actions. As much as "just give us an API" energy there is almost no company would have the staff required to make sense of an API that truly exposes the implementation of a massive ERP system (there are 10,000 engineers that developed SAP over 40 years of iterations). Now much of the implementation complexity for enterprise system for sale comes from customizing it for many companies but also to support all the things any given customer might not know they need yet! That's what makes it such a valuable product. As a result of these challenges the existing companies selling enterprise software have historically made it very difficult to acquire or use them in a headless manner. It boils down to: 1. The product is the experience. That's what they are selling you and what you are mostly buying. That's because it is what defines a category, demonstrates the capability, and captures the value. 2. The implementation is wildly complex and the lack of APIs is designed to maintain the integrity of the system so the experience can evolve. Even if there is an API, it will need to be at a level of abstraction that will almost by definition prevent headless operation in order to keep the system from imploding. This leads to the situation where customers declare "the data is our data" and the enterprise vendors assert "yes, but not the implementation." What that means in practice boils down to protracted contract negotiations and endless customer requests for APIs. AI training is a new frontier. It is not unlike the rise of intermediate BI tools driving deeper data access or the rise of mobile driving the desire for companies to build their own mobile apps accessing "their" data in richer ways perhaps requiring deeper data access. But AI training is new in two ways: * First, for Agents customers will want the full breadth of capabilities in a read/write way as a replacement for the AI operating at scale. Whereas BI might make do with static reporting views or mobile might make do with an abstract transaction entry API or lookup, AI "wants it all" and agents will replace not a single user on one task but many users on many tasks. * Second, the vendors will want to use this as an opportunity to deliver more value by providing the training and "model" (whatever that means down the road) that uses the enterprise data for solutions. While it is customer data, the vendor wants to offer the features of AI on top of the data and doesn't want to be disintermediated from that. I see this second tension and the most likely near term issue. And it is already heating up. This is a replay of BI, "data lakes", and so on. The industry is well-prepared for this debate and discussion. In the meantime, headless is going to be a challenge.
L levie @levie

Another week on the road meeting with a couple dozen IT and AI leaders from large enterprises across banking, media, retail, healthcare, consulting, tech, and sports, to discuss agents in the enterprise. Some quick takeaways: * Clear that we’re moving from chat era of AI to agents that use tools, process data, and start to execute real work in the enterprise. Complementing this, enterprises are often evolving from “let a thousand flowers bloom” approach to adoption to targeted automation efforts applied to specific areas of work and workflow. * Change management still will remain one of the biggest topics for enterprises. Most workflows aren’t setup to just drop agents directly in, and enterprises will need a ton of help to drive these efforts (both internally and from partners). One company has a head of AI in every business unit that roles up to a central team, just to keep all the functions coordinated. * Tokenmaxxing! Most companies operate with very strict OpEx budgets get locked in for the year ahead, so they’re going through very real trade-off discussions right now on how to budget for tokens. One company recently had an idea for a “shark tank” style way of pitching for compute budget. Others are trying to figure out how to ration compute to the best use-cases internally through some hierarchy of needs (my words not theirs). * Fixing fragmented and legacy systems remain a huge priority right now. Most enterprises are dealing with decades of either on-prem systems or systems they moved to the cloud but that still haven’t been modernized in any meaningful way. This means agents can’t easily tap into these data sources in a unified way yet, so companies are focused on how they modernize these. * Most companies are *not* talking about replacing jobs due to agents. The major use-cases for agents are things that the company wasn’t able to do before or couldn’t prioritize. Software upgrades, automating back office processes that were constraining other workflows, processing large amounts of documents to get new business or client insights, and so on. More emphasis on ways to make money vs. cut costs. * Headless software dominated my conversations. Enterprises need to be able to ensure all of their software works across any set of agents they choose. They will kick out vendors that don’t make this technically or economically easy. * Clear sense that it can be hard to standardize on anything right now given how fast things are moving. Blessing and a curse of the innovation curve right now - no one wants to get stuck in a paradigm that locks them into the wrong architecture. One other result of this is that companies realize they’re in a multi-agent world, which means that interoperability becomes paramount across systems. * Unanimous sense that everyone is working more than ever before. AI is not causing anyone to do less work right now, and similar to Silicon Valley people feel their teams are the busiest they’ve ever been. One final meta observation not called out explicitly. It seems that despite Silicon Valley’s sense that AI has made hard things easy, the most powerful ways to use agents is more “technical” than prior eras of software. Skills, MCP, CLIs, etc. may be simple concepts for tech, but in the real world these are all esoteric concepts that will require technical people to help bring to life in the enterprise. This both means diffusion will take real work and time, but also everyone’s estimation of engineering jobs is totally off. Engineers may not be “writing” software, but they will certainly be the ones to setup and operate the systems that actually automate most work in the enterprise.

J
Juri Strumpflohner @juristr ·
Let's talk about Agent Factories https://t.co/MTBm8wO03M
M
Michel Lieben @MichLieben ·
one GTM engineer can: > build the enrichment pipeline > wire the buying signals > design the targeting logic > score and route every lead automatically > run email, LinkedIn, and phone in parallel > manage deliverability across 4 ESPs > build Clay workflows from scratch > integrate AI without losing the signal > iterate on campaign architecture weekly > replace a 3-person SDR pod the complete guide to this hire:
I itsalexvacca @itsalexvacca

The GTM Engineering Hire: A Comprehensive Guide to the Role That's Replacing Your SDR Team

B
Berryxia.AI @berryxia ·
🚀 Apple M芯片本地LLM速度太顶了!DFlash推测解码,Stock MLX零fork, Qwen3.5最高4.13倍加速太猛了! 专为M系列芯片优化,速度简直了… https://t.co/Tl7068zr8u
I
Indie Fox @indie_maker_fox ·
这个技能画出来的架构图的质量是真的太高了! https://t.co/xPijG1GFTH 下面是OpenHarness的架构图,配色很舒服 https://t.co/lQ9SiNzP53
J
Jeffrey Emanuel @doodlestein ·
Well, it's probably coming too late for most people unless you're planning on filing an extension, but I created a truly ambitious skill for tax preparation on my skills site, https://t.co/Un9brY2G3l This skill spans 158 markdown files totaling 2.7 megabytes of text. It covers every state, tons of different professions, life events, and all sorts of sophisticated tax strategies, with all kinds of expertise about even niche topics like opportunity zones and captive insurance. Much of the underpinnings of it, including the nuts-and-bolts use of the Aiwyn MCP tax connector and the use of https://t.co/kxK2ZKXoly with Playwright MCP, is based on my actual multi-day session history preparing and filing my own fairly complex return, so I know it all works (I just finished filing mine a few hours ago). Here's how GPT 5.4 describes it and what makes it special: The "tax-return-preparation-and-advice-generic" skill is a source-verified, multi-year tax intelligence skill that turns AI from a glorified form-filler into a high-end tax strategist. It helps analyze returns across years, detect missed deductions and carryforwards, reconcile life events and profession-specific rules, model aggressive but defensible planning moves, and ground recommendations in current law instead of stale tax folklore. The result is a tax-prep and tax-planning system that is broader than software, more systematic than a one-off CPA review, and dramatically more useful for complex real-world filers. What makes it special: - It is multi-year by design. Most tax tools look at one return; this skill looks for patterns, carryovers, inconsistencies, and missed opportunities across years. - It is verification-first. The methodology is built around checking current IRS instructions, publications, and state guidance before making live filing claims. - It is aggressively practical. It does not stop at “here are the rules”; it pushes toward elections, timing moves, entity choices, depreciation strategies, retirement optimization, PTET, QBI, and other real savings levers. - It is unusually universal. It routes by profession, life event, situation, and jurisdiction, so it can adapt to freelancers, high earners, retirees, students, business owners, rental investors, divorce, inheritance, relocation, and more. - It is audit-aware. It emphasizes documentation, defensibility, and red-flag detection instead of encouraging sloppy “tax hacks.” - It is built for real execution. It includes filing workflows, tool guidance, and structured reference material, so an agent can move from analysis to action rather than just giving vague advice.
S
Sherwood @shcallaway ·
We replaced our existing memory system (sandbox + self-hosted git) with S3 Files and it’s very fast
A alex_holovach @alex_holovach

so we tried S3 Files and it's goated > 2+ GB/s write throughput > natively mounts to any workload > 1ms latency i really wish it was possible to mount @archildata on Fargate too

A
Aaron Levie @levie ·
The more enterprises I talk to about AI agent transformation, the more it’s clear that there is going to be a new type of role in most enterprises going forward. The job is to be the agent deployer and manager in teams. Here’s the rough JD: This person will need to figure out what are the highest leverage set of workflows on a team are (either existing or new ones) where agents can actually drive significantly more value for the team and company. In general, it’s going to be in areas where if you threw compute (in the form of agents) at a task you could either execute it 100X faster or do it 100X more times than before. Examples would be processing orders of magnitude more leads to hand them off to reps with extra customer signal, automating a contracting review and intake process, streamlining a client onboarding process to reduce as many straps as possible, setting up knowledge bases than the whole company taps into, and so on. This person’s job is to figure out what the future state workflow needs to look like to drive this new form of automation, and how to connect up the various existing or new systems in such a way that this can be fulfilled. The gnarly part of the work is mapping structured and unstructured data flows, figuring out the ideal workflow, getting the agent the context it needs to do the work properly, figuring out where the human interfaces with the agent and at what steps, manages evals and reviews after any major model or data change, and runs and manages the agents on an ongoing basis tracking KPIs, and so on. The person must be good at mapping the process and understanding where the value could be unlocked and be relatively technical, and has full autonomy to connect up business systems and drive automation. This means they’re comfortable with skills, MCP, CLIs, and so on, and the company believes it’s safe for them to do so. But also great operationally and at business. It may be an existing person repositioned, or a totally net new person in the company. There will likely need to be one or more of these people on every team, so it’s not a centralized role per se. It may rile up into IT or an AI team, or live in the function and just have checkpoints with a central function. This would also be a fantastic job for next gen hires who are leaning into AI, and are technical, to be able to go into. And for anyone concerned about engineers in the future, this will be an obvious area for these skills as well.
A
Ahmad @TheAhmadOsman ·
Using local LLMs? Make sure to setup web search for them Tell your favorite agent to setup SearNg for you Give that to your local LLMs (tell an agent to set that up as well) Watch them become way more intelligent and efficient You're welcome
L
Lotto @LottoLabs ·
For anyone asking about heterogenous setups (GPUs of different makes, models on separate machines) this is for you
L loktar00 @loktar00

llama.cpp RPC split models across machines, no fancy network needed!

O
Om Patel @om_patel5 ·
THIS TOOL SHOWS YOU EXACTLY WHERE YOUR CLAUDE CODE TOKENS ARE GOING this guy was spending $200+ a day on Claude Code with zero visibility into what was eating the tokens. so he built a terminal dashboard that reads your session transcripts and classifies every single turn into 13 categories with no LLM calls. what it shows you: > cost by task type (coding, debugging, exploration, brainstorming, etc) > cost by project, model, tool, and MCP server > daily activity chart > AND its interactive: arrow keys to switch between today, week, and month 56% of his spend was "conversation" where Claude is just responding with no tool use and the actual coding (edits and writes) was only 21%. more than half the money was going to Claude thinking out loud instead of actually writing code. one line install: npx codeburn free AND open source.
K
kaios @kaiostephens ·
5060ti? 9060xt? even the tenstorrent p100a? 3090 is a fantastic card, but stop glazing it like this. You can be more creative, there are better cards for perf/$ https://t.co/NfreFQi0Lj
M MemoryReboot_ @MemoryReboot_

Why RTX 3090 is the best GPU per GB in 2026 and how NVIDIA trapped itself with VRAM 3090 launched in 2020 for $1,500 with 24GB of GDDR6X. Today a used one goes for ~$1,000. That’s ~$42/GB of VRAM 5090 launched in 2025 at $2000 with 32GB of GDDR7. But the street price right now is ~$4,400. That’s ~$137/GB 3.3x difference!!! The rest of the 50 series lineup: RTX 5080 — 16GB RTX 5070 Ti — 16GB RTX 5070 — 12GB NVIDIA is giving you less VRAM than a 3090 had 5 years ago for more money Now they’re forced to artificially nerf VRAM just to protect their $20k+ server GPU sales (H100 etc) It’s an accidental bug in their pricing strategy that gives regular consumers a chance to dive into the local LLM rabbit hole They’re completely out of production and the used supply is shrinking by the day 3090 is going to stay relevant for a long time

J
Jordan Ross @jordan_ross_8F ·
I fully reverse-engineered Ramp's internal AI operating system. Their system — called Glass — is how they got 99% of their entire company using AI every single day. 350+ reusable workflows. Every tool connected at first login. Memory that refreshes every 24 hours. Automations running while everyone sleeps. I partnered with my engineering team and we broke down every component inside it. Then we rebuilt the whole thing for marketing agencies. 76 pages. Every system. Every layer. Every step. Steal it. Comment "OS" and I'll send it directly. Must be a following to receive auto DM
E eglyman @eglyman

99% of Ramp uses ai daily. but we noticed most people were stuck — not because the models weren't good enough, but because the setup was too painful and unintuitive for most. terminal configs, mcp servers, everyone figuring it out alone. so we built Glass. every employee gets a fully configured ai workspace on day one — integrations connected via sso, a marketplace of 350+ reusable skills built by colleagues, persistent memory, scheduled automations. when one person on a team figures out a better workflow, everyone on that team gets it and gets more productive. the companies that make every employee effective with ai will compound advantages their competitors can't match. most are waiting for vendors to solve this. we decided to own it.

M
Maximilian Alexander @signalgaining ·
Today I’m incredibly excited to announce Wendy. Wendy is an operating system and developer platform for Physical AI — built to make it dramatically easier to build and deploy on NVIDIA Jetson, Raspberry Pi, and other edge devices. We think robotics, edge AI, industrial systems, autonomous machines, and smart cameras should be far simpler to create. Less setup. Less infrastructure pain. Faster time to first demo. This is the start of something big. Get started at https://t.co/yGruInmaZ6
I
Immanuel @immanuel_vibe ·
unpopular dockerfile takes (that actually work) 1 - stop using alpine — yes, it's tiny. but musl libc ≠ glibc. your python/node app will rebuild native deps from scratch or just... silently be slower. use -slim (debian-slim) instead. same size win, zero grief. 2 - layer order is your cache strategy. COPY your lockfile first, run install, then copy source. invalidating the install layer on every code change is a skill issue ngl 3 - multi-stage builds aren't just "best practice" — they're the actual reason your prod image doesn't ship gcc and 400mb of build tools. builder stage = bloat zone. final stage = lean mean container. 4 - COPY . . is fine actually — if your .dockerignore is correct. most pain here is from forgetting to ignore node_modules/, .git, *.log. fix the ignore file, not the COPY. 5 - one process per container is a vibe, not a law. if your app needs nginx + app server and you're not at k8s scale — just use supervisord. the "one process" dogma costs more complexity than it saves sometimes. 6 - pin your base image by digest, not tag. node:20 today ≠ node:20 in 6 months. prod broke because of a tag? that's a you problem tbh. 7 - BuildKit cache mounts (--mount=type=cache) will change your life. pip/apt/cargo cache between builds without it ending up in the final layer. nobody talks about this enough fr there's no "best practice" in a vacuum. alpine is great for Go binaries. slim is great for Python. scratch is great for static bins. know your workload, then choose. btw if you want something to catch all this stuff automatically - check out dockerfile-roast — a linter written in Rust that literally roasts your Dockerfile. 63 rules, brutally honest output (but it can also provide just dry facts, no roast), runs on any OS or as a docker container https://t.co/NVYpe8iD65 #docker #devops #kubernetes #backend #linux #rust #sre #containers
M
Matt Pocock @mattpocockuk ·
I've found myself writing: "I don't know this area of code well. Go up a layer of abstraction. Give me a map of all the relevant modules and callers." Might need a new skill here. What should I name it?
H
Haz @diegohaz ·
These .claude/settings.json options fixed most of the issues I was having with Claude: { "model": "claude-opus-4-6", "effortLevel": "high", "alwaysThinkingEnabled": true, "env": { "CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING": "1", "MAX_THINKING_TOKENS": "31999" }, }
A
Avi Chawla @_avichawla ·
Pentesting firms don't want you to see this. An open-source AI agent just replicated their $50k service. A typical pentest today costs $30k-$50k per engagement, takes 4-6 weeks of scoping, and produces a PDF that's outdated the moment a PR merges. That worked when teams shipped slowly. But it falls apart when you're deploying multiple times a day and changing attack surface on every push, especially when attackers are also using AI to uncover flaws faster than any scheduled audit can keep up with. CI pipelines catch build failures, test regressions, and lint violations. But they don't tell what an attacker can actually do with the code you're about to ship. Meanwhile, AI agents are quietly closing that gap by doing what used to require a human pentester: - Probing attack surface and chaining small vulns - Validating findings with actual proof-of-exploit - Generating merge-ready remediation code - And running continuously If you want to see it in practice, this approach is actually implemented in Strix, an open-source framework (24k+ stars) for continuous AI pentesting. It deploys a graph of AI agents that act like real attackers against your stack. Each agent specializes in different attack types, running in parallel and sharing discoveries so if one finds an auth bypass, another immediately tests whether it chains into privilege escalation. The agents operate with a full HTTP proxy, browser automation for auth flows, terminal access, and a Python runtime for custom exploit development. Strix is designed to run at the pace devs actually ship, like before releases, after major changes, and continuously as the app evolves. You can point it at source code, a live app, or both simultaneously for grey-box testing. The code hitting production might be AI-generated. The breach will still be real. Just run `strix --target ./your-codebase` to start. I've shared the GitHub repo in the replies.