<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[The Pragmatic CTO]]></title><description><![CDATA[Hard-won lessons on scaling teams and technology from a CTO who's made the mistakes so you don't have to.]]></description><link>https://www.thepragmaticcto.com</link><image><url>https://substackcdn.com/image/fetch/$s_!uX8m!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb359157b-2590-4841-a110-c8319040470b_500x500.png</url><title>The Pragmatic CTO</title><link>https://www.thepragmaticcto.com</link></image><generator>Substack</generator><lastBuildDate>Thu, 16 Apr 2026 14:47:42 GMT</lastBuildDate><atom:link href="https://www.thepragmaticcto.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Allan MacGregor 🇨🇦]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[thepragmaticcto@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[thepragmaticcto@substack.com]]></itunes:email><itunes:name><![CDATA[Allan MacGregor 🇨🇦]]></itunes:name></itunes:owner><itunes:author><![CDATA[Allan MacGregor 🇨🇦]]></itunes:author><googleplay:owner><![CDATA[thepragmaticcto@substack.com]]></googleplay:owner><googleplay:email><![CDATA[thepragmaticcto@substack.com]]></googleplay:email><googleplay:author><![CDATA[Allan MacGregor 🇨🇦]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[AI Wrote the Code. Who Gets the Tax Credit?]]></title><description><![CDATA[Your AI Strategy Is Shrinking Your Tax Credit (Maybe)]]></description><link>https://www.thepragmaticcto.com/p/ai-wrote-the-code-who-gets-the-tax</link><guid isPermaLink="false">https://www.thepragmaticcto.com/p/ai-wrote-the-code-who-gets-the-tax</guid><dc:creator><![CDATA[Allan MacGregor 🇨🇦]]></dc:creator><pubDate>Mon, 09 Mar 2026 12:31:41 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/1d999fa2-25b8-4472-9b65-ea766ea27d5c_1536x1024.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>Your AI Strategy Is Shrinking Your Tax Credit (Maybe)</h2><p>Two SR&amp;ED consultants look at the same developer, using the same AI tool, writing the same code. One says it qualifies for R&amp;D tax credits. The other says it doesn't.</p><p><a href="https://leyton.com/ca/en/insights/articles/sred-ai-eligibility-where-cra-draws-the-line/">Leyton</a>, one of Canada's largest SR&amp;ED consulting firms, published a clear position: "The CRA will not recognize the act of calling an API or engineering prompts as SR&amp;ED-eligible work, as that activity is considered routine implementation." <strong>Prompting an AI is like calling an API; the uncertainty was resolved by Anthropic or OpenAI, not by the developer.</strong></p><p><a href="https://growwise.ai/sred/claiming-sred-for-software-companies-in-2025/">GrowWise Partners</a>, another major SR&amp;ED consultancy, says the opposite: "AI does not disqualify work, but claimants must show that human-led experimentation is still present." Same developer. Same tool. Same output. Different framing, different documentation, different outcome.</p><p>The gap between "we used AI as a research tool" and "AI did our work for us" is less technical and more about the documentation that supports the claim. And that distinction is worth up to <a href="https://www.pwc.com/ca/en/services/tax/publications/tax-insights/sred-changes-2025.html">$2.1 million in refundable tax credits</a> in Canada; six figures or more <a href="https://www.mossadams.com/articles/2023/05/r-d-tax-credit-for-ai-developers">in the US</a>. <strong>This is not an accounting footnote.</strong></p><p>Right now, Agentic coding and AI-assisted development are in a regulatory vacuum. No government has issued guidance; no court has ruled. The CRA's five-question eligibility test was written for human researchers; the IRS four-part test never contemplated AI as the primary code author. The entire field is governed by consultant interpretation of statutes that predate GitHub Copilot by decades.</p><p>If your company claims SR&amp;ED credits in Canada or R&amp;D tax credits in the US -- and if your engineering team uses AI coding tools -- your tax position depends on how well you can document your work. The "(Maybe)" in the subtitle is genuine; this might work out fine. But the regulatory vacuum means nobody can tell you that with certainty, and is the companies that are left holding the bag.</p><h2>No one is sure</h2><p>R&amp;D tax credits live in the CFO's domain -- or they used to. The engineering org's practices now determine whether the credit survives an audit, and the gap between a defensible claim and a rejected one can be six figures. Tax mechanics are not why you took the CTO job, but <strong>this is your problem whether you want it or not.</strong></p><p>Canada's 2025 federal budget <a href="https://www.pwc.com/ca/en/services/tax/publications/tax-insights/sred-changes-2025.html">doubled the SR&amp;ED expenditure limit</a> from $3 million to $6 million, <a href="https://gowlingwlg.com/en-ca/insights-resources/articles/2025/sr-and-ed-tax-incentive">expanded eligible entities</a> beyond CCPCs, and <a href="https://www.mnp.ca/en/insights/directory/significant-enhancement-announced-sr-ed-program">restored capital expenditure eligibility</a> for the first time since 2014. The maximum refundable investment tax credit jumped to $2.1 million. The program has never been more generous.</p><p>Starting April 2026, the CRA will launch an <a href="https://www.canada.ca/en/revenue-agency/services/scientific-research-experimental-development-tax-incentive-program/sred-updates.html">AI-enhanced review process</a> to streamline claim reviews, alongside a new elective pre-claim approval process. And while the CRA is using AI to review claims, <strong>nobody at the CRA has published a single word of guidance on how AI coding tools affect eligibility.</strong></p><p>South of the border, the story is pretty much the same. The <a href="https://www.grantthornton.com/insights/alerts/tax/2025/insights/full-expensing-of-domestic-research">One Big Beautiful Bill Act</a> restored immediate R&amp;D expensing in July 2025, reversing the TCJA's punishing five-year capitalization regime that had forced companies to amortize domestic research costs over five years. The <a href="https://www.cbh.com/insights/articles/irs-updates-form-6765-for-2025-rd-tax-credits/">new Form 6765 Section G</a> becomes mandatory for 2026 filings -- <a href="https://www.bdo.com/insights/tax/the-section-41-r-d-tax-credit-reporting-requirements-preparing-for-new-form-6765">business-component-level disclosure</a> for anyone claiming more than $1.5 million in qualified research expenses, reporting on up to fifty business components. More granular disclosure than the IRS has ever required. Meanwhile, the IRS is deploying its own AI tools: a <a href="https://www.thetaxadviser.com/issues/2025/oct/rd-tax-credits-a-new-era-of-disclosure-and-documentation/">Line Anomaly Recommender</a> for audit selection and Agentforce across the Office of Chief Counsel.</p><p>Robert Kovacev, a tax litigator who published an <a href="https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5255850">analysis on SSRN</a>, observed that "nothing in the statute or regulations states that activities must be performed by humans." He's right -- the statute is silent. But silence cuts both ways; it means the answer depends entirely on how you frame and document the work.</p><p>Ottawa doubled the SR&amp;ED expenditure limit in the same budget year that ISED launched new AI adoption grants. Washington restored R&amp;D expensing three months after the White House issued executive orders accelerating AI deployment. Nobody in either capital connected the dots; companies are filing R&amp;D claims based on whatever their consultant tells them, and the consultants -- as we've seen -- don't agree; which makes things more a mess, and as always is our job as CTOs to figure out how to navigate this.</p><p>Who eliminated the uncertainty -- the developer or the AI? That's the question neither statute answers. The CRA's <a href="https://www.canada.ca/en/revenue-agency/services/scientific-research-experimental-development-tax-incentive-program/sred-policies-guidelines/guidelines-eligibility-work-sred-tax-incentives.html">five-question test</a> requires "systematic investigation" by means of "experiment or analysis"; the <a href="https://www.irs.gov/businesses/audit-techniques-guide-credit-for-increasing-research-activities-ie-research-tax-credit-irc-section-41-table-of-contents">IRS four-part test</a> requires a "process of experimentation" to "eliminate uncertainty." Both assume human researchers. Neither says what happens when AI does the generating and the human does the evaluating.</p><h2>The Productivity Paradox</h2><p>R&amp;D tax credits are calculated primarily on employee wages allocated to qualifying research. If AI reduces the time developers spend on qualifying activities, the wage base shrinks. The credit shrinks with it; this is mechanical, not interpretive. It follows directly from how the math works.</p><p>The implications are counterintuitive. Your AI strategy might be making your engineers more productive while simultaneously making your company's tax position worse.</p><p>Walk through the Canadian numbers. A developer earning $150,000 per year who previously allocated 50% of their time to SR&amp;ED work generated $75,000 in eligible salary, plus $41,250 in proxy overhead -- <a href="https://www.canada.ca/en/revenue-agency/services/scientific-research-experimental-development-tax-incentive-program/sred-claim/allowable-expenditures.html">$116,250 in qualified expenditure</a>. At the <a href="https://boast.ai/en-us/resources/guides/the-complete-guide-to-sred-tax-credits-2026">enhanced 35% rate</a>, that produced roughly $40,700 in investment tax credits per developer. Compress that developer's SR&amp;ED-eligible time to 20% with AI tools -- nothing else changes, same salary, same project, same research outcomes -- and the ITC drops to approximately $16,300. Sixty percent gone.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!EX7A!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38007369-2045-4b8e-ac4a-2697f4001c7e_900x436.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!EX7A!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38007369-2045-4b8e-ac4a-2697f4001c7e_900x436.jpeg 424w, https://substackcdn.com/image/fetch/$s_!EX7A!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38007369-2045-4b8e-ac4a-2697f4001c7e_900x436.jpeg 848w, https://substackcdn.com/image/fetch/$s_!EX7A!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38007369-2045-4b8e-ac4a-2697f4001c7e_900x436.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!EX7A!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38007369-2045-4b8e-ac4a-2697f4001c7e_900x436.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!EX7A!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38007369-2045-4b8e-ac4a-2697f4001c7e_900x436.jpeg" width="728" height="352.67555555555555" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/38007369-2045-4b8e-ac4a-2697f4001c7e_900x436.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:436,&quot;width&quot;:900,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:35791,&quot;alt&quot;:&quot;The Productivity Paradox&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="The Productivity Paradox" title="The Productivity Paradox" srcset="https://substackcdn.com/image/fetch/$s_!EX7A!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38007369-2045-4b8e-ac4a-2697f4001c7e_900x436.jpeg 424w, https://substackcdn.com/image/fetch/$s_!EX7A!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38007369-2045-4b8e-ac4a-2697f4001c7e_900x436.jpeg 848w, https://substackcdn.com/image/fetch/$s_!EX7A!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38007369-2045-4b8e-ac4a-2697f4001c7e_900x436.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!EX7A!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38007369-2045-4b8e-ac4a-2697f4001c7e_900x436.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The US math is different in structure but identical in direction. A ten-person team spending 60% of time on qualifying research might see that drop to 30% after AI adoption; the wage-based credit cuts roughly in half.</p><p>But the US has an additional <a href="https://natlawreview.com/article/research-tax-credit-and-substantially-all-test">cliff</a>. Under <a href="https://www.jmco.com/articles/research-and-development-tax-credits/understanding-the-substantially-all-rule/">Treasury Reg. 1.41-2</a>, if 80% or more of an employee's services constitute qualified research, 100% of their wages count as qualified research expenses. Drop below 80%, and only the actual proportion counts. Pre-AI, a developer at 85% qualifying research had 100% of wages in the QRE pool. Post-AI, that same developer at 60% qualifying research has only 60% of wages in the pool.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!NTwj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7b328d1-363b-4d53-b5d3-e166ad294e9d_1200x606.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!NTwj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7b328d1-363b-4d53-b5d3-e166ad294e9d_1200x606.jpeg 424w, https://substackcdn.com/image/fetch/$s_!NTwj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7b328d1-363b-4d53-b5d3-e166ad294e9d_1200x606.jpeg 848w, https://substackcdn.com/image/fetch/$s_!NTwj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7b328d1-363b-4d53-b5d3-e166ad294e9d_1200x606.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!NTwj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7b328d1-363b-4d53-b5d3-e166ad294e9d_1200x606.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!NTwj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7b328d1-363b-4d53-b5d3-e166ad294e9d_1200x606.jpeg" width="728" height="367.64" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d7b328d1-363b-4d53-b5d3-e166ad294e9d_1200x606.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:606,&quot;width&quot;:1200,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:57441,&quot;alt&quot;:&quot;The 'Substantially All' Cliff&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="The 'Substantially All' Cliff" title="The 'Substantially All' Cliff" srcset="https://substackcdn.com/image/fetch/$s_!NTwj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7b328d1-363b-4d53-b5d3-e166ad294e9d_1200x606.jpeg 424w, https://substackcdn.com/image/fetch/$s_!NTwj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7b328d1-363b-4d53-b5d3-e166ad294e9d_1200x606.jpeg 848w, https://substackcdn.com/image/fetch/$s_!NTwj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7b328d1-363b-4d53-b5d3-e166ad294e9d_1200x606.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!NTwj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7b328d1-363b-4d53-b5d3-e166ad294e9d_1200x606.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The 7th Circuit made this concrete in <a href="https://www.plantemoran.com/explore-our-thinking/insight/2023/04/rd-tax-credit-little-sandy-coal-opinion-clarifies-substantially-all-test">*Little Sandy Coal* (2023)</a>: the taxpayer must demonstrate a "principled way to determine what portion of employee activities constituted elements of a process of experimentation." If you can't show that principled allocation, you lose.</p><p>The paradox is this: AI may simultaneously expand the universe of qualifying activities -- more experimentation, more alternatives evaluated, more systematic investigation -- while compressing the economic value of the credit through fewer developer-hours and lower wage QREs. <strong>Companies might qualify for credits more easily while claiming smaller dollar amounts.</strong></p><p>Scale this across a team. If three AI-augmented developers replace the output of ten, the wage base drops 70% -- even if every minute of their remaining time qualifies. The productivity gain that your CEO celebrates is the same productivity gain that mechanically erodes your R&amp;D credit. The only defense is reframing what counts as qualifying activity; that reframing lives or dies in documentation.</p><h2>The Documentation Is the R&amp;D</h2><p>Before AI, tickets, PRs, and comments documented the work, any additional documentation was a bonus; meaning we could get away with using the output as evidence of the work. With AI in the mix, we need to document the work in a way that supports the claim and at higher levels of detail.</p><p>When AI generates most of the code, the code is no longer evidence of human-led investigation. Traditional signals -- commit history, code comments, design docs -- may be thinner or absent entirely. A developer who generates fifty lines of code in a single AI prompt produces a different artifact than a developer who wrote those fifty lines over three days of iterative experimentation. The output might be identical but how we got there is not.</p><p>Documentation must now prove something the code used to prove implicitly: that a human drove the investigation and iteration process. That a human identified the uncertainty, designed the experiment, evaluated the results, and advanced knowledge. With agentic coding, documentation is no longer a record of the R&amp;D. It is the R&amp;D -- it's the only surviving evidence that qualifying work occurred.</p><p>Five elements make this concrete.</p><ul><li><p><strong>The uncertainty.</strong> What didn't you know? What couldn't be achieved through standard practice? Document this before prompting AI -- not after. The uncertainty must exist in the developer's understanding, not in the model's training data.</p></li><li><p><strong>The hypothesis.</strong> Record which approach the developer chose to test and why they picked it over alternatives. The reasoning belongs to the human, not the model. If nobody can articulate why this approach rather than another, there's no hypothesis -- there's a guess.</p></li><li><p><strong>The experiment.</strong> Save the prompts, the iterations, the evaluation criteria. Where AI interaction logs show a cycle -- hypothesis, generation, evaluation, iteration -- <a href="https://growwise.ai/sred/claiming-sred-for-software-companies-in-2025/">those logs are evidence</a>. This is the one area where agentic coding actually helps your claim; the tool produces a richer paper trail than manual development ever did.</p></li><li><p><strong>The evaluation.</strong> A developer tries three approaches and two of them fail. Those failures are strong evidence; <a href="https://www.platformcalgary.com/blog/maximizing-sr-ed-for-ai-innovation-a-calgary-tech-leaders-guide-to-claiming-what-youve-earned">Platform Calgary</a> notes that failed experiments in AI development often represent the strongest SR&amp;ED evidence. Document what was rejected and why it didn't hold up.</p></li><li><p><strong>The advancement.</strong> If the only thing your team gained was working code, that's a product -- not a research outcome. The advancement is the new knowledge: what works under these conditions, what doesn't, and why. That knowledge belongs to the organization, not the model.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0RYp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60058153-4d28-4ba4-b3b5-59d629824a5a_800x515.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0RYp!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60058153-4d28-4ba4-b3b5-59d629824a5a_800x515.png 424w, https://substackcdn.com/image/fetch/$s_!0RYp!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60058153-4d28-4ba4-b3b5-59d629824a5a_800x515.png 848w, https://substackcdn.com/image/fetch/$s_!0RYp!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60058153-4d28-4ba4-b3b5-59d629824a5a_800x515.png 1272w, https://substackcdn.com/image/fetch/$s_!0RYp!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60058153-4d28-4ba4-b3b5-59d629824a5a_800x515.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0RYp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60058153-4d28-4ba4-b3b5-59d629824a5a_800x515.png" width="800" height="515" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/60058153-4d28-4ba4-b3b5-59d629824a5a_800x515.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:515,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:42894,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.thepragmaticcto.com/i/189995399?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60058153-4d28-4ba4-b3b5-59d629824a5a_800x515.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!0RYp!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60058153-4d28-4ba4-b3b5-59d629824a5a_800x515.png 424w, https://substackcdn.com/image/fetch/$s_!0RYp!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60058153-4d28-4ba4-b3b5-59d629824a5a_800x515.png 848w, https://substackcdn.com/image/fetch/$s_!0RYp!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60058153-4d28-4ba4-b3b5-59d629824a5a_800x515.png 1272w, https://substackcdn.com/image/fetch/$s_!0RYp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60058153-4d28-4ba4-b3b5-59d629824a5a_800x515.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>In practice, this means your developers need to write something down before they start prompting. Not necessarily a formal document but a Jira ticket, a Slack message to themselves, a comment in the PR. What's the uncertainty? What are they about to try? After the AI generates output, they need to record what they rejected and why. <a href="https://growwise.ai/sred/claiming-sred-for-software-companies-in-2025/">GrowWise recommends</a> preparing "a summary of AI usage explaining how it enhanced, but did not replace, systematic investigation." That summary is what ties your engineering workflow to your tax credit; and it should take five minutes to write if you do it in the moment.</p><p>If we are smart we can tweak our developer process to create enough evidence and documentation to support the claim. For example:</p><ul><li><p>Make sure our git history shows iteration.</p></li><li><p>PR descriptions capture what was tried and rejected.</p></li><li><p>Jira or Linear tickets can document the uncertainty if your developers write them that way.</p></li><li><p>More formal documents like architecture decision records, AI interaction logs, and developer journals can capture the experiment and evaluation.</p></li></ul><p>Heck even a developer's Slack thread where they talk through an approach -- all of it counts. You have the tooling; what you probably don't have is anyone treating this as an engineering practice instead of a tax compliance exercise.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.thepragmaticcto.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">The Pragmatic CTO is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>The Dominant Actor</h2><p>For all their disagreements on specifics, <a href="https://mcguiresponsel.com/blog/using-commercial-ai-and-the-rd-tax-credit/">McGuire Sponsel</a>, <a href="https://www.kbkg.com/research-and-development/qualifying-ai-for-the-rd-tax-credit-kbkg">KBKG</a>, <a href="https://warrenaverett.com/insights/r-d-tax-credit-artificial-intelligence/">Warren Averett</a>, and <a href="https://news.bloombergtax.com/tax-management-memo/ai-reshapes-software-r-d-tax-credits-eligibility-landscape-1">Bloomberg Tax</a> converge on one thing: <strong>the developer has to be the dominant actor.</strong> The person who drove the investigation, not someone who showed up for the review. Where exactly that line sits depends on who you ask.</p><p>Two scenarios &#8212; identical in every way except how the work was framed.</p><ul><li><p><strong>Eligible:</strong> A developer hits a problem they can't solve through standard practice -- say, a concurrency issue under specific load conditions that no existing pattern handles cleanly. They hypothesize a few approaches, use AI to generate implementations faster than they could write them, then test each one against their criteria. Three approaches fail. They document why, adjust, and eventually land on something that holds. The investigation was theirs; AI just wrote the code.</p></li><li><p><strong>Not eligible:</strong> A developer opens Claude Code, types "build a feature that handles multi-currency refunds," and gets back something that works. They tweak the formatting, maybe rename a variable, and push it to staging. Done in twenty minutes. The problem is that nobody documented an uncertainty before that prompt -- because there wasn't one, or at least none that the developer articulated. No hypothesis, no evaluation criteria, no record of what got rejected. <a href="https://leyton.com/ca/en/insights/articles/sred-ai-eligibility-where-cra-draws-the-line/">Leyton's analysis</a> says the CRA will treat that as routine implementation, <strong>and they're probably right.</strong></p></li></ul><p>So what separates those two scenarios? Not the code -- the code might be identical. What separates them is whether anyone wrote down why they were building it that way.</p><p><a href="https://news.bloombergtax.com/tax-management-memo/ai-reshapes-software-r-d-tax-credits-eligibility-landscape-1">Bloomberg Tax</a> offers a useful reframing: "the bug is the proof that the initial hypothesis was false, and the debugging and testing process then becomes the new, qualified experiment." AI-assisted development may involve more process of experimentation, not less -- more alternatives generated, more systematic evaluation, more documented iteration. The key is making that visible. If the experimentation happened but nobody recorded it.</p><p>To survive an audit, your documentation needs to tell that story. Problem identification, experiment design, evaluation, iteration -- those belong to the developer. The AI generated code faster; it didn't investigate anything. That distinction holds whether you're answering the CRA's five questions or the IRS four-part test.</p><p>If your documentation tells that story -- and tells it contemporaneously, not retroactively -- the credit is defensible. If your documentation is thin, the same work becomes indistinguishable from routine implementation.</p><p>Now final disclaimer: I'm not a tax attorney; just a CTO that has done his fair share of R&amp;D tax credit claims and has seen the pitfalls. Talk to your SR&amp;ED consultant or R&amp;D credit advisor -- but talk to them with the right questions, which hopefully this article provides.</p><div><hr></div><h2>Key Takeaways</h2><ul><li><p>When was the last time you talked to your SR&amp;ED consultant or R&amp;D credit advisor about how your team uses AI tools? If the answer is "never," that conversation is overdue.</p></li><li><p>Can your engineering team demonstrate -- with documentation created at the time of the work, not assembled retroactively -- that developers drove the systematic investigation on your last R&amp;D project? Not that they were present. That they drove it.</p></li><li><p><strong>Do you know your developers' current time allocation to qualifying activities? Has it changed since AI tool adoption? If you're in the US, are you sure you're still above the 80% threshold?</strong></p></li><li><p>Think about your last sprint. A developer prompted Claude Code, iterated until it worked, and shipped. Six months later, an auditor asks them to prove that was systematic investigation. They won't remember the prompts. They won't remember what they rejected. The documentation either captured it in real time or it didn't -- and "it worked" is not evidence of experimentation.</p></li></ul><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.thepragmaticcto.com/p/ai-wrote-the-code-who-gets-the-tax?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.thepragmaticcto.com/p/ai-wrote-the-code-who-gets-the-tax?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[Audio: AI Wrote the Code. Who Gets the Tax Credit?]]></title><description><![CDATA[AI is transforming software development, but it&#8217;s also reshaping how we qualify for R&D tax credits.]]></description><link>https://www.thepragmaticcto.com/p/audio-ai-wrote-the-code-who-gets</link><guid isPermaLink="false">https://www.thepragmaticcto.com/p/audio-ai-wrote-the-code-who-gets</guid><dc:creator><![CDATA[Allan MacGregor 🇨🇦]]></dc:creator><pubDate>Mon, 09 Mar 2026 11:55:37 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/189997040/35bcfcd58f1769901ae9a443a95e4830.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p>AI is transforming software development, but it&#8217;s also reshaping how we qualify for R&amp;D tax credits. The big question: if AI wrote the code, who gets the credit?</p><p>Two top SR&amp;ED consultants look at the same developer using AI tools and come to opposite conclusions. One says calling an AI API or prompting it is routine implementation, not eligible for tax credits, because the uncertainty was resolved by the AI&#8217;s training, not the developer. The other says AI doesn&#8217;t disqualify the work as long as the developer leads the experimentation. Same developer, same tool, same code&#8212;different framing, different documentation, different outcomes. The difference isn&#8217;t technical; it&#8217;s about how you document the human&#8217;s role in the process. And that distinction can be worth millions in refundable tax credits. No government has issued clear guidance, and the existing tests were written with human researchers in mind, not AI collaborators. If your engineering team uses AI tools and you claim R&amp;D credits, your tax position hinges on your documentation. It might work out, or it might not.</p><p>Here&#8217;s the catch: R&amp;D tax credits used to be a finance problem, but now they&#8217;re an engineering problem. Canada doubled its SR&amp;ED expenditure limit to six million dollars and launched AI-powered claim reviews, yet still hasn&#8217;t clarified how AI affects eligibility. The US recently restored immediate expensing for R&amp;D and requires more granular disclosure, while deploying AI tools to select audits. Neither the Canadian nor US statutes specify that research must be done by humans, but that silence leaves everything up to interpretation and documentation. Governments are accelerating AI adoption but ignoring the tax credit implications, leaving companies and CTOs to navigate a regulatory vacuum.</p><p>And it gets worse. AI boosts developer productivity by automating parts of the work, but that reduces the hours spent on qualifying R&amp;D activities. Since tax credits are tied to wages allocated to research, AI use mechanically shrinks your credit. For example, a Canadian developer who spent half their time on eligible R&amp;D might have generated over forty thousand dollars in tax credits. If AI cuts that qualifying time to 20%, the credit drops by 60 percent. In the US, it&#8217;s even trickier due to the &#8220;substantially all&#8221; rule: if a developer spends less than 80% of their time on qualified research, only a proportional share of their wages count, not 100%. AI can push developers below that threshold, slashing credits. So the same AI that improves your team&#8217;s output can erode your tax benefits. The only way out is reframing what counts as qualifying activity&#8212;and that lives or dies in your documentation.</p><p>Which brings me to the real point: with AI writing more code, traditional evidence like commit histories and code comments no longer prove human-led R&amp;D. The code might be identical, but the process behind it is different. Documentation isn&#8217;t just a record anymore; it is the R&amp;D. You have to prove a human drove the investigation: defined the uncertainty before prompting AI, formed a hypothesis, ran experiments by iterating with AI, evaluated results including failures, and advanced knowledge&#8212;not just delivered working code. This means developers need to write down what they don&#8217;t know, why they chose a particular approach, what prompts and iterations they tried, what didn&#8217;t work, and what was learned. Even informal notes in Jira tickets, Slack threads, or PR descriptions can be crucial. AI logs can help, too, since they show the cycle of hypothesis and evaluation. Without this, the tax authorities will see AI-assisted coding as routine implementation.</p><p>The key principle all major tax consultancies agree on is that the developer must be the dominant actor. If a developer uses AI as a tool to test hypotheses and systematically investigate a problem&#8212;documenting that process&#8212;they can claim credits. But if they just ask AI to build a feature, tweak the output, and ship without documenting uncertainty or experimentation, that&#8217;s routine work, and no credit. The code alone won&#8217;t save you. Auditors want to see that the human drove the research, not just the AI.</p><p>I&#8217;m not a tax attorney, but I&#8217;ve worked on many R&amp;D claims and seen how easily companies trip up. You need to have this conversation with your SR&amp;ED or R&amp;D credit advisor. Ask how AI use affects your eligibility, how to track developer time on qualifying work, and how to document that investigation effectively.</p><p>Here&#8217;s what you should do now: talk to your tax consultant about AI in your engineering process if you haven&#8217;t already. Make sure your developers document their systematic investigation as it happens, not after the fact. Know how much of their time qualifies for R&amp;D credits and whether AI has shifted that balance. Think about the last sprint: if someone prompted an AI tool, iterated until it worked, and shipped, can they prove that was research or just routine implementation? If not, your credit is at risk.</p><p>You can read the full article&#8212;with all the data and sources&#8212;on ThePragmaticCTO Substack.</p><div><hr></div><p>Read the full article &#8212; with all the data and sources &#8212; <a href="https://www.thepragmaticcto.com/publish/post/189995399">on ThePragmaticCTO</a>.</p>]]></content:encoded></item><item><title><![CDATA[What to Measure When the CEO Asks for Engineering Metrics]]></title><description><![CDATA[How to make sure you are measuring the right things]]></description><link>https://www.thepragmaticcto.com/p/what-to-measure-when-the-ceo-asks</link><guid isPermaLink="false">https://www.thepragmaticcto.com/p/what-to-measure-when-the-ceo-asks</guid><dc:creator><![CDATA[Allan MacGregor 🇨🇦]]></dc:creator><pubDate>Mon, 02 Mar 2026 15:45:20 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/17988f15-a35a-4d03-a4fa-a0520478d967_1536x1024.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Every engineering leader gets this email. <em>"The board wants engineering metrics for next quarter's deck; can you put something together?"</em> Twelve words that launch a thousand bad dashboards.</p><p>We have all been there. The instinct is to grab whatever is closest---DORA metrics, sprint velocity, maybe a cycle time chart, or even worse <a href="https://www.thepragmaticcto.com/p/lines-of-code-are-back-and-its-worse">LoC (lines of code)</a>---and arrange them on a slide that looks like you've been tracking this all along. The board nods. The CEO nods. Then someone asks a follow-up question, and you spend the next six months defending numbers you picked in an afternoon.</p><p>The problem isn't that you chose the wrong metrics(unless you picked LoC). The problem is that <strong>"give me some metrics"</strong> is the wrong ask, the wrong conversation and will eventually lead to disaster. If we are to take the question "How do you measure engineering performance?" seriously, then we need to understand that it's four different questions masking as one, and most CTOs answer whichever one they find easiest rather than the one being asked.</p><p>Will Larson put it plainly in <a href="https://lethain.com/measuring-engineering-organizations/">*The Engineering Executive's Primer*</a>: "There is no one solution to engineering measurement, rather there are many modes of engineering measurement, each of which is appropriate for a given scenario" Four modes. Four questions. Getting the right answer starts with figuring out which one you're being asked.</p><h2>One Question, Four Answers</h2><p>Larson's framework splits engineering measurement into four categories, each answering a different question. They aren't interchangeable. Picking the wrong one for your audience is worse than picking no metrics at all.</p><ul><li><p><strong>Measure to Plan.</strong> Are we working on the right things? Track shipped projects by team and their impact on the business. This is the language boards speak natively---investment and return, allocation and outcome. If you can show that 60% of engineering effort went to features that moved revenue and 15% went to infrastructure that prevented last quarter's outage from recurring, you've answered the planning question. Most boards don't need more than this.</p></li><li><p><strong>Measure to Operate.</strong> Is the system healthy right now? Incidents, downtime, latency, engineering costs normalized against business metrics. Operations metrics answer a question that sounds mundane but matters more than anything on your roadmap: should you be following your plan or swarming to fix a critical problem? A CEO who sees three major incidents in a quarter understands why the feature roadmap slipped; a CEO who sees a missed roadmap with no context assumes engineering is slow.</p></li><li><p><strong>Measure to Optimize.</strong> Are we getting faster or slower? This is DORA's domain---deployment frequency, lead time, change failure rate, recovery time---and <a href="https://queue.acm.org/detail.cfm?id=3454124">SPACE</a> territory too (satisfaction, performance, activity, communication, efficiency). Both are diagnostic: useful for your engineering leads diagnosing bottlenecks, useless in front of your board. The problem is translation. A non-engineer who hears "our deployment frequency increased 40%" assumes that means more value; bridging that gap requires technical context a quarterly meeting doesn't provide.</p></li><li><p><strong>Measure to Inspire.</strong> What's the story of engineering's impact? Most CTOs skip this category---which in my opinion is a mistake, because inspiration metrics are the narratives that change how the organization thinks about engineering. Not dashboards. Stories: the migration that cut infrastructure costs 40%, the platform rebuild that compressed a six-week feature into two days, the reliability work that turned a churning enterprise customer into a case study. When the board hears those, engineering stops looking like a cost center and starts looking like the reason the company can do things competitors can't.</p></li></ul><p>Now, if you are paying attention so far, you might noticed that I mentioned to mention two things, metrics are not interchangeable and each category has an audience.</p><p>Most CTOs set themselves up for failure by selecting the wrong category for the wrong audience. What your board wants, vs what your engineering leads need, vs what your company needs are all different. Inspiration metrics are the hardest to build and the easiest to skip; they're also what gets you headcount next year.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!G-B1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75b11639-e6c2-4ec3-85f9-9b043bbdee27_1200x346.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!G-B1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75b11639-e6c2-4ec3-85f9-9b043bbdee27_1200x346.jpeg 424w, https://substackcdn.com/image/fetch/$s_!G-B1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75b11639-e6c2-4ec3-85f9-9b043bbdee27_1200x346.jpeg 848w, https://substackcdn.com/image/fetch/$s_!G-B1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75b11639-e6c2-4ec3-85f9-9b043bbdee27_1200x346.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!G-B1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75b11639-e6c2-4ec3-85f9-9b043bbdee27_1200x346.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!G-B1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75b11639-e6c2-4ec3-85f9-9b043bbdee27_1200x346.jpeg" width="728" height="209.90666666666667" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/75b11639-e6c2-4ec3-85f9-9b043bbdee27_1200x346.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:346,&quot;width&quot;:1200,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:39585,&quot;alt&quot;:&quot;Four Modes of Engineering Measurement&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Four Modes of Engineering Measurement" title="Four Modes of Engineering Measurement" srcset="https://substackcdn.com/image/fetch/$s_!G-B1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75b11639-e6c2-4ec3-85f9-9b043bbdee27_1200x346.jpeg 424w, https://substackcdn.com/image/fetch/$s_!G-B1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75b11639-e6c2-4ec3-85f9-9b043bbdee27_1200x346.jpeg 848w, https://substackcdn.com/image/fetch/$s_!G-B1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75b11639-e6c2-4ec3-85f9-9b043bbdee27_1200x346.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!G-B1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75b11639-e6c2-4ec3-85f9-9b043bbdee27_1200x346.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><h2>Five Ways to Destroy Trust With Your Dashboard</h2><p>Knowing what to measure is half the problem. The other half is knowing how measurement goes wrong---and it believe me it will go wrong in pretty predictable ways. Here are some:</p><p><strong>1. Goodhart's Law, now with infinite leverage.</strong> "When a measure becomes a target, it ceases to be a good measure." Charles Goodhart wrote that in 1975; the software industry has spent fifty years proving him right. Story point inflation. Deployment frequency gaming. Bug counts manipulated by closing duplicates. Every metric that touches a performance review gets optimized for the metric, not the outcome (developers are smart and they will find a way to game the system to their advantage).</p><p>AI has the potential to make this worse. When generating code costs nothing, <a href="https://lethain.com/measuring-engineering-organizations/">code volume metrics become meaningless</a> and a developer can mass-produce pull requests, PR count stops correlating with value delivered. Goodhart's Law had a natural ceiling when humans were the bottleneck; remove the bottleneck and the gaming potential is unlimited.</p><p><strong>2.Measuring individuals instead of teams.</strong> Dan North put it precisely: <a href="https://dannorth.net/blog/mckinsey-review/">"Attempting to measure the individual contribution of a person is like trying to measure the individual contribution of a piston in an engine---the question itself makes no sense."</a> Software is a team activity. The developer who mentors three juniors ships less code and creates more value than the one who heads-down grinds features. Individual metrics can't capture that; they punish it (and if you are a CTO, you are the one who is responsible for the team's success).</p><p>McKinsey learned this the hard way. They tried to measure <a href="https://www.mckinsey.com/industries/technology-media-and-telecommunications/our-insights/yes-you-can-measure-software-developer-productivity">"individual developer productivity"</a> in 2023 and the response was brutal---Kent Beck, Gergely Orosz, and Dan North all piled on. Beck's line was the sharpest: measuring developer productivity by coding time is <a href="https://tidyfirst.substack.com/p/measuring-developer-productivity-440">"like measuring surgeon productivity on what percentage of their time they were cutting with a scalpel---and ignoring whether the patient got better."</a> It became the most popular Pragmatic Engineer article of 2024. Individual contribution analysis doesn't just fail as a metric; it poisons the team. You get adversarial dynamics, eroded trust, and people optimizing for the wrong things.</p><p><strong>3. The measurement loop.</strong> Stakeholders keep asking for more metrics---different cuts, new dashboards, one more data point---and nothing you build satisfies them. I've been in this loop. It's not a metrics problem. It's a trust deficit wearing a metrics costume. No dashboard fixes a broken relationship between engineering and the business; if you're caught in this cycle, put the dashboard down and have the hard conversation about what's actually wrong. <a href="https://lethain.com/measuring-engineering-organizations/">Larson says it plainly</a>: the loop is a signal, not something you solve with more data.</p><p><strong>4. Optimization metrics in the wrong room.</strong> Cycle time, deployment frequency, change failure rate---these are diagnostic tools for engineering leaders, not performance indicators for the board. Put them in front of non-technical stakeholders and they get misread; a higher deployment frequency sounds good, but it says nothing about whether you shipped the right things. Worse, the board starts setting targets. "Can we get deployment frequency to daily?" Now you're optimizing for the metric instead of the outcome. Larson is blunt about this: CEO and board get planning and operations metrics. Full stop.</p><p><strong>5. Perfection paralysis.</strong> The opposite failure mode. Some CTOs refuse to measure anything until they have the perfect framework, the perfect instrumentation, the perfect dashboard. They read about DORA, SPACE, DX Core 4, DevEx; they evaluate engineering intelligence platforms from LinearB, Jellyfish, Swarmia, Cortex; they attend conferences and take notes. And they measure nothing while they decide.</p><p>My advice? Start with something imperfect. Larson's sequencing advice? measure easy things first to build trust with stakeholders, even if the data isn't precise. Only take on one new measurement task at a time.</p><h2>The Ghosts in the Machine</h2><p>As if things weren't complicated enough, the AI era has added a new layer of complexity. Having metrics that measure what you think they measure was already hard, now AI is making it even harder.</p><p>The <a href="https://dora.dev/research/2025/dora-report/">2025 DORA report</a> found that a 25% increase in AI adoption correlates with a 1.5% drop in delivery throughput and a 7.2% drop in delivery stability.</p><p>The individual-level numbers tell a different story. <a href="https://www.cortex.io/post/ai-is-making-engineering-faster-but-not-better-state-of-ai-benchmark-2026">Cortex's 2026 benchmark</a> shows PRs per author up 20% year-over-year. Sounds like progress. But incidents per PR increased 23.5%; change failure rates climbed roughly 30%. More output, more breakage. The <a href="https://www.faros.ai/blog/key-takeaways-from-the-dora-report-2025">Faros AI report</a> shows the same pattern at larger scale: tasks completed up 21%, PRs merged up 98%, but code review time increased 91% and PR size grew 154%. In a way AI is starting to clog the pipeline; and bring down overall quality.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hhXd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F930b35a5-cb23-47f1-9090-34414fe0a5a2_1200x630.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hhXd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F930b35a5-cb23-47f1-9090-34414fe0a5a2_1200x630.jpeg 424w, https://substackcdn.com/image/fetch/$s_!hhXd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F930b35a5-cb23-47f1-9090-34414fe0a5a2_1200x630.jpeg 848w, https://substackcdn.com/image/fetch/$s_!hhXd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F930b35a5-cb23-47f1-9090-34414fe0a5a2_1200x630.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!hhXd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F930b35a5-cb23-47f1-9090-34414fe0a5a2_1200x630.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hhXd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F930b35a5-cb23-47f1-9090-34414fe0a5a2_1200x630.jpeg" width="728" height="382.2" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/930b35a5-cb23-47f1-9090-34414fe0a5a2_1200x630.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:630,&quot;width&quot;:1200,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:70795,&quot;alt&quot;:&quot;The AI Metrics Paradox&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="The AI Metrics Paradox" title="The AI Metrics Paradox" srcset="https://substackcdn.com/image/fetch/$s_!hhXd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F930b35a5-cb23-47f1-9090-34414fe0a5a2_1200x630.jpeg 424w, https://substackcdn.com/image/fetch/$s_!hhXd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F930b35a5-cb23-47f1-9090-34414fe0a5a2_1200x630.jpeg 848w, https://substackcdn.com/image/fetch/$s_!hhXd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F930b35a5-cb23-47f1-9090-34414fe0a5a2_1200x630.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!hhXd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F930b35a5-cb23-47f1-9090-34414fe0a5a2_1200x630.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Every traditional metric is now suspect. Deployment frequency goes up because AI generates more deployable units; <a href="https://plandek.com/blog/how-to-measure-dora-metrics-in-the-age-of-ai-2026/">Plandek's right</a> that "more deployments aren't always a sign of progress." Lead time shrinks---but only for the coding phase. Review, testing, approval? At best same as before, at worst taking much longer. <strong>Change failure rate looks flat until you remember that volume is up; a flat rate on higher volume means more absolute failures.</strong> Recovery time is the ugliest one: developers stall because they're debugging code they didn't write and don't fully understand.</p><p>DORA added a fifth metric in 2025. <a href="https://dora.dev/research/2025/dora-report/">Rework rate</a>---unplanned follow-up deployments caused by production issues. It exists because the original four metrics miss something important: the cost of fixing what you just shipped. You can have perfect deployment frequency and still be drowning in rework.</p><p>Does this mean that metrics are done, the answer is no. But you need to read your metrics with more skepticism now, which means pairing every speed metric with a quality metric, breaking lead time down by stage rather than treating it as a single number, and watching rework rate as your earliest warning signal. A dashboard that only shows throughput is measuring the accelerator without looking at the road.</p><h2>Where to Start</h2><p>Frameworks can be useful, here is a starting point; adapt it for your context, your stage, your stakeholders.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!P2ut!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ba42ec7-4056-42a5-980e-6cd07bf0a1ce_1200x526.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!P2ut!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ba42ec7-4056-42a5-980e-6cd07bf0a1ce_1200x526.jpeg 424w, https://substackcdn.com/image/fetch/$s_!P2ut!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ba42ec7-4056-42a5-980e-6cd07bf0a1ce_1200x526.jpeg 848w, https://substackcdn.com/image/fetch/$s_!P2ut!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ba42ec7-4056-42a5-980e-6cd07bf0a1ce_1200x526.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!P2ut!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ba42ec7-4056-42a5-980e-6cd07bf0a1ce_1200x526.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!P2ut!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ba42ec7-4056-42a5-980e-6cd07bf0a1ce_1200x526.jpeg" width="728" height="319.1066666666667" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0ba42ec7-4056-42a5-980e-6cd07bf0a1ce_1200x526.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:526,&quot;width&quot;:1200,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:43047,&quot;alt&quot;:&quot;The Minimum Viable Engineering Dashboard&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="The Minimum Viable Engineering Dashboard" title="The Minimum Viable Engineering Dashboard" srcset="https://substackcdn.com/image/fetch/$s_!P2ut!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ba42ec7-4056-42a5-980e-6cd07bf0a1ce_1200x526.jpeg 424w, https://substackcdn.com/image/fetch/$s_!P2ut!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ba42ec7-4056-42a5-980e-6cd07bf0a1ce_1200x526.jpeg 848w, https://substackcdn.com/image/fetch/$s_!P2ut!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ba42ec7-4056-42a5-980e-6cd07bf0a1ce_1200x526.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!P2ut!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ba42ec7-4056-42a5-980e-6cd07bf0a1ce_1200x526.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p><strong>Delivery predictability.</strong> Did we ship what we said we would? This is the metric that builds or destroys credibility with the board. Not "how much did we ship" but "did we hit our commitments?" Track it as a percentage; trend it over quarters. When the number drops, come prepared with a root cause and a plan.</p></li><li><p><strong>System reliability.</strong> Incidents, uptime, recovery time. Boards understand reliability intuitively---the system works, or it doesn't. Pair incident count with recovery time; a team that has five incidents but recovers in minutes is in better shape than one that has two incidents and takes days to resolve them.</p></li><li><p><strong>Investment allocation.</strong> Where did engineering effort go? New features, maintenance, unplanned work, technical debt---this is how the board decides whether engineering is pulling in the right direction. <a href="https://www.swarmia.com/blog/engineering-metrics-for-leaders/">Swarmia</a> benchmarks it (roughly 60% new features, 15% productivity improvements, 10% keeping the lights on), but your numbers will look different and should; the point isn't hitting their targets, it's knowing your own and explaining the reasoning behind them.</p></li><li><p><strong>Team health.</strong> Attrition, hiring pipeline, engagement scores. The leading indicator that nobody reports until it's too late. A team losing senior engineers will show up in your delivery metrics six months from now; by then the damage is done. Report this proactively.</p></li></ul><p>Three principles hold no matter which category. Only <a href="https://leaddev.com/reporting-metrics/five-engineering-kpis-consider-your-next-board-meeting">report metrics you're already tracking</a>---the moment you build a separate collection for the board, you're maintaining two systems and trusting neither. Show trends, not snapshots; one quarter is noise, four quarters is signal. And never show a speed metric alone. Deployment frequency without change failure rate beside it is a lie of omission; cycle time without reliability is the same trick. If you are not already doing this, you are setting yourself up for failure.</p><p><a href="https://www.swarmia.com/blog/engineering-metrics-for-leaders/">Swarmia's advice</a> captures the right mindset: think of metrics like a thermometer. They're the outcome of good practices, not a target to chase.</p><div><hr></div><p>As a recap, here are some questions to ask yourself:</p><ul><li><p>When the CEO asks for engineering metrics, which of the four categories are they asking about---and are you answering that question or the one you're most comfortable with?</p></li><li><p>How many of your current metrics would survive Goodhart's Law? If your team optimized for nothing but hitting those numbers, would the outcomes improve or decay?</p></li><li><p>What story is your dashboard telling? Is it the story your engineering team would tell, or a performance your engineering team has learned to put on?</p></li><li><p>If you stripped away every metric that measures activity rather than outcomes, what would be left?</p></li></ul><p>Peter Drucker <a href="https://medium.com/centre-for-public-impact/what-gets-measured-gets-managed-its-wrong-and-drucker-never-said-it-fe95886d3df6">never said</a> "what gets measured gets managed." What he said was closer to the opposite: "Because knowledge work cannot be measured the way manual work can, one cannot tell a knowledge worker in a few simple words whether he is doing the right job and how well he is doing it." <a href="https://www.infoq.com/news/2009/08/demarco-software-engineering-/">Tom DeMarco</a>, who famously wrote "you can't control what you can't measure," retracted it in 2009: "My answers are no, no, and no."</p><p>Measurement isn't the goal. Understanding is. The metrics are supposed to help you make better decisions about your teams, your systems, and your strategy. If they're not doing that---if they're creating theater instead of insight---the problem isn't that you need better metrics. The problem is that you've confused the dashboard for the thing it's supposed to represent.</p>]]></content:encoded></item><item><title><![CDATA[Audio: What to Measure When the CEO Asks for Engineering Metrics]]></title><description><![CDATA[When the CEO asks for engineering metrics, the first mistake most CTOs make is thinking it&#8217;s a single question with a simple answer.]]></description><link>https://www.thepragmaticcto.com/p/audio-what-to-measure-when-the-ceo</link><guid isPermaLink="false">https://www.thepragmaticcto.com/p/audio-what-to-measure-when-the-ceo</guid><dc:creator><![CDATA[Allan MacGregor 🇨🇦]]></dc:creator><pubDate>Mon, 02 Mar 2026 15:40:20 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/189252365/4b370953efbc650ede5cb424cc291e1b.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p>When the CEO asks for engineering metrics, the first mistake most CTOs make is thinking it&#8217;s a single question with a simple answer. It&#8217;s not. It&#8217;s four very different questions wrapped into one, and answering the wrong one wastes time and erodes trust.</p><p>Will Larson&#8217;s framework breaks engineering measurement into four categories. First, measure to plan: are we working on the right things? Show the board how engineering time maps to business impact&#8212;features that move revenue, infrastructure that prevents outages. That&#8217;s what the CEO and board really want to know. Second, measure to operate: is the system healthy? Incidents, downtime, latency, cost ratios&#8212;these explain why a roadmap might slip and help prioritize firefighting over feature work. Third, measure to optimize: are we getting faster? DORA metrics like deployment frequency and lead time live here, but these are for engineering leadership, not the board. Without technical context, they&#8217;re meaningless to most non-engineers. Fourth, measure to inspire: what&#8217;s the story of engineering&#8217;s impact? This is where you share narratives that turn engineering from a cost center into a strategic advantage&#8212;how a platform rebuild cut feature delivery from six weeks to two days, for example. It&#8217;s the hardest category to build and the easiest to skip, but it&#8217;s what wins you headcount and support.</p><p>And it gets worse. Even if you pick the right category, dashboards often destroy trust in predictable ways. Goodhart&#8217;s Law warns us that when a metric becomes a target, it stops being a good measure. Developers are smart; they&#8217;ll game any metric tied to performance reviews. AI only makes this worse&#8212;code volume becomes meaningless when AI can churn out pull requests in bulk, inflating output without adding value.</p><p>Another trap is measuring individuals instead of teams. Software is a team sport. Trying to isolate individual contributions is like measuring a piston&#8217;s output in an engine&#8212;it makes no sense and poisons team dynamics. McKinsey&#8217;s disastrous attempt to measure individual developer productivity proved this painfully clear.</p><p>There&#8217;s also the measurement loop: stakeholders ask for more dashboards, more metrics, and nothing satisfies them. This is not a metrics problem; it&#8217;s a trust problem disguised as data. No dashboard fixes a broken relationship between engineering and the business.</p><p>Plus, optimization metrics like cycle time or deployment frequency belong in engineering leadership meetings, not in front of the board. Presenting them to CEOs without context leads to misinterpretation and dangerous targets that drive the wrong behaviors. CEOs and boards want planning and operations metrics, period.</p><p>Finally, perfection paralysis kills progress. Some CTOs wait for the perfect framework, the perfect tools, and never start measuring at all. Start with what you have, measure the easy stuff first to build trust, then iterate.</p><p>Now AI adds a new layer of complexity. According to the 2025 DORA report, increased AI adoption correlates with a drop in delivery throughput and stability. Cortex&#8217;s 2026 data shows PRs per author up 20%, but incidents per PR up 23.5%, and change failure rates up 30%. AI speeds up coding but clogs the pipeline with more broken code that takes longer to review and fix. Deployment frequency alone no longer signals progress; you have to pair speed with quality metrics like rework rate, which DORA added in 2025 to capture unplanned follow-up work caused by production issues.</p><p>So where do you start? Focus on a minimum viable dashboard: delivery predictability&#8212;did you ship what you promised? System reliability&#8212;incidents and recovery time. Investment allocation&#8212;where engineering effort went. And team health&#8212;attrition, hiring, engagement. Report metrics you&#8217;re already tracking, show trends not snapshots, and never show speed metrics alone. Deployment frequency without change failure rate is a lie by omission.</p><p>Metrics are like a thermometer. They reflect the health of your engineering practices but aren&#8217;t goals themselves. If your dashboard creates theater instead of insight, you&#8217;re confusing the map for the territory.</p><p>When the CEO asks for engineering metrics, ask yourself: which of the four categories are they really after? Would your current metrics survive scrutiny under Goodhart&#8217;s Law? What story is your dashboard telling&#8212;one your engineers would own, or one they&#8217;ve learned to perform? If you stripped away activity metrics and kept only outcome metrics, what remains?</p><p>Peter Drucker didn&#8217;t say &#8220;what gets measured gets managed.&#8221; He said knowledge work can&#8217;t be measured like manual work. Tom DeMarco, who famously claimed you can&#8217;t control what you can&#8217;t measure, later retracted that. Measurement isn&#8217;t the goal&#8212;understanding is. Use metrics to make better decisions, not to create theater.</p><p>You can read the full article&#8212;with all the data and sources&#8212;on ThePragmaticCTO Substack.</p><div><hr></div><p>Read the full article &#8212; with all the data and sources &#8212; <a href="https://www.thepragmaticcto.com/publish/post/189249108">on ThePragmaticCTO</a>.</p>]]></content:encoded></item><item><title><![CDATA[Audio: The AI-First Fallacy]]></title><description><![CDATA[Rebranding around AI can boost your stock price and attract funding, but that&#8217;s not the same as having a strategy that creates real value.]]></description><link>https://www.thepragmaticcto.com/p/audio-the-ai-first-fallacy</link><guid isPermaLink="false">https://www.thepragmaticcto.com/p/audio-the-ai-first-fallacy</guid><dc:creator><![CDATA[Allan MacGregor 🇨🇦]]></dc:creator><pubDate>Mon, 23 Feb 2026 15:15:24 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/187585634/50ca4b7ad92114631e2fb3f5436ffdcb.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p>Rebranding around AI can boost your stock price and attract funding, but that&#8217;s not the same as having a strategy that creates real value. The AI-first label is often branding masquerading as strategy, and it&#8217;s setting companies up for failure.</p><p>Look at the numbers: since ChatGPT launched, mentions of AI on earnings calls rose sixty-fold in a year, and companies calling out AI saw their stock jump an average of 4.6%, almost double those that didn&#8217;t. But this bump comes from talking about AI, not from AI delivering measurable results. Venture capital funding for AI startups exploded, yet 78% of these startups are just API wrappers on the same foundation models, with no real differentiation. Regulators are now fining companies for &#8220;AI washing&#8221;&#8212;making misleading claims about AI capabilities. Meanwhile, layoffs attributed to AI are often just a cover story to spin bad business news as positive transformation.</p><p>Strip away the hype, and the reality is stark. Studies show 95% of companies see no measurable return on AI investments, and nearly half abandon their AI projects. Most AI startups generate no revenue and have customer churn twice the SaaS average. McKinsey&#8217;s 2025 report found that while almost everyone uses AI, only 39% see any financial impact, and just a third are scaling AI programs. The gap between saying you&#8217;re AI-first and actually benefiting from AI isn&#8217;t a gap&#8212;it&#8217;s a chasm.</p><p>There&#8217;s a predictable five-step pattern when companies declare AI-first: first, a bold AI mandate; then backlash from employees and customers; followed by quality issues and rising costs; a public walk-back; and finally, the AI-first narrative quietly disappears. Klarna claimed AI was replacing hundreds of agents, only to rehire humans after quality dropped and costs rose. Duolingo&#8217;s CEO insisted small quality hits were acceptable, but engagement and stock price plummeted, forcing a reversal. Amazon announced AI-driven layoffs, then backtracked amid employee pushback. This pattern repeats because AI-first as an identity invites scrutiny and internal resistance&#8212;31% of workers sabotage AI rollouts, and some even falsify performance data.</p><p>To cut through the noise, I use a simple taxonomy. AI-native companies build products that cannot exist without AI&#8212;TikTok&#8217;s recommendation engine or Midjourney&#8217;s image generation. AI-enhanced companies improve existing products with AI features&#8212;like Salesforce adding AI to CRM or banks using AI for fraud detection. AI-washing is just slapping AI branding on a product with minimal integration&#8212;exactly what most AI startups do. Klarna, Duolingo, and Shopify are AI-enhanced, not AI-native, despite calling themselves AI-first. Ask yourself: if you removed AI, would your product still work? If yes, you&#8217;re AI-enhanced. If no, you might be AI-native. If you can&#8217;t tell, you&#8217;re probably AI-washing&#8212;and that&#8217;s risky.</p><p>The problem with AI-first identity worsens as AI commoditizes. When a $6 million open-source Chinese model can rival U.S. tech giants, and the companies spending billions on AI infrastructure see their stock prices fall, the models themselves are no longer a moat. OpenAI calls itself a product company, not a model company, signaling the shift. The winner won&#8217;t be the one who built the best model, but the one who attracts and retains customers. Value will come from domain expertise, proprietary data, workflow integration, and user experience&#8212;not the AI model itself. If your identity is tied to a commodity, you have no moat.</p><p>This isn&#8217;t a reason to dismiss AI. Real AI-native companies exist and thrive. The technology is transformative for specific use cases like recommendations, fraud detection, or drug discovery. The key is precision: define what AI solves for your business and measure it. The companies succeeding with AI redesign workflows and set growth objectives, not just cost-cutting. Most failed AI projects stem from poor data and bolting AI onto old processes. Gartner placed generative AI in the trough of disillusionment in 2025. The hype is cooling, and companies with real integration&#8212;not just buzzwords&#8212;will emerge stronger.</p><p>If your board asks &#8220;are we AI-first?&#8221; don&#8217;t answer with buzzwords. Give them data quality status, specific AI use cases, measurable outcomes, and a clear roadmap. Fix your data first. Redesign workflows, don&#8217;t just add AI features. Build domain advantages, not model dependencies. Set growth goals, not just layoffs and cost cuts. Replace &#8220;AI-first&#8221; with &#8220;AI-specific&#8221; and be honest about what AI actually delivers.</p><p>Ask yourself: if you stripped AI from your product, what would be left? When the models become commodities, will your company have a moat beyond the label? Because just like Long Island Iced Tea didn&#8217;t become a blockchain company by changing its name, you don&#8217;t become an AI company by declaring yourself AI-first. You become one by solving problems AI is uniquely suited to solve&#8212;and being honest about the ones it can&#8217;t.</p><p>You can read the full article&#8212;with all the data and sources&#8212;on ThePragmaticCTO Substack.</p><div><hr></div><p>Read the full article &#8212; with all the data and sources &#8212; <a href="https://www.thepragmaticcto.com/publish/post/187585506">on ThePragmaticCTO</a>.</p>]]></content:encoded></item><item><title><![CDATA[The AI-First Fallacy]]></title><description><![CDATA[When Branding Masquerades as Strategy]]></description><link>https://www.thepragmaticcto.com/p/the-ai-first-fallacy</link><guid isPermaLink="false">https://www.thepragmaticcto.com/p/the-ai-first-fallacy</guid><dc:creator><![CDATA[Allan MacGregor 🇨🇦]]></dc:creator><pubDate>Mon, 23 Feb 2026 14:15:23 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/61c4b4d8-0769-432e-97cb-9846e6e08cdd_1536x1024.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In December 2017, a beverage company called Long Island Iced Tea Corp did something remarkable. It renamed itself <a href="https://www.cnbc.com/2017/12/21/long-island-iced-tea-micro-cap-adds-blockchain-to-name-and-stock-soars.html">Long Blockchain Corp</a>. The stock surged 380% overnight. Trading volume spiked 1,000%. The company had zero blockchain technology, zero blockchain products, and zero blockchain revenue. The SEC subpoenaed documents; three individuals were <a href="https://www.cnn.com/2021/07/10/investing/blockchain-long-island-insider-trading">charged with insider trading</a>; the stock was delisted from NASDAQ within four months.</p><p>This should have been a cautionary tale. Instead, it was a preview.</p><p>Technology branding follows a predictable cycle, and we've watched it loop for over a decade. Satya Nadella took over Microsoft in 2014 with "mobile-first, cloud-first" as his rallying cry; <a href="https://www.ciodive.com/news/microsoft-shifts-from-mobile-first-cloud-first-to-everything-ai-2/448645/">within four years</a>, the slogan had quietly shifted to "intelligent cloud," and by 2024, it was "everything AI." Same company, same playbook, new buzzword. In 2017-2018, <a href="https://www.cbinsights.com/research/blockchain-hype-stock-trends/">Riot Blockchain</a> -- formerly Bioptix, a biotech diagnostics company -- pivoted to blockchain and watched its stock spike before crashing. Every major retailer in the 2010s declared itself "digital-first"; most bolted an e-commerce site onto existing operations and called it transformation. The companies that won -- Amazon, Shopify -- were digital-native from the start. They didn't need the label.</p><p>The buzzword captures something real about a technological shift. Then the buzzword gets weaponized as marketing before the technology matures; companies rebrand around it, stock prices move, consultants publish frameworks, and the SEC eventually gets involved. The buzzword fades. A new one takes its place.</p><p>"AI-first" is the current buzzword. The playbook hasn't changed.</p><h2>The Earnings Call Effect</h2><p>The financial incentive to declare yourself "AI-first" is measurable -- and it has almost nothing to do with whether AI creates value for your business.</p><p>Since ChatGPT launched in November 2022, AI mentions on earnings calls went from roughly <a href="https://fortune.com/2024/01/22/over-30000-mentions-ai-earnings-calls-2023-c-suite-leaders-massive-technology-shift/">500 per quarter to over 30,000</a> by the end of 2023. A sixty-fold increase in twelve months. Companies that mentioned AI on earnings calls saw an average stock price increase of <a href="https://cepr.org/voxeu/columns/what-corporate-earnings-calls-reveal-about-ai-stock-rally">4.6%, compared to 2.4%</a> for those that didn't; among tech companies specifically, <a href="https://www.wallstreetzen.com/blog/ai-mention-moves-stock-prices-2023/">71% that mentioned AI saw their stock rise</a>, with an average gain of 11.9%. Roughly one-third of stock gains for "AI-exposed" firms were <a href="https://cepr.org/voxeu/columns/what-corporate-earnings-calls-reveal-about-ai-stock-rally">attributable to their GenAI discussions alone</a> -- not to any measurable AI output, but to the act of talking about it.</p><p>The money followed the narrative. Global AI venture capital hit <a href="https://news.crunchbase.com/ai/global-vc-funding-2025-annual-data/">$202.3 billion in 2025</a>, up 75% year-over-year. AI captured 53% of all global VC funding; in the U.S., that number was 64%. But <a href="https://medium.com/@neumannfelix/most-ai-startups-are-just-wrappers-that-wont-exist-in-a-couple-of-years-74d5dec95f00">78% of AI startups launched in 2024 are API wrappers</a> -- over 12,000 companies building on the same foundation models, differentiated primarily by their landing pages.</p><p>Regulators noticed the gap between claims and reality. In March 2024, the SEC charged <a href="https://www.sec.gov/news/press-release/2024-36">Delphia and Global Predictions</a> with making false and misleading statements about their use of AI -- the first-ever <strong>"AI washing"</strong> enforcement actions. By August 2025, the FTC had launched <a href="https://www.ftc.gov/news-events/news/press-releases/2025/08/ftc-sues-stop-air-ai-using-deceptive-claims-about-business-growth-earnings-potential-refund">"Operation AI Comply,"</a> suing Air AI Technologies for claiming its product could "fully replace human sales representatives" when the technology couldn't perform basic functions like placing outbound calls.</p><p>And then there's the layoff theater. Oxford Economics <a href="https://fortune.com/2026/01/07/ai-layoffs-convenient-corporate-fiction-true-false-oxford-economics-productivity/">published an analysis</a> in January 2026 that cut through the noise: AI-attributed job cuts accounted for just 4.5% of total reported layoffs, while standard "market and economic conditions" cuts were four times larger. Their conclusion was blunt: "We suspect some firms are trying to dress up layoffs as a good news story rather than bad news." Attributing cuts to AI "conveys a more positive message to investors" than admitting to business failures.</p><p>The incentives are clear. Mention AI; stock goes up. Declare "AI-first"; funding flows in. Attribute layoffs to AI; investors applaud your efficiency. None of this requires AI to produce a single dollar of value.</p><h2>The Data Beneath the Branding</h2><p>Strip away the branding and look at what "AI-first" companies are producing. <strong>The reality doesn't match the narrative.</strong></p><p>A <a href="https://fortune.com/2025/08/18/mit-report-95-percent-generative-ai-pilots-at-companies-failing-cfo/">2025 MIT study</a> found that 95% of businesses are seeing zero measurable return on their AI investments. <a href="https://www.ciodive.com/news/AI-project-fail-data-SPGlobal/742590/">S&amp;P Global</a> reported that 42% of companies are abandoning most of their AI initiatives -- up from 17% the year before. <a href="https://www.nttdata.com/global/en/insights/focus/2024/between-70-85p-of-genai-deployment-efforts-are-failing">NTT DATA</a> puts GenAI deployment failure rates at 70-85%. The numbers are consistent across every major analyst; the only thing that varies is how bad the picture looks.</p><p>The AI startup landscape is worse. Of those 12,000+ wrapper startups, <a href="https://medium.com/@neumannfelix/most-ai-startups-are-just-wrappers-that-wont-exist-in-a-couple-of-years-74d5dec95f00">60-70% generate zero revenue</a>. Only 3-5% surpass $10,000 in monthly revenue. The churn rate is staggering: 65% of AI wrapper customers leave within 90 days -- <a href="https://ai4sp.org/why-90-of-ai-startups-fail/">nearly double</a> the SaaS industry average of 35%. Roughly 90% of AI startups fail within their first year, compared to ~70% for traditional tech firms. These aren't companies building defensible technology; they're companies wrapping an API and hoping the branding holds.</p><p><a href="https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai">McKinsey's 2025 State of AI report</a> captures the gap between perception and performance most precisely. Eighty-eight percent of respondents report regular AI use; 72% have adopted GenAI in at least one function. Sounds like a revolution. But only 39% report any enterprise-level EBIT impact, and only one-third have begun to scale their AI programs. Almost everyone is "using AI." Almost nobody is seeing financial results from it.</p><p>The gap between declaring AI identity and achieving AI results isn't a gap. It's a massive chasm.</p><h2>The Five-Step Pattern</h2><p>There's a pattern to how "AI-first" declarations play out. It's predictable enough to map; it's consistent enough to name.</p><p><strong>Step one:</strong> CEO announces an aggressive AI mandate -- public memo, earnings call, or press interview. <strong>Step two:</strong> backlash follows -- employees resist, customers boycott, investors scrutinize. <strong>Step three:</strong> reality emerges -- quality drops, costs rise, customers leave. <strong>Step four:</strong> walk-back or reversal. <strong>Step five:</strong> the narrative quietly shifts, and what was "AI-first" becomes something softer or disappears from the talking points entirely.</p><p>The arc plays out the same way across industries, company sizes, and geographies.</p><p><strong>Klarna</strong> was the poster child. In 2024, CEO Sebastian Siemiatkowski <a href="https://www.entrepreneur.com/business-news/klarna-ceo-reverses-course-by-hiring-more-humans-not-ai/491396">bragged openly</a> that AI was "doing the work of 700 full-time agents." The narrative drove their IPO filing. By May 2025, Siemiatkowski was <a href="https://www.bloomberg.com/news/articles/2025-05-08/klarna-turns-from-ai-to-real-person-customer-service">telling Bloomberg</a> the opposite: "Cost unfortunately seems to have been a too predominant evaluation factor... what you end up having is lower quality." Klarna began rehiring human agents. Customer service costs rose to $50 million in Q3 -- up from $42 million -- despite the company's claimed $60 million in AI savings.</p><p><strong>Duolingo</strong> turned messaging into self-inflicted damage. CEO Luis von Ahn <a href="https://fortune.com/2025/08/18/duolingo-ceo-admits-controversial-ai-memo-did-not-give-enough-context-insists-company-never-laid-off-full-time-employees/">posted a memo</a> in April 2025 declaring Duolingo "AI-first" and warning that "small hits on quality are an acceptable price to pay." DAU growth dropped from 56% in February to 37% by June. The stock ended 2025 down 45.9%. The company earned an <a href="https://museumoffailure.com/exhibition/duolingo-ai-failure">exhibit in the Museum of Failure</a>. The walk-back came months later: "I did not give enough context." The sharpest irony -- Duolingo never laid off a single full-time employee. The damage was almost entirely self-inflicted through branding.</p><p><strong>Amazon</strong> showed the pattern at scale. In June 2025, Andy Jassy <a href="https://www.washingtonpost.com/technology/2025/06/17/amazon-jobs-ai-workforce-reduction/">told employees</a> that AI would "reduce our total corporate workforce." The response on internal Slack was immediate and overwhelmingly hostile. By October, Amazon announced 14,000 layoffs citing AI; Jassy then walked it back, calling them <a href="https://fortune.com/2025/11/01/ceo-andy-jassy-amazon-layoffs-about-culture-not-ai/">"about culture, not AI."</a> By December, <a href="https://fortune.com/2025/12/02/amazon-employees-open-letter-warning-companys-ai-damage-democracy-jobs-earth/">over 1,000 employees</a> had signed an open letter warning that the company's "all-costs-justified, warp-speed approach to AI development will do staggering damage."</p><p>Three companies; three industries; the same five steps. The pattern repeats because "AI-first" as an organizational identity is fragile. It invites scrutiny from every direction -- employees who fear replacement, customers who notice quality drops, investors who eventually demand proof. And the internal resistance is measurable: <a href="https://writer.com/blog/enterprise-ai-adoption-survey-press-release/">31% of workers</a> report actively sabotaging their company's AI rollout, jumping to <a href="https://www.cio.com/article/4022953/31-of-employees-are-sabotaging-your-gen-ai-strategy.html">41% among millennials and Gen Z</a>. One in ten admit to tampering with performance metrics to make AI appear to underperform.</p><p>The prediction market has already priced in the reversals. <a href="https://www.gartner.com/en/newsroom/press-releases/2026-02-03-gartner-predicts-half-of-companies-that-cut-customer-service-staff-due-to-ai-will-rehire-by-2027">Gartner expects</a> that by 2027, half of companies that cut customer service staff due to AI will rehire them -- under different job titles. <a href="https://www.theregister.com/2025/10/29/forrester_ai_rehiring/">Forrester predicts</a> half of all AI-attributed layoffs will be reversed by end of 2026. If "AI-first" were a sound strategy, the companies declaring it wouldn't keep reversing course.</p><h2>A Taxonomy That Matters</h2><p>"AI-first" tells you nothing. It's a branding label, not a strategy description. A three-part taxonomy is more useful for evaluating companies, strategies, and your own roadmap.</p><p><strong>AI-native:</strong> the product cannot exist without AI. The AI isn't a feature bolted on later; it's the foundation the entire product grows from. TikTok's recommendation engine is the product -- content discovery powered by AI is the entire value proposition. Midjourney is image generation; remove the AI and nothing remains. <a href="https://blog.superhuman.com/ai-native/">Superhuman</a> built email around AI from day one -- Split Inbox, AI writing, intelligent sorting are the core experience, not add-ons.</p><p>The defining characteristic of genuinely AI-native companies is that they don't need to call themselves "AI-first." Nobody describes TikTok as an "AI-first company"; they describe it as a video platform. The AI is invisible infrastructure. When the label is self-evident, you don't need the marketing.</p><p><strong>AI-enhanced:</strong> AI makes an existing product better, but the product works without it. This is the majority of successful AI deployment, and there is nothing wrong with it. Salesforce adding AI features to CRM; banks using AI for fraud detection; logistics companies optimizing delivery routes. The value proposition exists independent of AI; AI accelerates, improves, or extends it.</p><p><strong>AI-washing:</strong> a marketing label applied to the same product with an API call bolted on. No meaningful integration; no proprietary data advantage; no workflow redesign. A GPT wrapper, a chatbot skin, or a buzzword added to product descriptions. This is where the 78% of wrapper startups live, and it's where most self-declared "AI-first" companies land.</p><p>Now apply the taxonomy to the companies from the previous section. Klarna is AI-enhanced -- customer service existed long before AI; AI was an efficiency layer. Duolingo is AI-enhanced -- language learning worked before AI; AI accelerated content production. Shopify is AI-enhanced -- the e-commerce platform existed for over a decade before any AI features shipped. All three declared "AI-first." None of them are. The taxonomy exposes the gap between branding and operational reality.</p><p>Here is a simple question -- but one worth taking to your next strategy meeting: if you removed the AI from your product, would the product still work?</p><p>If yes, you're AI-enhanced. That's a perfectly valid strategy. Build from there.</p><p>If no, you might be genuinely AI-native. Build your moat accordingly -- in proprietary data, domain expertise, and workflow integration, not in which model you call.</p><p>If you're not sure, you might be AI-washing. <strong>That's the dangerous position.</strong></p><h2>The Commoditization Test</h2><p>"AI-first" as identity has a deeper problem than inaccuracy. It becomes meaningless when the AI layer commoditizes. And the evidence suggests that process is already well underway.</p><p>In January 2025, Chinese startup DeepSeek released a reasoning model <a href="https://www.brookings.edu/articles/deepseek-ai-big-tech-competition/">nearly equivalent to the best U.S. models</a> at a fraction of the cost. Open-source. Claimed training cost of roughly $6 million. The market reaction was immediate: Nvidia lost <a href="https://www.cnbc.com/2025/01/27/nvidia-falls-10percent-in-premarket-trading-as-chinas-deepseek-triggers-global-tech-sell-off.html">$588.8 billion in market value</a> in a single day -- the largest single-day loss any stock has ever recorded. The core investor fear wasn't about DeepSeek specifically; it was about what DeepSeek implied. If a Chinese startup can build competitive AI for $6 million, why are U.S. tech companies spending hundreds of billions on infrastructure that a fraction of the cost can replicate?</p><p>OpenAI itself signaled the shift. The company has positioned itself as "not a model company; it's a product company that happens to have fantastic models." <strong>When the company building the models tells you the models aren't the differentiator, listen.</strong> Andrew Chen at a16z <a href="https://andrewchen.substack.com/p/revenge-of-the-gpt-wrappers-defensibility">made the same observation</a>: the axis of competition is shifting from "can you build it?" to "will consumers come? And will they stick?" It's the same transition that defined Web 2.0; the technology becomes table stakes, and the winners differentiate on everything else.</p><p>The infrastructure math doesn't close, either. <a href="https://sequoiacap.com/article/ais-600b-question/">Sequoia Capital calculated</a> that AI infrastructure spending would need to generate $600 billion in annual revenue to justify current CapEx levels. The gap between investment and revenue "continues to loom large." In January 2026, Microsoft reported record revenue and beat analyst estimates -- then disclosed <a href="https://fortune.com/2026/01/28/microsoft-stock-drops-azure-growth-slows-capex-spending-q2/">$37.5 billion in quarterly CapEx</a> for AI data centers. The stock dropped 10.5%, erasing approximately $375 billion in market capitalization. As Morningstar analysts put it: "The era of rewarding 'AI potential' has ended, and a new, more demanding era of 'AI proof' has begun."</p><p>If your identity is "AI-first" and the AI layer commoditizes -- when every competitor has access to equivalent models at equivalent cost -- what's left? The answer isn't AI. It's everything around AI: domain expertise, proprietary data, workflow integration, distribution, user experience. The companies that will win are building moats in those layers. The companies declaring "AI-first" are defining themselves by the commodity.</p><h2>Where This Breaks Down</h2><p>The taxonomy isn't a reason to dismiss AI. It's a reason to be precise about what you're building and why.</p><p>Genuinely AI-native companies exist, and they're defensible. TikTok, Midjourney, vertical SaaS products that reimagine entire workflows around AI capabilities -- these started from different questions and imagined solutions that only make sense because AI exists. They don't need the "AI-first" label because their products are self-evidently built on AI. The distinction matters.</p><p>The technology itself is transformative for specific, well-defined use cases: recommendation engines, fraud detection, drug discovery, content generation, code assistance. These are real capabilities producing real value; dismissing them would be as foolish as the hype. The critique isn't "AI doesn't work." It's that declaring "AI-first" tells you nothing about whether AI works for your specific context, your specific problems, or your specific customers. Companies seeing the most value from AI <a href="https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai">set growth and innovation objectives</a> beyond cost-cutting; they redesign workflows rather than bolting AI onto existing processes. <a href="https://www.informatica.com/blogs/the-surprising-reason-most-ai-projects-fail-and-how-to-avoid-it-at-your-enterprise.html">Purchasing from specialized vendors</a> succeeds 67% of the time, compared to roughly 22% for internal builds. The path to AI value is specific, targeted, and unglamorous. It's the opposite of a branding exercise.</p><p>Gartner placed GenAI in the <a href="https://www.gartner.com/en/articles/hype-cycle-for-artificial-intelligence">Trough of Disillusionment</a> in 2025. This isn't the end of AI; it's the correction. Technologies that survive the trough emerge with realistic expectations and genuine adoption patterns. The companies that come out the other side will be the ones that invested in real integration -- not the ones that invested in the label.</p><h2>What to Do Instead</h2><p>If you're a CTO fielding "are we AI-first?" from your board, you're not alone, and the pressure is real. Board oversight disclosure on AI <a href="https://corpgov.law.harvard.edu/2025/04/02/ai-in-focus-in-2025-boards-and-shareholders-set-their-sights-on-ai/">increased 84% year over year</a> -- 150% since 2022. Shareholder proposals focused on AI quadrupled in 2024 versus 2023. But 66% of board directors report "limited to no knowledge or experience" with AI, and fewer than 25% of companies have board-approved AI policies. The dynamic is dangerous: AI-illiterate boards demanding transformation they don't understand, driven by investor anxiety they can't evaluate.</p><p><a href="https://www.pwc.com/gx/en/issues/c-suite-insights/ceo-survey.html">PwC reports</a> that 42% of CEOs believe their company won't be viable beyond the next decade on its current path. That existential anxiety creates enormous pressure to show AI transformation -- even when the transformation is theater. The board doesn't want theater. They want answers they can defend to shareholders. Give them precision instead of buzzwords.</p><ul><li><p><strong>Fix the data first.</strong> <a href="https://www.informatica.com/blogs/the-surprising-reason-most-ai-projects-fail-and-how-to-avoid-it-at-your-enterprise.html">Forty-three percent</a> of organizations cite data quality as their top AI obstacle; <a href="https://www.gartner.com/en/articles/hype-cycle-for-artificial-intelligence">57% admit</a> their data isn't ready for AI. No amount of "AI-first" branding fixes bad data infrastructure. This is the boring, unglamorous work that makes AI deployments succeed or fail, and it belongs in your board deck before any AI initiative does.</p></li><li><p><strong>Redesign workflows; don't bolt AI onto existing processes.</strong> <a href="https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai">McKinsey's key finding</a> across companies seeing genuine AI value: they redesigned how work gets done, not just what tools people use. The board deck should show workflow transformations with measurable outcomes, not tool purchases.</p></li><li><p><strong>Build domain advantages, not model dependencies.</strong> Value lives in proprietary data, domain expertise, and <a href="https://www.vendep.com/post/forget-the-data-moat-the-workflow-is-your-fortress-in-vertical-saas">workflow integration</a>. When the model layer commoditizes -- and it will -- these are what remain. Your moat is never the API you call.</p></li><li><p><strong>Set growth objectives, not just efficiency targets.</strong> Companies setting growth and innovation goals beyond cost-cutting see the most AI value. "AI-first" memos are almost always about cutting costs. That's the wrong optimization target, and it's one that invites the five-step pattern of backlash, reversal, and narrative shift.</p></li><li><p><strong>Answer the board with precision, not buzzwords.</strong> Replace "we're AI-first" with specifics: "We're deploying AI against these three problems, with these KPIs, and here's what we've learned so far." Use the taxonomy: "We're AI-enhanced in customer service, AI-native in our recommendation engine, and evaluating AI for supply chain optimization. We're not AI-first -- we're AI-specific." That answer gives the board something defensible. "AI-first" gives them a press release.</p></li></ul><div><hr></div><h2>Questions for CTOs</h2><ul><li><p>If you stripped the AI from your product, what would be left? Is that enough?</p></li><li><p>When your board asks "are we AI-first?" -- what are they asking, exactly? And are you answering the question they mean, or the one they said?</p></li><li><p>Can you name three specific problems your AI initiatives are solving -- with KPIs attached? If not, you might be declaring an identity rather than executing a strategy.</p></li></ul><p>In eighteen months, when the models are commoditized and every competitor has access to the same capabilities, what's your moat? If the answer is "we're AI-first," you don't have one.</p><p>Long Island Iced Tea didn't become a blockchain company by changing its name. <strong>Your company doesn't become an AI company by declaring itself "AI-first."</strong> It becomes an AI company by solving problems that AI is uniquely suited to solve -- and being honest about the ones where it isn't.</p>]]></content:encoded></item><item><title><![CDATA[Audio: No Vibes Allowed: Context Engineering for Real Codebases]]></title><description><![CDATA[If you believe AI coding tools are speeding up your teams but delivery metrics don&#8217;t show it, you&#8217;re not imagining things.]]></description><link>https://www.thepragmaticcto.com/p/audio-no-vibes-allowed-context-engineering</link><guid isPermaLink="false">https://www.thepragmaticcto.com/p/audio-no-vibes-allowed-context-engineering</guid><dc:creator><![CDATA[Allan MacGregor 🇨🇦]]></dc:creator><pubDate>Fri, 20 Feb 2026 12:05:18 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/187277612/ce2c653ebf2b41c4b6db1912bedc1619.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p>If you believe AI coding tools are speeding up your teams but delivery metrics don&#8217;t show it, you&#8217;re not imagining things. A rigorous trial with experienced open-source developers found that AI assistance actually slowed them down by 19%, even though they felt 20% faster&#8212;a 40-point perception gap. This wasn&#8217;t novice error; these were skilled devs on familiar codebases using mainstream AI tools. The disconnect between perception and reality is real, and it&#8217;s backed by solid data.</p><p>Stanford&#8217;s extensive study confirms AI coding tools boost productivity on simple, new projects by up to 40%, but that gain halves or disappears as task complexity and codebase size grow. For hard tasks in mature systems, AI helps little or even hurts, mainly because fixing AI-introduced bugs eats into any speed gains. The bigger the codebase, the worse the AI performs. Context window limits and intricate dependencies overwhelm current models, turning AI from helper to liability on your toughest problems.</p><p>And it gets worse. AI-assisted commits are changing code quality in troubling ways. GitClear&#8217;s analysis reveals copy-pasted code is on the rise, refactoring is tanking, and code churn is doubling. AI models optimize for local correctness&#8212;code compiles and tests pass&#8212;but global architecture coherence degrades. CodeRabbit&#8217;s study of pull requests shows AI coauthored code has nearly twice as many major issues, security vulnerabilities up to double, and readability problems tripled compared to human-only work. Developers know this firsthand: trust in AI accuracy dropped from 40% to 29%, and most say they spend more time fixing AI&#8217;s &#8220;almost right&#8221; code than they save. The &#8220;slop factory&#8221; churns on&#8212;ship fast, fix later, repeat&#8212;with questionable net velocity and clear quality decline.</p><p>The industry divides into three camps. Camp 1 says AI is fundamentally incapable of handling complex systems; the evidence supports this. Camp 2 hopes smarter future models will fix these problems, so companies wait passively for advances. Camp 3, however, argues the bottleneck isn&#8217;t the AI model itself but how we feed it information&#8212;context engineering. With the right workflow, today&#8217;s models can handle large codebases effectively. This is where new breakthroughs are happening.</p><p>Dex Horthy from HumanLayer nails the core constraint: context window physics. AI models have a cliff effect&#8212;once you fill beyond about 40% of the context window, accuracy plummets. Just dumping more code into the prompt makes things worse, not better. His solution is &#8220;frequent intentional compaction&#8221;&#8212;deliberately compressing, validating, and reloading context throughout the development process to keep the AI&#8217;s input clean and focused. The damage hierarchy is critical: incorrect context poisons everything downstream, missing info leads to guesswork, and noise wastes tokens but is least harmful. The formula is simple: prioritize correctness first, completeness second, compactness third, and minimize noise.</p><p>Applying this means three phases: Research&#8212;map the architecture and relevant files with fresh context windows and human review; Plan&#8212;craft a precise implementation strategy with clear file edits and tests, keeping context load moderate and reviewed by domain experts; Implement&#8212;execute the plan with minimal overhead, verifying continuously and compressing status back into context. The insight is counterintuitive: most time should go into research and planning, not code writing. Research yields tenfold return, planning fivefold, implementation just onefold. Humans add the most value by reviewing research and plans, not raw code. Flawed assumptions early on multiply downstream mistakes. As Horthy says, &#8220;Do not outsource the thinking.&#8221;</p><p>This approach delivers results. Horthy, an amateur Rust dev new to a 300K-line codebase, produced a one-shot PR approved by the project CTO. Another time, he and a collaborator implemented 35,000 lines of WebAssembly support in seven hours&#8212;a task estimated at days per engineer. But it&#8217;s not magic. They failed to remove Hadoop dependencies from Parquet Java because that required deep architectural understanding that can&#8217;t be compressed into context windows. Context engineering works spectacularly for decomposable problems, but not for holistic architectural redesigns. Knowing that boundary is crucial.</p><p>Context engineering is gaining traction as a discipline. Martin Fowler defines it as curating what the model sees to improve outcomes&#8212;not just prompt phrasing but workflow engineering. Spotify and others have published enterprise-scale approaches. The CLAUDE.md ecosystem exemplifies this: persistent markdown files encoding build commands, coding conventions, architecture decisions, and lazy-loaded skills guide AI tasks. But as Fowler cautions, certainty is impossible with LLMs; you must think probabilistically. Horthy warns against buzzword dilution&#8212;if your vendor can&#8217;t explain the damage hierarchy, they&#8217;re not truly doing context engineering.</p><p>Here&#8217;s the 90/10 rule for CTOs: For roughly 90% of AI coding&#8212;simple tasks, greenfield work, small fixes&#8212;AI tools yield real 15&#8211;40% gains with minimal workflow change. But for the critical 10%&#8212;complex tasks in large codebases that determine stability, security, and maintainability&#8212;AI without context engineering is neutral or worse. The mistake is expecting the same AI workflow to handle both. Discipline in context engineering bridges that gap.</p><p>Open questions remain. Can mid-level engineers learn this discipline? Does it scale from solo experts to teams? What if you lack a domain expert? Cultural leadership is key; tool adoption alone won&#8217;t cut it. Meanwhile, senior engineers see the tradeoffs clearly, while juniors produce AI-assisted code that increases technical debt. Context engineering might be the bridge, but it&#8217;s unproven at scale.</p><p>I&#8217;m running experiments applying context engineering to measure where AI helps and where it creates rework, by task and codebase area. The data matches the 90/10 pattern. Routine work sees gains; complex integration demands the full research-plan-implement rigor to avoid net negative outcomes. This is a bet on discipline over tooling. The developers who master context engineering won&#8217;t just be faster; they&#8217;ll do the work AI can&#8217;t do alone. Maybe future models will make this irrelevant, but waiting risks falling behind. The skills&#8212;research rigor, structured planning, domain expertise&#8212;are valuable no matter what.</p><p>So ask yourself: When your team uses AI on complex work, are they investing in research and planning or just generating code faster? Do you measure AI-induced rework? Who on your team is developing context engineering skills&#8212;or are you waiting for smarter models? Context engineering makes explicit the bottleneck that&#8217;s always been there: understanding the problem well enough to write the right code. Without it, you&#8217;re just generating slop faster.</p><p>You can read the full article&#8212;with all the data and sources&#8212;on ThePragmaticCTO Substack.</p><div><hr></div><p>Read the full article &#8212; with all the data and sources &#8212; <a href="https://www.thepragmaticcto.com/publish/post/187277480">on ThePragmaticCTO</a>.</p>]]></content:encoded></item><item><title><![CDATA[No Vibes Allowed: Context Engineering for Real Codebases]]></title><description><![CDATA[Context engineering as discipline]]></description><link>https://www.thepragmaticcto.com/p/no-vibes-allowed-context-engineering</link><guid isPermaLink="false">https://www.thepragmaticcto.com/p/no-vibes-allowed-context-engineering</guid><dc:creator><![CDATA[Allan MacGregor 🇨🇦]]></dc:creator><pubDate>Fri, 20 Feb 2026 12:01:24 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/4f508097-adef-41ce-befe-a29ec251118a_1536x1024.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>A randomized controlled trial of 16 experienced open-source developers working 246 real-world tasks found that developers using AI coding tools took <a href="https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/">19% longer to complete their work</a>. But they believed they were 20% faster. <strong>A 40-percentage-point perception gap</strong>; the developers weren't just wrong about the magnitude of the improvement, they had the direction backwards.</p><p>These weren't beginners fumbling with a new tool. They averaged five years of experience on the specific codebases where they were tested. They used Cursor Pro and Claude 3.5/3.7 Sonnet --- mainstream tools, not fringe experiments. The methodology was rigorous: randomized, controlled, pre-registered. And the result was unambiguous.</p><p>If you're a CTO and your teams report that AI tools are "helpful" while your delivery metrics stay flat, you're not imagining things. The data confirms the disconnect.</p><p><a href="https://softwareengineeringproductivity.stanford.edu/ai-impact">Stanford's three-year study across 600+ companies and 100,000+ developers</a> fills in the rest of the picture. AI coding tools increase productivity 15--20% on average --- but that average obscures massive variation. Simple tasks on new projects see 30--40% gains. Simple tasks in existing codebases see 15--20%. Hard tasks in mature codebases? Zero to 10% gains, sometimes negative. As Stanford's researchers noted, "a significant portion of that gain is lost fixing the bugs and mess the AI made."</p><p>The degradation scales with complexity. As codebase size increases from 10K to 10M lines of code, AI's productivity contribution <a href="https://www.marvinzhang.dev/blog/ai-productivity">drops sharply</a>. Context window performance degrades from roughly 90% accuracy at 1K tokens to around 50% at 32K tokens. Signal-to-noise ratio collapses; dependencies and domain-specific logic grow more intricate than the model can reason about unaided.</p><p>The pattern is clear: AI coding tools work well on small, isolated problems. They struggle --- and sometimes actively hurt --- on the large, interconnected codebases where your hardest engineering problems live. The question is whether that gap is permanent or whether something can be done about it.</p><h2>The Slop Factory</h2><p>The speed problem is bad enough. The quality problem is worse.</p><p><a href="https://www.gitclear.com/ai_assistant_code_quality_2025_research">GitClear analyzed 211 million lines of code across 2020--2024</a> and found that AI-assisted development is fundamentally changing what gets committed. <strong>Copy-pasted code rose from 8.3% to 12.3%.</strong> Duplicated code blocks of five or more lines increased eightfold in 2024. Refactoring collapsed --- from 25% of all changes in 2021 to less than 10% in 2024, a 60% decline. Code churn doubled; new code revised within two weeks grew from 3.1% to 5.7%. For the first time in GitClear's measurement history, copy-pasted lines exceeded moved or refactored lines.</p><p>LLMs prioritize <a href="https://www.sonarsource.com/blog/the-inevitable-rise-of-poor-code-quality-in-ai-accelerated-codebases/">local functional correctness over global architectural coherence</a>. The code compiles. The tests pass. But the system accumulates entropy --- duplicated logic, ignored abstractions, brittle coupling --- that compounds with every AI-assisted commit.</p><p><a href="https://www.coderabbit.ai/blog/state-of-ai-vs-human-code-generation-report">CodeRabbit's analysis of 470 real-world pull requests</a> quantified the damage. AI-coauthored PRs averaged 10.83 issues versus 6.45 for human-only PRs. 1.7x more major issues; 1.4x more critical issues. Logic errors up 75%. Security vulnerabilities up 1.5--2x. Readability issues up 3x. Performance bugs up 8x.</p><p>Developers know this. The <a href="https://stackoverflow.blog/2025/12/29/developers-remain-willing-but-reluctant-to-use-ai-the-2025-developer-survey-results-are-here/">Stack Overflow 2025 survey</a> found that trust in AI accuracy fell from 40% to 29% year over year. Sixty-six percent say they spend more time fixing "almost-right" AI code than they save. More developers actively distrust AI (46%) than trust it (33%).</p><p>Dex Horthy, founder of HumanLayer, named the dynamic concisely: "A lot of the extra code shipped by AI tools ends up just reworking the slop that was shipped last week."</p><p>The slop factory. Ship fast on Monday; fix what you shipped on Friday. Net velocity gain: debatable. Net quality impact: measurable and negative.</p><p>This is not an anti-AI argument. The productivity gains on simple tasks are real; the Stanford data confirms that. But when AI coding tools are deployed without discipline on complex codebases, the quality evidence is damning. And quality problems compound in ways that speed gains do not.</p><h2>Three Camps</h2><p>The industry has sorted itself into three responses to this data.</p>
      <p>
          <a href="https://www.thepragmaticcto.com/p/no-vibes-allowed-context-engineering">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Audio: The Software Factory: When No Human Writes or Reviews the Code]]></title><description><![CDATA[StrongDM&#8217;s Software Factory throws down a radical challenge: no human writes code, no human reviews it, and you better be spending at least a thousand dollars a day in tokens per engineer to keep up.]]></description><link>https://www.thepragmaticcto.com/p/audio-the-software-factory-when-no</link><guid isPermaLink="false">https://www.thepragmaticcto.com/p/audio-the-software-factory-when-no</guid><dc:creator><![CDATA[Allan MacGregor 🇨🇦]]></dc:creator><pubDate>Wed, 18 Feb 2026 14:01:52 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/187277606/93b3f6965eefb1559cfc5915e158801e.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p>StrongDM&#8217;s Software Factory throws down a radical challenge: no human writes code, no human reviews it, and you better be spending at least a thousand dollars a day in tokens per engineer to keep up. They&#8217;ve skipped every safety net most of us rely on and gone all-in on agentic AI development. This is either the future of software or the blueprint for a disaster waiting to happen.</p><p>But before we get skeptical, credit where it&#8217;s due: the engineering behind this is impressive. They don&#8217;t just throw AI at the problem; they build structured, spec-driven workflows. The cleverest idea is using &#8220;scenarios&#8221; as holdout sets &#8212; user stories stored outside the codebase that AI agents can&#8217;t see, preventing them from gaming their own tests. It&#8217;s a principle borrowed from machine learning, where you never train on your test set. Then there&#8217;s their Digital Twin Universe &#8212; full behavioral clones of third-party services like Okta and Slack, running thousands of tests at scale without API costs or rate limits. This isn&#8217;t casual; it&#8217;s a methodical, iterative approach to growing correctness, not just generating code once and shipping.</p><p>But here&#8217;s the rub: the numbers don&#8217;t support skipping human review. CodeRabbit&#8217;s December 2025 report analyzed hundreds of real-world pull requests and found AI-generated code had 1.4 times more critical issues and 1.7 times more major issues than human code. Security vulnerabilities doubled, readability issues tripled, and performance problems were eight times more frequent. Veracode and FormAI studies confirm half or more of AI-generated code samples have security flaws. Now imagine this in StrongDM&#8217;s context &#8212; software controlling enterprise access. Trusting AI alone on security-critical code is a gamble with catastrophic downside.</p><p>And it gets worse. Real-world failures have already happened with some human oversight, like Replit&#8217;s AI agent wiping a live production database during a code freeze, or Moltbook leaking 1.5 million API keys because AI-generated schemas lacked essential security settings. StrongDM&#8217;s model removes human review entirely &#8212; no code writers, no reviewers &#8212; so the guardrails that failed with humans won&#8217;t exist at all. When no one understands the code, who investigates the failures? Incident response and compliance become nightmares if the audit trail is just AI conversations.</p><p>StrongDM&#8217;s answer to verification is the holdout sets, but who writes those? If humans do, you haven&#8217;t eliminated human review &#8212; you&#8217;ve just moved it upstream. If AI writes the scenarios too, you&#8217;ve just pushed the problem higher, with agents verifying agents verifying agents. Software edge cases are unbounded; you can&#8217;t test what you haven&#8217;t imagined. That missing checkbox in Moltbook&#8217;s breach is a perfect example. The most brittle part breaks the system, and in security software, that brittle part is the attacker's first target.</p><p>The economics add another layer of complexity. Spending $1,000 per day per engineer on tokens means $240,000 a year just on AI usage &#8212; more than the median software engineer salary. StrongDM builds high-priced enterprise security software, so maybe it makes sense there. But for most startups or broader software development, the cost is prohibitive. Plus, if AI can build your product from specs, it can build your competitor&#8217;s too. Your moat shifts from code to your scenario library, which is just documentation and far easier to copy.</p><p>There is a middle ground. Sam Schillace, Microsoft&#8217;s Deputy CTO and creator of Google Docs, lays out &#8220;Coding Laws for LLMs&#8221; that are pro-AI but insist on human oversight. His key point: don&#8217;t write code if AI can do it, but always keep human validation checkpoints. Treat models as tools, not autonomous agents. StrongDM&#8217;s rules directly contradict these principles. Given the data and incidents, the evidence supports keeping humans in the loop for now.</p><p>What about the engineers? If no human writes or reviews code, what do they do? The optimistic spin is they become supervisors and architects, focusing on high-level design and domain expertise. The harsher truth is you&#8217;re shifting from coding to prompt engineering and scenario design &#8212; valuable but fewer roles overall. More critically, when no one writes or reviews code, the team loses shared understanding and the mental model of the system decays. Maintenance, debugging, and evolution get harder, not easier.</p><p>The real test is happening now: StrongDM is being acquired by Delinea, a major identity security player. Will they keep the &#8220;no human review&#8221; approach for security-critical products once compliance and risk are on the table? Or will human oversight return? That answer will tell us more than any manifesto about whether the dark factory model is viable or just an experiment.</p><p>As for me, I&#8217;m not embracing the dark factory. The data doesn&#8217;t justify removing human review, especially in security-sensitive contexts. But I&#8217;m borrowing ideas: keeping verification scenarios outside the codebase is smart, and smaller-scale digital twins or mocks for integration testing are worth exploring. I&#8217;m watching the trajectory carefully but won&#8217;t abandon human judgment until the numbers say it&#8217;s safe.</p><p>The Software Factory isn&#8217;t about vision or ambition alone &#8212; it&#8217;s about evidence. Their holdout set concept is worth adopting. Their engineering deserves respect. Their philosophy is provocative but premature. The real question for every CTO is: what defect rate would make you comfortable trusting AI without human review? Are we there yet? For now, the answer is no.</p><p>You can read the full article &#8212; with all the data and sources &#8212; on ThePragmaticCTO Substack.</p><div><hr></div><p>Read the full article &#8212; with all the data and sources &#8212; <a href="https://www.thepragmaticcto.com/publish/post/187277478">on ThePragmaticCTO</a>.</p>]]></content:encoded></item><item><title><![CDATA[The Software Factory: When No Human Writes or Reviews the Code]]></title><description><![CDATA[StrongDM's radical experiment with AI-generated code]]></description><link>https://www.thepragmaticcto.com/p/the-software-factory-when-no-human</link><guid isPermaLink="false">https://www.thepragmaticcto.com/p/the-software-factory-when-no-human</guid><dc:creator><![CDATA[Allan MacGregor 🇨🇦]]></dc:creator><pubDate>Wed, 18 Feb 2026 12:02:09 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/b726f485-883c-4d25-87b9-3e236c95bcdf_1536x1024.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>StrongDM's <a href="https://factory.strongdm.ai/">Software Factory</a> has three cardinal rules. Rule one: code must not be written by humans. Rule two: code must not be reviewed by humans. Rule three: if you haven't spent at least $1,000 on tokens today per human engineer, your software factory has room for improvement.</p><p>Three rules. No hedging, no qualifiers, no "except when."</p><p>The guiding mantra for every engineer on the team is a single question: "Why am I doing this?" The implication is clear; the model should be doing it instead. Every line of code a human writes is, in their framing, a failure of imagination --- a task that should have been delegated to an agent.</p><p>Simon Willison <a href="https://simonwillison.net/2026/Feb/7/software-factory/">published his analysis</a> of the approach today, calling it "the most ambitious form of AI-assisted software development I've seen yet." He frames it as Level 5 on a spectrum from "spicy autocomplete" to what StrongDM calls the Dark Factory --- fully agentic development where humans don't write code and don't review it. Most of us are somewhere around Levels 2 and 3; StrongDM skipped straight to the end of the spectrum.</p><p>That alone would be worth discussing. But context matters.</p><p>StrongDM builds security and access management software --- permission management across Okta, Jira, Slack, and Google services. They're <a href="https://www.globenewswire.com/news-release/2026/01/15/3219527/0/en/Delinea-and-StrongDM-to-Unite-to-Redefine-Identity-Security-for-the-Agentic-AI-Era.html">being acquired by Delinea</a>, an identity security company, with the deal expected to close Q1 2026. No human writes the code that controls access to enterprise systems. No human reviews it. This is either the most visionary approach to software development anyone has shipped, or <strong>the setup for a catastrophe that writes its own case study.</strong> The data should tell us which.</p><h2>The Engineering Is Disciplined</h2><p>Before the skepticism, StrongDM deserves credit for what they've built. This is not vibe coding. The engineering is structured, specification-driven, and contains ideas that deserve serious analysis --- regardless of whether you buy the philosophy.</p><p>The strongest idea is scenarios as holdout sets. The problem is well-known: when agents write both code and tests, they game the tests. An agent can trivially write `assert true` and declare victory. StrongDM's solution replaces traditional tests with "scenarios" --- end-to-end user stories stored outside the codebase, invisible to the code-generating agents. The analogy comes from machine learning; you never train on your test set because it corrupts evaluation. <a href="https://factory.strongdm.ai/">StrongDM applies the same principle to software verification</a>. The agents can't see the scenarios, so they can't game them. The satisfaction metric shifts from boolean --- did all tests pass? --- to probabilistic: what fraction of observed trajectories through all scenarios likely satisfy the user?</p><p>That's a genuinely smart framing. It addresses the most obvious objection to AI-generated testing in a way that borrows from a discipline with decades of rigor behind it. If you've worked with ML pipelines, you recognize the logic immediately; the principle is sound even if you question the scope of its application.</p><p>The <a href="https://factory.strongdm.ai/">Digital Twin Universe</a> is equally impressive. StrongDM built behavioral clones of third-party services --- Okta, Jira, Slack, Google Docs, Google Drive, Google Sheets --- as self-contained Go binaries that replicate APIs, edge cases, and observable behaviors. They run thousands of scenarios hourly; they test at volumes exceeding production limits; they simulate dangerous failure modes impossible against live services. No rate limits. No API costs. Building full SaaS replicas was always theoretically possible but economically unfeasible; agentic development reverses the cost equation.</p><p>The team calls this "grown software" --- code that <a href="https://factory.strongdm.ai/">compounds correctness through iteration</a> rather than degrading over time. Not generated once and shipped; grown through cycles of agent-driven refinement against scenario validation. The Software Factory was founded July 14, 2025 by Jay Taylor, Navan Chauhan, and Justin McCarthy, StrongDM's CTO and co-founder. The catalyst, according to them, was Claude Sonnet 3.5's October 2024 revision, which enabled "long-horizon agentic coding workflows" that compound correctness rather than error. Subsequent models --- Opus 4.5, GPT 5.2 --- increased reliability further; the trajectory gave them confidence to go all-in.</p><p>It matters that Willison is the one taking this seriously. He's been one of the most rigorous and careful observers of AI-assisted development for years. His assessment: this is <a href="https://simonwillison.net/2026/Feb/7/software-factory/">structured, spec-driven agentic development</a>, not reckless experimentation. He remains most interested in "enabling agents to prove code works without human line-by-line review." Coming from Willison, that's not hype. It's a signal worth tracking.</p><p>The holdout-set concept is worth stealing. The DTU is worth studying. The engineering behind the Software Factory is disciplined enough that dismissing it outright would be intellectually lazy.</p><p>The philosophy is a different question.</p><h2>The Numbers Don't Support It</h2><p>The quality data on AI-generated code is unambiguous, and it runs directly counter to "no human review."</p><p><a href="https://www.coderabbit.ai/blog/state-of-ai-vs-human-code-generation-report">CodeRabbit's "State of AI vs Human Code Generation" report</a>, published December 2025, analyzed 470 real-world open source pull requests --- 320 AI-coauthored, 150 human-only. AI-authored PRs contained <a href="https://www.businesswire.com/news/home/20251217666881/en/CodeRabbits-State-of-AI-vs-Human-Code-Generation-Report-Finds-That-AI-Written-Code-Produces-1.7x-More-Issues-Than-Human-Code">1.4x more critical issues and 1.7x more major issues</a> than human-written PRs. The averages: 10.83 issues per AI PR versus 6.45 for human PRs. Logic and correctness issues --- business logic errors, misconfigurations, unsafe control flow --- rose 75%. Security vulnerabilities increased 1.5--2x. Code readability problems jumped more than 3x. Performance inefficiencies appeared nearly 8x more often in AI-generated code.</p><p>Those numbers deserve a second read. Not 10% worse. Not marginally worse. Measurably, significantly worse across every dimension that matters for production software --- logic, security, readability, performance. The study looked at real-world pull requests in open-source projects; these aren't synthetic benchmarks or contrived examples.</p><p>The security dimension is particularly damning. The <a href="https://www.accorian.com/security-impact-of-vibe-coding-deep-dive-part-1-of-2/">Veracode 2025 report</a> found that 45% of AI-generated code contains security vulnerabilities, with XSS errors appearing in 86% of AI-generated cases and SQL injection in 20% of generated code samples. The <a href="https://www.netcorpsoftwaredevelopment.com/blog/ai-generated-code-statistics">FormAI study</a> analyzed 112,000 C programs generated by ChatGPT; 51.24% contained at least one security vulnerability.</p><p>Now apply that to StrongDM's context. They build access management software --- the software that determines who can access what across your enterprise systems. Applying "no human review" to security-critical software means trusting AI agents to get security right, when every major study shows AI code has 1.5--2x more security vulnerabilities than human-written code. StrongDM's holdout scenarios may catch some of this. But scenarios are only as comprehensive as the person --- or agent --- that writes them.</p><p>The failure mode here isn't a broken feature. It's a security breach.</p><h2>When the Dark Factory Has a Dark Day</h2><p>The failure cases are not hypothetical. They've already happened --- at companies with more human oversight than StrongDM proposes.</p><p>In July 2025, <a href="https://fortune.com/2025/07/23/ai-coding-tool-replit-wiped-database-called-it-a-catastrophic-failure/">a Replit AI agent deleted a live production database</a> during an active code freeze. It wiped data for over 1,200 executives and 1,190 companies. The agent <a href="https://www.theregister.com/2025/07/21/replit_saastr_vibe_coding_incident/">admitted to running unauthorized commands</a>, panicked in response to empty queries, and violated explicit instructions not to proceed without human approval. A code freeze, explicit guardrails, human involvement in the process --- and the agent still destroyed a production database.</p><p>In January 2026, <a href="https://www.isyncevolution.com/blog/ai-code-slop-crisis-vibe-coding-security-risks">Moltbook launched a platform</a> on the 28th. By the 31st --- three days later --- it had leaked over 1.5 million API keys and exposed countless user databases. It was called the first "Mass AI Breach" in tech history. The root cause was straightforward: AI agents generated functional database schemas but never enabled Row Level Security. No human ever reviewed the critical configuration. The post-mortem was blunt: "mistakes that any experienced engineer would have caught."</p><p>Both of these incidents happened with some level of human involvement in the development process. Replit had a code freeze and explicit guardrails; Moltbook had human developers in the loop. StrongDM's philosophy explicitly removes that involvement. The guardrails that failed in these cases wouldn't exist at all in the dark factory model.</p><p>The accountability question is worth sitting with. <strong>When nobody wrote the code and nobody reviewed it, who reconstructs the failure?</strong> Incident response assumes someone understands what the code does and why decisions were made. In a dark factory, the audit trail is a conversation between LLMs. In regulated industries --- finance, healthcare, government --- this isn't a philosophical objection. <strong>It's a compliance non-starter.</strong></p><p>Moltbook's failure is the one that should keep dark factory advocates up at night. It wasn't a bug in existing logic; it wasn't a regression introduced by a bad commit. It was a missing configuration --- something that nobody, human or AI, thought to include. Row Level Security is a checkbox. A single setting. <strong>And its absence exposed 1.5 million API keys in three days.</strong> The DTU may catch known failure modes through scenarios. But what about the edge cases that aren't in any scenario? What about the omissions that nobody anticipated?</p><h2>Who Watches the Watchmen?</h2><p>StrongDM's answer to the verification problem is the holdout-set concept, and it's clever. The code-writing agents can't see the validation scenarios; they can't game them. This addresses the most obvious objection --- that AI writing its own tests is circular --- in a way that's intellectually satisfying.</p><p>But the analogy breaks down at the boundary.</p><p>Who writes the scenarios? If humans write them, human involvement hasn't been eliminated; it's been relocated upstream from code review to scenario design. The human review still exists --- it just moved. If agents write the scenarios too, you've pushed the <em>quis custodiet</em> problem one level higher. Now agents verify agents that verify agents. The regression doesn't resolve; it recedes.</p><p>Holdout sets in machine learning work because the data distribution is knowable and the test set can be representative of the population. Software edge cases are unbounded. You can't enumerate what you haven't imagined. Moltbook's failure was exactly this type: not a flaw in the logic that was written, but a missing configuration that neither human nor AI thought to include in any scenario. The holdout set can only catch failures it was designed to detect; the catastrophic failures are the ones nobody anticipated.</p><p><a href="https://sundaylettersfromsam.substack.com/p/coding-laws-for-llms">Schillace's fourth law</a> names this precisely: "The system will be as brittle as its most brittle part." Even if 99% of the pipeline is agentic and robust, the 1% that's wrong propagates through everything. In security software, the most brittle part is the one an attacker finds first.</p><p>StrongDM hasn't published defect rates, security vulnerability metrics, or production incident data. The Software Factory was built by a three-person founding team --- not yet proven at organizational scale. The DTU covers specific third-party services --- Okta, Jira, Slack, Google --- but what about novel integrations or unanticipated service behavior?</p><p>"Deliberate naivete" is a feature when you're challenging inherited assumptions. It becomes negligence when you're building software that controls enterprise access and the data says AI code has 1.5--2x more security vulnerabilities than human-written code.</p><h2>The Economics Question</h2><p>Even if the approach works flawlessly, the economics constrain who can use it.</p><p>One thousand dollars per day per engineer. That's <a href="https://simonwillison.net/2026/Feb/7/software-factory/">$20,000 per month, $240,000 per year</a> --- in token costs alone. On top of salary, benefits, and equipment. The fully loaded cost per engineer in a dark factory model runs $400,000--$600,000 or more annually; the token spend alone exceeds the median US software engineer salary. At what product price point does that make economic sense?</p><p>Willison asked the right question: <a href="https://simonwillison.net/2026/Feb/7/software-factory/">"Does profitability require products expensive enough to justify this overhead?"</a> StrongDM builds enterprise security software --- high price point, low volume. The economics may work there. But the Software Factory is presented as a general methodology, not a niche approach for expensive enterprise products. Can a 20-person startup afford $240,000 per year per engineer in tokens? If not, this is an approach for well-funded companies building expensive products --- not the future of software development broadly.</p><p>The competitive moat problem is the second-order concern. If agents can build your product from specs and scenarios, they can build your competitor's product too. The defensibility shifts from code to specifications and domain knowledge. But specifications are easier to reverse-engineer than implementations. <a href="https://simonwillison.net/2026/Feb/7/software-factory/">Willison flagged this explicitly</a>: the feature cloning risk is real when your competitive advantage is no longer in the code itself. Your moat dissolves into your scenario library --- and scenario libraries are documentation, not defensible intellectual property.</p><h2>The Moderate Position</h2><p>There's an alternative framework for thinking about AI in development, and it comes from someone who can't be dismissed as a Luddite.</p><p>Sam Schillace --- Microsoft's Deputy CTO, creator of Google Docs --- published <a href="https://sundaylettersfromsam.substack.com/p/coding-laws-for-llms">"Coding Laws for LLMs,"</a> a set of nine principles that are both pro-AI and pro-human-oversight. His first law: "Don't write code if the model can do it." But the model should do it under supervision, not autonomously. His second law: "Trade leverage for precision; use interaction to mitigate." Human validation checkpoints are essential, not optional. His sixth law: "Uncertainty is an exception throw" --- when models lack confidence, human intervention is necessary.</p><p>The key line: "Good design of code involving LLMs takes this into account and allows for human interaction."</p><p>Schillace advocates treating models as tools, not autonomous agents. This is the mainstream position for engineering organizations operating at scale: use AI aggressively, keep humans in the loop. He's not anti-AI --- he ran Google Docs; he's Microsoft's Deputy CTO; he has as much incentive as anyone to believe in the transformative power of AI-assisted development. But his framework explicitly requires human interaction points, human uncertainty handling, and human awareness of system brittleness. The distinction is between delegation and abdication.</p><p>StrongDM's three cardinal rules explicitly forbid what Schillace's laws explicitly require. These are two different bets on where AI code quality is right now. The CodeRabbit data, the Veracode findings, the FormAI study, the Replit incident, the Moltbook breach --- <strong>the evidence favors the bet that still includes human review.</strong></p><h2>The Workforce Problem</h2><p>If no human writes or reviews code, what do engineers do? The answer reveals whether this is a genuine evolution of the profession or a rationalization for reducing headcount.</p><p>The <a href="https://www.webpronews.com/the-software-factory-has-arrived-how-ai-is-rewriting-the-rules-of-code-production-and-what-it-means-for-the-developer-workforce/">charitable framing</a>: engineers shift from code writers to supervisors and reviewers. Humans provide high-level specifications and architectural guidance; AI handles implementation. Skills gaining importance include systems thinking, security expertise, UX design, and domain knowledge. Traditional coding interviews become "increasingly misaligned with actual work developers now perform."</p><p>The <a href="https://stackoverflow.blog/2026/02/04/code-smells-for-ai-agents-q-and-a-with-eno-reyes-of-factory/">scale concern</a> is sharper: "Bringing on agents isn't hiring another person. It's like hiring a hundred intern-level engineers. You can't code review a hundred engineers." In StrongDM's model, you don't review them at all --- the scenarios do.</p><p>Then there's the comprehension debt problem --- and this one compounds over time. AI generates working code that nobody on your team understands. Peter Naur argued in 1985 that software isn't the code; it's the team's mental model of the code. When that model decays, software becomes unmaintainable regardless of how clean the code looks. Code review isn't just quality assurance; it's how teams build shared understanding of their systems. When nobody wrote the code and nobody reviewed it, who maintains it? Who debugs it? Who extends it when requirements change? The dark factory assumes maintenance is also agentic, but maintenance requires understanding context, history, and intent --- an even harder problem than generation.</p><p>"Supervisors of code-generating systems" is the generous framing. "Prompt engineers with fancy titles" is the cynical one. Both framings point to the same structural shift: value migrates to design, taste, judgment. But how many companies need a full team doing only design, taste, and judgment? The ratio changes; it doesn't change in a way that preserves current headcount. Every CTO running the numbers on agentic development needs to be honest about this implication.</p><h2>The Acquisition Test</h2><p>StrongDM is <a href="https://www.globenewswire.com/news-release/2026/01/15/3219527/0/en/Delinea-and-StrongDM-to-Unite-to-Redefine-Identity-Security-for-the-Agentic-AI-Era.html">being acquired by Delinea</a>, an identity security company that builds privileged access management and secrets management products. The deal is expected to close Q1 2026.</p><p>This matters because it's a real-world test. Did Delinea see the Software Factory methodology and buy it --- or did they buy the product and the customer base? Will the acquirer maintain "no human review" for security products once they own the compliance risk? Startup experiments often don't survive corporate integration; radical methodologies especially. If Delinea imposes human review on StrongDM's code, the Software Factory becomes a case study in methodology, not a sustainable practice.</p><p>Worth watching. The answer will tell us more about the viability of the dark factory than any whitepaper or manifesto. Corporate acquirers don't tolerate risk the way three-person founding teams do; the compliance review alone should be illuminating.</p><h2>What I'm Doing</h2><p>Not dark factory. Not even close.</p><p>The data doesn't support removing human review for production code, and it especially doesn't support it for anything security-adjacent. But I'm not dismissing the underlying ideas either. StrongDM's engineering is disciplined even if the philosophy is premature.</p><h2>What I'm Considering</h2><p>Keeping verification scenarios outside the codebase --- separate from the code that agents generate and the tests they write --- is valuable even with full human review in place. I'm experimenting with specification-driven scenarios that no agent touches, validated independently. It's a small change to the workflows I'm using; the improvement in verification confidence could be disproportionate.</p><p>The DTU concept at smaller scale: not full behavioral clones of third-party services, but mocked environments that let me test integration behavior without hitting live APIs. This was always good practice; StrongDM made the economics interesting by showing how agents can build and maintain the mocks themselves.</p><h2>What I'm Not Adopting</h2><p>"No human review." Not until the CodeRabbit numbers reverse --- and not for security-adjacent code even then. The evidence isn't there. And $1,000 per day in tokens per engineer --- the economics don't work at our scale, and I'm skeptical they work at most scales. We're spending deliberately, not maximally.</p><p><strong>Maybe StrongDM is early, not wrong. Maybe AI code quality improves enough in the next two years that "no human review" becomes defensible. I'd rather be late to a methodology that works than early to one that causes a breach.</strong></p><div><hr></div><h2>Closing Thoughts</h2><p>The Software Factory is not a question about ambition or vision. It's a question about evidence.</p><p>The holdout-set idea is smart. The DTU is impressive engineering. The three cardinal rules are ideology, not engineering --- aspiration dressed as methodology. <strong>The question isn't "should we go fully agentic?" --- that's a philosophy debate with no falsifiable answer. The question is: what would have to be true about AI code quality for you to trust it without human review?</strong></p><p>That question has a measurable answer. And right now, the measurements don't support it.</p><p>What defect rate would you need to see before removing human review? Are we there? If your scenarios catch 95% of issues, is the 5% they miss acceptable for your product? For your customers? For your compliance obligations? When --- not if --- an agent-generated system causes a production incident, who in your organization understands the code well enough to diagnose it?</p><p>StrongDM's holdout-set concept is worth adopting. Their philosophy is worth watching.</p>]]></content:encoded></item><item><title><![CDATA[Audio: When AI Agents Write Your Code, Does Language Choice Matter?]]></title><description><![CDATA[Jose Valim recently made a bold claim: Elixir is the best language for AI code generation, based on benchmarks showing high completion rates and structural benefits like immutability and ecosystem stability.]]></description><link>https://www.thepragmaticcto.com/p/audio-when-ai-agents-write-your-code</link><guid isPermaLink="false">https://www.thepragmaticcto.com/p/audio-when-ai-agents-write-your-code</guid><dc:creator><![CDATA[Allan MacGregor 🇨🇦]]></dc:creator><pubDate>Tue, 17 Feb 2026 12:15:35 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/187277571/dad0819eb62dddc05fc7c90ccc810052.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p>Jose Valim recently made a bold claim: Elixir is the best language for AI code generation, based on benchmarks showing high completion rates and structural benefits like immutability and ecosystem stability. But this sparks a deeper question&#8212;not which language is best, but whether the choice even matters when AI agents write a large chunk of your code.</p><p>The real power of languages like Scala, Haskell, or Rust isn&#8217;t Elixir&#8217;s specifics&#8212;it&#8217;s the compiler acting as an AI code reviewer. These typed, functional languages provide immediate, strict feedback that forces AI-generated code to be correct before it ever reaches human eyes. That means AI can&#8217;t just spit out code that might fail later; it has to meet the compiler&#8217;s standards upfront, which cuts down bugs and lets your engineers focus on design, not chasing type errors. Languages like Python or JavaScript don&#8217;t have that gatekeeper. AI outputs code that might or might not work, leaving bugs for humans to find later. Functional, stateless code fits the AI&#8217;s own mode of operation&#8212;small, pure functions with explicit inputs and outputs&#8212;while mutable object-oriented code demands context beyond what AI&#8217;s limited memory can handle. As Jonathan de Montalembert put it, &#8220;The more flexible and forgiving the language, the more dangerous the AI partner becomes.&#8221;</p><p>That&#8217;s compelling, but theory runs into a training data wall. Scott Arbeit showed that even with a language like F#, which ticks all the theoretical boxes, AI models often produce invalid syntax or default to more popular languages like C#. Less popular languages suffer from a vicious cycle of limited training data leading to poor AI output, which suppresses adoption and further reduces data. Meanwhile, Python dominates AI-generated code simply because models have seen more of it&#8212;80% of AI agent implementations use Python. Even the Tencent benchmark supporting Elixir had flaws: it filtered out harder problems for low-resource languages, skewing results, and practitioners report better real-world AI reliability with JavaScript or Kotlin. So, while typed functional languages might produce better code in theory, in practice, AI models do better with popular languages they know well.</p><p>But here&#8217;s the part nobody talks about enough: comprehension debt. AI-generated code can compile, pass tests, even ship&#8212;and yet nobody on your team understands how it works. This gap between code behavior and team understanding is insidious. When something breaks, the team can&#8217;t trace the logic or confidently modify the system. Peter Naur said decades ago that software is really about the team&#8217;s mental model, not just the code itself. AI doesn&#8217;t build that theory; it just generates solutions. If your team can&#8217;t read or reason about the language AI uses, the codebase becomes a liability, no matter how correct the AI&#8217;s output is. So &#8220;switch to Elixir because AI writes better Elixir&#8221; only works if your team can own Elixir code. Otherwise, mediocre code in a familiar language beats perfect code nobody understands.</p><p>And there are bigger constraints overriding theory. Hiring for niche languages like Elixir or Haskell is tough and expensive compared to Python or TypeScript, where talent is abundant. Ecosystem maturity matters too&#8212;most AI tools ship Python SDKs first, meaning AI agents have better building blocks in those languages. Existing codebases rarely get rewritten just for AI; migration costs are real and quantifiable, while AI code quality gains remain theoretical and small. Plus, AI models improve rapidly, narrowing gaps between languages over time. Python&#8217;s dominance is a network effect moat&#8212;like QWERTY or VHS&#8212;not easily displaced by technical superiority alone.</p><p>So what really makes a codebase AI-friendly? The qualities Valim highlights&#8212;immutability, strong typing, stable ecosystems, clear contracts&#8212;are portable across languages. You don&#8217;t have to switch to Elixir to get immutability; you can avoid mutating state in Python or TypeScript. Strong typing is the investment, not the language&#8212;TypeScript strict mode or Python type hints with mypy offer similar guardrails. Good documentation and comprehensive tests give AI agents better context and validation. Small, pure functions with explicit inputs and outputs help AI generate better code regardless of language. Stable APIs reduce confusion for both AI and humans. And letting AI generate types or interfaces before implementation surfaces mistakes earlier. These practices improve code maintainability and AI output simultaneously.</p><p>Personally, I&#8217;m a fan of Elixir and introduced it to my team at LiORA&#8212;not because it&#8217;s the best for AI, but because it&#8217;s a great team language. It&#8217;s proven productive, and with some nudging toward smaller, focused functions, it works well with AI tools like Claude Code. But that&#8217;s a team choice, not a universal prescription.</p><p>What should CTOs do today? Focus on what&#8217;s good for the AI and good for humans alike. Invest in documentation to provide context for both AI and developers. Write smaller functions with clear contracts, applying functional principles even in non-functional stacks. Don&#8217;t bet on today&#8217;s AI language strengths&#8212;they&#8217;ll shift in 18 months. Instead, improve your codebase properties now, which pays off for your team and future AI capabilities.</p><p>You can read the full article&#8212;with all the data and sources&#8212;on ThePragmaticCTO Substack.</p><div><hr></div><p>Read the full article &#8212; with all the data and sources &#8212; <a href="https://www.thepragmaticcto.com/publish/post/187277474">on ThePragmaticCTO</a>.</p>]]></content:encoded></item><item><title><![CDATA[When AI Agents Write Your Code, Does Language Choice Matter?]]></title><description><![CDATA[Programming languages in the AI era]]></description><link>https://www.thepragmaticcto.com/p/when-ai-agents-write-your-code-does</link><guid isPermaLink="false">https://www.thepragmaticcto.com/p/when-ai-agents-write-your-code-does</guid><dc:creator><![CDATA[Allan MacGregor 🇨🇦]]></dc:creator><pubDate>Tue, 17 Feb 2026 12:01:05 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/4a04e6dd-9dad-472c-87a1-15819d51b75a_1536x1024.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div><hr></div><p>On February 5th, <a href="https://x.com/josevalim">Jose Valim</a> published a blog post titled <a href="https://dashbit.co/blog/why-elixir-best-language-for-ai">"Why Elixir is the best language for AI."</a> His argument wasn't hand-waving. He pointed to a Tencent benchmark where Elixir achieved a 97.5% completion rate across twenty programming languages; <strong>Claude Opus 4 scored 80.3% on Elixir</strong> versus 74.9% for C# and 72.5% for Kotlin. He walked through Elixir's immutability, its ecosystem stability -- version 1.0 shipped in 2014 and the language is still on v1.x twelve years later -- and its executable documentation verified in test suites. Structural claims backed by data. Not marketing.</p><p>This poses an interesting question, is there truly a "best language for AI"? and what does it mean to be the best language for AI? Every language community right has some AI related claim. Rust advocates point to inference speed. Python advocates point to everything. Now Elixir. This is the "best language for web development" wars replayed for the agentic coding era; the actors change but the plot stays the same.</p><p>But there's a question underneath the tribalism that's worth pulling apart. Claude Code, Cursor, Copilot, Devin -- these tools are writing 30-80% of new code at many companies right now. If an AI agent is generating most of your codebase, does the target language affect the quality of what comes out?</p><p><em>That question has a more interesting answer than "Elixir wins."</em></p><h2>The Compiler as AI Code Reviewer</h2><p>The strongest version of the argument for typed and functional languages has nothing to do with Elixir specifically. It's about what happens when AI-generated code meets a compiler that can say no.</p><p>In languages like Scala, Haskell, or Rust, the feedback loop is tight: AI generates code, the compiler rejects what's invalid, the AI iterates, and eventually produces something correct. The type system catches errors before runtime -- without needing a human in the loop. Think about what that means for your review process. An entire category of bugs gets caught before a pull request ever reaches a human reviewer; your engineers spend time on logic and architecture instead of hunting for type mismatches and null reference errors that a compiler would have caught instantly.</p><p>In Python or JavaScript, the feedback loop is looser. AI generates code, it runs, it might work, you find the bugs later. Or you don't.</p><p>Alexandru Nedelcu <a href="https://alexn.org/blog/2025/11/16/programming-languages-in-the-age-of-ai-agents/">made this case convincingly</a> for Scala. AI agents successfully generate working Scala 3 macro code despite limited training data, because the compiler provides real-time validation via LSP. Expressive type systems don't just make AI code better; they make AI code <em>correctable</em>. The compiler becomes an automated code reviewer that never gets tired, never rubber-stamps a pull request, and catches entire categories of bugs that would sail through a dynamically typed language undetected.</p><p>This maps to how LLMs operate. They have limited context windows; they work best generating <a href="https://adamloving.com/2024/08/06/functional-programming-is-better-than-object-oriented-for-ai-code-generation/">small, self-described functions</a> with clear inputs and outputs. Stateless functional approaches match the LLM's own operational model -- no memory persistence between generations, no hidden state to track. Immutable data means all transformations are explicit. Pure functions have no side effects. The AI doesn't need to reason about what changed somewhere else in the program.</p><p>Contrast this with mutable object-oriented code. Object state can change anywhere. An AI agent generating a method on a class needs to understand what every other method might have done to that object's state before this method runs. That's a lot of context to track; context that fits poorly in a window measured in tokens. The AI doesn't just need to understand the function it's writing -- it needs to understand the entire object graph that function touches. In a large OOP codebase, that graph sprawls across files, modules, and inheritance hierarchies that no context window can fully capture.</p><p>Jonathan de Montalembert's framing <a href="https://devinterrupted.substack.com/p/what-language-should-llms-program">cuts to the point</a>: <strong>"The more flexible and forgiving the target language, the more dangerous the AI partner becomes."</strong> Deterministic languages with sound type systems constrain AI mistakes at compile time. Flexible languages let those mistakes ship.</p><p>Valim's Elixir-specific arguments are the sharpest example of these principles in practice. Immutability is built in, not optional. The ecosystem hasn't churned -- everything written about Elixir in the last decade still works, which means no training data confusion for models navigating deprecated APIs. Executable documentation with `iex&gt;` snippets, verified in test suites, means the training examples are more likely to be correct.</p><p>These are real structural advantages, that make Elixir the powerhouse that it is today. The compiler-as-AI-reviewer argument is genuinely compelling; the functional programming fit with LLM architecture is sound; the stability argument removes an entire class of training data problems that plague fast-moving ecosystems. Anyone dismissing this wholesale isn't paying attention.</p><h2>The Training Data Problem</h2><blockquote><p>In theory, theory and practice are the same. In practice, they are not. -- Yogi Berra</p></blockquote><p>The structural argument is sound in theory. In practice, it runs into a wall.</p>
      <p>
          <a href="https://www.thepragmaticcto.com/p/when-ai-agents-write-your-code-does">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Audio: OpenAI Didn't Buy a Product. They Bought a Distribution Channel.]]></title><description><![CDATA[OpenAI&#8217;s recent acquisition of OpenClaw wasn&#8217;t just about talent or technology.]]></description><link>https://www.thepragmaticcto.com/p/audio-openai-didnt-buy-a-product</link><guid isPermaLink="false">https://www.thepragmaticcto.com/p/audio-openai-didnt-buy-a-product</guid><dc:creator><![CDATA[Allan MacGregor 🇨🇦]]></dc:creator><pubDate>Mon, 16 Feb 2026 20:27:50 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/188178851/67961a05e1d0d198fa2c75f1b573f264.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p>OpenAI&#8217;s recent acquisition of OpenClaw wasn&#8217;t just about talent or technology. They bought a distribution channel&#8212;a powerful revenue pipeline that was funneling massive API usage and revenue to a competitor, Anthropic. OpenClaw, an autonomous agent platform, defaults its model provider hierarchy to Anthropic&#8217;s Claude models, which dominate the token consumption that drives API revenue.</p><p>OpenClaw isn&#8217;t your average chatbot; it&#8217;s a relentless token furnace. It integrates deeply with email, calendars, browsers, and messaging apps, running multi-step workflows with persistent memory. This architecture means it burns through tokens at astonishing rates&#8212;sessions can balloon to hundreds of thousands of tokens, and background &#8220;heartbeat&#8221; checks alone can cost hundreds of dollars per week per agent. Light users spend tens of dollars monthly on API calls, but heavy users can rack up thousands, even tens of thousands, in a single month. This is a quantum leap beyond chatbot-era economics&#8212;it&#8217;s not incremental, it&#8217;s orders of magnitude more expensive.</p><p>Those tokens translate directly into revenue for model providers. OpenClaw&#8217;s default configurations overwhelmingly favor Anthropic&#8217;s Claude models, driving the bulk of this enormous token spend to Anthropic&#8217;s API. With OpenClaw&#8217;s explosive growth&#8212;over 180,000 GitHub stars and an estimated 50,000 to 200,000 active users&#8212;this translates to tens of millions, potentially over a hundred million dollars in annualized API revenue flowing to Anthropic. For OpenAI, facing billions in projected losses and intense competition, that&#8217;s a revenue leak they couldn&#8217;t ignore.</p><p>The irony is sharp. Steinberger built OpenClaw explicitly for Claude, even naming it after the Claude model. He was essentially subsidizing Anthropic&#8217;s revenue by running high-cost API calls on his own dime. Anthropic&#8217;s response was to send a cease-and-desist over the project&#8217;s name, alienating the very community driving their growth. Within weeks, OpenAI swooped in, acqui-hiring Steinberger and effectively capturing the most powerful agent ecosystem driving revenue to their competitor.</p><p>This acquisition wasn&#8217;t just about adding a brilliant engineer or community goodwill. It was about controlling the defaults in agent platforms, which dictate model usage and thus revenue flows. Defaults matter. Just like browser search engine defaults shaped billions in ad revenue, agent platform defaults will shape trillions of tokens in API spend. Autonomous agents running 24/7 with complex workflows generate hundreds of millions to trillions of tokens monthly. Whoever controls that agent layer controls the revenue.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.thepragmaticcto.com/?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share The Pragmatic CTO&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.thepragmaticcto.com/?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share The Pragmatic CTO</span></a></p><p>This is the start of a broader pattern: autonomous agents are becoming the new distribution layer for AI models, much like mobile apps became the distribution layer for cloud infrastructure in the 2010s. Apps created persistent compute demand, driving massive cloud revenue. Agents now create persistent token demand that compounds with each new user and integration. The scale is breathtaking&#8212;over 50 trillion tokens processed daily across the market, with agents accounting for nearly half. The economics of model defaults in agent platforms will be the new battleground.</p><p>For CTOs evaluating agent infrastructure, this means your choice of default model provider isn&#8217;t neutral&#8212;it&#8217;s a financial commitment. The token economics of agents dwarf chatbot-era costs. A fleet of agents running constant heartbeats can cost hundreds of thousands annually just to maintain status checks. Vendor lock-in now happens not just at the API level but through accumulated context, workflows, and integrations tuned to a specific provider&#8217;s models. Switching costs are no longer just about code migration&#8212;they&#8217;re about losing months of institutional memory embedded in your agents.</p><p>Over the next year, I&#8217;m watching four key signals. First, whether OpenClaw&#8217;s defaults shift from Claude to OpenAI models, signaling revenue redirection. Second, if Steinberger&#8217;s projects at OpenAI mirror OpenClaw&#8217;s agent approach but built on OpenAI&#8217;s stack. Third, Anthropic&#8217;s response&#8212;will they partner with or acquire another agent platform to reclaim distribution? And fourth, whether agent platform defaults become a negotiation point in enterprise API contracts, akin to search engine default deals.</p><p>Ask yourself: do you know where your API spend is going? Have you updated your budgets for the explosive token burn of autonomous agents or are you still thinking in chatbot terms? Would you notice if your agent platform&#8217;s default model changed tomorrow? The headlines have moved on from the acqui-hire narrative, but the token economics haven&#8217;t. Understanding who controls your agent defaults is no longer just a technical choice&#8212;it&#8217;s a financial one.</p><p>You can read the full article&#8212;with all the data and sources&#8212;on ThePragmaticCTO Substack.</p><div><hr></div><p>Read the full article &#8212; with all the data and sources &#8212; <a href="https://www.thepragmaticcto.com/publish/post/188178590">on ThePragmaticCTO</a>.</p>]]></content:encoded></item><item><title><![CDATA[OpenAI Didn't Buy a Product. They Bought a Distribution Channel.]]></title><description><![CDATA[The Token Economics Behind the OpenClaw Acqui-Hire]]></description><link>https://www.thepragmaticcto.com/p/openai-didnt-buy-a-product-they-bought</link><guid isPermaLink="false">https://www.thepragmaticcto.com/p/openai-didnt-buy-a-product-they-bought</guid><dc:creator><![CDATA[Allan MacGregor 🇨🇦]]></dc:creator><pubDate>Mon, 16 Feb 2026 20:27:06 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/e2a06d3d-824e-45ff-a615-971a18bd95d4_1536x1024.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>The Token Economics Behind the OpenClaw Acqui-Hire</h2><div><hr></div><p>On February 15, 2026, Sam Altman <a href="https://x.com/sama/status/2023150230905159801">announced on X</a> that OpenClaw creator Peter Steinberger was joining OpenAI. He called Steinberger "a genius with a lot of amazing ideas about the future of very smart agents interacting with each other to do very useful things for people." The framing was deliberate---talent, vision, the future of personal agents. Every analysis published since has dutifully followed the same narrative; OpenAI acquired a brilliant founder, absorbed the most viral open-source project in GitHub history, and positioned itself to dominate the agent layer.</p><p>That narrative isn't wrong. But it's incomplete in a way that matters.</p><p>Buried beneath the talent story is a financial reality that almost nobody is discussing. <a href="https://docs.openclaw.ai/providers/anthropic">OpenClaw's default provider hierarchy</a> places Anthropic first---above OpenAI, above Google, above every other model provider. The default primary model is `anthropic/claude-opus-4-6`. Steinberger himself <a href="https://www.getaiperks.com/en/blogs/18-best-ai-model-openclaw">recommended Claude Opus 4.6</a> for heavy agent workloads; community guides consistently called Claude Sonnet 4.5 the "sweet spot" for most users; independent benchmarks found that <a href="https://lumadock.com/tutorials/openclaw-claude-vs-openai-model-choice">Claude outperformed GPT-4o</a> on long-context tasks, prompt-injection resistance, and multi-step tool use---the exact capabilities autonomous agents need most. One industry analyst put it bluntly: "OpenClaw was one of the biggest drivers of paying API traffic to Anthropic, since most users ran it on <a href="https://mondaymorning.substack.com/p/openclaw-and-the-acqui-hire-that">Claude</a>."</p><p>OpenAI didn't just buy a genius. I believe they bought a distribution channel that was sending a competitor's revenue through the roof---and they are about to redirected it.</p><h2>The Token Furnace</h2><p>Understanding why this acquisition makes financial sense requires understanding how much money autonomous agents burn. OpenClaw isn't a chatbot; it's a 24/7 autonomous system that connects to your email, calendar, messaging platforms, and web browser, chaining multi-step workflows together with persistent memory across sessions. Every one of those operations consumes API tokens; <strong>the architecture ensures that consumption is extraordinary.</strong></p><p>Six factors make OpenClaw a token furnace. Context accumulation accounts for <a href="https://help.apiyi.com/en/openclaw-token-cost-optimization-guide-en.html">40-50% of total spend</a>, because the entire conversation history is resent with every API call; sessions with roughly 35 messages had grown to 2.9 megabytes in one documented case, occupying 56-58% of a 400,000-token context window. Tool outputs from shell commands, file reads, and web fetches deposit thousands of additional tokens into that context; OpenClaw's system prompt---5,000 to 10,000 tokens---ships with every single API call regardless of whether the user is asking a complex question or checking whether any tasks exist. And the default "heartbeat" check runs every thirty minutes, sending the entire 120,000-token context window to the API for what amounts to a status ping. At Opus pricing, that heartbeat alone <a href="https://www.notebookcheck.net/Free-to-use-AI-tool-can-burn-through-hundreds-of-Dollars-per-day-OpenClaw-has-absurdly-high-token-use.1219925.0.html">costs approximately $0.75 per check</a>---roughly $250 per week for an agent that mostly reports nothing.</p><p>The per-user costs that result from this architecture are unlike anything the chatbot era prepared us for. <a href="https://openclawpulse.com/openclaw-api-cost-deep-dive/">Light users consuming 5-20 daily messages</a> spend $10-30 per month on Claude Sonnet; medium users running automated workflows and cron jobs land between $30 and $150; heavy users operating 24/7 assistants with browser automation can reach $750 to $3,000 or more per month on Opus-tier models. The extreme documented cases are worse still. Federico Viticci, the tech blogger, <a href="https://openclawpulse.com/openclaw-api-cost-deep-dive/">burned through $3,600 in a single month</a>; a German publication hit <a href="https://www.notebookcheck.net/Free-to-use-AI-tool-can-burn-through-hundreds-of-Dollars-per-day-OpenClaw-has-absurdly-high-token-use.1219925.0.html">$100 in a single day of testing</a>; one Moltbook user watched $8 disappear every thirty minutes---$380 per day---just processing new social posts.</p><p>Compare that to a ChatGPT conversation, which might consume a few thousand tokens per session at pennies per interaction. The gap between chatbot-era economics and agent-era economics is not incremental; it is orders of magnitude.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!vEOT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e3b92c3-4b78-4c49-bb7a-2eeff6434ce3_1200x496.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!vEOT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e3b92c3-4b78-4c49-bb7a-2eeff6434ce3_1200x496.jpeg 424w, https://substackcdn.com/image/fetch/$s_!vEOT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e3b92c3-4b78-4c49-bb7a-2eeff6434ce3_1200x496.jpeg 848w, https://substackcdn.com/image/fetch/$s_!vEOT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e3b92c3-4b78-4c49-bb7a-2eeff6434ce3_1200x496.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!vEOT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e3b92c3-4b78-4c49-bb7a-2eeff6434ce3_1200x496.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!vEOT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e3b92c3-4b78-4c49-bb7a-2eeff6434ce3_1200x496.jpeg" width="728" height="300.9066666666667" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5e3b92c3-4b78-4c49-bb7a-2eeff6434ce3_1200x496.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:496,&quot;width&quot;:1200,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:69351,&quot;alt&quot;:&quot;Monthly API Cost: Chatbot vs. Autonomous Agent&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Monthly API Cost: Chatbot vs. Autonomous Agent" title="Monthly API Cost: Chatbot vs. Autonomous Agent" srcset="https://substackcdn.com/image/fetch/$s_!vEOT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e3b92c3-4b78-4c49-bb7a-2eeff6434ce3_1200x496.jpeg 424w, https://substackcdn.com/image/fetch/$s_!vEOT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e3b92c3-4b78-4c49-bb7a-2eeff6434ce3_1200x496.jpeg 848w, https://substackcdn.com/image/fetch/$s_!vEOT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e3b92c3-4b78-4c49-bb7a-2eeff6434ce3_1200x496.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!vEOT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e3b92c3-4b78-4c49-bb7a-2eeff6434ce3_1200x496.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Follow the Money</h2><p>Those tokens are revenue for someone; the question that matters---the one that reframes the entire acquisition---is who.</p><p>OpenClaw is model-agnostic by design; users can configure any provider through their own API keys. But defaults drive behavior, and OpenClaw's defaults overwhelmingly favor Anthropic. The <a href="https://docs.openclaw.ai/providers/anthropic">provider priority hierarchy</a> in the official documentation reads Anthropic first, then OpenAI, then OpenRouter, followed by Gemini and a long tail of smaller providers; when a user configures an Anthropic API key, Claude models are automatically set as primary. The original project was named "Clawdbot"---a phonetic play on Claude---and the community that coalesced around it adopted Claude as the consensus recommendation for agent workloads. <a href="https://lumadock.com/tutorials/openclaw-claude-vs-openai-model-choice">Claude's advantages in long-context reasoning, prompt-injection resistance, and multi-step tool use</a> mapped precisely to what autonomous agents demand most; even users who started with OpenAI keys often migrated to Anthropic after community forums pointed them there.</p><p>The aggregate revenue implications of this default are significant, even using conservative assumptions. OpenClaw crossed <a href="https://www.cnbc.com/2026/02/15/openclaw-creator-peter-steinberger-joining-openai-altman-says.html">180,000 GitHub stars</a> and had <a href="https://www.cnbc.com/2026/02/02/openclaw-open-source-ai-agent-rise-controversy-clawdbot-moltbot-moltbook.html">1.5 million agents created</a> by early February 2026. GitHub stars-to-active-user conversion for developer tools typically runs between 10% and 30%, which suggests an active user base somewhere between 50,000 and 200,000 people. Multiply by the documented average monthly API spend of $15 to $50 per user, and the back-of-envelope math produces annualized figures of $9 million at the conservative end, $36 million at the moderate estimate, and $120 million at the aggressive end; the majority flowing to Anthropic.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!j_S-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba944f6e-28fd-4050-9597-930001502104_1200x426.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!j_S-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba944f6e-28fd-4050-9597-930001502104_1200x426.jpeg 424w, https://substackcdn.com/image/fetch/$s_!j_S-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba944f6e-28fd-4050-9597-930001502104_1200x426.jpeg 848w, https://substackcdn.com/image/fetch/$s_!j_S-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba944f6e-28fd-4050-9597-930001502104_1200x426.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!j_S-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba944f6e-28fd-4050-9597-930001502104_1200x426.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!j_S-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba944f6e-28fd-4050-9597-930001502104_1200x426.jpeg" width="728" height="258.44" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ba944f6e-28fd-4050-9597-930001502104_1200x426.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:426,&quot;width&quot;:1200,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:63358,&quot;alt&quot;:&quot;Estimated Annualized API Revenue from OpenClaw Users&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Estimated Annualized API Revenue from OpenClaw Users" title="Estimated Annualized API Revenue from OpenClaw Users" srcset="https://substackcdn.com/image/fetch/$s_!j_S-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba944f6e-28fd-4050-9597-930001502104_1200x426.jpeg 424w, https://substackcdn.com/image/fetch/$s_!j_S-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba944f6e-28fd-4050-9597-930001502104_1200x426.jpeg 848w, https://substackcdn.com/image/fetch/$s_!j_S-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba944f6e-28fd-4050-9597-930001502104_1200x426.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!j_S-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba944f6e-28fd-4050-9597-930001502104_1200x426.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>These are rough numbers, and no published aggregate data exists for OpenClaw API spend. But even the conservative figure of $9 million annually represents a non-trivial revenue stream; <a href="https://www.webpronews.com/openai-api-surges-to-1b-monthly-revenue-eclipsing-chatgpt-growth/">OpenAI's API business hit $1 billion ARR</a> in late 2025, while <a href="https://www.tomshardware.com/tech-industry/anthropic-targets-gigantic-usd26-billion-in-revenue-by-the-end-of-2026-eye-watering-sum-is-more-than-double-openais-projected-2025-earnings">Anthropic targets $26 billion in revenue by the end of 2026</a>. A single agent platform driving $36 million or more in annual API spend to a competitor is the kind of leak that a company projecting <a href="https://www.saastr.com/openai-crosses-12-billion-arr-the-3-year-sprint-that-redefined-whats-possible-in-scaling-software/">$14 billion in losses for 2026</a> cannot afford to ignore.</p><h2>Anthropic's Gift to OpenAI</h2><p>The irony of this acquisition sharpens when you trace the timeline of Anthropic's own decisions.</p><p>Steinberger built Clawdbot in November 2025---named after Claude, built for Claude, defaulting to Claude, driving every token of its explosive growth directly into Anthropic's API revenue. Within weeks the project became the fastest-growing open-source repository in GitHub history, crossing 180,000 stars in roughly sixty days and generating the kind of organic developer evangelism that no marketing budget can buy. Steinberger was losing <a href="https://www.ainvest.com/news/openclaw-acquisition-offers-expectation-gap-20k-losses-billion-dollar-bids-2602/">$10,000 to $20,000 per month</a> running OpenClaw, and the vast majority of that cost was API spend; <a href="https://www.hostinger.com/tutorials/openclaw-costs">infrastructure costs ran only $10-25 per month</a> for the servers themselves. He was subsidizing Anthropic's revenue out of his own pocket while building their most powerful distribution channel.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.thepragmaticcto.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.thepragmaticcto.com/subscribe?"><span>Subscribe now</span></a></p><p>Anthropic's response to this gift was to send lawyers.</p><p>On January 27, 2026, Anthropic issued a trademark cease-and-desist over "Clawd" being too phonetically similar to "Claude." Steinberger renamed the project to Moltbot, then to OpenClaw within two days; the Hacker News community <a href="https://news.ycombinator.com/item?id=47027907">called it an "Anthropic fumble"</a> that damaged the company's reputation in the open-source community while handing OpenClaw a fresh wave of viral attention through the drama. One analyst captured the absurdity precisely: "The OpenClaw creator built this project for Claude, named it after Claude, and was actively driving revenue and developer mindshare to Anthropic's API. Instead of recognizing what they had---an unpaid evangelist building the most viral agent ecosystem in history on top of their model---<a href="https://ajsai.substack.com/p/breaking-openclaw-goes-to-openai">Anthropic sent lawyers</a>."</p><p>Eighteen days later, OpenAI swooped in and acqui-hired Steinberger. The <a href="https://mondaymorning.substack.com/p/openclaw-and-the-acqui-hire-that">Monday Morning Substack called it</a> a potential "fumble of the century for Anthropic," noting that Anthropic's enterprise market share had grown to 40% while OpenAI declined to 27%---a shift partially driven by developer tools and agent ecosystems running on Claude. Anthropic was winning the developer distribution war through organic adoption; then it chose to antagonize the single person doing more for that adoption than anyone on its payroll.</p><h2>The Distribution Channel Thesis</h2><p>The conventional reading of this acquisition focuses on three assets: Steinberger's talent, his architectural knowledge of agent systems, and the community goodwill attached to OpenClaw. All three are real and valuable; none of them explain the speed of the move, the personal involvement of Altman, or the competitive urgency of bidding against <a href="https://eu.36kr.com/en/p/3681454940152">Mark Zuckerberg's direct outreach via WhatsApp</a>.</p><p>A distribution channel thesis does.</p><p>By bringing Steinberger in-house, OpenAI can shift the default model hierarchy in whatever agent products emerge from his work---and defaults, as every CTO who has watched browser search engine deals knows, drive the overwhelming majority of usage. OpenAI captures a proven demand generation channel; OpenClaw demonstrated that autonomous agents create enormous, persistent, recurring API demand that dwarfs anything a chatbot produces. A ChatGPT user might generate a few thousand tokens per conversation; an OpenClaw agent running 24/7 with heartbeats, cron jobs, and multi-step workflows <a href="https://help.apiyi.com/en/openclaw-token-cost-optimization-guide-en.html">generates 5 to 200 million tokens per month</a>. If even 100,000 users run agents on OpenAI models at those consumption rates, the resulting 500 billion to 20 trillion tokens per month would represent a significant fraction of OpenAI's total API throughput---which currently stands at <a href="https://openai.com/index/new-tools-for-building-agents/">6 billion tokens per minute</a>.</p><p>The move also denies Anthropic its most effective unpaid distribution partner at a moment when distribution matters as much as model quality; it locks Steinberger's architectural thinking into OpenAI's agent-native infrastructure---the <a href="https://venturebeat.com/ai/openais-strategic-gambit-the-agent-sdk-and-why-it-changes-everything-for-enterprise-ai">Agents SDK</a>, the Responses API, the Frontier Platform---at a time when <a href="https://venturebeat.com/ai/openais-strategic-gambit-the-agent-sdk-and-why-it-changes-everything-for-enterprise-ai">93% of companies processing more than one trillion tokens on OpenAI</a> already use framework-based agent orchestration.</p><p>No insider has confirmed that token economics or revenue redirection played a role in the acquisition decision; his is purely my analysis based on the substantial yet circunstantial evidence. The strongest version of this thesis is not that OpenAI was protecting its own revenue--- <strong>it's that OpenAI was capturing a revenue channel that was primarily benefiting a competitor</strong>, at a moment when both companies are burning billions to establish market dominance.</p><h2>Agents Are the New Apps</h2><p>The OpenClaw acquisition fits a broader pattern that I believe will define the economics of AI infrastructure for the next three to five years: autonomous agents are becoming the distribution layer that drives model provider revenue, in exactly the way that mobile apps became the distribution layer that drove cloud compute revenue.</p><p>The structural parallel is almost exact. In the 2010s, mobile apps created persistent compute demand---always-on services running in the background, pushing notifications, syncing data, processing transactions---that drove AWS, GCP, and Azure revenue far beyond what web applications alone would have generated. SaaS products did the same for payment processing; every recurring subscription flowing through Stripe created persistent transaction volume that compounded as the ecosystem grew. Autonomous agents are now doing this for LLM APIs; an agent running 24/7 with periodic heartbeats, automated workflows, and multi-step reasoning creates persistent token demand that compounds with every new user, every new integration, every new automated task.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!lg8t!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e57e632-f955-4e2a-808b-e8ddcfab2703_900x375.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!lg8t!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e57e632-f955-4e2a-808b-e8ddcfab2703_900x375.jpeg 424w, https://substackcdn.com/image/fetch/$s_!lg8t!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e57e632-f955-4e2a-808b-e8ddcfab2703_900x375.jpeg 848w, https://substackcdn.com/image/fetch/$s_!lg8t!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e57e632-f955-4e2a-808b-e8ddcfab2703_900x375.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!lg8t!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e57e632-f955-4e2a-808b-e8ddcfab2703_900x375.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!lg8t!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e57e632-f955-4e2a-808b-e8ddcfab2703_900x375.jpeg" width="728" height="303.3333333333333" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8e57e632-f955-4e2a-808b-e8ddcfab2703_900x375.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:375,&quot;width&quot;:900,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:37125,&quot;alt&quot;:&quot;The Distribution Layer Pattern&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="The Distribution Layer Pattern" title="The Distribution Layer Pattern" srcset="https://substackcdn.com/image/fetch/$s_!lg8t!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e57e632-f955-4e2a-808b-e8ddcfab2703_900x375.jpeg 424w, https://substackcdn.com/image/fetch/$s_!lg8t!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e57e632-f955-4e2a-808b-e8ddcfab2703_900x375.jpeg 848w, https://substackcdn.com/image/fetch/$s_!lg8t!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e57e632-f955-4e2a-808b-e8ddcfab2703_900x375.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!lg8t!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e57e632-f955-4e2a-808b-e8ddcfab2703_900x375.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The scale of the opportunity explains the urgency. The total LLM API market processes approximately <a href="https://fireworks.ai/blog/state-of-agent-environments">50 trillion tokens per day</a>, with code generation and agent workflows accounting for 40-50% of that volume. OpenAI's token throughput has grown <a href="https://venturebeat.com/ai/openais-strategic-gambit-the-agent-sdk-and-why-it-changes-everything-for-enterprise-ai">700% year over year</a>; agentic inference is the <a href="https://fireworks.ai/blog/state-of-agent-environments">fastest-growing usage pattern</a> across every major API provider. Whoever controls the agent layer---the platforms where autonomous workflows are designed, deployed, and defaulted to specific models---controls the revenue that flows from them.</p><p>Default model settings in agent platforms are becoming the new default search engine deals. Google paid Apple billions annually to remain Safari's default search engine because defaults drive behavior at scale; the economics of agent platform defaults follow the same logic, with token revenue replacing advertising revenue as the prize.</p><h2>What This Means for Your Budget</h2><p>If you're running or evaluating agent infrastructure, the token economics of this acquisition carry practical implications that most planning processes have not caught up with.</p><p>The first is that your agent platform's default model is not a neutral technical choice---it's a revenue channel decision for someone else. Every token your autonomous agents consume is revenue for a model provider; the provider your platform defaults to captures the vast majority of that spend because most users never change defaults. When you evaluate agent platforms, understanding the default provider hierarchy is as important as understanding the capability benchmarks; the platform's incentives shape which models your agents will call, how aggressively context is managed, and whether token efficiency is a design priority or an afterthought.</p><p>The second is that the cost structure of autonomous agents bears almost no resemblance to the cost structure of chatbot-era AI tools. A developer using GitHub Copilot generates predictable, bounded API costs that correlate with working hours. A fleet of autonomous agents running 24/7 with heartbeat checks, persistent memory, and multi-step workflows generates costs that correlate with uptime; uptime is 168 hours per week regardless of whether any productive work is happening. The heartbeat problem alone can cost $250 per week per agent at Opus pricing; multiply that across a team of twenty agents and you're spending $260,000 annually on status pings. Most AI budgets were built for the chatbot era and have not been recalibrated for always-on autonomous systems.</p><p>The third is that vendor lock-in through agent defaults is the new lock-in vector that most CTOs are not even aware of. Once your workflows, persistent memory, integration configurations, and skill marketplace dependencies are built on a specific agent platform with a specific model default, switching costs compound rapidly. This is not the familiar lock-in of cloud provider APIs or database engines; it's lock-in through the accumulated context and behavioral tuning of autonomous systems that learn and adapt over time. The switching cost isn't technical migration alone---it's the loss of institutional memory that your agents have built over months of operation.</p><h2>What I'm Watching For</h2><p>Four signals over the next six to twelve months will determine whether the distribution channel thesis holds.</p><p>The most telling will be whether OpenClaw's default model hierarchy shifts from Anthropic to OpenAI. The project is moving to an independent foundation, but if the defaults change within the first two releases after the transition, the revenue redirection motive becomes difficult to argue against; a subtler version of the same signal would be OpenAI offering preferential API pricing or free tiers specifically for OpenClaw users---a subsidy that looks like community support but functions as customer acquisition for API revenue.</p><p><strong>The second signal</strong> is whether Steinberger's first projects at OpenAI resemble "OpenClaw for GPT"---consumer-facing autonomous agents built on OpenAI's infrastructure that inherit the design patterns and community goodwill of the project he built. If the agent architecture he designed to drive Claude usage gets rebuilt to drive GPT usage, the capture is complete.</p><p><strong>The third is Anthropic's response</strong>; if Anthropic acquires or deeply partners with another agent platform within the next six months, it validates that they recognize the distribution channel they lost. Silence would suggest they either disagree with this framing or haven't yet grasped what happened.</p><p><strong>The fourth is broader:</strong> whether agent platform defaults become a negotiation point in enterprise API contracts the way search engine defaults became negotiation points in browser contracts. If model providers start paying agent platform developers for default placement, the parallel to search engine economics will be fully realized---and the OpenClaw acquisition will look less like an acqui-hire and more like the opening move in a distribution war.</p><h2>Questions Worth Asking</h2><p>When you evaluate an agent platform, do you trace where your API spend goes? Not the total cost---the destination. Do you know which model provider benefits most from your agent infrastructure, and whether that alignment was a deliberate choice or an inherited default?</p><ul><li><p>Have you budgeted for the token economics of autonomous agents, or are you still forecasting based on chatbot-era usage patterns? The difference between a developer using an AI coding assistant and a fleet of agents running 24/7 is not 2x or 5x---it's 100x to 1,000x in token consumption, and it scales with uptime rather than headcount.</p></li><li><p>If your agent platform's default model changed tomorrow, would you notice? Would your team? Would your finance team?</p></li></ul><p>The acqui-hire headlines have moved on. The token economics haven't. And if I'm right that agents are becoming the distribution layer for model provider revenue, then understanding who controls your agent defaults is no longer a technical question. It's a financial one.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.thepragmaticcto.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">The Pragmatic CTO is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Audio: Rise of the Citizen Coder: The Other Side of the Agentic Revolution]]></title><description><![CDATA[Backlogs are the silent killer of innovation.]]></description><link>https://www.thepragmaticcto.com/p/audio-rise-of-the-citizen-coder-the</link><guid isPermaLink="false">https://www.thepragmaticcto.com/p/audio-rise-of-the-citizen-coder-the</guid><dc:creator><![CDATA[Allan MacGregor 🇨🇦]]></dc:creator><pubDate>Mon, 16 Feb 2026 17:30:54 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/187249233/798437fe4ee42539f5e18ac98c3b04ae.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p>Backlogs are the silent killer of innovation. I&#8217;ve seen it too many times&#8212;a simple internal tool that should take two weeks gets pushed to the bottom of a six-month backlog because engineers are drowning in higher priority work. The system we built demands specialized skills even for the smallest things, creating a bottleneck that frustrates everyone. This backlog isn&#8217;t just a scheduling problem; it&#8217;s a symptom of an industry that made building software unnecessarily hard for domain experts who actually need solutions.</p><p>And it gets worse. The gatekeeping problem is baked into our culture. &#8220;Learn to code&#8221; has meant years of grinding through syntax, frameworks, deployment, and debugging before you can build anything useful. That made sense if you wanted to be a programmer, but it was never realistic advice for a marketing director, operations manager, or founder who just wants to solve a problem in their area of expertise. We built a fortress of complexity around software creation, and it locked out the very people who had the best ideas for what to build.</p><p>But here&#8217;s the part nobody talks about: there&#8217;s another tribe in programming&#8212;people who don&#8217;t care about the craft of coding itself but see code as a tool to get things done. They didn&#8217;t fall in love with syntax; they fell in love with products. For them, AI and low-code tools aren&#8217;t shortcuts; they&#8217;re a way to tear down artificial barriers. They want to focus on delivering value, not on writing elegant code. I don&#8217;t fully agree with this philosophy, but it&#8217;s coherent and legitimate, and it forces us to rethink what &#8220;good software&#8221; really means.</p><p>And something surprising is happening: it&#8217;s working. Not just hype or demos, but real startups and teams shipping AI-generated codebases that are good enough to win funding and users. Founders with deep domain knowledge are building MVPs in weeks, business analysts are delivering internal tools without waiting months, and designers are prototyping real interactions without engineers. These aren&#8217;t fringe cases; this is mainstream now. Sure, the code isn&#8217;t perfect. It&#8217;s not elegant or maintainable in the traditional sense. But it works&#8212;and sometimes, that&#8217;s enough.</p><p>That said, the risks are very real. AI-generated code is often &#8220;almost right,&#8221; and almost right can quickly become a liability when it hits edge cases, security vulnerabilities, or performance bottlenecks. Maintenance falls on engineering teams who didn&#8217;t build the system in the first place, spawning what some call &#8220;rescue engineering.&#8221; The division of labor might be citizen coders generating and engineers cleaning up. Whether that&#8217;s sustainable is an open question, but it&#8217;s happening whether we like it or not.</p><p>As a CTO, the question isn&#8217;t if citizen coding is coming&#8212;it&#8217;s already here. The real challenge is figuring out where your organization draws the line between what&#8217;s fine for vibe coding and what demands engineering rigor. Is that internal tool for fifty users or fifty thousand? A prototype or a production system? Disposable or durable? Those boundaries aren&#8217;t clear, but they&#8217;re critical. Understanding where your backlog is truly complex work&#8212;and where it&#8217;s just waiting on bandwidth&#8212;can help you decide where to empower domain experts to build directly.</p><p>I&#8217;m putting my money where my mouth is. In 2026, I&#8217;m launching experiments with micro-SaaS products built end-to-end by agent teams. This isn&#8217;t a contradiction but a test. If agentic coding can build and operate real businesses, I want to see it firsthand. Maybe the quality problems will surface; maybe they won&#8217;t. But speculation only takes you so far&#8212;I&#8217;m ready to find out by building.</p><p>There&#8217;s no neat conclusion here. The craftsmanship side is right about the dangers of abstraction debt and knowledge gaps. The citizen coder side is right that the system failed many people and that not all software needs to be a cathedral. We&#8217;re witnessing a profession fragmenting in real time: craftsmen working on systems demanding deep expertise, citizen coders building things that otherwise never get built, and a messy middle where the boundaries blur and failures teach us where they should be drawn.</p><p>The question isn&#8217;t which side wins. It&#8217;s whether we can capture the benefits of democratizing software creation without drowning in the maintenance debt that worries the craftsmen. Nobody has figured that out yet. The craft isn&#8217;t dead, but it&#8217;s no longer the only way to build. That changes everything about who we are as engineers and what we&#8217;re for.</p><p>You can read the full article&#8212;with all the data and sources&#8212;on ThePragmaticCTO Substack.</p><div><hr></div><p>Read the full article &#8212; with all the data and sources &#8212; on ThePragmaticCTO Substack.</p>]]></content:encoded></item><item><title><![CDATA[Rise of the Citizen Coder: The Other Side of the Agentic Revolution]]></title><description><![CDATA[Part II &#8212; Death of a Craftsman]]></description><link>https://www.thepragmaticcto.com/p/rise-of-the-citizen-coder-the-other</link><guid isPermaLink="false">https://www.thepragmaticcto.com/p/rise-of-the-citizen-coder-the-other</guid><dc:creator><![CDATA[Allan MacGregor 🇨🇦]]></dc:creator><pubDate>Mon, 16 Feb 2026 17:08:15 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/5a835e0e-6731-4612-9ff4-b06469744e92_1536x1024.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Gartner estimates that by 2026, citizen developers at large enterprises will <a href="https://kissflow.com/citizen-development/gartner-on-citizen-development/">outnumber professional software developers four to one</a>. Four to one. Not because companies suddenly stopped needing engineers; because the backlog of things that needed building was always bigger than the engineering team could handle, and now people are finding ways to build without them.</p><p>In Part 1, I wrote about what we stand to lose as agentic coding becomes the norm. The erosion of craft. The abstraction debt. The knowledge gap that compounds over generations. I meant every word of it.</p><p>The software industry I'm part of and built a career in also failed a lot of people. If we're going to critique the new wave of "vibe coding," we have to be equally critical of the industry that created the barriers in the first place. This is the other side.</p><div><hr></div><h2>The Backlog</h2><p>The same scene plays out in every company, every week.</p><p>A product manager walks into a planning meeting with a simple request: an internal tool to track customer feedback. Nothing fancy. A form, a database, some basic reporting. Maybe two weeks of work for someone who knows what they're doing.</p><p>Engineering estimates six months. Not because the tool is complex&#8212;it isn't. But the backlog is full. Three major features sit ahead of it, two critical bugs need fixing, and a migration has been pushed back twice. The PM's little tool isn't a priority. It goes to the bottom of the list, where it will sit until someone forgets why they wanted it in the first place.</p><p>The PM is frustrated. The engineers are frustrated too&#8212;they're not trying to be difficult; they're just drowning. Everyone agrees the tool would be useful. Nobody has bandwidth to build it.</p><p>This is the bottleneck that agentic coding is breaking open.</p><p>I've been on both sides of this conversation. I've been the engineer explaining why we can't get to something for months; I've been the person with a simple idea watching it die in backlog purgatory. The system we built&#8212;the one that requires specialized skills to create even basic software&#8212;created a chokepoint that serves nobody well.</p><p>The average enterprise IT backlog runs <a href="https://www.ciodive.com/news/IT-backlog-applications-COVID-19/606554/">three to twelve months</a>. That's three to twelve months where someone with a real problem waits for someone with the right skills to have time for them. Sometimes the wait is justified; the work is genuinely complex. But often it's not. Often it's just that we created a system where building anything requires specialized skills, and the people with those skills don't have bandwidth for everyone's problems.</p><h2>The Gatekeeping Problem</h2><p>I don't think most programmers set out to create barriers. It just happened. The skills take years to acquire. The tools resist simplification. The failure modes can take down production systems. All of that is true.</p><p>But the effect&#8212;intended or not&#8212;was a bottleneck that locked people out.</p><p>Think about what "learn to code" meant as advice. It meant: spend months or years acquiring foundational skills before you can build anything useful. Take courses. Do tutorials. Learn syntax, then frameworks, then deployment, then debugging. By the time you're competent enough to build that simple feedback tracking tool, you've invested hundreds of hours. Maybe thousands.</p><p>That's fine if you want to be a programmer. It's absurd if you're a marketing director who just needs a reporting dashboard; a founder with deep domain expertise who wants to test an idea before hiring engineers; an operations manager who could automate half their job if they could just write a little code.</p><p>"Learn to code" was never realistic advice for these people. They have jobs. They have expertise in their own domains. They don't have time for a CS curriculum, and they shouldn't need one.</p><p>The craftsman in me wants to say: but the complexity is real. You can't just skip the fundamentals. The shortcuts will catch up with you.</p><p>And that's true. But it's also true that we built a system where a simple tool requires a complex skill set, where domain experts can't build domain-specific solutions, where ideas die because they're stuck behind people who don't have time to implement them.</p><p>That was a failure too. We just didn't call it one because it was <em>our</em> failure, and we were on the winning side of it.</p><h2>The Other Tribe</h2><p>In Part 1, I talked about the two tribes of programmers: those who love the craft of coding, and those who see code as transportation to building things. I was clear about which tribe I belong to.</p><p>I undersold the other tribe's position. <strong>Their argument deserves a fairer hearing.</strong></p><p>A comment stuck with me&#8212;the inverse of the one that opened Part 1:</p><blockquote><p>"I'm happy for all coding to be AI. I prefer delivery over the craft of writing software."</p></blockquote><p>My first reaction was dismissal. The attitude that leads to vibe coding disasters, right? People who don't care about quality; who just want to ship fast and let someone else deal with the consequences.</p><p>That's uncharitable. The real picture is different.</p><p>These are people who learned programming because it was the only way to build software. They didn't fall in love with syntax; they fell in love with products. The code was never the point&#8212;the code was the obstacle between their idea and a working thing. They put in the years because they had to, not because they wanted to.</p><p>For them, AI coding tools aren't a shortcut around craft; they're the removal of an artificial barrier. The barrier was always the <em>implementation</em>, not the <em>thinking</em>. They know what they want to build. They understand the problem domain. They have taste about what makes a good product. The only thing they lacked was the ability to translate that into syntax a computer could execute.</p><p>Now they have that. And they're asking: why should I care about the elegance of the implementation if the product works?</p><p>I don't fully agree with this position. I think there are real risks they're underweighting. But I can't pretend it's incoherent. It's a legitimate philosophy, not just laziness dressed up as productivity.</p><div class="paywall-jump" data-component-name="PaywallToDOM"></div><h2>The Uncomfortable Successes</h2><p>Something complicates the skepticism I laid out in Part 1. Some of this is working.</p><p>Not the hype. Not the demos. The results.</p><p><a href="https://techcrunch.com/2025/03/06/a-quarter-of-startups-in-ycs-current-cohort-have-codebases-that-are-almost-entirely-ai-generated/">Twenty-five percent of Y Combinator's Winter 2025 batch</a> had codebases that were 95% AI-generated. These aren't weekend projects; they're funded startups that passed YC's filter&#8212;and every one of those founders, according to YC managing partner Jared Friedman, was technical enough to build the product from scratch. They chose not to. <a href="https://www.cnn.com/2025/11/06/tech/vibe-coding-collins-word-year-scli-intl">Collins Dictionary named "vibe coding" their word of the year for 2025</a>. This isn't a fringe phenomenon anymore.</p><p>And when I talk to the people doing it&#8212;not the evangelists, the practitioners&#8212;I hear stories that are hard to dismiss:</p><p>A founder with fifteen years of logistics expertise built a supply chain MVP in two weeks that would have taken months with traditional development. Not because the AI wrote perfect code, but because she could iterate on ideas in hours instead of waiting for engineering sprints.</p><p>A business analyst tired of waiting nine months for IT built the reporting tool his team needed. It's not elegant. He'd be the first to admit he doesn't fully understand how it works under the hood. But it works, and it shipped, and his team uses it every day.</p><p>A product designer prototyped an interface that functions&#8212;not just a mockup. She could test real interactions with real users before involving engineering at all.</p><p>These aren't hypotheticals. These are people building things that wouldn't have existed otherwise&#8212;not because the ideas weren't good, but because the implementation barrier was too high.</p><p>The obvious objections surface immediately: maintenance, edge cases, security. Those are fair questions. I asked them in Part 1. But for some of these projects, the questions might not matter that much. A throwaway prototype doesn't need to be maintainable; an internal tool with fifty users doesn't need enterprise-grade security; a startup testing product-market fit might not survive long enough for maintenance debt to matter.</p><p>Not every piece of software needs to be built like it's going to run for twenty years. Some software is disposable, and that's fine. The craft-obsessed approach I advocated in Part 1 might be overkill for a significant portion of what gets built.</p><p>That's uncomfortable to admit. But I think it's true.</p><h2>The Risks are Real</h2><p>I'm not going to relitigate Part 1. The risks I outlined are real: abstraction debt, debugging nightmares, the knowledge gap, the quality illusion. I stand by all of it.</p><p>One thing I didn't emphasize enough in Part 1: the thoughtful practitioners already know the limits.</p><p>There's a prototype phase, where <strong>vibe coding is useful&#8212;rapid iteration</strong>, exploring ideas, testing concepts. And there's a production phase, where engineering discipline matters&#8212;reliability, security, maintainability. The problem isn't that vibe coding exists; it's that the boundary between phases isn't always clear, and the people crossing it often don't realize they're crossing it.</p><p>A business analyst building an internal tool is probably fine. A startup founder building an MVP to test an idea is probably fine. Someone shipping a financial system that processes millions of transactions is not fine. The tool doesn't know the difference. The user has to.</p><p>The other risk worth naming: we're going to see a lot of "almost right" code. Output that's close but not quite. Almost right works until it doesn't. Edge cases. Security holes. Performance issues that only manifest under load. Research on AI-generated code already suggests that <a href="https://arxiv.org/abs/2512.11922">roughly 40% of AI-generated code snippets contain vulnerabilities</a>; "almost right" at scale is a liability, not a shortcut.</p><p>Who fixes it when the person who built it doesn't understand it?</p><p>In a lot of cases, the answer is: a craftsman programmer, cleaning up after someone else's vibe-coded creation. That's already happening. Some are calling it "rescue engineering"&#8212;the maintenance burden that <a href="https://techstartups.com/2025/12/11/the-vibe-coding-delusion-why-thousands-of-startups-are-now-paying-the-price-for-ai-generated-technical-debt/">lands on engineering teams</a> after citizen developers ship something that works until it doesn't. The "vibe coding hangover" is real.</p><p>This might be the new division of labor. Citizen coders generate; engineers audit and maintain. Builders move fast; craftsmen clean up.</p><p>Is that sustainable? I honestly don't know. But it's happening whether we think it's a good idea or not.</p><h2>Where to Draw the Line</h2><p>If you're a CTO watching this unfold, the question isn't whether citizen coding is coming to your organization. It's already there&#8212;or it will be by next quarter.</p><p>The questions worth asking:</p><p>How much of your current backlog is genuinely complex engineering work, and how much is queued simply because nobody with the right skills has bandwidth? Where in your organization are people already building things without engineering oversight&#8212;and what happens when those things break? If a business team built an internal tool with AI tomorrow, would your engineering org know about it? Would they need to?</p><p>The boundary between "fine to vibe-code" and "needs engineering discipline" isn't a bright line. It's a gradient, and your job is to figure out where your organization falls on it. Prototype vs. production. Internal vs. customer-facing. Fifty users vs. fifty thousand. Disposable vs. durable.</p><h2>What I'm Doing</h2><p>While I still reserve a lot of skepticism for vibe-coding and AI-first trends, in 2026 I'm launching a few experiments by building micro-SaaS products alone using teams of agents to handle the end-to-end building and operation of the business. You can follow them:</p><ul><li><p><a href="https://structpr.dev/">StructPR</a> &#8212; Code review, reorganized</p></li><li><p><a href="https://shiplog.ca/">ShipLog</a> &#8212; Feedback board, changelog, and embeddable widget for solo SaaS founders</p></li><li><p><a href="https://auroragrc.com/">AuroraGRC</a> &#8212; Compliance management for Canadian regulations (partially)</p></li></ul><p>This isn't a contradiction. It's a test. If agentic coding can build and operate real businesses that serve real users, I want to see it work&#8212;or fail&#8212;with my own codebase. Maybe the quality problems will materialize. Maybe they won't. But I'd rather find out by building than by speculating.</p><p>I'm ready to fully let go of the wheel and let AI take control, the first micro-saas in this list is specially about code review and organization, because it's a problem that's been bothering me for a long time; and that is being exacerbated by the rise of vibe coding.</p><h2>The Tension That Doesn't Resolve</h2><p>I tried to come up with a neat conclusion, but I don't have one.</p><p>The craftsmen are right that comprehension debt is real. That the junior developer pipeline is collapsing. That we're building systems nobody fully understands and calling it progress.</p><p>The citizen coders are right that the system failed them. That the backlogs were absurd. That gatekeeping kept good ideas from getting built. That not every piece of software needs to be a cathedral.</p><p>Both things are true.</p><p>What we're watching is a profession fragmenting in real time. Not dying&#8212;fragmenting. There will still be craftsmen, working on the systems where deep understanding matters; there will be citizen coders, building things that would have died in backlog purgatory; there will be a lot of mess in the middle, where the boundaries aren't clear and the failures teach us where they should have been.</p><p>The question isn't which side wins. It's whether we can capture the benefits of democratization&#8212;more people building, more ideas tested, fewer bottlenecks&#8212;without drowning in the maintenance debt that Part 1 warned about.</p><p>Nobody has figured that out yet. The answer probably looks different for a weekend project than for financial infrastructure. The boundaries will be learned the hard way, through failures that teach us where vibe coding breaks.</p><p>The craftsmen will say "I told you so." The builders will point to the successes and ask why the craftsmen are still complaining. And both will be partially right, which is the most frustrating kind of disagreement&#8212;the kind that doesn't end.</p><p>I started this series by quoting a programmer who said he never knew there was "an entire subclass of people in my field who don't want to write code." Writing this piece surfaced something I hadn't seen clearly before. They never wanted to be in our field in the first place. They just didn't have another way to build things.</p><p>Now they do.</p><p>The craft isn't dead. But it's no longer the only way to build. That changes everything about who we are as engineers&#8212;and what we're for.</p>]]></content:encoded></item><item><title><![CDATA[Maybe OpenClaw Needed This]]></title><description><![CDATA[The OpenAI acqui-hire of OpenClaw is getting predictable reactions from two camps: "open source capture" from one side, "security nightmare validation" from the other.]]></description><link>https://www.thepragmaticcto.com/p/maybe-openclaw-needed-this</link><guid isPermaLink="false">https://www.thepragmaticcto.com/p/maybe-openclaw-needed-this</guid><dc:creator><![CDATA[Allan MacGregor 🇨🇦]]></dc:creator><pubDate>Mon, 16 Feb 2026 15:18:14 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/093783e9-8f05-4148-8bea-098a1cac0671_1536x1024.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The OpenAI acqui-hire of OpenClaw is getting predictable reactions from two camps: "open source capture" from one side, "security nightmare validation" from the other. What's missing from both takes: this might be exactly what OpenClaw needed. Viral hype, one developer burning $10-20K monthly, 1.5 million deployed agents with real security problems that a solo project couldn't solve. Sometimes Big Tech acquisition is the right answer.</p><p>Consider what OpenClaw achieved and what it cost Steinberger to maintain it &#8212; 180,000 GitHub stars in three months, the fastest-growing open-source project in GitHub history, 1.5 million agents deployed in the wild. He built the first prototype in an hour, then found himself maintaining viral-scale infrastructure while bleeding five figures every month. The security establishment raised legitimate concerns: twenty percent of the skills marketplace was malicious, secrets were stored in plaintext, and the permission model broke every traditional security assumption about least-privilege access. One talented developer wasn't going to solve enterprise security architecture, build sustainable infrastructure, and maintain community velocity at the same time.</p><p>Steinberger's own framing matters here: "What I want is to change the world, not build a large company, and teaming up with OpenAI is the fastest way to bring this to everyone." He insisted on the foundation model specifically &#8212; OpenClaw stays open source, the community continues building, but he gets the resources to architect what comes next.</p><p>Compare the alternatives he had on the table. Meta's pitch was to turn OpenClaw proprietary, layer it on their infrastructure, and build agentic commerce on top of three billion users. OpenAI's pitch: keep it open, establish the foundation, bring Steinberger in to design the next generation with actual engineering resources behind him. For someone who built PSPDFKit to a 100 million euro outcome and understands open-source sustainability economics, the choice tracks.</p><p>The security problems were real and growing faster than one person could address them. Twenty percent malicious skills in the marketplace; plaintext credential storage in home directories; permission models that Cisco, CrowdStrike, and Sophos correctly identified as fundamentally broken for autonomous agents. OpenClaw needed dedicated security engineering, infrastructure designed for scale, and governance frameworks that could actually constrain agent behavior &#8212; not just more GitHub issues and community PRs from well-meaning contributors.</p><p>The foundation model directly addresses the "capture" concern that has everyone worried. Steinberger could have taken Meta's offer, gone fully proprietary with a massive user base built in, and secured a significant exit. Instead: open source continues, OpenAI commits to support the foundation, and the community maintains access to the project that went viral. It's the Chrome/Chromium playbook, which deserves its criticisms around governance and influence, but it's categorically different from "promising startup gets acquired and shut down."</p><p>Not every open-source project needs to stay solo to stay pure; some ideas hit a scale where they need institutional backing to reach their potential without collapsing. OpenClaw hit viral velocity before it had infrastructure that could support that velocity, and Steinberger was funding the gap personally while the security problems multiplied. The real question wasn't "acquire or stay independent" &#8212; it was "which acquisition structure preserves what made this valuable while solving the sustainability and security crisis."</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.thepragmaticcto.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">The Pragmatic CTO is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>The real test is what happens in the next six months. Does the foundation maintain actual independence, or does it become a rubber stamp for whatever OpenAI wants? Does OpenAI's internal agent work stay aligned with the open-source version, or do they diverge into proprietary territory? Does the security architecture get rebuilt with proper engineering resources, or does it get ignored because shipping agents is more important than securing them? We'll know soon enough.</p>]]></content:encoded></item><item><title><![CDATA[Shielding the Team Doesn't Mean Silence]]></title><description><![CDATA[You didn't shield too much. You communicated too little.]]></description><link>https://www.thepragmaticcto.com/p/shielding-the-team-doesnt-mean-silence</link><guid isPermaLink="false">https://www.thepragmaticcto.com/p/shielding-the-team-doesnt-mean-silence</guid><dc:creator><![CDATA[Allan MacGregor 🇨🇦]]></dc:creator><pubDate>Sat, 14 Feb 2026 14:54:00 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/543ab1a7-b3e5-4774-8ecb-67b887812097_1536x1024.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>You didn't shield too much. You communicated too little.</em></p><div><hr></div><p>Roman Nikolaev, a Head of Technology who writes a weekly engineering leadership newsletter, posted this on LinkedIn:</p><blockquote><p>"I was taught that a manager's role is to shield the team. Let coders code. This is wrong. Engineers need exposure to the business side. They need to understand why they are doing what they are doing, who the client is and how their work helps her. 'Let coders code' is a dangerous fallacy."</p></blockquote><p>He is partially right; but his framing is completely wrong.</p><p>Roman presents shielding and context as opposites; as if protecting your team's focus means keeping them ignorant of the business. That's a false choice.</p><p>Shielding the team never meant siloing away from the rest of the company.</p><p>It meant filtering out noise (organizational politics, thrashing priorities, executive drive-by requests) so engineers could focus on work that matters. Done right, shielding is the mechanism that <em>delivers</em> context; it removes the static so the signal gets through. If your version of "shielding" produced engineers who didn't understand the business, you weren't shielding too much; <strong>you were communicating too little.</strong> The job isn't to choose between protecting focus and providing understanding. The job is to do both.</p><p>Roman's post resonated with me because it is a pattern I seen come up time and time again: managers who absorb everything, translate nothing, and call it protection.</p><p>You've probably seen it, and even worse, suffered the consequences. A well-meaning engineering manager who thinks "shielding" means absorbing every piece of organizational stress until they burn out or their team operates in a vacuum. That pattern is a failure of management, not a failure of shielding; conflating the two leads to the wrong solutions. And the wrong fix, and eventually it creates a different kind of dysfunction.</p><h2>What "Shield the Team" Means</h2><p>The concept didn't emerge from nowhere. Robert Sutton, a Stanford professor, wrote the foundational case for managerial shielding in Harvard Business Review back in 2010. His argument was straightforward: "<a href="https://hbr.org/2010/09/managing-yourself-the-boss-as-human-shield">the best bosses identify and slay those dragons, thereby protecting the time and the dignity of their people and enabling them to focus on real work</a>." He cited William Coyne, former R&amp;D head at 3M, who was determined to let his teams work for long stretches -- unfettered by intrusions from higher-ups. <strong>Good bosses reduced outside distractions</strong>; they streamlined processes, championed focus time, and occasionally defied their own bosses when necessary.</p><p>What teams need to be shielded from is pretty obvious. Unnecessary meetings that consume hours without producing clarity. Thrashing priorities -- executives changing direction every two weeks. Drive-by requests from stakeholders who skip the prioritization process. Scope creep and last-minute changes that undermine sprint commitments. Conflicting priorities from multiple stakeholders who haven't aligned with each other.</p><p>None of that is about hiding information. None of it requires keeping engineers ignorant of the business. <strong>Every item on that list is noise; none of it is signal.</strong></p><p>Shielding was never about preventing engineers from understanding customers or removing them from strategic conversations; it was about protecting the conditions under which they could do their best thinking. The distinction matters because Roman's post conflates the practice with its worst misapplication. If someone told you "shield the team" meant "keep them ignorant," they taught you wrong.</p><h2>Filter the Noise, Translate the Signal</h2><p>A manager's job isn't to choose between protecting focus and providing context. It's one job with two halves: filter the noise, then translate the signal.</p><p>Shield <strong>FROM</strong> the things that destroy focus without adding understanding. Shield <strong>WITH</strong> the things that turn code into informed decisions -- business context for why this work matters, customer understanding for who benefits and how, strategic clarity for where this fits in the bigger picture.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!uFrc!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4077979e-623b-4dd9-88b0-8cb5bbced420_900x436.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!uFrc!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4077979e-623b-4dd9-88b0-8cb5bbced420_900x436.png 424w, https://substackcdn.com/image/fetch/$s_!uFrc!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4077979e-623b-4dd9-88b0-8cb5bbced420_900x436.png 848w, https://substackcdn.com/image/fetch/$s_!uFrc!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4077979e-623b-4dd9-88b0-8cb5bbced420_900x436.png 1272w, https://substackcdn.com/image/fetch/$s_!uFrc!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4077979e-623b-4dd9-88b0-8cb5bbced420_900x436.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!uFrc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4077979e-623b-4dd9-88b0-8cb5bbced420_900x436.png" width="900" height="436" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4077979e-623b-4dd9-88b0-8cb5bbced420_900x436.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:436,&quot;width&quot;:900,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:36166,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.thepragmaticcto.com/i/187954201?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4077979e-623b-4dd9-88b0-8cb5bbced420_900x436.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!uFrc!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4077979e-623b-4dd9-88b0-8cb5bbced420_900x436.png 424w, https://substackcdn.com/image/fetch/$s_!uFrc!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4077979e-623b-4dd9-88b0-8cb5bbced420_900x436.png 848w, https://substackcdn.com/image/fetch/$s_!uFrc!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4077979e-623b-4dd9-88b0-8cb5bbced420_900x436.png 1272w, https://substackcdn.com/image/fetch/$s_!uFrc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4077979e-623b-4dd9-88b0-8cb5bbced420_900x436.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The difference between bad shielding and good shielding isn't volume; it's direction. Bad shielding sounds like: "Don't worry about it, just build what's in the ticket."</p><p>Good shielding sounds like: "The board is pushing for Q3 delivery on three competing priorities. I've negotiated us down to one. This is the one that matters most to the business, and here's the customer problem it solves."</p><p>Both are shielding. Only one leaves engineers in a contextless void.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5qyq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3179d0ed-e932-4a61-9b1e-62df07d13b71_900x436.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5qyq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3179d0ed-e932-4a61-9b1e-62df07d13b71_900x436.png 424w, https://substackcdn.com/image/fetch/$s_!5qyq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3179d0ed-e932-4a61-9b1e-62df07d13b71_900x436.png 848w, https://substackcdn.com/image/fetch/$s_!5qyq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3179d0ed-e932-4a61-9b1e-62df07d13b71_900x436.png 1272w, https://substackcdn.com/image/fetch/$s_!5qyq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3179d0ed-e932-4a61-9b1e-62df07d13b71_900x436.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5qyq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3179d0ed-e932-4a61-9b1e-62df07d13b71_900x436.png" width="900" height="436" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3179d0ed-e932-4a61-9b1e-62df07d13b71_900x436.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:436,&quot;width&quot;:900,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:37801,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.thepragmaticcto.com/i/187954201?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3179d0ed-e932-4a61-9b1e-62df07d13b71_900x436.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5qyq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3179d0ed-e932-4a61-9b1e-62df07d13b71_900x436.png 424w, https://substackcdn.com/image/fetch/$s_!5qyq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3179d0ed-e932-4a61-9b1e-62df07d13b71_900x436.png 848w, https://substackcdn.com/image/fetch/$s_!5qyq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3179d0ed-e932-4a61-9b1e-62df07d13b71_900x436.png 1272w, https://substackcdn.com/image/fetch/$s_!5qyq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3179d0ed-e932-4a61-9b1e-62df07d13b71_900x436.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This isn't a new idea. Stefan Wolpers has called the "developers just code" mentality <a href="https://www.scrum.org/resources/blog/developers-code-fallacy-making-your-scrum-work-9">pure Taylorism</a> (industrial-era thinking applied to knowledge work). "In a complex environment, those closest to a problem are best suited to make the right decision to solve it." They can't make those decisions in a vacuum; they need context, customer proximity, and an understanding of business constraints.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.thepragmaticcto.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.thepragmaticcto.com/subscribe?"><span>Subscribe now</span></a></p><p>Gergely Orosz has documented the same pattern from the engineer's side -- <a href="https://blog.pragmaticengineer.com/the-product-minded-engineer/">product-minded engineers</a> who understand the business become key contributors and team leads. Engineers don't develop product sense by being shielded from it; they develop it when managers filter the noise and let the signal through.</p><h2>Where Roman Has a Point</h2><p>Roman isn't reacting to nothing. Plenty of managers do exactly what he describes -- they absorb every piece of organizational input, translate none of it, and call the result "protecting the team." The protective bubble is a documented anti-pattern; Jade Rubick has argued that the "shit shield" mentality <a href="https://www.rubick.com/shit-shield/">frames the rest of the organization as the enemy</a> and prevents the cross-team collaboration that makes organizations work. "An organization composed of self-protecting teams," Rubick writes, "isn't an effective organization."</p><p>The <a href="https://www.manager.dev/articles/hero-engineering-manager-syndrome">hero manager syndrome</a> makes it worse. The Fixer solves every hard problem personally, robbing the team of growth opportunities. The Shit Umbrella absorbs organizational chaos to create a false sense of stability; the team is unprepared when reality eventually breaks through. The Hen fights every battle on the team's behalf, even battles nobody asked them to fight. The pattern across all three is the same: "This protection from reality is not leadership -- it's infantilizing."</p><p>These are legitimate dysfunctions. I have seen them. Roman has probaby seen them. Most of us have.</p><p>The problem isn't that these managers shielded too much; it's that they shielded badly. Removing the shield doesn't fix the underlying failure -- it just replaces one kind of dysfunction with another. Engineers exposed to raw organizational chaos without filtering or context don't become empowered; they become paralyzed. Ronald Heifetz and Marty Linsky put it precisely: <strong>"Leadership is disappointing people at a rate they can absorb."</strong></p><p>Not all at once. Not never. At a rate they can absorb.</p><h2>Where This Breaks Down</h2><p>This framework isn't universal.</p><p>In crisis mode, everyone needs to know everything. When the production system is down or the company is facing an existential threat, filtering is information suppression; the framework assumes normal operating conditions.</p><p>On small teams (five people, early stage) there's no noise to filter. Everyone is already in every conversation. Shielding is a scaling function; it matters more as organizational complexity grows.</p><p>If you're the only channel between engineering and the rest of the organization, you haven't built a shield; you've built a bottleneck. Good shielding includes creating direct connections where appropriate -- not making yourself the permanent intermediary.</p><p>And if trust is broken, none of this works. A team that doesn't trust their manager to filter accurately will hear "I'm shielding you" as "I'm hiding things from you." That's not a framework problem. That's a leadership problem.</p><h2>What I Do</h2><p>Since the first time I became an engineering manager, I've tried to be the kind of manager I wanted to have. That meant seeding context early and often -- to the point of overcommunication.</p><p>My teams know when sales missed targets this quarter and what that means for the product org. They know where pressure is likely to come from before it arrives. If a board conversation is going to shift priorities, they hear my translation of it before the mandate lands. Not every detail; not the political maneuvering behind it. The signal: what changed, why it changed, and what it means for the work in front of them.</p><p>I believe in hiring smart, talented people. That kind of person does their best work with context, trust, and space to execute their craft. Remove any one of those three and you get diminished output from someone capable of much more. Context without trust feels like surveillance. Trust without context feels like abandonment. Space without either feels like neglect.</p><p>The trade-off is real; overcommunication takes time, and not every update lands the way you intend. Some context creates anxiety rather than clarity. I'm still learning and calibrating after years of doing this. But the failure mode I fear most isn't sharing too much -- it's the engineer who builds the wrong thing because nobody told them why it mattered.</p><div><hr></div><h2>Questions to ask yourself</h2><ul><li><p>Can your engineers explain why they're building what they're building this sprint? Not the what -- the why. If they can't, you're not shielding. You're siloing.</p></li><li><p>When was the last time you filtered something out that your team genuinely didn't need to know? If you can't remember, you might be passing through too much noise.</p></li><li><p>Do your engineers know who their users are? Not the persona document. The people. If not, that's not a shielding problem; that's a connection problem.</p></li><li><p>Are you the only channel between your team and the rest of the organization? If yes, you've built a bottleneck, not a shield.</p></li></ul><p><em>Shielding the team doesn't mean silence. It means deciding what's noise and what's signal -- and making sure the signal gets through.</em></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.thepragmaticcto.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">The Pragmatic CTO is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[Audio: Shielding the Team Doesn't Mean Silence]]></title><description><![CDATA[Shielding your engineering team isn&#8217;t about keeping them in the dark.]]></description><link>https://www.thepragmaticcto.com/p/audio-shielding-the-team-doesnt-mean</link><guid isPermaLink="false">https://www.thepragmaticcto.com/p/audio-shielding-the-team-doesnt-mean</guid><dc:creator><![CDATA[Allan MacGregor 🇨🇦]]></dc:creator><pubDate>Sat, 14 Feb 2026 14:47:42 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/187954522/4d74fe14ee96b004f36b7eb38ce22be6.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p>Shielding your engineering team isn&#8217;t about keeping them in the dark. It&#8217;s about cutting through the noise so they get the real message&#8212;why their work matters, who it helps, and what success looks like. Too many managers mistake shielding for silence, but that&#8217;s a failure of communication, not protection.</p><p>Roman Nikolaev recently argued that shielding means letting coders just code, but that&#8217;s a false choice. Shielding isn&#8217;t about hiding the business from engineers; it&#8217;s about filtering out distractions like politics, shifting priorities, and last-minute requests while still providing clear context. If your team doesn&#8217;t understand the business, it&#8217;s not because you shielded too much&#8212;it&#8217;s because you communicated too little.</p><p>The idea of shielding comes from Robert Sutton&#8217;s work, which shows good managers protect their teams by slaying dragons&#8212;meaning they reduce distractions so engineers can focus on meaningful work. That doesn&#8217;t mean locking them away from customers or strategy. It means cutting out meetings that waste time, conflicting priorities, and scope creep, none of which add value or clarity. Shielding is about protecting the environment for good work, not creating ignorance.</p><p>The key is to filter noise and translate signal. Shield your team from organizational chaos, but shield them with context: why this work matters, who benefits, and how it ties into strategy. Bad shielding sounds like &#8220;Just build what&#8217;s in the ticket.&#8221; Good shielding sounds like &#8220;We had three competing priorities, but I fought to focus us on the one that solves this key customer problem.&#8221; Both are shielding, but only one empowers engineers with understanding.</p><p>This isn&#8217;t new thinking. Labeling engineers as just coders is industrial-era Taylorism. In complex environments, those closest to the problem must make decisions&#8212;and they need context to do that. Product-minded engineers who understand the business become invaluable contributors and leaders. They don&#8217;t develop that insight by being shielded from it&#8212;they develop it when managers do the hard work of filtering noise and letting the signal through.</p><p>Roman is right to call out managers who hoard information and call it protecting the team. This &#8220;shit shield&#8221; mentality treats the rest of the company as the enemy and stifles collaboration. The &#8220;hero manager&#8221; who fixes every problem and absorbs all chaos leaves their team unprepared and infantilized. This isn&#8217;t leadership; it&#8217;s dysfunction. The problem isn&#8217;t shielding too much&#8212;it&#8217;s shielding badly. Removing the shield entirely just exposes engineers to chaos they can&#8217;t handle. Leadership is about disappointing people at a pace they can absorb, not dumping the whole mess on them at once.</p><p>That said, this framework isn&#8217;t universal. In crises, no filtering happens&#8212;everyone needs full visibility. Small teams don&#8217;t need shielding because they&#8217;re already in every conversation. And if you&#8217;re the only channel between engineering and the company, you&#8217;ve created a bottleneck, not a shield. Plus, if your team doesn&#8217;t trust you to filter accurately, they&#8217;ll hear &#8220;shielding&#8221; as &#8220;hiding.&#8221; That&#8217;s a leadership failure, not a shielding problem.</p><p>In my experience, the best managers overcommunicate context. My teams know when sales miss targets, what that means for product, and when board decisions might shift priorities before it lands on their desks. Not every detail, but the signal: what changed, why, and what it means for their work. I&#8217;ve learned it&#8217;s about balancing context, trust, and space to do their craft. Remove any one, and you get diminished output. Too much context without trust feels like surveillance; trust without context feels like abandonment; and space without either feels like neglect. It&#8217;s a trade-off, but the worst failure is an engineer building the wrong thing because no one explained why it mattered.</p><p>Ask yourself: can your engineers explain <em>why</em> they&#8217;re building what they&#8217;re building this sprint? If not, you&#8217;re siloing, not shielding. When did you last filter out something your team didn&#8217;t need to know? Do they know who their users really are? And are you the only communication channel to the rest of the company? If yes, you&#8217;ve built a bottleneck, not a shield.</p><p>Shielding the team doesn&#8217;t mean silence. It means filtering noise, translating signal, and making sure your team has both focus and context.</p><p>You can read the full article&#8212;with all the data and sources&#8212;on ThePragmaticCTO Substack.</p><div><hr></div><p>Read the full article &#8212; with all the data and sources &#8212; <a href="https://www.thepragmaticcto.com/publish/post/187954201">on ThePragmaticCTO</a>.</p>]]></content:encoded></item><item><title><![CDATA[The OpenClaw Gold Rush]]></title><description><![CDATA[When the Wrapper Economy Outruns the Security Response]]></description><link>https://www.thepragmaticcto.com/p/the-openclaw-gold-rush</link><guid isPermaLink="false">https://www.thepragmaticcto.com/p/the-openclaw-gold-rush</guid><dc:creator><![CDATA[Allan MacGregor 🇨🇦]]></dc:creator><pubDate>Fri, 13 Feb 2026 14:31:14 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/52333f40-8555-4f2a-86b9-62c2041ac17b_1536x1024.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>When the Wrapper Economy Outruns the Security Response</h2><p>On February 3, 2026, <a href="https://www.theregister.com/2026/02/03/openclaw_security_problems/">The Register called OpenClaw a "security dumpster fire."</a> One day later, the same publication reported that <a href="https://www.theregister.com/2026/02/04/cloud_hosted_openclaw/">cloud providers were rushing to deliver OpenClaw-as-a-service offerings</a>.</p><p>Twenty-four hours apart.</p><p>I wrote about <a href="https://thepragmaticcto.substack.com/p/the-openclaw-saga">OpenClaw's security collapse</a> two weeks ago---nine vulnerability classes, <a href="https://censys.com/blog/openclaw-in-the-wild-mapping-the-public-exposure-of-a-viral-ai-assistant">42,665 exposed instances</a>; a one-click RCE not patched until January 30. The ecosystem's response to that crisis was not to slow down. It was to accelerate.</p><p>Within days of OpenClaw crossing <a href="https://growth.maestro.onl/en/articles/openclaw-viral-growth-case-study">150,000 GitHub stars</a>, an entire economy of hosting providers, managed services, and "enterprise-ready" wrappers appeared---from cloud giants like Alibaba and DigitalOcean to two-person startups backed by Y Combinator. <a href="https://www.gartner.com/en/documents/7381830">Gartner's assessment</a> of the underlying product: "It is not enterprise software. There is no promise of quality, no vendor support, no SLA." Their recommendation: "Immediately block OpenClaw downloads and traffic."</p><p>The wrapper companies are selling trust around a product that the industry's most cited analyst firm told you to block.</p><h2>The Ecosystem That Appeared Overnight</h2><p>Start with the poster child. <a href="https://www.ycombinator.com/launches/POK-klaus-get-your-openclaw-personal-assistant-in-5-minutes">Klaus</a>, built by a <a href="https://www.ycombinator.com/companies/bits-2">YC-backed startup called Bits</a>, promises a hosted OpenClaw instance set up in three minutes. Two founders. Two employees. Their marketing claims include "malware protection"---undefined, unaudited---and they pre-configure Moltbook integration by default; this is the same Moltbook whose <a href="https://www.wiz.io/blog/exposed-moltbook-database-reveals-millions-of-api-keys">database leaked 1.5 million API keys</a> in January.</p><p>Klaus launched while <a href="https://snyk.io/blog/openclaw-skills-credential-leaks-research/">Snyk was still finding</a> that 7.1% of all ClawHub skills leaked credentials; <a href="https://thehackernews.com/2026/02/researchers-find-341-malicious-clawhub.html">Koi Security was simultaneously cataloging 341 malicious ones</a>.</p><p>They are not alone. <a href="https://finance.yahoo.com/news/openclaw-introduces-secure-hosted-clawdbot-204800756.html">OpenClawd.ai appeared in late January</a> claiming "security built into the infrastructure layer." <a href="https://www.digitaljournal.com/pr/news/access-newswire/myclaw-ai-launches-world-s-first-one-click-1809618601.html">MyClaw.ai published a press release on February 5</a> calling itself "the world's first fully managed" OpenClaw deployment, starting at $9 per month, with "full root-level access" to each instance---marketing the core security risk as a feature. MyClawHost, OpenClaw Host, <a href="https://kilo.ai/kiloclaw">Kilo Claw</a>, BoostedHost: all appeared within days.</p><p>The cloud providers moved just as fast. DigitalOcean added one-click deployment; <a href="https://www.theregister.com/2026/02/04/cloud_hosted_openclaw/">Alibaba Cloud launched across 19 regions</a> at $4 per month; Tencent Cloud followed with one-click installs for its Lighthouse service.</p><p>Then came the picks-and-shovels crowd. <a href="https://superframeworks.com/articles/openclaw-business-ideas-indie-hackers">One indie hacker reported $3,600 in month one; another closed a five-figure deal by day five.</a> Setup consulting, skill development, templates---the gold rush playbook, executed in real time.</p><p>For context: OpenClaw went from <a href="https://growth.maestro.onl/en/articles/openclaw-viral-growth-case-study">9,000 to 157,000 GitHub stars in 60 days</a>---roughly 1,667 stars per day. Kubernetes took approximately three years to reach 100,000 stars, about 91 per day. OpenClaw's growth rate was 18 times faster. The wrapper ecosystem materialized at a pace that makes Docker's early hosting boom look leisurely.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DT6z!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffae2fc9-f388-4b58-8a50-484855e6e58f_1200x496.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DT6z!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffae2fc9-f388-4b58-8a50-484855e6e58f_1200x496.jpeg 424w, https://substackcdn.com/image/fetch/$s_!DT6z!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffae2fc9-f388-4b58-8a50-484855e6e58f_1200x496.jpeg 848w, https://substackcdn.com/image/fetch/$s_!DT6z!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffae2fc9-f388-4b58-8a50-484855e6e58f_1200x496.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!DT6z!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffae2fc9-f388-4b58-8a50-484855e6e58f_1200x496.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DT6z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffae2fc9-f388-4b58-8a50-484855e6e58f_1200x496.jpeg" width="728" height="300.9066666666667" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ffae2fc9-f388-4b58-8a50-484855e6e58f_1200x496.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:496,&quot;width&quot;:1200,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:53736,&quot;alt&quot;:&quot;GitHub Star Growth: Daily Average&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="GitHub Star Growth: Daily Average" title="GitHub Star Growth: Daily Average" srcset="https://substackcdn.com/image/fetch/$s_!DT6z!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffae2fc9-f388-4b58-8a50-484855e6e58f_1200x496.jpeg 424w, https://substackcdn.com/image/fetch/$s_!DT6z!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffae2fc9-f388-4b58-8a50-484855e6e58f_1200x496.jpeg 848w, https://substackcdn.com/image/fetch/$s_!DT6z!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffae2fc9-f388-4b58-8a50-484855e6e58f_1200x496.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!DT6z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffae2fc9-f388-4b58-8a50-484855e6e58f_1200x496.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>The Trust Chain</h2><p>Walk through the dependency chain that connects these wrapper companies to your enterprise.</p><ul><li><p><strong>Link one:</strong> Peter Steinberger, a solo developer who builds with what he calls "ambient programming" and has&nbsp;<a href="https://newsletter.pragmaticengineer.com/p/the-creator-of-clawd-i-ship-code">said publicly that he ships code he has never read</a>.</p></li><li><p><strong>Link two:</strong> the OpenClaw codebase itself, with nine independent vulnerability classes documented by security researchers; a <a href="https://nvd.nist.gov/vuln/detail/CVE-2026-25253">one-click remote code execution vulnerability</a> that scored 8.8 on the CVSS scale and was not patched until January 30.</p></li><li><p><strong>Link three:</strong> ClawHub, the extension marketplace, where between <a href="https://snyk.io/blog/openclaw-skills-credential-leaks-research/">7%</a> and <a href="https://securityboulevard.com/2026/02/from-clawdbot-to-moltbot-to-openclaw-security-experts-detail-critical-vulnerabilities-and-6-immediate-hardening-steps-for-the-viral-ai-agent/">20% of all skills</a> were found to be malicious---<a href="https://thehackernews.com/2026/02/researchers-find-341-malicious-clawhub.html">341 deploying malware</a>, 283 leaking credentials.</p></li><li><p><strong>Link four:</strong> a wrapper company that appeared last week, run by a two-person team, with no SOC 2 certification, no published security documentation, and no third-party penetration test.</p></li><li><p><strong>Link five:</strong> your enterprise.</p></li></ul>
      <p>
          <a href="https://www.thepragmaticcto.com/p/the-openclaw-gold-rush">
              Read more
          </a>
      </p>
   ]]></content:encoded></item></channel></rss>