The Pragmatic CTO
The Pragmatic CTO Podcast
Audio: Lines of Code Are Back (And It's Worse Than Before)
0:00
-4:59

Audio: Lines of Code Are Back (And It's Worse Than Before)

The software industry once agreed on one thing: lines of code are a terrible metric. Dijkstra called counting lines “a very costly measuring unit,” and Bill Gates said it’s like measuring airplane construction by weight. For decades, experts agreed that lines of code don’t measure software quality or productivity. But then AI showed up, and suddenly lines of code are back—and worse than before.

Every major tech CEO is boasting about the percentage of their code generated by AI. Google, Microsoft, Meta, Anthropic—numbers are climbing from 25 to 90 percent. These figures get trumpeted as progress, but nobody reports how much of that AI-generated code is buggy, rejected, or thrown away. The industry has replaced lines of code with lines of AI code, and the obsession has spilled into social media culture where volume is celebrated over value.

Goodhart’s Law teaches us that when a measure becomes a target, it ceases to be a good measure. Lines of code failed as a human metric because developers could game it by writing verbose, unnecessary code. Now, with AI, the metric is infinitely gameable. An AI can generate thousands of lines at zero cost, turning a flawed metric into a meaningless one. We’re not just repeating history—we’re running it without guardrails. More lines mean less comprehension, not more.

The cost of optimizing for volume is showing up in the data. GitClear analyzed millions of lines and found copy-pasted code rising sharply, refactoring plummeting, and code churn doubling. In 2024, more duplicate code was generated than refactored. Meanwhile, a controlled study showed developers using AI took 19% longer to complete tasks—even though they believed they were faster. Trust in AI accuracy is falling, with more developers actively distrusting AI than trusting it. Security reports reveal nearly half of AI-generated code contains vulnerabilities. More code is producing worse outcomes, yet we treat the volume as a badge of honor.

The industry tried to switch to acceptance rate—the percentage of AI suggestions developers accept—as a better metric. But it suffers from the same flaws, plus new ones. Accepting code doesn’t mean it’s good or understood; developers may accept suggestions just to avoid the cognitive load of constant evaluation. Acceptance rate ends at the moment of acceptance, ignoring what happens afterward: bugs, rewrites, or comprehension gaps. We keep chasing simple numbers that claim to measure productivity, but no single figure captures the complexity of software development.

That said, lines of code aren’t always meaningless. They can help estimate project scope, track codebase growth trends, or indicate AI adoption levels. AI coding tools themselves are not the problem—when used by experts who know what to build, they can speed up development. But measuring AI’s value by lines produced is like measuring a surgeon’s skill by the number of incisions made. More code isn’t better software.

So what should we measure instead? The answer is to shift from inputs—lines written, suggestions accepted—to outcomes: what happens after the code is written. Four metrics survive this test. Time-to-value measures how long from idea to working feature; this is what really matters. Code half-life tracks how long new code lasts before needing revision; healthy code endures. Defect origin rate compares bugs introduced by AI versus humans, helping calibrate review processes. And comprehension coverage asks whether your team understands every critical path in the system. If the answer is no, you have a ticking time bomb.

Good metrics measure what happens after writing, not during. Typing speed was never the bottleneck—it’s understanding, design, and judgment. Yet we keep measuring the wrong thing.

At LiORA, we focus on time to value, customer impact, and trust—not on lines of code. We use AI mostly to assist with code review, improving quality and speed. These metrics are harder to capture, but they point in the right direction. I’m open to being wrong, but I’d rather measure the hard things poorly than the easy things precisely. It’s better to explain nuanced metrics to your board than to explain why you shipped code nobody understands.

When your board asks what percentage of your code is AI-generated, ask yourself: is that what they really want to know? If AI disappeared tomorrow, would your team ship slower—or just write less code? How much of your codebase can your team explain? The bottleneck was never typing speed. It was understanding, design, and judgment. Lines of code measured the wrong thing when humans wrote code. They measure even less now that machines do. The real question for every CTO is not how much code you’re generating, but how much of that code should exist at all.

You can read the full article—with all the data and sources—on ThePragmaticCTO Substack.


Read the full article — with all the data and sources — on ThePragmaticCTO Substack.

Discussion about this episode

User's avatar

Ready for more?