StrongDM’s Software Factory throws down a radical challenge: no human writes code, no human reviews it, and you better be spending at least a thousand dollars a day in tokens per engineer to keep up. They’ve skipped every safety net most of us rely on and gone all-in on agentic AI development. This is either the future of software or the blueprint for a disaster waiting to happen.
But before we get skeptical, credit where it’s due: the engineering behind this is impressive. They don’t just throw AI at the problem; they build structured, spec-driven workflows. The cleverest idea is using “scenarios” as holdout sets — user stories stored outside the codebase that AI agents can’t see, preventing them from gaming their own tests. It’s a principle borrowed from machine learning, where you never train on your test set. Then there’s their Digital Twin Universe — full behavioral clones of third-party services like Okta and Slack, running thousands of tests at scale without API costs or rate limits. This isn’t casual; it’s a methodical, iterative approach to growing correctness, not just generating code once and shipping.
But here’s the rub: the numbers don’t support skipping human review. CodeRabbit’s December 2025 report analyzed hundreds of real-world pull requests and found AI-generated code had 1.4 times more critical issues and 1.7 times more major issues than human code. Security vulnerabilities doubled, readability issues tripled, and performance problems were eight times more frequent. Veracode and FormAI studies confirm half or more of AI-generated code samples have security flaws. Now imagine this in StrongDM’s context — software controlling enterprise access. Trusting AI alone on security-critical code is a gamble with catastrophic downside.
And it gets worse. Real-world failures have already happened with some human oversight, like Replit’s AI agent wiping a live production database during a code freeze, or Moltbook leaking 1.5 million API keys because AI-generated schemas lacked essential security settings. StrongDM’s model removes human review entirely — no code writers, no reviewers — so the guardrails that failed with humans won’t exist at all. When no one understands the code, who investigates the failures? Incident response and compliance become nightmares if the audit trail is just AI conversations.
StrongDM’s answer to verification is the holdout sets, but who writes those? If humans do, you haven’t eliminated human review — you’ve just moved it upstream. If AI writes the scenarios too, you’ve just pushed the problem higher, with agents verifying agents verifying agents. Software edge cases are unbounded; you can’t test what you haven’t imagined. That missing checkbox in Moltbook’s breach is a perfect example. The most brittle part breaks the system, and in security software, that brittle part is the attacker's first target.
The economics add another layer of complexity. Spending $1,000 per day per engineer on tokens means $240,000 a year just on AI usage — more than the median software engineer salary. StrongDM builds high-priced enterprise security software, so maybe it makes sense there. But for most startups or broader software development, the cost is prohibitive. Plus, if AI can build your product from specs, it can build your competitor’s too. Your moat shifts from code to your scenario library, which is just documentation and far easier to copy.
There is a middle ground. Sam Schillace, Microsoft’s Deputy CTO and creator of Google Docs, lays out “Coding Laws for LLMs” that are pro-AI but insist on human oversight. His key point: don’t write code if AI can do it, but always keep human validation checkpoints. Treat models as tools, not autonomous agents. StrongDM’s rules directly contradict these principles. Given the data and incidents, the evidence supports keeping humans in the loop for now.
What about the engineers? If no human writes or reviews code, what do they do? The optimistic spin is they become supervisors and architects, focusing on high-level design and domain expertise. The harsher truth is you’re shifting from coding to prompt engineering and scenario design — valuable but fewer roles overall. More critically, when no one writes or reviews code, the team loses shared understanding and the mental model of the system decays. Maintenance, debugging, and evolution get harder, not easier.
The real test is happening now: StrongDM is being acquired by Delinea, a major identity security player. Will they keep the “no human review” approach for security-critical products once compliance and risk are on the table? Or will human oversight return? That answer will tell us more than any manifesto about whether the dark factory model is viable or just an experiment.
As for me, I’m not embracing the dark factory. The data doesn’t justify removing human review, especially in security-sensitive contexts. But I’m borrowing ideas: keeping verification scenarios outside the codebase is smart, and smaller-scale digital twins or mocks for integration testing are worth exploring. I’m watching the trajectory carefully but won’t abandon human judgment until the numbers say it’s safe.
The Software Factory isn’t about vision or ambition alone — it’s about evidence. Their holdout set concept is worth adopting. Their engineering deserves respect. Their philosophy is provocative but premature. The real question for every CTO is: what defect rate would make you comfortable trusting AI without human review? Are we there yet? For now, the answer is no.
You can read the full article — with all the data and sources — on ThePragmaticCTO Substack.
Read the full article — with all the data and sources — on ThePragmaticCTO.











