The Pragmatic CTO
The Pragmatic CTO Podcast
Audio: When AI Agents Write Your Code, Does Language Choice Matter?
0:00
-5:46

Audio: When AI Agents Write Your Code, Does Language Choice Matter?

Jose Valim recently made a bold claim: Elixir is the best language for AI code generation, based on benchmarks showing high completion rates and structural benefits like immutability and ecosystem stability. But this sparks a deeper question—not which language is best, but whether the choice even matters when AI agents write a large chunk of your code.

The real power of languages like Scala, Haskell, or Rust isn’t Elixir’s specifics—it’s the compiler acting as an AI code reviewer. These typed, functional languages provide immediate, strict feedback that forces AI-generated code to be correct before it ever reaches human eyes. That means AI can’t just spit out code that might fail later; it has to meet the compiler’s standards upfront, which cuts down bugs and lets your engineers focus on design, not chasing type errors. Languages like Python or JavaScript don’t have that gatekeeper. AI outputs code that might or might not work, leaving bugs for humans to find later. Functional, stateless code fits the AI’s own mode of operation—small, pure functions with explicit inputs and outputs—while mutable object-oriented code demands context beyond what AI’s limited memory can handle. As Jonathan de Montalembert put it, “The more flexible and forgiving the language, the more dangerous the AI partner becomes.”

That’s compelling, but theory runs into a training data wall. Scott Arbeit showed that even with a language like F#, which ticks all the theoretical boxes, AI models often produce invalid syntax or default to more popular languages like C#. Less popular languages suffer from a vicious cycle of limited training data leading to poor AI output, which suppresses adoption and further reduces data. Meanwhile, Python dominates AI-generated code simply because models have seen more of it—80% of AI agent implementations use Python. Even the Tencent benchmark supporting Elixir had flaws: it filtered out harder problems for low-resource languages, skewing results, and practitioners report better real-world AI reliability with JavaScript or Kotlin. So, while typed functional languages might produce better code in theory, in practice, AI models do better with popular languages they know well.

But here’s the part nobody talks about enough: comprehension debt. AI-generated code can compile, pass tests, even ship—and yet nobody on your team understands how it works. This gap between code behavior and team understanding is insidious. When something breaks, the team can’t trace the logic or confidently modify the system. Peter Naur said decades ago that software is really about the team’s mental model, not just the code itself. AI doesn’t build that theory; it just generates solutions. If your team can’t read or reason about the language AI uses, the codebase becomes a liability, no matter how correct the AI’s output is. So “switch to Elixir because AI writes better Elixir” only works if your team can own Elixir code. Otherwise, mediocre code in a familiar language beats perfect code nobody understands.

And there are bigger constraints overriding theory. Hiring for niche languages like Elixir or Haskell is tough and expensive compared to Python or TypeScript, where talent is abundant. Ecosystem maturity matters too—most AI tools ship Python SDKs first, meaning AI agents have better building blocks in those languages. Existing codebases rarely get rewritten just for AI; migration costs are real and quantifiable, while AI code quality gains remain theoretical and small. Plus, AI models improve rapidly, narrowing gaps between languages over time. Python’s dominance is a network effect moat—like QWERTY or VHS—not easily displaced by technical superiority alone.

So what really makes a codebase AI-friendly? The qualities Valim highlights—immutability, strong typing, stable ecosystems, clear contracts—are portable across languages. You don’t have to switch to Elixir to get immutability; you can avoid mutating state in Python or TypeScript. Strong typing is the investment, not the language—TypeScript strict mode or Python type hints with mypy offer similar guardrails. Good documentation and comprehensive tests give AI agents better context and validation. Small, pure functions with explicit inputs and outputs help AI generate better code regardless of language. Stable APIs reduce confusion for both AI and humans. And letting AI generate types or interfaces before implementation surfaces mistakes earlier. These practices improve code maintainability and AI output simultaneously.

Personally, I’m a fan of Elixir and introduced it to my team at LiORA—not because it’s the best for AI, but because it’s a great team language. It’s proven productive, and with some nudging toward smaller, focused functions, it works well with AI tools like Claude Code. But that’s a team choice, not a universal prescription.

What should CTOs do today? Focus on what’s good for the AI and good for humans alike. Invest in documentation to provide context for both AI and developers. Write smaller functions with clear contracts, applying functional principles even in non-functional stacks. Don’t bet on today’s AI language strengths—they’ll shift in 18 months. Instead, improve your codebase properties now, which pays off for your team and future AI capabilities.

You can read the full article—with all the data and sources—on ThePragmaticCTO Substack.


Read the full article — with all the data and sources — on ThePragmaticCTO.

Discussion about this episode

User's avatar

Ready for more?