How do you pick a tech stack for a new project?

If it was five years ago, it was fairly straightforward. There were two basic questions: first, which language/framework is the best to solve the problem at hand; and second, what language and tooling are you and your team familiar with. And then you get ~~cracking~~ hacking.

But it’s 2025, and love it or loathe it (opinions seem to only exist at those two extremes), AI code generation is here to stay. Even if you vehemently reject robotic assistance when writing your codebase, you can be fairly certain that robotic assistance will be used to maintain it. And so if we’re following the adage that you’re not writing the code for yourself, but rather for the engineer who comes after, we should at least consider how well LLMs can handle the language you choose. Follow the advice of John Woods (“always code as if the person who ends up maintaining your code will be a violent psychopath who knows where you live”) and consider that the ire of your colleagues might come from how bad the AI is at helping them fix up your broken code.

I used to be convinced that there are really only two games in town when it comes to “the best language to use with AI” – Python and JavaScript. But I ran across a paper on ArXiv recently (or rather, someone tweeted it at José Valim) that caught my eye. It’s indicated that there might be more here than meets the eye, and that the foretold dominance of those two languages when it comes to AI codegen isn’t actually assured. Maybe the future of code includes other languages like – dare I mention it – Elixir?

(Of course I was going to say Elixir: this is Revelry. And I just mentioned José.)

There is hope

There may be hope that we are not in a nightmarish future where path dependence dictates that everything will be written in Javascript or Python.

The paper, recently released by the AI R&D team at Tencent, introduces “an automated method for generating high-difficulty multilingual code generation datasets without manual annotations.” In short – a novel way to generate LLM benchmarks that aren’t purely focused on Python (and Python specific problems). This is pretty exciting, because it means that we can compare AI code generation between languages, problems and models in a way that should give us less biased data.

AutoCodeBench is composed of 3,920 problems spread across 20 different programming languages – and all the inputs themselves are automatically generated and verified in sandboxed environments. The problems test everything from basic algorithmic thinking to complex data structure manipulation.

What makes this particularly interesting is the range of languages tested. It doesn’t just include the usual suspects like Python, JavaScript and PHP. The tests include several so-called “low-resource” languages – languages that are not as well-known or widely used as those I previously mentioned – which in practical terms means there’s comparatively little open-source code or training data available for it.

The Surprising Results

The data itself is quite dense, so I’ve constructed two charts to demonstrate the results. I’ve attached the data that I used to generate these charts (only using the top three models for each mode) at the end of the article – see Figure 3

Ai code generation results - reasoning mode — Figure 1

What you’re looking at here is how well several different AI models wrote correct code across twenty programming languages. The number you see — Pass@1 — measures how often a model gets a coding problem right on the first try – higher is better. So this data tells us that most top models, like Claude Opus 4 or GPT-4.1, score around 50% overall, meaning they solve about half the problems correctly without retries. But it also tells us that Elixir is the standout stunner language. In both non-reasoning mode and reasoning mode, Elixir reaches more than 80%, outperforming all the other languages tested.

A surprised pikachu — *Elixir? Good at stuff?*

But why? Here’s my theory…

Why Elixir Is Good At AI Code Generation: Functional Programming Advantages

I think I can hypothesize about why – I think it comes down to the inherent advantages of the language, and the quality of the training data.

Elixir works well with AI because it just works well in general

I work with Elixir pretty much every day, and there are specific characteristics of the language that I think make it particularly AI-friendly.

First, it follows the principles of functional programming rather than object-oriented programming (OOP). This might sound like a technical detail – and I know it may instill fear in the hearts of my OOP focused colleagues – but it means Elixir has some inherent benefits when it comes to AI code generation. In functional programming, everything is immutable, side effects are discouraged, and you don’t have to worry about distant or abstract values hiding somewhere in your codebase – like inherited class attributes that could be modified three levels deep in some other part of your application.

When an AI is trying to understand what a piece of code does, this predictability is invaluable. There’s no hidden state to track, no mysterious mutation happening in the background. What you see is what you get – this_thing |> transformed_here() |> then_over_here(). This minimizes the attention and context that the model needs to maintain in scope when it’s either producing or evaluating code.

Second, Elixir is both quite compact and fairly easy to understand. The syntax is clean and consistent – it doesn’t have the complexity or overhead of languages that run closer to the metal like C++, and it’s not absolutely bonkers like some languages I could mention that make you say wat. And because the philosophy of the language favours composability, the standard library comprises only about ~120 core modules and ~2000 public functions (compared to say, ~15,000 functions in Python).

I have a suspicion that this composability and clarity isn’t just great for human developers – it probably makes things much easier for the model too. When the syntax is consistent and the patterns are clear, it’s much easier for an AI to “learn” the rules of the language and apply them correctly – and in addition, there’s simply fewer (both in totality and in context) for the AI to get confused about.

Elixir works well with AI because of the quality of the training data

The age of a programming language can be a double-edged sword when it comes to AI performance. Too new, and there aren’t enough examples for the AI to learn from. Too old, and you end up with decades of legacy code and outdated syntax.

There’s most likely some work going on behind the scenes when the models are trained to mitigate some of these issues. Lots of the big models out there don’t disclose their methods, but we can probably hazard some educated guesses. There’s probably a bias in the training data that was fed to the model when it was created, as well as some post-creation Reinforcement Learning (RL) to help the model ‘prefer’ to use newer syntax. To make that concrete: I’d imagine that models like GPT-4 were probably fed more Python 3 code than Python 2 code during creation, and then outputs were either rewarded or punished respectively during RL training.

But any given model probably doesn’t need to worry about that with Elixir. Elixir sits in what I’d call the “Goldilocks zone” of programming language age. It’s been around since 2011, so there’s plenty of training data available, but it’s young enough that there isn’t too much historical baggage for AI models to wade through.

But I think more importantly, the quality of that training data is generally quite high. Unlike some languages where you might find code written by beginners learning on the job, Elixir tends to attract more experienced developers. This means that most of the Elixir code floating around in public repositories was written by people who actually know what they’re doing. God help us when the AI starts training on all the vibe-coded skibidi toilet games.

I know how elitist this sounds. I promise not all Elixir developers are snooty and snobbish. And for sure: there’s plenty of bad Elixir code out there (I’ve certainly written my fair share). But I still get targeted ads around ‘learning to code’ and you best believe it’s all for PHP, JavaScript and Python. I think there’s more ‘good’ code examples than ‘bad’ code examples in Elixir – and because of the aforementioned composability and clarity, there’s fewer footguns around to produce ‘bad’ code anyways.

But Is Elixir Actually Good At AI Code Generation?

Before we get too excited about these results, there’s another piece of research that complicates the picture. A Stanford study found that AI can actually decrease developer productivity in niche languages as task complexity increases. So how do we square this circle? AutoCodeBench shows Elixir performing brilliantly, but Stanford suggests niche languages struggle with AI assistance on complex tasks.

To be clear, the two studies are talking about two different things: how good the AI is at improving developer productivity, versus how good the AI is at solving problems. Developer productivity includes solving problems, but also includes things like making sure your code is readable, adhering to best practices, debugging existing issues and working in the real world, which is frequently messy. It’s also a much harder thing to measure.

So – how do we square this circle?

First things first and the elephant in the room – that study was funded by Microsoft, and so didn’t include any of the flagship models from Anthropic. It’s purely my opinion, but I’ve found Claude’s models far superior to any of Copilot or OpenAi’s offerings, particularly when it comes to writing Elixir code. So take what you will from that.

Let’s also acknowledge the developer in the room, and their (sometimes numerous) idiosyncrasies and preferences. Elixir developers (myself included) can be a bit… let’s say “particular” about code quality. We tend to care a lot about well-structured, clean code. So when it comes to productivity, there’s the stylistic element to consider. I want my pipes and my pattern-matching. I want my functions to be low (or maybe high?) in terms of cyclomatic complexity. I want specs, and doc blocks, and `with` statements. Often, the solution works, but it just doesn’t “look” the way I want it to – leading to (sometimes justified) bias against AI written code in code review.

And so I will admit that AI has slowed me down in the past, purely because I’m so anal about the code I’m committing I feel like I *have* to write it by hand. However – I have since mitigated this (and you can too!) with a) better prompting with an appropriate set of rules and b) committing to a highly opinionated tool formatter like Styler

In addition, maybe the reason “AI doesn’t help much” with Elixir isn’t because AI is bad at Elixir – maybe it’s because Elixir problems are already well-structured enough that we don’t need as much help. When it comes to productivity – there’s less to gain. So short of ignoring the study because I don’t agree with it, I think there are enough reasons to take it with a pinch of salt, and not just throw the (potentially illusory?) AI-assisted productivity gains out of the window.

The Bottom Line

Just because Python and JavaScript were dominant when LLMs rose to prominence, we’re not doomed to a future of __main__ and (() => {})();. There’s hope that we don’t have to accept them as the lingua francas of generic implementation for AI code generation. Quality patterns and training data mean that Elixir will be around for a long time yet, and is a great choice when starting a wide variety of projects.

Additionally, tools like Tidewave help to mitigate any shortcomings of model training data anyways, by using so-called ‘runtime intelligence’. What that means in practice is that any AI agents run within your development environment, and therefore have direct access to your code, development database and even front-end presentation. This means that they can generally glean things like formatting conventions, relational database specifics and stylistic choices to (hopefully) make fewer mistakes.

And of course, the fit of the language and framework to the problem still matters. Elixir is always going to be a heavyweight contender and natural choice when it comes to writing fault-tolerant and concurrent applications.

And guess what? At Revelry, we are really good at writing Elixir code. Come build with us.

A table of data comparing the pass@1 performance of different models for autocodebench

Figure 3 – The data

AI Artificial Intelligence Elixir

Which language is best for AI code generation? The answer might surprise you

There is hope

The Surprising Results

Why Elixir Is Good At AI Code Generation: Functional Programming Advantages

The Bottom Line

Categories