Do you think all AI generated code is shit? Are you an engineer, founder, or layperson who has used AI in the last few years to write code and are generally happy with the result? Or are you an engineer, maintainer or reviewer who sees bad AI generated code around every corner and wants to tear your hair out?

If you answered yes to one or all of these questions, this article is for you! Revelry likes to build good code, fast, and we’ve been in the LLM space since 2021. This means that we’ve had to face and resolve tensions over AI generated code quality for a while – and face some hard truths in the process. I’ve had to face my own long, dark teatime of the soul. But in the long run, it’s helped us build better software.

Many software engineers are biased against AI code. I’ve seen this online and I’ve also heard it in the workplace:

“We don’t want AI-generated content in our projects.”

“If it is AI-generated, it should be clearly marked as such so it can be blocked or ignored”

“I do not want any AI in my life, especially in my code.”

I think for many engineers, AI code has become synonymous with bad code – which is undeniably the case sometimes. But as the models get better – or more crucially, as engineers get better at working with AI – this AI Code bias is going to start causing bigger and bigger problems.

Some code owners will insist: “I’m not going to lower my standards because you’re using AI.” And I’m not asking you to. I’m trying to say that just because you can tell AI was used: you shouldn’t dismiss it out of hand. I’m going to show you how people suspect code is AI generated, why they are biased against it – and why that’s a problem.

A screenshot of a man looking suspicious with the text: when bro suddenly starts submitting prs with detailed descriptions, correct grammar and casing, and testing steps — When bro be using ai

The Smell Test

So, how do you know if the code you’re reading is AI generated (so you can be biased against it)? Code reviews have started to feel a bit like identifying the “human” Cylons in Battlestar Galactica (if we’re being bigots, I should call them “toasters”). What do I see that causes me to J’*AI*-ccuse!!™ the engineer whose code I’m reading?

The fact of the matter is, AI generated code has its own distinctive smell. In fact it’s so distinctive, there are entire open source libraries dedicated to removing it.

Developers sometimes use the phrase ‘code smell’ to describe when something doesn’t look right – I’ve always imagined it as a damp, musty sort of smell, like a wet jumper. Maybe the code works: but a code smell indicates that there may be problems beneath the surface, or that things might get real nasty real fast if you try to extend the patterns. There’s a fantastic resource here that documents code smells and anti-patterns that are commonly found in Elixir codebases.

AI has its own particular odour though; certain forms that mark it out as having been generated by a language model – a “skinjob” (if Blade runner is more your fancy). Some of these I wouldn’t really call code smells per se – they’re just identifying marks, or signatures. Some of them are genuinely problematic; but many are benign, and, dare I say it, actually quite helpful. The real problem is that a biased code-reviewer will see something that identifies the AI fingerprint, and dismiss the whole pull request out of hand – even when the code itself is just fine.

However.

You Should Be Biased Against Bad Code

Bad code is bad code. The kind of code you get after you give the LLM a phrase like “please build a login w/e magic link” with next to zero context, and then do no iterative development afterwards. These patterns are bad and obvious to the engineer (or should be); and also an obvious sign for the reviewer that the engineer hasn’t read their damn code properly.

A screenshot of a cursorchat window with the text: "change this entire repository to be in typescript. Make no mistakes. " — The perfect prompt

Here are some code patterns that tell you AI was probably used and that the code sucks:

Failure to separate concerns across modules, or between the display layer and the business logic. The function to get the users? Let’s define that in the display layer! A schema definition? Also the display layer! Data validation? Put ’em in the display layer!
Old patterns. After all, the AI was trained on basically everything – so that means that it doesn’t matter if the library you are using added syntactic sugar a year ago. It will reach for the old thing every damn time, or even worse, mix the old with new in an incompatible way.
Code comments that document the debugging process instead of the code itself, eg:
- # now seamlessly delves into the database, avoiding the capitalization problem
- # now uses reduce rather than recursion (on an entirely new function)
Pointless bloat and abstraction. It’s almost as if it wants to generate as much as possible: if you ask for the login screen, you’ll generally get it: along with a database migration for a table of all the user’s mothers, the gravitational prediction module and some javascript thrown on top.

These are all so easily fixed. They can usually be fixed by hand pretty easily, or even sometimes by simply letting the AI have another pass over it. I think it was situations like these that built the bias in our minds: stupid, silly and above all obvious problems. Why did you ask me to review this? Do you hate me?

Everything I’ve shown you so far has probably validated your bias. Let’s look at the flip side.

You Shouldn’t Be Biased Against AI Code.

I’m trying to defend the engineer who uses AI responsibly, and iteratively, to craft their code. Who understands the patterns that have been generated, and has reviewed their own code first. And I’m also trying to defend those who (for whatever reason) haven’t used AI to generate their code that day, but are accused of it anyways – and nothing feels worse than to have attribution robbed from you.

Here are some patterns that tell you AI was probably used but tell you nothing about whether the code sucks:

– More code comments. Generative AI likes to comment code, sometimes obsessively so – and sometimes, these comments are actually useful. They aren’t debugging comments (like those mentioned earlier). They don’t just state what the function is doing, which anyone could clearly see because it’s so easy to write self-documenting functions in Elixir. Good comments can actually improve code readability.

– Words like “seamlessly” or “beautiful”, or other flowery language. These change between models (I remember when GPT-4 was obsessed with “delve”) but this is a clear sign that a simulant has been let loose in your codebase. But when used in a useful comment or inside an Architectural Decision Record (ADR) – it means nothing about quality. Sometimes flowery language diffuses clarity, but humans tend to the other extreme: nothing at all, or hitting the wrong level of abstraction to be useful.

– Defensive programming. Elixir is supposed to fail fast – so I would count most rescue blocks that it adds as garbage – and recover. AI has a tendency to assume that everything can be nil. But sometimes checking stuff exists before you act on it is necessary, and there are instances where the AI has caught bugs in my code before they’ve happened.

– Emojis in your logs. The robots love to insert them, and I usually guiltily remove them to not out myself as haven’t used AI. But sometimes, these emojis are exactly what you need. Some may disagree, but in my view, logs are usually read by humans, and emoji’s make them more readable. You may disagree.

– Huge chunks of code. Just because the diff is massive, doesn’t mean the code is bad. Yes, AI is often over-eager and spits out reams and reams of code. But sometimes, big diffs are necessary for a new feature, or a large refactor.

– Dramatic improvement in code quality. When someone who usually submits mediocre code suddenly produces well-documented, tested code, suspicion arises. But this reinforces the very bias we should fight – dismissing good code because it seems “too good” for that developer.

The Bias Problem

We’re becoming paranoid – PRs where there’s even a whiff of AI (even if the code is good or the code is in fact, not AI-written) are being at best side-eyed and at worst, dismissed with prejudice. A McCarthyite revolution is upon us; we see the reds around every corner (the red light of Hal 9000’s eye, that is).

Hal-9000's eye — I’m sorry, i can’t merge that dave

I have guiltily removed good comments. I have shamefully re-worded ADRs. And worst of all, I’ve deleted defensive function clauses that have lead to failures – all in an effort to prove to the world (and myself) that I don’t generate all my code with AI.

And this has in fact made my code worse and my progress slower.

I have also left spicy comments on other people’s work where I suspect AI has been involved; even if the comment that aroused my suspicions was actually a decent comment. I’ve referred to engineers and reviewers throughout this article, but like many reading this, I’m both: I use AI to write code, and then harshly review (suspected) AI generated code. It’s ironic that reviewers get irate at engineers for using AI as a cognitive shortcut, so they use AI fingerprints as a cognitive shortcut to avoid doing the actual code review!

This bias is also driven by insecurity. I work with smart and dedicated professionals who have spent a great deal of time learning and perfecting their craft. I know from bitter experience how awful it feels sometimes to see my hard-won knowledge of computers, software and systems be synthesized in seconds by the dark forces of an emergent property of a neural network. The instinct to be a Luddite is almost unbearably strong sometimes. But we have to fight it, because the AI isn’t going anywhere and we’re hurting our fellow engineers by giving in to these biases!

What I’ve tried to prove above is that you shouldn’t be biased against AI code, but you absolutely should be biased against bad code. Just because you see a code comment, doesn’t mean you need to throw a hissy fit and lament the old days where you didn’t have to wade through broken AI slop. And maybe the human behind the keyboard actually read the code, understood what it does, checked if it follows conventions, saw if the tests pass and opened it up for review.

You don’t know. You have to read the code first.

AI Artificial Intelligence Code Review

Why your AI Code Bias is Making You a Worse Reviewer

The Smell Test

You Should Be Biased Against Bad Code

You Shouldn’t Be Biased Against AI Code.

The Bias Problem

Categories