I know how important (and difficult) it is to use good naming practices when writing maintainable code. So I try to stop and think whenever I need to create a new name for something. Recently I was creating a variant of getSomethingByPropertyWithThing in a JavaScript app, and I had a random thought:

“I wish this was Ruby, because
get_something_by_property_with_thing
would be easier to read.”

Looking at it now, I’m not even sure I still agree with that. But, it led me to do a little exploration.

Do language naming conventions affect the length or quality of variable and function names?

I thought I’d look into this. And even if I didn’t turn up any momentous results, I decided I could at least provide some interesting musings about language naming conventions.

A little more background on code naming

We’ve all seen code with poorly named variables, methods, and classes. And we’ve all wondered, “Why did they do that?”

My unacknowledged assumption was that we all try to name things well. Sometimes, though, we feel constrained by the context to discard otherwise appropriate names because they’re too long. If the language’s naming conventions make longer names easy to parse, which I think snake_case does better than camelCase, developers would feel less constrained and would therefore choose more descriptive names.

So I wanted to see if a language’s casing conventions impact the length of the variable and method names people choose to use.

The “science”

I picked a few popular repositories across a few programming languages. Then, I set out to compare the lengths of their variable, method, and class names. Here are the languages and projects I chose:

Language	Project	Number of Names	Mean Name Length	Median Name Length
Ruby	Brew	4439	11.6	11
Ruby	Devise	950	13.8	13
Ruby	Discourse	10485	13.6	12
Ruby	Rails	27531	22.6	18
Elixir	Credo	1004	11.2	10
Elixir	Hex	9535	9.7	9
Elixir	Phoenix	1047	9.4	9
Elixir	Timex	594	8.4	8
JavaScript	Atom	2965	12.8	12
JavaScript	FreeCodeCamp	1433	12.5	12
JavaScript	Lodash	792	9.1	9
JavaScript	Luxon	378	8.9	8
C#	Dapper	2601	12.7	10
C#	Nunit	7940	19.0	16
C#	PowerShell	43045	16.1	15
C#	RestSharp	2018	14.7	11

I chose these languages arbitrarily, based largely on popularity combined with my own familiarity. And, I chose the repos by popularity on GitHub and in the language communities as I understand them.
I didn’t use language parsers. I was lazy and used regex (yeah, I know), figuring what I needed was a large-ish sample of names, not a comprehensive capture. That may have caused subtle distortions. I think I missed JS variables created by destructuring in JavaScript, and it appears that Luxon uses a lot of those. The names I missed could be longer or shorter on average than the function names and the variable names that are declared outside of destructuring.
I removed underscores when counting length. So get_something_by_property_with_thing counts the same as getSomethingByPropertyWithThing. If the underscores make things easier to read, you pay for it in the way of slightly decreased information density.

In every case, the mean length was greater than the median, meaning there are outliers on the top end. That’s not surprising, because short names are much more frequent than long names, and there’s a hard limit on how short a name can be.

The mean and median are very close together for most repos, indicating that extreme outliers (i.e. super long names) are very rare. That appears to be a little less true of the C# and Ruby repos in the sample, however, despite not sharing a naming convention.

There are some large differences in name length, but they hardly fit into a pattern I’d call conclusive. There are large variances between languages that have similar naming conventions and between projects written in the same languages. Compare Atom and Lodash, both JavaScript projects.

Or Elixir vs Ruby, which by and large share a naming convention:

Context is king

It appears, unsurprisingly, that the developers of top-notch software know their business and choose names appropriate to their context.

Lodash contributors know they’re working on a general-purpose utility library whose users are not afraid of digging into their comprehensive documentation and even their source code when necessary. So rather than asking their users to litter their code with things like lodash.splitIntoChunksOfSize, we have _.chunk.

Atom developers face a whole different set of circumstances. Their hundreds of contributors are the only ones who typically see the code behind their powerful text editor, but it’s used by 18% of software developers according to the most recent Stack Overflow Developer Survey. Software like this has to manage a ton of complexity, leading to the need of some more explicit names, like numPathsToPretendToSearchInCustomDirectorySearcher. Your users are counting on you to pretend to search some paths in a custom directory searcher, and they don’t care how elegant your variable names are!

Language features may also influence the length of names required to convey the necessary details. Elixir’s pattern matching facilities might lead to shorter names, for instance. Multiple function signatures can be used to handle different kinds of inputs, whereas other languages would require separate functions. And, each requires its own distinguishable name. Elixir is also the only purely functional language in the group, and as such you’d expect variables to be used less frequently and to be very short-lived.

Long names

The best part was seeing the kinds of variable and method names that live down in that long tail. Ruby and PowerShell really stand out here:

The longest name was in PowerShell. And you just know from the name is a powerful piece of software dealing with some complicated concepts and interactions.
ViewsOf_System_Management_Automation_AliasInfo_System_Management_Automation_ApplicationInfo_System_Management_Automation_CmdletInfo_System_Management_Automation_ExternalScriptInfo_System_Management_Automation_FilterInfo_System_Management_Automation_FunctionInfo_System_Management_Automation_ScriptInfo is 301 characters long, but I only counted 273 because of all the underscores. It has 28 underscores, after all!

Do you know how many test cases Rails has? I don’t either, but I extracted more than 11,700 names starting with test_ from the Rails codebase, including the longest one, at 125 characters (without underscores): test_legacy_marshal_signed_cookie_is_read_and_transparently_encrypted_by_encrypted_hybrid_cookie_jar_if_both_secret_token_and_secret_key_base_are_set. More than 2000 of the runners-up were also test methods or test helpers. The longest Rails name I found that’s not part of a test is a class for a very specific ActiveRecord error: HasManyThroughCantAssociateThroughHasOneOrManyReflection.

Code is written for humans, so context matters.

If I had to draw a conclusion, I’d say this: Context matters a great deal. Language conventions are a part of that, but the problem domain and culture of the individual project are much more important. Developers in general don’t appear to be afraid of long variable names when they’re warranted, regardless of language or casing convention.

Perhaps there’s something to be said about whether code reflects the organizations that write or sponsor it. Let’s pick up that question at another time.

Jason Pollentier Languages Open Source Phoenix Programming

Language Naming Conventions in Programming: It’s All in the Context