I know how important (and difficult) it is to use good naming practices when writing maintainable code. So I try to stop and think whenever I need to create a new name for something. Recently I was creating a variant of
“I wish this was Ruby, because
would be easier to read.”
Looking at it now, I’m not even sure I still agree with that. But, it led me to do a little exploration.
Do language naming conventions affect the length or quality of variable and function names?
I thought I’d look into this. And even if I didn’t turn up any momentous results, I decided I could at least provide some interesting musings about language naming conventions.
A little more background on code naming
We’ve all seen code with poorly named variables, methods, and classes. And we’ve all wondered, “Why did they do that?”
My unacknowledged assumption was that we all try to name things well. Sometimes, though, we feel constrained by the context to discard otherwise appropriate names because they’re too long. If the language’s naming conventions make longer names easy to parse, which I think
snake_case does better than
camelCase, developers would feel less constrained and would therefore choose more descriptive names.
So I wanted to see if a language’s casing conventions impact the length of the variable and method names people choose to use.
I picked a few popular repositories across a few programming languages. Then, I set out to compare the lengths of their variable, method, and class names. Here are the languages and projects I chose:
|Language||Project||Number of Names||Mean Name Length||Median Name Length|
- I chose these languages arbitrarily, based largely on popularity combined with my own familiarity. And, I chose the repos by popularity on GitHub and in the language communities as I understand them.
- I removed underscores when counting length. So
get_something_by_property_with_thingcounts the same as
getSomethingByPropertyWithThing. If the underscores make things easier to read, you pay for it in the way of slightly decreased information density.
In every case, the mean length was greater than the median, meaning there are outliers on the top end. That’s not surprising, because short names are much more frequent than long names, and there’s a hard limit on how short a name can be.
The mean and median are very close together for most repos, indicating that extreme outliers (i.e. super long names) are very rare. That appears to be a little less true of the C# and Ruby repos in the sample, however, despite not sharing a naming convention.
Or Elixir vs Ruby, which by and large share a naming convention:
Context is king
It appears, unsurprisingly, that the developers of top-notch software know their business and choose names appropriate to their context.
Lodash contributors know they’re working on a general-purpose utility library whose users are not afraid of digging into their comprehensive documentation and even their source code when necessary. So rather than asking their users to litter their code with things like
lodash.splitIntoChunksOfSize, we have
Atom developers face a whole different set of circumstances. Their hundreds of contributors are the only ones who typically see the code behind their powerful text editor, but it’s used by 18% of software developers according to the most recent Stack Overflow Developer Survey. Software like this has to manage a ton of complexity, leading to the need of some more explicit names, like
numPathsToPretendToSearchInCustomDirectorySearcher. Your users are counting on you to pretend to search some paths in a custom directory searcher, and they don’t care how elegant your variable names are!
Language features may also influence the length of names required to convey the necessary details. Elixir’s pattern matching facilities might lead to shorter names, for instance. Multiple function signatures can be used to handle different kinds of inputs, whereas other languages would require separate functions. And, each requires its own distinguishable name. Elixir is also the only purely functional language in the group, and as such you’d expect variables to be used less frequently and to be very short-lived.
The best part was seeing the kinds of variable and method names that live down in that long tail. Ruby and PowerShell really stand out here:
The longest name was in PowerShell. And you just know from the name is a powerful piece of software dealing with some complicated concepts and interactions.
ViewsOf_System_Management_Automation_AliasInfo_System_Management_Automation_ApplicationInfo_System_Management_Automation_CmdletInfo_System_Management_Automation_ExternalScriptInfo_System_Management_Automation_FilterInfo_System_Management_Automation_FunctionInfo_System_Management_Automation_ScriptInfo is 301 characters long, but I only counted 273 because of all the underscores. It has 28 underscores, after all!
Do you know how many test cases Rails has? I don’t either, but I extracted more than 11,700 names starting with
test_ from the Rails codebase, including the longest one, at 125 characters (without underscores):
test_legacy_marshal_signed_cookie_is_read_and_transparently_encrypted_by_encrypted_hybrid_cookie_jar_if_both_secret_token_and_secret_key_base_are_set. More than 2000 of the runners-up were also test methods or test helpers. The longest Rails name I found that’s not part of a test is a class for a very specific ActiveRecord error:
Code is written for humans, so context matters.
If I had to draw a conclusion, I’d say this: Context matters a great deal. Language conventions are a part of that, but the problem domain and culture of the individual project are much more important. Developers in general don’t appear to be afraid of long variable names when they’re warranted, regardless of language or casing convention.
Perhaps there’s something to be said about whether code reflects the organizations that write or sponsor it. Let’s pick up that question at another time. Hope you enjoyed this post on language naming conventions, read more of my work here!