Digging Through an Inherited Codebase: How to Be a Code Archaeologist

Researching code is one of the most overrated skills for a developer to have. In the kind of development we do at Revelry, we constantly have to get to know new codebases. Sometimes it’s a client’s legacy application. Sometimes it’s a bit of open source. You are trying to move things forward.

Moving things forward means being able to find what to change and change it.

But we’ve all seen those codebases. You know the one. The quarter-million-line world’s-biggest-ball-of-twine. Hunting through an inherited codebase like this turns finding anything into the proverbial exercise of finding a needle in a needle stack.

What do you do when you are working with an app like that? What if you are staring at a line of code and you need to figure out why it exists in the first place?

Here’s what I would do:

  1. git blame the line. That’ll tell you who committed it (and their email address). Write this down.
  2. Go to GitHub. Use the commit URL (e.g. rails/rails@abc123). In the commit info box, above the commit message, there will be a PR title and description (assuming the commit came from a PR, and wasn’t pushed directly onto the develop branch [shame]).
  3. Go to the PR page. Read the discussion. This is where the story of the code reveals itself. Some of the comments probably tell you at least a little about what was broken, or what the goal was when the code was originally made. Note the people involved. Hopefully, there are other names there– reviewers and other commenters. Write these down. See if the PR references a GitHub issue.
  4. Visit the issue page. Read the discussion. Note the names. You should have most of the story now. Maybe there are related issues listed. Open those too. Read them.
  5. If all else fails, you’ve collected the names of people to ask: the original developer, the code reviewer, the user who noticed the bug or requested the feature in the first place.

If you are starting with a bug report, or a feature request, and you need to identify the place to make changes, you can do something similar in reverse.

Start with the description of what is broken or what needs to change.

Note any URLs listed in the bug report or any distinct elements of the page which are mentioned or pictured in screenshots. Click around the system. Find the page(s). Look for distinct bits of text– a bit of copy that probably doesn’t occur anywhere else, an image that is unique to that page, or a bit of styling that isn’t reused.

Search for the bit of copy or a couple of words at least. Find the image URL or CSS class in web inspector and search for that. Which files appear in these results? Working from the view, I find the controller. Then the models.

All along, I write down method names I encounter so I have more strings to pull on as I try to untangle the knot. Again, I look for bits of text that aren’t often used, or methods that are probably only used in one place and feature. You work down this tree of searches, backtracking when you hit dead ends, until eventually you find the right spot.

You can make this easier on yourself and others by leaving breadcrumbs.

This process is no substitute for having good comments in your code. Or tests. Or documentation. In fact, this approach works best on a codebase that is already readable, organized, documented, commented, and tested.

List your issue numbers in code comments and in your pull requests. Have good code style that anticipates extension (leave trailing commas and newlines). Use the Refined GitHub extension that turns issue numbers in code comments into clickable links.

Make your world’s-biggest-ball-of-twine into the world’s best organized twine collection.

At Revelry, we are innovating with the latest technologies.

Apply to work with us  or Hire us to round out your tech team!

Check out more of our thoughts on development and product.

Keep in touch by subscribing to CODING CREATIVITY.

More Posts by Robert Prehn: