git history

How to Make Git History Useful to Humans

It’s well understood that code should be written with a human audience in mind, but it’s not quite that simple. Your software isn’t like a book or article that, once written, is delivered to readers in its completed form. Your code – and the captured git history – reads more like a book that is published, edited, and re-published a thousand times. Your code’s readers will often need to reference previous versions to understand the latest.

“Programs must be written for people to read,

and only incidentally for machines to execute.”

Seems simple enough, right? That line from Structure and Interpretation of Computer Programs” (Abelson & Sussman) is both inspiring and intimidating at the same time.

Version control tools like git help to capture and distribute that history, but the way you use git today will determine whether that history will be illuminating or utterly baffling to future readers.

So what can we do to create the best git history for ourselves and future team members?

Leaving breadcrumbs

Assuming you’re using GitHub or a similar git host, you already have access to some nice features that help with this. Make sure your workflow requires all changes to go through a pull request (PR) process before making it to master, and make sure every PR description includes a link to the issue it’s meant to address. That way, a commit can always be traced back to the pull request, and from there back to the original issue.

If there’s a technical or product discussion about how or why to do something, make sure that discussion happens in (or gets copied to) the issue or PR so future readers have access to the full context of the change.

Of course, much has been written about commits: crafting atomic commits, writing informative commit messages, and knowing when (and when not to) squash or rebase. This is all crucial stuff, but I want to discuss something a little more obscure: noise.

Reducing the noise

If a line is changed in your diff, the change on that line should ideally be related to the purpose of your PR. If not, it’s noise that will obscure the history of why that line is the way it is. (It also makes it harder for code reviewers to understand your changes.)

The most common offender here is code style, and this is one of the most underrated reasons to not only have a style guide, but also to enforce it rigorously. If you let things slide, then it becomes impossible to fix them later without introducing noise, and ultimately you end up with a very noisy history, totally inconsistent code style, or both. If you don’t have a consistent style and want to turn over a new leaf, you can bite the bullet and create a single cleanup PR that applies all of your style fixes at once. When folks see that commit in the history, they’ll know that all of the changes in it are purely style, which is a lot easier than having unrelated style fixes in every PR.

Here are some style guidelines that may seem trivial, but whose application can reduce noise in your git history:

  • Whitespace: Trim trailing whitespace, use consistent indentation, and add a newline at the end of every file.
  • Use trailing commas in multi-line lists and declarations.
  • Use brackets (and multiple lines) for single-statement if clauses.
  • Empty methods / statements should have start / end on their own lines. No catch(e) {} or def noop; end.
  • When chaining methods, put each invocation on its own line.

It’s pretty clear how whitespace inconsistencies introduce noisy diffs, but some of the others are more subtle. I think we’ve all heard “If you don’t use brackets on a single-statement if block, the next dev may come along and add a statement, thinking it’ll be part of the if but really it won’t.” The history argument is different: “If you don’t use brackets on a single-statement if block, the next dev will have to modify the line containing the if statement itself just to add their own statement.”

diff-with-noise
diff-no-noise

The general idea is to plan for extension not only in your architecture but also in the formatting of your code. Let’s apply this thinking to a functional React component:

export default Component = props => (
  <img
    alt={props.alt}
    src={props.src}
    onClick={props.onClick} />
);

If you structure your code this way, certain future changes will require introducing noise:

  • Removing the onClick prop would mean changing the line with the srcprop on it to close the tag.
  • Adding a prop after onClick would mean changing the onClick line to move the closing tag.
  • Adding any logic before returning the JSX would require indentation changes to pretty much every line.

Structuring the code slightly differently would eliminate all of those issues:

export default Component = props => {
  return (
    <img
      alt={props.alt}
      onClick={props.onClick}
      src={props.src}
    />
  );
};

Tips for navigating git history

git blame is the obvious tool, as it shows you info about the most recent commit to touch each line in a file. Depending on the amount of noise, blame is often enough to get a good idea of the kinds of things that have been happening around a particular function or feature. Sometimes, though, you need to dig further back. GitHub’s Blame UI has a link to View blame prior to this change, which lets you dig back deeper. My favorite command-line tool for this is git log -L start,end:file, which displays changes to a particular chunk of a file in reverse chronological order, with commit messages and diffs.

And if you really want to dig in on some code archaeology tips, Robert’s advice on researching code is solid.

We're building an AI-powered Product Operations Cloud, leveraging AI in almost every aspect of the software delivery lifecycle. Want to test drive it with us? Join the ProdOps party at ProdOps.ai.