Human Readable Code

I recently got my virtual wrist slapped when a developer asked about good coding practices, and I over-generally recommended that he should “comment like crazy”. Immediately I received several responses implying that comments are evil, and that truly well written code should need little or no comments to explain what the code is doing, and the variable and function names should do most of the explaining. Could this be correct? Some people even say that inline comments are a sign of code smell. In school, we were forced to heavily comment all code in our introductory classes, and once in upper-division, no longer required, but strongly encouraged to do so. Good comments were always brought up when lectures briefly mentioned coding styles and standards. How then, would very readable but non-commented code exist?

I did a brief Google search on “Human Readable Code”, and was delightfully impressed when I discovered a brief excerpt from Joshua Kerievsky’s book Refactoring to Patterns. In this excerpt, he uses some code written by Ward Cunningham which gives the perfect example of human-readable code:

november(20, 2005)

Afterwords Kerievsky goes on to generally state that human readable code:

Reads like spoken language
Separates important code from distracting code

His example is beautiful! It perfectly illustrates what perfect human readable code is (assuming the function returns a Date object initialized to Nov. 20th, 2005).

However, I have never seen production code like this. I wonder why, look around at work, and it is obvious. When would the average production programmer have time to write and rewrite function names so that someone unfamiliar with his work (or even them 12 months later) could view the function and know at a glance exactly what was being done? Kerievsky’s example above is extremely beautiful code, but simply that he has a november function implies there are most likely january, february, march, april, may, june, july, august, september, october, and december functions as well. Realistically though, it is just bad code. Though descriptive, it will never be used in production environments. I wont get into the potential headache that could arise from having to refactor each of these functions if the underlying code of each function had to be changed. What I did notice is that Kerievsky said that the above code is nicer than this:

java.util.Calendar c = java.util.Calendar.getInstance();
c.set(2005, java.util.Calendar.NOVEMBER, 20);
c.getTime();

While that is certainly true, production coders just don’t get into the proper states of mind to write the best, most human readable name for the function. The problem with function names is that they need to be specific enough to be able to be understood by other people reading/using the code, but vague enough that it can be used for multiple purposes (such as getting any date). Most programmers would wrap the above code in a function like this:

Date getDate(int year, int month, int day)

While the november(20, 2005) sample is a perfect example of human readable code, the getDate example will be seen over and over again simply because it is good enough. The problem arises when method names that need to be generic (specifically thinking of functions declared in interfaces or headers) which contain something along the lines of establishConnection() or initializeVariables(). These names are nearly always too generic, usually require them to do so.

Enter the comment!

Comments not only increase code’s human readability, but are essential to it. I firmly believe that stigmas against commenting arise from comments like this:

//initialize the variables
initializeVariables()

That’s just nasty.

I propose a solution: Write comments before you even begin programming. Describe exactly what you are about to do in your code both to yourself and to the future code maintainer who will see the code (and may even look back at the source-control revision control system logs to see which developer wrote this, and is punishing them from months/years before, who may even be yourself :D). After your first draft, rewrite it in as short and concise language as possible, summarize what you are going to do into actionable steps or verbs. Finish off by filling in the code.

This method absolutely should be done before you write any code. It will work much better because your “English brain” is still capable of coming up with complete intelligible sentences, and your engineer brain hasn’t quite taken over. Furthermore, it will be much more readable in the future when you have to come back to try to find what went wrong or got overlooked. If you start commenting after writing code, you are likely to write comments like the one above. Write what you mean to do in comments first in human-readable English, then start programming.

I feel like this article would benefit from a trivial example, so this is a C# function I wrote for my server-side development class to parse Twitter-like “mentions” of a user out of a users’s string (ie “blah blah blah @brianesserlieu“):

Phase 1 – write a summary of what I want to do:

private static List ParseMessage(string message)
{
    //I need to check the message and see if there are any mentions at all.
    //If there are, I need to parse each mention, and check to see if the
    //parsed users exist.  If they do exist, I need to add them to the list
    //of users to be tagged to this message and return that list of users.
    return null;
}

Phase 2 – break it down into short, concise “verbs”:

private static List ParseMessage(string message)
{
   //parse the string, get each "@..."

    //check to see if any parsed users exist

    //if they do, then check to see if they are valid users

    //if user found, verify the user is an existing user

    //if user exists then add them to the list of users

    //finally return the list of users

    return null;
}

Phase 3 – Fill in the code

private static List ParseMessage(string message)
{
  List userList = new List<RegisteredUser>();
  List errors = new List();

  //parse the string, get each "@..."
  string[] splitStrings = message.Split('@');

  //check to see if any parsed users exist
  if (splitStrings.Length > 1)
  {
    foreach (string s in splitStrings)
    {
      //if they do, then check to see if they are valid users
      Match match = Regex.Match(s, @"A([A-Za-z0-9]+)", RegexOptions.IgnoreCase);

      if (match.Success)
      {
        //if user found, verify the user is an existing user
        string parsedUsername = match.Groups[1].Value;

        RegisteredUser user = DALUser.GetUser(parsedUsername, ref errors);
        if ( user != null)
        {
          //if user exists then add them to the list of users
          userList.Add(user);
        }
      }

    }
  }

  //return the list of users
  return userList;
}

Just remember, when you press the build button or type in make, all sorts of magical things happen. The comping process removes all the comments, fills macros, inlines inline functions, and links everything into one wonderful computer-readable program. Before all of that, computer programmers need to focus on everything occurring before that build button gets pressed, and that is writing human readable code.

Here’s a great discussion on writing and commenting good code.

3 thoughts on “Human Readable Code”

Kasia says:

September 11, 2011 at 11:03 pm

The “beautiful” example with november(20, 2005) only works for people who use American date format – if you are putting the day of the month first, the function looks awkward (think about using a function called “colour()” :)) It seems unnecessary to create 12 functions that do exactly the same except apply a different constant (month number). Personally, I find parameters with their meaning specified in a docblock clearer.

As to commenting, I completely agree that comments are needed. At work we use PEAR coding standards, which say that if you look at the piece of code and think, wow, I don’t want to try and describe that, that’s exactly what you should do. Perhaps you should elaborate more on the naming conventions – you moved quickly from november() to comments.

LikeLike

1. Brian Esserlieu says:
  
  September 13, 2011 at 2:24 am
  
  Another good example of why the november code would be bad. I cannot forget that human-readable code should not just be American-readable code. I too am a fan of parametrized method calls like:
  getDate(Date.November, 20, 2005)
  or for my rest-of-the-world date friends:
  getDate(20, Date.November, 2005)
  
  LikeLike
  
Chris Rebert says:

October 2, 2011 at 12:08 am

This is why I like keyword arguments. They can sometimes be slightly
verbose, but they make the code perfectly clear. Also, the sometimes
rather arbitrary ordering of parameters no longer need matter.
Pseudocode example:

Date.new(month=Date.NOVEMBER, year=2011, day=20)

I expect using LINQ would make your other, longer example more concise
and even more clear.

LikeLike