Software Engineers don’t scale … or how I learned to stop worrying and love commenting my code.

NOTE: I originally published this on the Inside Heetch blog

This article, by Mathew Skelton and Manuel Pais, reminds me that, for everything that’s been made simpler in this era of cloud computing and micro-services, we’ve paid a price in terms of cognitive load. We simply need to keep more context in mind when making or evaluating any change to our code.

When we work in small teams this mass of context is quickly internalised, but what was easy to understand initially can become a problem as teams scale up. I’d like to explain why, and suggest some simple steps you can take from day one to ease that pain.

Get comfortable, it’s story time…

In the early days of a company or project, there’s very little code maintained by very few people. This is where our story begins.

Imagine you’re the 2nd engineer hired into a start-up. You spend your first days reading the few thousand lines of code that exist thus far, and you ask your colleague “hey, in the fulfillment service I don’t see where we charge the customer, can you help me?”. Your colleague leans in and says, “sure, let me show you. I actually message this other service…”, and as quick as that you’re back on track.

It’s easy to communicate, and to keep the context of what you’re doing in your head, and even the context of what your colleague is doing, to a degree.

You hire another engineer, let them read the code and then give them some low-hanging fruit to work on. As they get started with one of the tasks, they find that they are a little stuck, so they ask you “Hey, can you show me where in the fulfillment service we initiate a charge to the customer?” “Oh”, you say, “yes, I remember, we actually send a message over to the charge service…”

After a while, you scale up and more engineers get involved. Initially this is easy. You spend a little time explaining things and answering questions, but eventually the new engineers get up to speed and become productive. You’re now a senior engineer (congratulations!). One day a new engineer asks one of the other junior engineers “Where in the fulfillment service do we initiate the charge?”, and the junior engineer replies “Hmm, I don’t actually remember, but I know who will!”, and you get pulled into the conversation.

Now the code is growing faster and the amount of activity you have to keep in your head is growing too. It’s no longer possible for you to know everything that is going on.

Eventually, the company grows again, more engineers are hired, you’re now a team leader (your career is stellar!). Each time an engineer joins you and the senior engineers spend time walking them through the projects, and then eventually they begin work. Because there is more code now it’s harder and harder for each new engineer to understand the full context of what is happening.

To be productive it’s necessary for engineers to isolate sections of code and work on them without understanding the full context of a program. They can confidently do this because you have good test coverage.

Every day your new Slack channels are full of questions like “Hey, Dave, where do we initiate a customer charge when fulfilling a request?”, and answers like “I don’t know, but perhaps @Jane or @Rakesh will?”. Your new QA Engineer is asking questions like: “I want to make a robot test that establishes that all customers are billed before we fulfill an order; what’s the entrance point I should use?”, and your help-desk engineer is asking, “The customer says her browser crashed as she was completing an order; she’s been charged, but we’ve got no record of the order. Is that something that can happen?” The team starts looking into these questions, but 5 minutes in comes the question: “Hey, where do we initiate charges?”

Code level documentation is a scaling issue

What is it I’m trying to show with the story above? As time progresses, the most experienced engineers spend more and more of their time helping the juniors in relatively trivial ways. Resolving indirection, and providing context, not to mention all the meta-discussion about what was or wasn’t decided, when, and by who. Every time the senior engineering staff gets asked a question they themselves lose track of what they were doing and have to spend time to regain that context.

The cost of each change made to the code gets higher and higher, even though we’ve been meticulous about writing good code, with good test coverage, in a great language, backed by CI/CD and world class tooling.

This might sound like a straw man argument, but it is not. It’s based on a real-life experience of working at a start-up that lived by the mantra “The code is the documentation”. Ultimately this policy made scaling the engineering team hard, and wasteful.

Why did that happen, and how can we avoid it?

One of the main reasons software engineers don’t always understand the value of human language commentary mixed into their code is because they don’t see the use case. After all, to work on source code you must first understand it, and that means reading the code. Surely repeating yourself in the form of in-line comments just costs time and clutters up the source code?

Well, let’s take that idea a little further. Reading code implies something more than that. Making a change to some software requires that you read enough of the code to find the specific path through the software that must be changed, hold a summary of that context in your head and make a change within that context. Often you’ll cross-check details of that context by visiting the files and functions that comprise it.

This contextual summary in your mind is the very thing that is so easily lost when someone distracts you from your code — and why we feel frustration at those moments. This real-world context-switching is very costly, as we have to go back, and regain the train of thought, and frame of reference required to continue what we were doing prior to interruption.

We engineers have built lots of fancy tools to help us in this quest to navigate and understand code quickly. We have complex search tools, automated lookup of symbol definition and consumption. Many of us invest large amounts of time and effort into this, whilst studiously avoiding a more direct answer.

Yes, it’s just an engineering problem

Lets think about the way we work when coding, and compare it to the way our programs work. Here are two cases that occur commonly in computer systems:

  • I have some summary information that I have calculated at a significant cost. I need to reference this information a second time, how do I approach that? Well, I store it. Perhaps in a variable, if it’s only relevant locally, and in the short term, or in some persistent store if it has broader or longer-term relevance. In either case, when our process is interrupted and then later scheduled on CPU again, we pay the cost of saving and reloading the state, including what we calculated, but not the cost of calculating it. Likewise, if we re-use the result, there is no additional cost.
  • I wish to access some information that I know is somewhere in a large store of data. I have two options. I can either scan through the data checking each item for a match, or maintain an index that can lead me quickly to it. There is a small trade-off to be made in terms of creating and maintaining the index, but in a great many cases this is worthwhile because of the speed of access to the data.

Can we translate these ideas to our own patterns of work? How do we cache and index for an engineer? In-line comments fulfill both these roles, they can act to cache contextual information where it is required, and they can act as indexes to other information. Consider the following comments from Go code:

// The Service.Config struct was initialised in Service.Configure.

This comment captures a little bit of context about where some information being used in a function comes from. The two seconds it took to write will save me a grep and trawl when I revisit this code in 6 months time. When reading some code for the first time, I often try and add this kind of commentary as I go.

Likewise, I make sure that the functions I visit are properly documented:

// Configure initializes the Service.Config struct from the
// configuration values stored in the file with the given path.
func (s *Service) Configure(configPath string) error {

Why have I duplicated that information? Because referencing the member of the structure that it initialises makes it possible to see at a glance what it does, which again saves time.

Why have I deviated from the norms of Go commenting style by including the struct name Service? It allows me to grep specifically for Service.Config rather that s.Config. The code itself will refer to s.Config, but in order to know that variable name used internally is s one must already be looking at the code you want to find. Of course convention suggest it would probably be s, but you cannot always be sure.

As this is Go code, and Service.Configure is a public method of a public structure, this comment will wind up in auto-generated documentation as well, and such a facility exists in a great many modern languages.

// Send an asynchronous message to the Charge
// service. Either a ChargeInitiated message or ChargeError 
// message will be sent back and handled by 
// handlers.ChargeResponseHandler.

This comment is even more of time saver — it describes the actions with relation to another service with which this code interacts. The message it describes might be considered a side effect of the code, especially as we won’t block on the call, and the pointer to where the response is handled nullifies the conceptual cost of the indirection in the code.

Finally, this example points us at a non-code external resource and answers the fundamental “why” question:

if request.CountryCode == "DE" { 
	// We're in Germany, so we cannot apply discounts to books. See 
	//  https://github.com/MyFirm/our-project/issues/156

Reusable information

Now, the great thing about comments in code is that they are persistent and accessible by audiences other than me. Think about the following use cases. Every one of these scenarios can be made more efficient by spending a few seconds writing notes in your code.

A new engineer approaching the project

A decent README, detailing the intent of a program or library and its entrance points will get a new engineer from 0-100 (or 0-60 for you imperial types) much faster than just staring at the code. If you hand someone a map, they’ll get much further before they ask for directions.

Another software engineer performing code review on your code

I find code review is much more difficult than writing code. To do it well requires some mental gymnastics.

You have to understand the flow and context of what the code was doing before the change. You have to understand the intent of the change, and perhaps some additional discussions and decision that led to it. You have to understand the flow and detail of the solution, and within that you have to see what actually changed and what effects that will have.

With all of that in your head, you have to check that the right thing has been done, and that this is in line with stylistic constraints your team places upon code.

Some explanatory notes in line can work wonders with understanding the intent of the changes whilst browsing a diff.

Better reviews mean fewer bugs, and better understanding means fewer invalid review points, which in turn makes less work for everyone involved!

Engineers consuming a library or package

This should be obvious, but if you’re using a library to achieve something, it is probably not your intent to invest the time to fully understand the implementation of the library before you use it. Well written doc-strings or documentation comments, especially where those get automatically translated into reference documents, obviate the need to read the code to understand the intended flow and effects of a library.

This saves immeasurable time. It doesn’t matter if the engineers are in a different company, different team, or even just dealing with a package in a project large enough to have several of them. They could even be QA engineers doing white box testing. The point remains: If you make an abstraction, you should make it consumable in an abstract way, and that means explaining what can’t be seen purely from function signatures and variable names.

Conclusion

My intent in this post has been to show that far from being a time sink, taking the time to write down what is in your head, whilst you work on the code can save everyone time. The time it saves only increases as the development team grows bigger.

Yes, of course, there is a cost to maintaining comments and documentation strings, but if you make this part of your work routine you’ll find the cost is negligible when compared to the cost of repeating the same answers over and over again in response to inquiries. My tip is that each team member checks that comments and documentation strings are updated when they review pull requests. If your comments are out of sync with your code that’s an engineering failure just as much as a poorly expressed piece of code.

The idea that “the code is the documentation” appeals strongly to the engineers impulse to do what they do well, and efficiently, but it’s a mantra that fails to recognise the difference between giving clear instructions to a system of related computer programs, and the ability of individual humans to deal with that complexity, or the fundamental questions that humans always ask, and computers never do: “Why?” and “What the ..?”