Always leave the campground cleaner than you found it
We’re currently joining another infrastructure team at Jimdo (let’s call it a restructuring measure). Among other things, this process involves merging the digital Kanban boards of both teams into one – what a great opportunity to go through our backlog and recklessly close tickets that are duplicates, done, or obsolete for one reason or another.
While studying our backlog, I noticed a couple of things:
- We have too many tickets in our backlog (120+ before the cleanup).
- Ideas are worth nothing unless executed.
- Things we thought were important turned out not to matter at all.
More specifically, we never came around to complete some code refactoring tasks, despite the fact that several of these tickets are labeled “easy pick”.
Why is that? Is it a good idea to postpone refactorings? And if not, what’s the alternative?
Before trying to answer these questions, let’s look at the main reason we need to refactor our code in the first place.
Broken windows
In “The Pragmatic Programmer”, there’s a chapter called Software Entropy. In a nutshell, the chapter makes the point that entropy – the amount of disorder in a system, which tends to a maximum – is the reason that all projects run the risk of decaying during their lifetime. Yet, there are teams that “successfully fight nature’s tendency toward disorder” and manage to get software rot – technical debt – under control.
But how do they achieve this?
The key realization here is that “neglect accelerates the rot faster than any other factor”. Put another way: living with bad code and poor design decisions is likely to lead to even more bad code and poor design decisions.
This understanding is at the heart of the Broken Windows Theory:
The broken windows theory is a criminological theory of the norm-setting and signaling effect of urban disorder and vandalism on additional crime and anti-social behavior. The theory states that maintaining and monitoring urban environments to prevent small crimes such as vandalism, public drinking, and toll-jumping helps to create an atmosphere of order and lawfulness, thereby preventing more serious crimes from happening.
The nice thing about this theory is that it is not only true for crimes such as vandalism, it’s true for software development as well.
Once windows start breaking and nobody cares, more serious crimes will follow. The moment you accept substandard code and inadequate designs, your systems begin to deteriorate (and to slow development down as a result).
It’s a matter of mindset: the more broken windows in your codebase, the more likely people are to think “this code is crap anyway”, and the more “crimes” they’re going to commit. On the other hand, if the code is of high quality, people will probably take extra care not to mess things up.
Therefore, don’t live with broken windows.
Refactor early, refactor often
Addressing the issue of broken windows – or in our case, software rot – is the part where I disagree with “The Pragmatic Programmer” to some extent.
I do agree that you should refactor early and refactor often. Refactor too late and your productivity will decline until you decide to do something about it, which might be a massive undertaking at that point. So it’s generally a good idea to fix flaws as soon as you discover them – when the cost of change is lowest.
The other side of the coin: refactor too early and you risk making rash design decisions based on, well, guessing. In particular, there’s the danger of “optimizing before we know that we need to” aka premature optimization.
I take issue with the following paragraph from the book though:
If there is insufficient time to fix [a broken window] properly, then board it up. Perhaps you can comment out the offending code, or display a “Not Implemented” message, or substitute dummy data instead. Take some action to prevent further damage and to show that you’re on top of the situation.
Recommending to comment out code is questionable advice at best. Code that has no purpose should be killed to not become a source of distraction, confusion, and communication overhead – another broken window, if you will.
The book also suggests putting refactoring tasks on the schedule if you can’t do them immediately. However, if at all, I’ve only seen this work in practice when done as soon as possible, say, within a week. Our backlog is one, albeit small, example of this going wrong.
Incremental refactoring
I recently read an excellent blog post by Ron Jeffries. In it, he adds to the uncomfortable feeling I had about refactoring tickets. He argues that putting them on the backlog is indeed a bad idea, especially when you have a lot of refactoring to do to get back to a “clean field”.
Ron writes:
We took many weeks to get the code this bad, and we’ll surely not get that many weeks to fix it. […] A big refactoring session is hard to sell, and if sold, it returns less than we hoped, after a long delay.
What he suggests instead is to improve the code where we work:
We take the next feature that we are asked to build, and instead of detouring around all the weeds and bushes, we take the time to clear a path through some of them. Maybe we detour around others. We improve the code where we work, and ignore the code where we don’t have to work. We get a nice clean path for some of our work. Odds are, we’ll visit this place again: that’s how software development works.
This type of incremental refactoring reminds me of yet another programming wisdom…
The Boy Scout Rule
The Boy Scout Rule says:
Always leave the campground cleaner than you found it.
No matter who’s responsible for the mess, try to improve the environment for the next group.
I hope that now, at the end of this article, you can see how following that simple rule can make a huge difference to the evolution of our production systems – and to the teams building and maintaining them together.
“Boy Scouting” and an aversion to broken windows have been crucial in keeping the technical debt of our PaaS under control – even more so now that I know that refactoring tickets have no place in our backlog.