Here are three seemingly unrelated stories:
Last week, Turkish President Erdoğan demanded the deletion of a German satirical video mocking him for his controversial actions restricting freedom of speech and other human rights in his country. The German government did not comply with his request. In fact, Erdoğan achieved the exact opposite: his reaction to the video inadvertently drew further public attention to it, making it all the more popular in the rest of the world. (I guess the president has never heard of Barbra Streisand.)
Group chat is great for getting quick feedback, managing crises that require immediate attention (such as outages), and sharing funny pictures of cats. At the same time, group chat can lead to mental fatigue, fear of missing out, and an ASAP culture that is all about now. The companies behind popular chat tools like Slack or HipChat are unlikely to highlight these negative consequences.
Viagra was originally developed to lower blood pressure. Nearly any drug has side effects. Some of them are negative, others are beneficial. Viagra, the little blue pill, turned out to be the first effective treatment against erectile dysfunction – a totally unexpected outcome.
What do these three stories have in common? They are examples of unintended consequences.
Not what you thought
Unintended consequences, sometimes also called unanticipated consequences, are results that are not the ones foreseen by specific actions. In other words, you do X to achieve Y, but what you get instead, or additionally, is Z.
There are three types of unintended consequences, and I already gave you one example for each type in the stories above:
- Unexpected benefit: A positive, unexpected benefit. Viagra falls into this category. Its surprising discovery can be attributed in large parts to luck.
- Unexpected drawback: A negative, unexpected disadvantage occurring in addition to the desired effect. Group chat, while being a great invention, also suffers from unintended consequences such as mental exhaustion.
- Perverse result: A perverse effect contrary to what was originally intended. Erdoğan tried to wipe an unpleasant video from the internet – and it backfired big time.
Now you might wonder how any of this fits into the overall theme of running production systems. Trust me, it matters more than you might think – but read on.
The challenge of web systems
In web systems, or any complex software system for that matter, we have to deal with unintended consequences all the time. While we do enjoy unexpected benefits from time to time – fixing bug X accidentally solves problem Y as well – it’s the unexpected drawbacks that are more apparent, and thus more interesting. In web systems, outages are probably the most common manifestation of unintended consequences.
The majority of outages are self-inflicted. At some point, someone did something and it had an unintended consequence. You push a bad configuration or deploy a buggy Docker image and all of a sudden the website goes down. It has happened to all of us. I, for one, have certainly caused my fair share of outages, despite being a very careful person.
But why is it that deliberate changes to web systems will often have negative consequences?
The challenge is that web systems are inherently complex. These systems are composed of many moving parts that work together to form an intricate whole. There’s a high rate of change and often a variety of processes leading to those changes. This makes it hard – if not impossible – to fully understand how all the bits and pieces resonate with each other under different conditions. Put another way, web systems are largely intractable, which is a major reason why outages – be they self-inflicted or not – are both unavoidable and unpredictable.
(Then there’s also the butterfly effect, which says that small causes can have large effects, but I won’t go into that here.)
Decouple all the things
So the trouble is that we’re thinking about a change we’re going to make, but we don’t necessarily anticipate the negative consequences it might have on the system as a whole. In software engineering, there’s a term for this: coupling. We fail to anticipate difficulties because we don’t think about how coupled a piece of software is to the rest of the system.
If we want to build reliable systems – systems that minimize the risk of self-inflicted outages – we have to remove the coupling. We have to decouple all the things.
A decoupled system allows changes to be made to any one component without having an effect on any other component. By isolating each individual piece, we no longer have to keep all these complicated models in our head. Instead, we only have to know the internals of the one component we want to modify (and the interfaces it uses). This in turn reduces the probability of unintended consequences.
A prime example of decoupling at work is Kubernetes, the container cluster manager from Google.
Kubernetes makes it easy to build reliable distributed systems by enabling people to create decoupled distributed systems. Developers can, for example, talk to the Kubernetes API to deploy their application containers, and they can do so without having to worry about cluster nodes, host OS/kernel, or underlying hardware. This way, Kubernetes allows us to decouple operations and separate concerns in terms of teams.
Kubernetes is worth a look if you want fewer outages and other surprises in your ops life.
For more on this topic, I highly recommend watching this presentation by Brendan Burns. He inspired me to think about unintended consequences and write about them here.