The Burden of Running Systems
In The Power of Less Code I wrote:
A couple of weeks ago, we finished the migration from Dkron to Nomad for running all periodic batch jobs of our PaaS. […] And yet there’s a much better solution we would have preferred: not [operating Nomad] in the first place, but rather outsource the task of running periodic jobs to a hosted service provider with a paid support plan.
One reader replied that the solution to not writing code wasn’t to load it off to someone else, e.g. to a SaaS vendor, as dependencies would be just as bad.
That’s a good point. While I still believe in every single word I wrote, this comment made me realize that my original statements lacked depth and deserve more explanation from my side.
You’re not paid to write code
In his excellent post, You’re not paid to write code, Tyler Treat takes the same line as I do: writing code should always be the last resort and never the first option to add value to the business. However, Tyler looks at the topic from a different angle, drawing from systems theory. Some of his points are worth quoting here (emphasis mine):
[John] Gall’s Fundamental Theorem of Systems is that new systems mean new problems. I think the same can safely be said of code – more code, more problems. Do it without a new system if you can.
Every time you write code or introduce third-party services, you are introducing the possibility of failure into your system.
Systems are seductive and engineers in particular seem to have a predisposition for them. They promise to do a job faster, better, and more easily than you could do it by yourself or with a less specialized system.
Almost anything is easier to get into than out of. When we introduce new systems, new tools, new lines of code, we’re with them for the long haul. It’s like a baby that doesn’t grow up.
Sharp observations (and a nudge for me to learn more about Gall’s work).
In a nutshell, new systems mean new problems. Change introduces new forms of failure. We should therefore think twice before writing code or adding third-party services (which are systems too and require developing integration code).
Questions, questions, questions
This brings me back to Nomad and the general question of whether using a hosted service – assuming there is one – would indeed be better than operating such a system ourselves?
The answer is, of course, that it depends on the circumstances. It’s not always better to use SaaS products just as it’s not always better to operate systems in-house. There are no absolutes – you must weigh the tradeoffs.
At this point, you should ask yourself a lot of questions:
-
Does it make sense to pay another company for providing service X? After all, it’s their core business, not yours. It’s safe to assume they’re much more skilled in technology X.
-
Alternatively, would it be a good idea to operate the system in-house and take the burden of automation, monitoring, backups, bug fixing, updates, etc.? What other potential gains do you lose when choosing this alternative (opportunity cost)?
-
Do you avoid using already existing software and tend towards reinventing the wheel for the wrong reasons? Beware of the NIH syndrome!
-
Are you afraid to give up control and become dependent on a vendor? Do you trust the other company to do the right thing? Trust is paramount.
-
What services (if any) are there on the market? Is there a good fit in terms of features, licensing, security, SLA, support, etc.? Spend some time evaluating the available options. Do a quick spike if a product sounds promising. Run chaos experiments to verify your assumptions.
-
What are the total costs of using an existing service compared to hosting it all yourself?
-
Could it be that you underestimate the time it takes to build and run a similar system? Features are often much more complicated than we realize.
-
Will it be difficult to integrate with the external service? (Applying reliability design patterns such as timeouts and exponential backoff is an article of its own.) Conversely, what about getting rid of the dependency again? (The software is only done when it’s deleted.)
As you can tell, I’m biased. I prefer using a hosted service provider because that’s how we do things at Jimdo. Operating a system is always our plan B, never plan A. This strategy has been working very well so far.
Solving the right problem
Let’s wrap this up with a little reminder. Before deciding whether to run a system or not, I want you to step back for a moment and think again if you actually need more software to solve a problem or if you can somehow do without it.
In Rework, the business book by Jason Fried and David Heinemeier Hansson that is unlike any other book I know, they write:
Small is not just a stepping-stone. Small is a great destination in itself. […] expenses, rent, IT infrastructure, furniture, etc. These things don’t just happen to you. You decide whether or not to take them on.
We only started looking into running periodic batch jobs on our platform when other development teams kept asking for it. We’ve had the need for such a system, so we took it on.
At the end of the day, it’s not only about solving the problem right, but it’s also about solving the right problem.