There’s nothing like a good spike
I’m a big proponent of spike solutions. A spike is a simple end-to-end solution to a technical problem. Spikes are cheap – and often dirty – implementations that are meant to be thrown away after exploration. It’s fine to create multiple spikes to explore different directions when dealing with a tough engineering challenge. Spikes should only be concerned with the problem at hand, independent of existing code, best practices, and similar ceremony. That means, for the first time in your career, you’re encouraged to write some lousy shell scripts to get the job done. I do it all the time – no one will blame you.
Spikes are all about reducing risk by reducing the number of unknowns. Spikes help you verify that you’re on the right track, that what you’ve imagined is indeed possible given the constraints you face. Spikes also make for more accurate estimates of development costs. Having trouble predicting how long a feature is going to take? Or if it’s feasible at all? Create a spike, and your decisions will be all the wiser.
The great thing about spikes is that they can prove you wrong before you’ve wasted a lot of resources trying to build The Real Thing™. They help you overcome perfectionism by making sure you don’t lose yourself in details right from the get-go. No wonder spikes are listed among the Rules of Extreme Programming.
That’s enough praising for now. Let me give you a real-world example. Last year, when I was still working at Jimdo, we were looking for a more reliable service for running periodic batch jobs inside our PaaS. One of the candidates that caught my attention was Nomad, the cluster scheduler developed by HashiCorp.
As the Lean Enterprise book has taught us, we should only spend time on automation for products or features once they have been validated. Anything else is wasteful. Rather than rushing to automate Nomad’s setup in AWS on day one (which later turned out to take a couple of weeks), I decided to create a spike first. Before adding yet another tool to our stack and taking on the burden of operating it, I wanted to learn more about Nomad’s capabilities and figure out if it was the right choice for us.
As I mentioned, we were looking for a reliable cron solution. The spike’s purpose was to convince us that Nomad was at least worth a closer look. To that end, I asked myself: what would be the simplest setup to achieve that goal?
I ended up doing some local testing with Vagrant based on a demo that ships with Nomad. For the spike, I started a Nomad server and two Nomad clients inside the virtual machine managed by Vagrant. I then created a minimal periodic job that would send a message to a Slack channel every minute. Based on this quick experiment, I was able to learn more about Nomad’s mechanics and test its resilience to different kinds of injected failures, like killing one or both clients. (During testing, I actually encountered a serious bug in Nomad causing successful batch jobs to be run again after restarting a stopped client. Luckily, that bug had been fixed in master the day before…)
All in all, the spike was a success. We continued doing more experiments in AWS, gradually going from a one node setup to operating Nomad in a highly-available fashion.
In a sense, spikes are low-risk experiments – not unlike Chaos Engineering experiments – to validate assumptions early on. I consider them an essential part of my software development toolbox.
Next time you hit a roadblock, a spike or two might be all you need to move forward.