I spent some days tinkering around with the Mesosphere Datacenter Operating System (DCOS). I wanted to understand what the DCOS is and how to get it up and running. This post captures the things I learned.
Based on my learnings, I started a project that makes it possible to launch (and delete) an entire DCOS cluster with a single command. I’ll write about that too.
If you’re planning to give Mesosphere DCOS a try yourself, this post is for you.
A high-level overview of DCOS
So, what exactly is the Mesosphere DCOS?
Mesosphere DCOS is the flagship product of Mesosphere, Inc.
The official documentation describes the system as follows:
The DCOS is a new kind of operating system that spans all of the machines in your datacenter or cloud and treats them as a single computer, providing a highly elastic and highly scalable way of deploying applications, services, and big data infrastructure on shared resources. The DCOS consists of everything necessary to build out a self-healing, fault-tolerant, and scalable solution.
Similar to how an operating system like Linux manages resources and provides common services for programs, DCOS manages a cluster of machines and provides facilities for deploying and running distributed applications on the cluster. To achieve that, DCOS combines prominent open-source projects, most notably Apache Mesos and Marathon, with partly proprietary components.
Put another way, DCOS is the mother brain that links and controls everything. It takes care of the inherent complexity of large-scale distributed systems – dealing with problems like fault tolerance and synchronization – and hides that complexity behind simple, approachable interfaces, interfaces that are well known to operators and developers alike:
There is a command-line tool (DCOS CLI) that allows you to quickly install datacenter services like Kafka or Spark to your cluster. This process, which only takes a single command, appears to be as simple as installing a Debian or RPM package on a server – although it’s obvious there is a lot more going on under the hood. You can also use the CLI to deploy your own Docker-based applications, manage the nodes in your cluster, view the status of various system components, and more.
Besides the CLI, DCOS also comes with a beautifully designed web interface from which you can monitor the current state of your datacenter. It will show you, among other things, a list of services you’re running, a live view of nodes used across your cluster, the number of tasks running on each node, and graphs of how much CPU and memory is being used per cluster/node/service. I must admit that the web interface makes scaling your cluster quite an experience.
Mesosphere is about to open up their SDK and developer program, which will “provide tools – documentation, reference libraries and support – that will make it simple to develop all types of distributed services or applications on top of the DCOS”. You can sign up for early access. I haven’t (yet).
Mesosphere’s ambitious and worthwhile goal is to “turn the software developer into a data center programmer”. With DCOS, you don’t need to be an expert in distributed systems to deploy and maintain today’s applications. At least, that’s the theory; I don’t know how well this works in practice. While the available interfaces look promising, I haven’t used the system enough to evaluate whether it provides the right abstractions and sufficient operational visibility for use in production. I’ll share my experience once I’ve learned more.
Bootstrapping your own DCOS cluster
Mesosphere DCOS comes in two flavors, a free Community Edition and a commercial Enterprise Edition. This will be about the free CE only. As of writing, the CE is available for Amazon AWS, with support for Microsoft Azure and Google Cloud Platform coming later.
Being familiar with AWS, I wasted no time and followed Mesosphere’s setup guide for creating a DCOS cluster on AWS, which includes these steps:
- Create an AWS EC2 key pair
- Create a DCOS cluster by selecting and configuring a CloudFormation template
- Monitor the AWS cluster provisioning process in CloudFormation
- Open the DCOS web interface by copying the hostname from CloudFormation
- Install and configure the DCOS CLI, again using the hostname
This all worked fine except for the last step, as I had trouble getting the CLI install script to work under Mac OS X. That and the fact that I have a distaste for “clicked infrastructure” of any kind led me to working on a better solution. Given that it only takes one command to install software under DCOS, I wondered what it would take to make it just as easy to bootstrap the DCOS itself?
This is how dcos-bootstrap came to be. The project automates all steps involved in bringing up and tearing down a DCOS cluster on AWS, without ever touching the AWS Management Console. Getting started with the DCOS has never been easier!
Internally, dcos-bootstrap uses the official CloudFormation templates provided by Mesosphere for installing the cluster. On top of that, it automates the remaining tasks I listed above with Ansible and glues everything together in a convenient
Now your own DCOS cluster is just a
make bootstrap away. Conversely,
make destroy will delete the cluster and all of its resources from AWS again.
The best thing: After running
make bootstrap, you will also get a preconfigured DCOS CLI for instant use. So you can start deploying your own applications and manage other aspects of your cluster right away.
One final note: dcos-bootstrap is ideal for playing around with DCOS CE. It is, however, nothing you should be using in production as-is! For instance, the DCOS web interface will be accessible by anyone who knows the URL. You’ve been warned.
In any case, check out the README of dcos-bootstrap if you want to learn more about it.
To be honest, I like what Mesosphere has achieved with the DCOS. Unfortunately, there’re currently some things that discourage me from doing more with it.
As a matter of fact, the Community Edition is neither open source nor is it downloadable.
On the one hand, Mesosphere grants a “free cloud license” without any limits or lock-in. On the other hand, what you actually get is access to some CloudFormation templates that are difficult to understand (over 2000 lines of JSON) and hard to modify (there’re only a handful of parameters) – certainly nothing you would blindly deploy next to your production stack.
What this means in practice: You can use these templates for installing the DCOS to your own infrastructure on AWS, but you cannot directly download, let alone inspect it in any other way. Of course, the EULA prohibits reverse engineering, but you don’t have to be a genius to understand what’s going on on your servers either (just run
dcos node ssh and figure things out). This makes no sense to me.
Now, I’m not exactly sure how the community is supposed to use the Community Edition without violating any terms. I asked some people at Mesosphere whether the license permits to take the current CloudFormation templates and adapt them to whatever needs I might have. I haven’t received a definite answer yet; they’re still evaluating their options.
In the end, all I want to know is how we can make the most out of DCOS CE without stepping on somebody’s toes. I know that Mesosphere loves open source. I’m sure they will sort things out eventually. Otherwise, I’m afraid that growing a community around DCOS would be really hard.
Update (November 19, 2015)
Tobi Knaup, the CTO of Mesosphere, emailed me about the licensing questions. Here’s the gist of it:
While not all pieces of DCOS are open source today, feel free to use what we open sourced at https://github.com/mesosphere. It’s mostly Apache 2.0 licensed. Specifically, feel free to modify the CloudFormation template for your purposes.
As for downloading DCOS using something other than CloudFormation:
If you’re just downloading and installing the original DCOS tarball using a different provisioning tool that’s totally fine.
That is good news, indeed. I’m glad that Mesosphere took this very serious and clarified the situation in a timely manner.