Decoding YAML in Go

6 minute read

I originally developed chef-runner as a fast alternative to the painfully slow vagrant provision. The tool has since evolved and can now be used to rapidly provision not only local Vagrant machines but also remote hosts like EC2 instances (in fact, any system reachable over SSH).

Due to its popularity in the Chef world, I also added support for Test Kitchen, a CI tool for testing infrastructure code. Test Kitchen happens to store its configuration as YAML files. To provision an instance managed by Test Kitchen, chef-runner parses the respective YAML file on disk and extracts SSH connection settings from it.

This was the first time I had to decode YAML data in Go. Decoding itself is easy. There are, however, some more advanced techniques I’d like to write about today. I’m going to show different iterations of the code I wrote for chef-runner. To make complete sense out of the presented examples, though, it’s worth taking a short look at Test Kitchen first.

Test Kitchen

Test Kitchen comes as a Ruby gem. You can install it this way:

$ gem install test-kitchen

Once installed, use the kitchen command-line tool to create a project skeleton with two Test Kitchen instances:

$ kitchen init
...
$ kitchen list
Instance             Driver   Provisioner  Last Action
default-ubuntu-1204  Vagrant  ChefSolo     <Not Created>
default-centos-64    Vagrant  ChefSolo     <Not Created>

For our purposes here, it’s enough to boot up one of the instances, e.g., the one based on Ubuntu 12.04:

$ kitchen create default-ubuntu-1204
-----> Starting Kitchen (v1.3.1)
-----> Creating <default-ubuntu-1204>...
       Bringing machine 'default' up with 'virtualbox' provider...
       ==> default: Importing base box 'opscode-ubuntu-12.04'...
       ...
       Vagrant instance <default-ubuntu-1204> created.
       Finished creating <default-ubuntu-1204> (0m33.90s).
-----> Kitchen is finished. (0m34.14s)

For each instance, Test Kitchen will store a YAML-encoded configuration file in .kitchen/. The file looks like what you see below:

$ cat .kitchen/default-ubuntu-1204.yml
---
hostname: 127.0.0.1
username: vagrant
ssh_key: "/long/path/to/private_key"
port: '2222'
last_action: create

Bingo! I was pleased to find out that this file stores all information required to access the instance via SSH. Now I only had to process the data in Go for chef-runner to be able to provision Test Kitchen instances…

Decoding YAML in 3 steps

When it comes to working with YAML in Go, there is no better package than, well, the yaml package:

The yaml package enables Go programs to comfortably encode and decode YAML values. It […] is based on a pure Go port of the well-known libyaml C library to parse and generate YAML data quickly and reliably.

For the sake of this post, I’m only going to focus on decoding YAML (encoding really isn’t that different). With the yaml package, the whole decoding process typically boils down to three steps:

Step 1: Declare a struct type with fields mapping to YAML values

After looking at the contents of the Test Kitchen YAML file again, it’s easy to come up with a struct type that contains all the fields – with proper name and type – we need:

// Test Kitchen instance configuration
type instanceConfig struct {
	Hostname string
	Username string
	SSHKey   string `yaml:"ssh_key"`
	Port     string
}

Note that the yaml package will only decode exported struct fields. It will map a field name of, say, Hostname to a YAML key of hostname by default. Sometimes keys don’t map nicely to field names. In that case, you may define a different key via a field tag, as I did for SSHKey. Last but not least, if you don’t care about some YAML value, simply omit it from the struct type.

Step 2: Add a method for decoding

Now that we have our instanceConfig type, let’s add a Parse method to it. In its simplest form, this method is just a wrapper around yaml.Unmarshal, which decodes the YAML data within the passed byte slice into our struct:

import "gopkg.in/yaml.v2"

func (c *instanceConfig) Parse(data []byte) error {
	return yaml.Unmarshal(data, c)
}

Since Parse just wraps yaml.Unmarshal, you might be wondering why the method needs to exist at all. There are actually two reasons. First, the caller doesn’t have to know anything about YAML decoding (separation of concerns). Second, we can extend Parse without having to change its signature. For example, in chef-runner I also check that each struct field has a valid (non-zero) value and otherwise return an error:

func (c *instanceConfig) Parse(data []byte) error {
	if err := yaml.Unmarshal(data, c); err != nil {
		return err
	}
	if c.Hostname == "" {
		return errors.New("Kitchen config: invalid `hostname`")
	}
	// ... same check for Username, SSHKey, and Port ...
	return nil
}

Step 3: Put it together

Finally, if you put the pieces together – reading the YAML file, decoding the data, printing the result – you will end up with something like this:

func main() {
	data, err := ioutil.ReadFile(".kitchen/default-ubuntu-1204.yml")
	if err != nil {
		log.Fatal(err)
	}
	var config instanceConfig
	if err := config.Parse(data); err != nil {
		log.Fatal(err)
	}
	fmt.Printf("%+v\n", config)
}

The program’s output shows that we successfully decoded the configuration file of the Test Kitchen instance:

{Hostname:127.0.0.1 Username:vagrant SSHKey:/long/path/to/private_key Port:2222}

With this information at hand, chef-runner is in the position to log into the instance and do its magic via some third-party SSH library.

Auxiliary structs

While the code from the previous section already does the job, there is one thing that bothered me: instanceConfig stores the SSH port as a string. This is a direct result of Test Kitchen quoting the YAML value like this: port: '2222'. I wanted the port to be an integer since I had to pass it to a SSH library that way. Having to convert the string after decoding YAML appeared to be a bad solution.

Fortunately, I stumbled upon the slides of what must have been an excellent presentation by Francesc Campoy. In it, Francesc suggests to use an auxiliary struct type to decode JSON that cannot be decoded the usual way. After applying this idea to the Parse method, the Port field finally ended up being an integer:

type instanceConfig struct {
	Hostname string
	Username string
	SSHKey   string
	Port     int
}

func (c *instanceConfig) Parse(data []byte) error {
	var aux struct {
		Hostname string
		Username string
		SSHKey   string `yaml:"ssh_key"`
		Port     string
	}
	if err := yaml.Unmarshal(data, &aux); err != nil {
		return err
	}
	if aux.Hostname == "" {
		return errors.New("Kitchen config: invalid `hostname`")
	}
	// ... same check for Username and SSHKey ...
	port, err := strconv.Atoi(aux.Port)
	if err != nil {
		return errors.New("Kitchen config: invalid `port`")
	}
	c.Hostname = aux.Hostname
	c.Username = aux.Username
	c.SSHKey = aux.SSHKey
	c.Port = port
	return nil
}

The Unmarshaler interface

One more tip: it is also possible to decode YAML data directly into instanceConfig by implementing the yaml.Unmarshaler interface. For this, rename and modify the Parse method to look like this:

func (c *instanceConfig) UnmarshalYAML(unmarshal func(interface{}) error) error {
	var aux struct {
		// ...
	}
	if err := unmarshal(&aux); err != nil {
		return err
	}
	// ...
}

Now this code works as expected:

func main() {
	// ...
	var config instanceConfig
	if err := yaml.Unmarshal(data, &config); err != nil {
		log.Fatal(err)
	}
	// ...
}

If you’d like to learn more about the actual implementation in chef-runner, you can view the final code here.

Updated: