Blog
November 12, 2015
TABLE OF CONTENTS
Recently I have been discussing how to setup and test a Highly Available RabbitMQ cluster inside Docker Containers. As a few readers figured out, the next post in that series will be integrating a distributed Spring XD environment with the RabbitMQ cluster all running inside Docker containers. Since Spring XD has been docker-ized https://hub.docker.com/u/springxd/ from 1.0, we can use Docker Compose to natively link RabbitMQ + Spring XD containers together for connectivity. I plan on discussing what Spring XD is and does well in the upcoming article, but for today it is beyond the scope of this post. If you want to learn more you can read about it here.
Today I want to take a step back and address a few questions about hosting a Docker RabbitMQ cluster (and Spring XD) running on the same host. Deployment to a single host is not ideal for true HA or where a production SLA is required. So the questions that came from this limitation are about supporting extensible and scalable deployments beyond the single host scenario. Here are some I want to discuss today:
Until Docker 1.9, there was no simple way to solve all these questions. Docker did not have native support for deploying containers across multiple hosts for container redundancy or placement strategies. Now that 1.9 is released we can re-evaluate these great questions.
Today’s post will cover how to setup a distributed Docker Swarm using the new production-ready Docker 1.9, Docker Swarm 1.0, and Docker Compose 1.5.1. Specifically, I want to share how to build a development Swarm (running on a single host), a distributed Swarm (3 hosts), and a production Swarm environment (9 hosts).
Let’s get started!
Users that are new to Docker Swarm should read over the documentation in case there is feature drift invalidating some of the following sections. Docker is being actively developed and that’s a good thing.
token://<your token string>
instead of consul://<consul uri>/<swarm name e.g. myswarm>
. The advantage is simplicity when you do not need to integrate with a service discovery tool to get going. The disadvantage is each Swarm Node in the Cluster needs to know the token after a new Swarm Cluster has been created. I found assigning the Cluster token to the EC2 display name as a tag to be a simple way to sync multiple Swarm Nodes, but it also meant the environment had to have an initial starting Node up and running that was responsible for creating the first Swarm Cluster before the other Swarm Nodes could be provisioned. Readers familiar with syncing the RabbitMQ Erlang cookie file to start a cluster will probably write the file to disk after checking the initial EC2 host’s display name on startup (or some other persistent location like S3). Additionally, consul can be started in bootstrap single server mode, for hosting a self-contained single host environment. (consul agent -server -bootstrap-expect 1 -data-dir /tmp/consul). For demonstration purposes I will just use Docker Hub.Since 1.9 is fairly new, figuring out how to install each component can seem a little overwhelming. Here is how to install each component for installation on Fedora or an AWS AMI.
Docker Daemon
curl -sSL -O https://get.docker.com/builds/Linux/x86_64/docker-1.9.0 chmod +x docker-1.9.0
mv docker-1.9.0 /usr/local/bin/docker
Docker Machine
curl -L https://github.com/docker/machine/releases/download/v0.5.0/docker-machine_linux-amd64.zip > machine.zip
unzip machine.zip
rm machine.zip
mv -f docker-machine* /usr/local/bin
Docker Swarm
export GOPATH="<go workspace folder, I used /opt/goworkspace/>"
go get github.com/docker/swarm
chmod 777 $GOPATH/bin/swarm
rm -f /usr/local/bin/swarm >> /dev/null
ln -s $GOPATH/bin/swarm /usr/local/bin/swarm
Docker Compose
curl -L https://github.com/docker/compose/releases/download/1.5.1/docker-compose-`uname -s`-`uname -m` > /usr/local/bin/docker-compose
chmod +x /usr/local/bin/docker-compose
Consul
wget https://releases.hashicorp.com/consul/0.5.2/consul_0.5.2_linux_amd64.zip -O /tmp/consul.zip
unzip consul.zip
cp ./consul /usr/local/bin
cp ./consul /usr/bin
rm -f /tmp/consul.zip
rm -f /tmp/consul
I created this GIST for installing all the components on a host at once:
Most of the documents and tutorials I found want to use docker-machine to provision a virtualbox host for the swarm. I do not want any unnecessary components or possible issues to debug on a production environment, so I started by provisioning a single AWS EC2 ami running in a t2.small and then running the installation script to get everything installed. When creating a single host development environment I found it beneficial to consider the following reference diagram:
Here is how to setup the environment in order:
Start the Docker Daemon running with:
nohup /usr/local/bin/docker daemon -H tcp://INTERNAL_IP_ADDRESS:2
Create the Swarm Cluster and store the Token in a file:
mkdir -p /opt/swarm
/usr/local/bin/swarm create > /opt/swarm/clustertoken
echo "Cluster Token: "
cat /opt/swarm/clustertoken
chmod 666 /opt/swarm/clustertoken
Start the Swarm Join by running:
nohup /usr/local/bin/swarm join --addr=INTERNAL_IP_ADDRESS:2375 token://$(cat /opt/swarm/clustertoken) &
Start the Swarm Manage by running:
bashnohup swarm manage -H INTERNAL_IP_ADDRESS:4000 token://$(cat /opt/swarm/clustertoken) &
Point the Docker command line interface at the Swarm Manager:
export DOCKER_HOST=INTERNAL_IP_ADDRESS:4000
View the Swarm info and ensure the host is registered in the Docker Swarm:
# docker info
Containers: 0
Images: 0
Role: primary
Primary: 10.0.0.137:4000
Strategy: spread
Filters: health, port, dependency, affinity, constraint
Nodes: 1
swarm1.internallevvel.com: 10.0.0.137:2375
└ Containers: 0
└ Reserved CPUs: 0 / 1
└ Reserved Memory: 0 B / 2.054 GiB
└ Labels: executiondriver=native-0.2, kernelversion=4.1.10-17.31.amzn1.x86_64, operatingsystem=Amazon Linux AMI 2015.09, storagedriver=devicemapper
swarm1.internallevvel.com: 10.0.0.137:2375
Start a container that is deployed on the Swarm
# docker run -itd --name=singletest busybox
db0af98e5b13bc24801047594f989437d33956bcf27aa70702e805ec4cc16a1b
#
View the container
docker ps
Now that the single host environment is working, we can break out each component for redundancy and preparing for a production environment. We will be removing the Swarm Cluster Token and using consul from here on. In this configuration, each Swarm Node is really a replica of each t2.small. Here is a reference diagram for the Swarm I am running:
I find looking at the running processes a good way to figure out connectivity and ordering on a new platform.
Here are the processes running on Swarm Node 1 (notice the increasing PIDs indicate the ordering for startup):
# ps x | grep consul
3510 ? Sl 1:18 /usr/local/bin/consul agent -server -config-dir=/etc/consul.d/server -data-dir=/opt/consul/data -bind=10.0.0.137
3583 ? Sl 2:28 /usr/local/bin/docker daemon -H 10.0.0.137:2375 --cluster-advertise 10.0.0.137:2375 --cluster-store consul://10.0.0.137:8500/swarmnodes --label=com.docker.network.driver.overlay.bind_interface=eth0
3672 ? Sl 0:01 /usr/local/bin/swarm join --addr=10.0.0.137:2375 consul://10.0.0.137:8500/swarmnodes
3690 ? Sl 0:03 /usr/local/bin/swarm manage -H tcp://10.0.0.137:4000 --replication --advertise 10.0.0.137:4000 consul://10.0.0.137:8500/swarmnodes
Here are the processes running on Swarm Node 2:
# ps x | grep consul
3506 ? Sl 2:38 /usr/local/bin/consul agent -server -config-dir=/etc/consul.d/server -data-dir=/opt/consul/data -bind=10.0.0.54
3566 ? Sl 2:29 /usr/local/bin/docker daemon -H 10.0.0.54:2375 --cluster-advertise 10.0.0.54:2375 --cluster-store consul://10.0.0.54:8500/swarmnodes --label=com.docker.network.driver.overlay.bind_interface=eth0
3585 ? Sl 0:00 /usr/local/bin/swarm join --addr=10.0.0.54:2375 consul://10.0.0.54:8500/swarmnodes
3652 ? Sl 0:02 /usr/local/bin/swarm manage -H tcp://10.0.0.54:4000 --replication --advertise 10.0.0.54:4000 consul://10.0.0.54:8500/swarmnodes
Here are the processes running on Swarm Node 3:
# ps x | grep consul
3507 ? Sl 1:14 /usr/local/bin/consul agent -server -config-dir=/etc/consul.d/server -data-dir=/opt/consul/data -bind=10.0.0.146
3568 ? Sl 2:31 /usr/local/bin/docker daemon -H 10.0.0.146:2375 --cluster-advertise 10.0.0.146:2375 --cluster-store consul://10.0.0.146:8500/swarmnodes --label=com.docker.network.driver.overlay.bind_interface=eth0
3589 ? Sl 0:00 /usr/local/bin/swarm join --addr=10.0.0.146:2375 consul://10.0.0.146:8500/swarmnodes
3653 ? Sl 0:03 /usr/local/bin/swarm manage -H tcp://10.0.0.146:4000 --replication --advertise 10.0.0.146:4000 consul://10.0.0.146:8500/swarmnodes
Here are the ordered commands to start the Swarm. Please run all of them on each Node separately.
Setup a consul configuration file on each host (make sure to create the /etc/consul.d/server and /opt/consul/data directories):
cat /etc/consul.d/server/config.json
{
"datacenter" : "<a name for your data center>",
"bootstrap" : false,
"bootstrap_expect" : 3,
"server" : true,
"data_dir" : "/opt/consul/data",
"log_level" : "INFO",
"enable_syslog" : false,
"start_join" : [],
"retry_join" : [],
"client_addr" : "0.0.0.0"
}
Start consul
nohup /usr/local/bin/consul agent -server -config-dir=/etc/consul.d/server -data-dir=/opt/consul/data -bind=INTERNAL_IP_ADDRESS &
Have consul join the consul cluster
/usr/local/bin/consul join swarm1.internallevvel.com swarm2.internallevvel.com swarm3.internallevvel.com
Start the Docker Daemon
nohup /usr/local/bin/docker daemon -H INTERNAL_IP_ADDRESS:2375 --cluster-advertise INTERNAL_IP_ADDRESS:2375 --cluster-store consul://INTERNAL_IP_ADDRESS:8500/<a name like devswarm> --label=com.docker.network.driver.overlay.bind_interface=eth0 &
Start the Swarm Join
nohup /usr/local/bin/swarm join --addr=INTERNAL_IP_ADDRESS:2375 consul://INTERNAL_IP_ADDRESS:8500/<a name like devswarm> &
Start the Swarm Manager
nohup /usr/local/bin/swarm manage -H tcp://INTERNAL_IP_ADDRESS:4000 --replication --advertise INTERNAL_IP_ADDRESS:4000 consul://INTERNAL_IP_ADDRESS:8500/<a name like devswarm> &
Point each host the Swarm Manager by setting the environment variable:
export DOCKER_HOST=INTERNAL_IP_ADDRESS:4000
Once all the Swarm Nodes are installed and setup, you can confirm the Swarm is ready with:
# docker info
Containers: 0
Images: 0
Role: primary
Strategy: spread
Filters: health, port, dependency, affinity, constraint
Nodes: 3
swarm1.internallevvel.com: 10.0.0.137:2375
└ Containers: 0
└ Reserved CPUs: 0 / 1
└ Reserved Memory: 0 B / 2.054 GiB
└ Labels: com.docker.network.driver.overlay.bind_interface=eth0, executiondriver=native-0.2, kernelversion=4.1.10-17.31.amzn1.x86_64, operatingsystem=Amazon Linux AMI 2015.09, storagedriver=devicemapper
swarm2.internallevvel.com: 10.0.0.54:2375
└ Containers: 0
└ Reserved CPUs: 0 / 1
└ Reserved Memory: 0 B / 2.054 GiB
└ Labels: com.docker.network.driver.overlay.bind_interface=eth0, executiondriver=native-0.2, kernelversion=4.1.10-17.31.amzn1.x86_64, operatingsystem=Amazon Linux AMI 2015.09, storagedriver=devicemapper
swarm3.internallevvel.com: 10.0.0.146:2375
└ Containers: 0
└ Reserved CPUs: 0 / 1
└ Reserved Memory: 0 B / 2.054 GiB
└ Labels: com.docker.network.driver.overlay.bind_interface=eth0, executiondriver=native-0.2, kernelversion=4.1.10-17.31.amzn1.x86_64, operatingsystem=Amazon Linux AMI 2015.09, storagedriver=devicemapper
CPUs: 3
Total Memory: 6.163 GiB
Name: swarm2.internallevvel.com
Deploy one app to each Swarm Node
# docker run -itd --name=AppDeployedToNode1 --env="constraint:node==swarm1.internallevvel.com" busybox
5e5d3e056aee3a5e621ed9775245392f31e2b908922ee6087706bafbd665df08
# docker run -itd --name=AppDeployedToNode2 --env="constraint:node==swarm2.internallevvel.com" busybox
e01065a81645a35c7c3d71e7796a6804a1092d1d038f6b3df8fa7c9f72567b01
# docker run -itd --name=AppDeployedToNode3 --env="constraint:node==swarm3.internallevvel.com" busybox
88bb54327a910f0fb6ce3a502f820746c48e8b712b34ec72a8ce34c09605ad75
Confirm the apps were deployed and running on the correct host
# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
88bb54327a91 busybox "sh" 29 seconds ago Up 28 seconds swarm3.internallevvel.com/AppDeployedToNode3
e01065a81645 busybox "sh" 36 seconds ago Up 35 seconds swarm2.internallevvel.com/AppDeployedToNode2
5e5d3e056aee busybox "sh" 43 seconds ago Up 43 seconds swarm1.internallevvel.com/AppDeployedToNode1
Inspect the Swarm and confirm there is a container on each Node
# docker info
Containers: 3
Images: 3
Role: primary
Primary: 10.0.0.54:4000
Strategy: spread
Filters: health, port, dependency, affinity, constraint
Nodes: 3
swarm1.internallevvel.com: 10.0.0.137:2375
└ Containers: 1
└ Reserved CPUs: 0 / 1
└ Reserved Memory: 0 B / 2.054 GiB
└ Labels: com.docker.network.driver.overlay.bind_interface=eth0, executiondriver=native-0.2, kernelversion=4.1.10-17.31.amzn1.x86_64, operatingsystem=Amazon Linux AMI 2015.09, storagedriver=devicemapper
swarm2.internallevvel.com: 10.0.0.54:2375
└ Containers: 1
└ Reserved CPUs: 0 / 1
└ Reserved Memory: 0 B / 2.054 GiB
└ Labels: com.docker.network.driver.overlay.bind_interface=eth0, executiondriver=native-0.2, kernelversion=4.1.10-17.31.amzn1.x86_64, operatingsystem=Amazon Linux AMI 2015.09, storagedriver=devicemapper
swarm3.internallevvel.com: 10.0.0.146:2375
└ Containers: 1
└ Reserved CPUs: 0 / 1
└ Reserved Memory: 0 B / 2.054 GiB
└ Labels: com.docker.network.driver.overlay.bind_interface=eth0, executiondriver=native-0.2, kernelversion=4.1.10-17.31.amzn1.x86_64, operatingsystem=Amazon Linux AMI 2015.09, storagedriver=devicemapper
CPUs: 3
Total Memory: 6.163 GiB
Name: swarm2.internallevvel.com
AT THIS POINT THE SWARM IS ABLE TO DEPLOY CONTAINERS ACROSS THE SWARM NODES. NOW WE CAN CONFIRM THE OVERLAY NETWORK CAN LINK CONTAINERS ACROSS MULTIPLE HOSTS.
Create a docker-compose.yml file for deploying the HA RabbitMQ cluster containers from the previous post (I will be posting a public version on docker hub soon). Here are the contents from mine:
$ cat cluster/docker-compose.yml
rabbit1:
image: jayjohnson/rabbitclusternode
hostname: cluster_rabbit1_1
cap_add:
- ALL
- NET_ADMIN
- SYS_ADMIN
ports:
- "1883:1883"
- "5672:5672"
- "8883:8883"
- "15672:15672"
rabbit2:
image: jayjohnson/rabbitclusternode
hostname: cluster_rabbit2_1
cap_add:
- ALL
- NET_ADMIN
- SYS_ADMIN
environment:
- CLUSTERED=true
- CLUSTER_WITH=cluster_rabbit1_1
- RAM_NODE=true
ports:
- "1884:1883"
- "5673:5672"
- "8884:8883"
- "15673:15672"
rabbit3:
image: jayjohnson/rabbitclusternode
hostname: cluster_rabbit3_1
cap_add:
- ALL
- NET_ADMIN
- SYS_ADMIN
environment:
- CLUSTERED=true
- CLUSTER_WITH=cluster_rabbit1_1
ports:
- "1885:1883"
- "5674:5672"
- "8885:8883"
- "15674:15672"
Now use Docker Compose to deploy the containers as a RabbitMQ cluster according to the docker-compose.yml configuration. Make sure to run this in the same directory as the yml file and specify the new overlay networking.
docker-compose --x-networking --x-network-driver overlay up -d
Confirm the RabbitMQ cluster containers are running and distributed across the Swarm Nodes
# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
377cfd780f9d jayjohnson/rabbitclusternode "/bin/sh -c /opt/rabb" 8 seconds ago Up 6 seconds 10.0.0.146:1883->1883/tcp, 10.0.0.146:5672->5672/tcp, 4369/tcp, 10.0.0.146:8883->8883/tcp, 9100-9105/tcp, 10.0.0.146:15672->15672/tcp, 25672/tcp swarm3.internallevvel.com/cluster_rabbit1_1
45b2111b35de jayjohnson/rabbitclusternode "/bin/sh -c /opt/rabb" 8 seconds ago Up 7 seconds 4369/tcp, 9100-9105/tcp, 25672/tcp, 10.0.0.54:1884->1883/tcp, 10.0.0.54:5673->5672/tcp, 10.0.0.54:8884->8883/tcp, 10.0.0.54:15673->15672/tcp swarm2.internallevvel.com/cluster_rabbit2_1
52f8fbad2f98 jayjohnson/rabbitclusternode "/bin/sh -c /opt/rabb" 9 seconds ago Up 7 seconds 4369/tcp, 9100-9105/tcp, 25672/tcp, 10.0.0.137:1885->1883/tcp, 10.0.0.137:5674->5672/tcp, 10.0.0.137:8885->8883/tcp, 10.0.0.137:15674->15672/tcp swarm1.internallevvel.com/cluster_rabbit3_1
88bb54327a91 busybox "sh" 14 minutes ago Up 14 minutes swarm3.internallevvel.com/AppDeployedToNode3
e01065a81645 busybox "sh" 14 minutes ago Up 14 minutes swarm2.internallevvel.com/AppDeployedToNode2
5e5d3e056aee busybox "sh" 14 minutes ago Up 14 minutes swarm1.internallevvel.com/AppDeployedToNode1
Try connecting to one of the RabbitMQ brokers
[root@swarm3 ~]# telnet 0.0.0.0 5672
Trying 0.0.0.0...
Connected to 0.0.0.0.
Escape character is '^]'.
AMQP Connection closed by foreign host.
[root@swarm3 ~]#
Login to one of the containers
[root@swarm3 ~]# docker exec -t -i cluster_rabbit1_1 /bin/bash
[root@cluster_rabbit1_1 /]#
Confirm the overlay network set the /etc/hosts for connectivity between the RabbitMQ containers running on different Swarm Nodes
```bash [root@cluster_rabbit1_1 /]# cat /etc/hosts
0.0.4 cluster_rabbit1_1
0.0.1 localhost ::1 localhost ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters
0.0.2 cluster_rabbit3_1
0.0.2 cluster_rabbit3_1.cluster
0.0.3 cluster_rabbit2_1
0.0.3 cluster_rabbit2_1.cluster [root@cluster_rabbit1_1 /]#
Check the RabbitMQ cluster status
[root@cluster_rabbit1_1 /]# rabbitmqctl cluster_status
Cluster status of node rabbit@cluster_rabbit1_1 ...
[{nodes,[{disc,[rabbit@cluster_rabbit1_1,rabbit@cluster_rabbit3_1]},
{ram,[rabbit@cluster_rabbit2_1]}]},
{running_nodes,[rabbit@cluster_rabbit2_1,rabbit@cluster_rabbit3_1,
rabbit@cluster_rabbit1_1]},
{cluster_name,<<"rabbit@cluster_rabbit1_1">>},
{partitions,[]}]
[root@cluster_rabbit1_1 /]#
Logout and stop the RabbitMQ cluster with Docker Compose
[root@cluster_rabbit1_1 /]# exit
[root@swarm3 ~]#
[root@swarm3 ~]# docker-compose --x-networking stop
Stopping cluster_rabbit1_1 ... done
Stopping cluster_rabbit2_1 ... done
Stopping cluster_rabbit3_1 ... done
[root@swarm3 ~]#
At this point we have tested deployment using Docker Compose for demonstrating automatic placement of containers across the Swarm Nodes, but what if you want to ensure each Swarm Node gets the appropriate container every time.
If you do not want to deploy across the Swarm using Docker Compose you can create a custom overlay network and then manually deploy containers with the docker run command specifying to use that overlay network. Here is how I am deploying another RabbitMQ cluster across the Swarm:
This script creates the ‘testoverlay’ network and then deploys the same RabbitMQ container from docker hub across the Swarm Nodes one at a time. So far, I have not been able to build a Docker Compose configuration file that was able to handle exact placement configuration so I had to create this script for making the deployment consistent every time.
Every production environment is going to have nuances that need to be handled carefully. This reference architecture diagram is an example deployment topology for running a production Swarm environment. Not everyone is going to fit into the same shoe, and we want to hear your feedback on what would not work in your environment. With that consideration, here’s a starting point for building out a Production Docker Swarm Environment:
From the diagram you can see all we have to do is change where the Swarm Managers and Service Discovery Managers run. This allows for a consistent Swarm Node build that is only going to host Docker Containers running your applications. Developers or the DevOps driving production publishes will only interface with the Swarm Managers, and for security it makes sense to lock down access to the Container Nodes to only the ports necessary for hosting the applications and the management ports for the Swarm.
In the future, I plan on releasing the repository with the provisioning scripts, installers, setup scripts, and tooling for deploying to a targeted AWS VPC instance within a few minutes. Even with the lack of documentation for troubleshooting, Docker Swarm is still significantly easier to setup than some of the other enterprise on-premise PaaS offerings I have installed and ran on production before, and it has a great community of developers supporting it.
With the ability to run a Docker Swarm on your own computer, in your own data center, or on a major cloud provider like AWS, the toolset for managing the container lifecycle from development to production is getting easier and easier with each release. Developers can run their own environment, they can share environments using the overlay network, QA could spin up Swarm environments for testing and shut them down when they are done, and IT now has the ability to cut down on costs by dynamically adjusting the running Swarm Nodes based off the application traffic demand. All in all, I think this release is a huge success for Docker. I will be opening up some PRs in the hopes of improving the documentation and add some debugging tips soon.
Lastly, here are some of my final considerations after running Swarm on AWS:
Let’s recap what we have done in this post. We have:
Well that is all for now! Thanks for reading and I hope you found this post valuable. There is a lot to talk about in this new Docker Swarm release, and we are excited to hear your feedback on running Docker Swarm. If your organization would like assistance determining your Docker and Docker Swarm strategy, please reach out to us at Levvel and we can get you started.
Until next time,
- Jay
Authored By
Jay Johnson
Meet our Experts
Let's chat.
You're doing big things, and big things come with big challenges. We're here to help.