Foreword
This explanation of clustering Rabbit-MQ assumes that you’ve had some experience with Rabbit-MQ. At least to the point of being able to get Rabbit-MQ up and running and processing messages. For this explanation I will be using CentOS linux, other linux distributions may or may not require slight modifications to the setup process. You will need at least 2 machines or virtual instances up and running and install Rabbit-MQ on both.
Overview
Clustering Rabbit-MQ is actually very simple once you understand what’s going on and how it actually works. There is no need for a load balancer or any other hardware/software component and the idea is simple. Send all messages to the master queue and let the master distribute the messages down to the slaves.
Create Short Names
First, we need to change the host name and host entries of our machines to something short. Rabbit-MQ has trouble clustering queues will fully qualified DNS names. We’ll need a single short word host and route. For now, let’s use the names “master” for the master head, then “slave1”, “slave2” … “slaveN” respectively for the rest.
Set the master host name to “master”:
echo “master” > /proc/sys/kernel/hostname
Next we need to set the entries in the /etc/hosts
file to allow the short names to be aliased to machine or instance IPs. Open the /etc/hosts
file in your favorite editor and add the following lines:
cat /etc/hosts 127.0.0.1
localhost localhost.localdomain 192.168.0.100
master master.localdomain 192.168.0.101
slave1 slave1.localdomain 192.168.0.102
slave2 slave2.localdomain
Please note: Your particular /etc/hosts
file will look different that the above. You’ll need to substitute your actual ip and domain suffix for each entry.
Make sure each slave you plan to add has an entry in the /etc/hosts
file of the master. To verify your settings for each of the entries you provide, try pinging them by their short name.
ping master
PING master (192.168.0.100) 56(84) bytes of data.
64 bytes from master (192.168.0.100): icmp_seq=1 ttl=61 time=0.499 ms
64 bytes from master (192.168.0.100): icmp_seq=2 ttl=61 time=0.620 ms
64 bytes from master (192.168.0.100): icmp_seq=3 ttl=61 time=0.590 ms
64 bytes from master (192.168.0.100): icmp_seq=4 ttl=61 time=0.494 ms
If you get something like the above, you’re good to go. If not, take a good look at your settings and adjust them until you do.
Once your short names are setup in the master /etc/hosts
file, copy the /etc/hosts
file to every slave so that all machines have the same hosts file entries, or to be more specific, that each machine has the master and slave routes. If you’re familiar with routing, feel free to just add the missing routes.
Then for each slave update the host name.
echo “slave1” > /proc/sys/kernel/hostname
echo “slave2” > /proc/sys/kernel/hostname
echo “slave3” > /proc/sys/kernel/hostname
Synchronize ERLang Cookie
Next we need to synchronize our ERlang cookie. Rabbit-MQ needs this to be the same on all machines for them to communicate properly. The file we need is located on the master at /var/lib/rabbitmq/.erlang.cookie
, we’ll cat this value then update all the cookies on the slave.
cat /var/lib/rabbitmq/.erlang.cookie DQRRLCTUGOBCRFNPIABC
Copy the value displayed by the cat.
Please notice that the file itself is storing the value without a carriage return nor a line feed. This value needs to go into the slaves the same way. Do so be executing the following command on each slave. Make sure you use the “-n” flag.
First let’s make sure we stop the rabbitmq-server on the slaves before updating the ERlang cookie. Do this on each slave server.
service rabbitmq-server stop
service rabbitmq-server stop
service rabbitmq-server stop
Next let’s update the cookie and start the service back up.
echo -n “DQRRLCTUGOBCRFNPIABC” > /var/lib/rabbitmq/.erlang.cookie service rabbitmq-server start
echo -n “DQRRLCTUGOBCRFNPIABC” > /var/lib/rabbitmq/.erlang.cookie service rabbitmq-server start
echo -n “DQRRLCTUGOBCRFNPIABC” > /var/lib/rabbitmq/.erlang.cookie service rabbitmq-server start
Once again substitute the “DQRRLCTUGOBCRFNPIABC” value with your actual ERlang cookie value.
Create The Cluster
Now we cluster the queues together. Starting with the master, issue the following commands:
rabbitmqctl stop_app
rabbitmqctl reset
rabbitmqctl start_app
Next we cluster the slaves to the master. For each slave execute the following commands:
rabbitmqctl stop_app
rabbitmqctl reset
rabbitmqctl cluster rabbit@master
rabbitmqctl start_app
These commands actually do the clustering of the slaves to the master. To verify that everything is in working order issue the following command on any master or slave instance:
rabbitmqctl status
Status of node rabbit@master …
[{running_applications,[{rabbit,”RabbitMQ”,”1.7.2″}, {mnesia,”MNESIA CXC 138 12″,”4.4.3″}, {os_mon,”CPO CXC 138 46″,”2.1.6″}, {sasl,”SASL CXC 138 11″,”2.1.5.3″}, {stdlib,”ERTS CXC 138 10″,”1.15.3″}, {kernel,”ERTS CXC 138 10″,”2.12.3″}]}, {nodes,[rabbit@slave1,rabbit@slave2,rabbit@slave3,rabbit@master]}, {running_nodes,[rabbit@slave1,rabbit@slave2,rabbit@slave3,rabbit@master]}] …done.
Notice the lines containing nodes and running_nodes. They should list out all of the mater and slave entries. If they do not, you may have done something wrong. Go back and try executing all the steps again. Otherwise, you’re good to go. Start sending messages to the master and watch as they are distributed to each of the slave nodes.
You can always dynamically add more slave nodes. To do this, updated the /etc/hosts file of all the machines with the new entry. Copy the master ERlang cookie value to the new slave. Execute the commands to cluster the slave to the master and verify.
Troubleshooting
If you accidentally update the cookie value before you’ve stopped the service, you could get strange errors the next time you start the rabbitmq-server. If this happens just issue the following command:
rm /var/lib/rabbitmq/mnesia/rabbit/* -f
This removes the mnesia storage and allows you to restart the rabbitmq-server without errors.
0 Comments