Howto: Highly available Zimbra cluster using Heartbeat and DRBD

zimbra_logoThis morning I successfully set up a clustered, high availability pair of Zimbra (VMware virtual) servers, synced with DRBD and using Heartbeat to failover to the secondary standby server.

This is a howto that tries to cover *all* the steps, as there seems to be a great series of Howto’s on the subject that in one way or another, leave something out with ‘I am assuming you already (insert service here) working and will not cover this’ clauses. In particular I ran into a few small hurdles with DRBD and hostnames and whatnot, so tried to document what I needed to do to make it work.

[qrcode size=”150″ link=”true”]http://www.philiplawlor.com/howto-highly-a…tbeat-and-drbd | Click link to enroll. Normal Data rates may apply.[/qrcode]

In this set up, I install Debian Etch 4.0 on a VMware VM using the netinstall iso zimbra_iconimage, and along the line (after installing Zimbra itself) I clone the machine by copying its vmdk disk image to save time and avoid having to duplicate too many steps.

In this howto, the one zimbra ‘domain’ that both servers believe themselves to be is ‘zimbra.yourdomain.com’. You’ll notice a bit of hostname fiddling from time to time: this is required to keep Zimbra happy at install time, and also later during DRBD configuration things change again. In the end, the two VMs are ‘zimbra-1’ and ‘zimbra-2’ with respective IPs of 192.168.1.11 and 192.168.1.12. The ‘virtual’ IP of ‘zimbra.yourdomain.com’ is 192.168.1.10. Heartbeat configures whichever server is to take over the running of Zimbra with this virtual IP as a virtual ethernet interface.
Please replace zimbra.yourdomain.com, zimbra-1, zimbra-2 and the IP addresses to whatever suits your environment.

Howto

1) First steps – DNS
I edited the DNS server authoritative for the domain ‘yourdomain.com’ (in my case, an internal DNS server on the same LAN) to add these entries:

zimbra IN A 192.168.1.10zimbra MX 10 zimbra

zimbra-1 IN A 192.168.1.11

zimbra-1 MX 10 zimbra-1

zimbra-2 IN A 192.168.1.12

zimbra-2 MX 10 zimbra-2

as well as the reverse PTR entries.

2) Debian Install – Manual Partitioning
I did a standard netinstall of Debian Etch on the zimbra-1 VM, but manually set up the partitioning as follows. Note the low specs of these machines, it was only a test after all and not a production server 🙂

/boot   /dev/sda1     100MB     (primary) (bootable flag on)
/       /dev/sda5     3GB       (logical) (ext3)
swap    /dev/sda6     512MB     (logical)
(unmounted) /dev/sda7 150MB     (logical) (ext3) # this'll be the DRBD meta-disk
(unmounted) /dev/sda8 7GB       (logical) (ext3) # this'll be the /opt partition used by DRBD

Note that sda7 and sda8 are not mounted. Debian will try to warn you about this, but just ignore the warnings and continue with the installation. We will let Heartbeat mount these devices through /dev/drbd0 when needed.

3) Remove exim4
If you installed Debian with a network mirror and ‘Standard System’ checked in tasksel, Debian will install exim4 which we don’t want since Zimbra will be using its Postfix installation.

apt-get remove --purge exim4 exim4-base exim4-config exim4-daemon-light

4) Install extra packages
These packages are required to install Zimbra. We also throw in DRBD for use later on.

apt-get install ntp ntpdate libc6-i686 sudo libidn11 curl fetchmail libgmp3c2 libexpat1 libgetopt-mixed-perl libxml2 libstdc++6 libpcre3 libltdl3 ssh drbd0.7-module-source drbd0.7-utils linux-headers-`uname -r`

5) Edit (fudge) the hostname to keep Zimbra happy
To install Zimbra successfully, we must trick the server into thinking it is the ‘real’ domain zimbra.yourdomain.com where in fact it is zimbra-1.

echo zimbra.yourdomain.com > /etc/hostname

6) Reboot the server

reboot

7) Mount /opt
We will now temporarily mount /dev/sda8 as /opt so that we can do a Zimbra installation.

mount -t ext3 /dev/sda8 /opt

8) Download, extract and install Zimbra Collaboration Suite (Open Source edition)
At the time of writing, ZCS was version 5.09 and we are downloading the Open Source Edition Debian pack.

cd /tmp/
wget "http://h.yimg.com/lo/downloads/5.0.9_GA/zcs-5.0.9_GA_2533.DEBIAN4.0.20080815215219.tgz"
tar zxfv zcs-5.0.9_GA_2533.DEBIAN4.0.20080815215219.tgz
cd zcs-5.0.9_GA_2533.DEBIAN4.0.20080815215219
./install.sh -l

This install should go ok if your hostname is set to zimbra.yourdomain.com. Zimbra will alert you to a DNS MX record error, because the MX record for zimbra.yourdomain.com points to the virtual IP (192.168.1.10) and not zimbra-1’s IP (192.168.1.11) . That’s ok, we want it like that, so ignore the error and say ‘No’ to ‘Change domain’ or whatever the question is.

9) Remove Zimbra startup scripts
We want to remove the Zimbra startup scripts because Heartbeat will be handling the starting of Zimbra when it needs to.
This command will probably work:

update-rc.d -f zimbra remove

But I did it the long, and probably non-Debian way, because I was not thinking straight:

rm /etc/rc2.d/S99zimbra
rm /etc/rc3.d/S99zimbra
rm /etc/rc4.d/S99zimbra
rm /etc/rc5.d/S99zimbra

10) Change hostname back for DRBD, modify /etc/hosts
Now that we have Zimbra installed, we need to change the hostname again to make DRBD work. Note that you can’t just edit /etc/hosts and fudge the local hostname because DRBD is smarter and will report a mismatch if /etc/hostname and /etc/hosts don’t agree.

echo zimbra-1 > /etc/hostname

Nonetheless we will now edit /etc/hosts and tell zimbra-1 that it is also zimbra.yourdomain.com, and also that there is a zimbra-2 at 192.168.1.12 (although there isn’t just yet). Your /etc/hosts on zimbra-1 should now look like this:

127.0.0.1       zimbra.yourdomain.com localhost.localdomain localhost
192.168.1.11    zimbra-1 zimbra.yourdomain.com
192.168.1.12    zimbra-2

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

11) Shutdown and clone zimbra-1 to make a zimbra-2
At this point I cloned zimbra-1’s vmdk image and created a zimbra-2. Edit per comment: glossed over this, but it seems pretty obvious, being a clone, it will have the same IP etc as zimbra-1. To make zimbra-2 the equivalent of zimbra-1, set its IP to be 192.168.1.12 instead of zimbra-1’s 192.168.1.11 (you may have issues bringing up the eth interface entirely until that point, since it was a virtual machine, edit /etc/networking/interfaces from within the VMware Console) and change the hostname:

echo zimbra-2 > /etc/hostname

And edit the hosts file to look like this:

127.0.0.1       zimbra.yourdomain.com localhost.localdomain localhost
192.168.1.12    zimbra-2 zimbra.yourdomain.com
192.168.1.11    zimbra-1

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

12) Reboot both servers, install DRBD and configure
On both zimbra-1 and zimbra-2:

cd /usr/src/
tar xvfz drbd0.7.tar.gz
cd modules/drbd/drbd
make
make install
mv /etc/drbd.conf /etc/drbd.conf.orig

Make a new /etc/drbd.conf that looks like this:

resource r0 {
 protocol C;
 incon-degr-cmd "halt -f";
 startup {
  degr-wfc-timeout 120; # 2 minutes
 }
 disk {
  on-io-error detach;
 }
 net {
 }
 syncer {
  rate 10M;
  group 1;
  al-extents 257;
 }
 on zimbra-1 {
  device /dev/drbd0;
  disk /dev/sda8;
  address 192.168.1.11:7788;
  meta-disk /dev/sda7[0];
 }
 on zimbra-2 {
  device /dev/drbd0;
  disk /dev/sda8;
  address 192.168.1.12:7788;
  meta-disk /dev/sda7[0];
 }
}

13) Get the first DRBD sync going
On zimbra-1 and zimbra-2

modprobe drbd
drbdadm up all

If you get a heap of complaint about mispelt or mismatching hostnames, check that you changed the hostnames on each server to their respective zimbra-1/zimbra-2 hostname as per above, and that you did a reboot of each.

Otherwise with no errors:
On zimbra-1:

drbdadm -- --do-what-I-say primary all
drbdadm -- connect all

When I ran the ‘connect all’ second command, I got some odd error about a DRBD child process that couldn’t terminate. It was odd, because I didn’t get that when I set up HA NFS using DRBD and Heartbeat the previous day on other servers! Nonetheless, I ran:

cat /proc/drbd

And I could see that the syncing was taking place between the two servers nonetheless. It looked something like this (stole this output from an NFS howto but it looks like this)

version: 0.7.20 (api:77/proto:74)
SVN Revision: 1743 build by phil@mescal, 2005-01-31 12:22:07
 0: cs:SyncSource st:Primary/Secondary ld:Consistent
    ns:13441632 nr:0 dw:0 dr:13467108 al:0 bm:2369 lo:0 pe:23 ua:226 ap:0
        [==>..............] sync'ed: 3.1% (7000/7168)M
        finish: 1:14:16 speed: 2,644 (2,204) K/sec
 1: cs:Unconfigured

Let this process run before doing anything else. What is happening is that DRBD is syncing both servers data on /etc/sda8. On my 7 GB partitions, this took about 1 hour (slow VMs, could be faster or slower on yours). Just keep running `cat /proc/drbd` until you see that the progress is complete.

We’re almost there!!

14) Install and configure Heartbeat
On zimbra-1 and zimbra-2:

apt-get install heartbeat

You’ll see some sort of error after the package is installed. Heartbeat doesn’t install a ha.cf, haresources or authkeys file by default, you need to create these first before heartbeat will run.
On zimbra-1 and zimbra-2, create these three files:

/etc/heartbeat/ha.cf

logfacility local0
keepalive 2
deadtime 20 # timeout before the other server takes over
bcast eth0
node zimbra-1 zimbra-2 # our two zimbra VMs
auto_failback on # very important or auto failover won't happen

/etc/heartbeat/haresources

zimbra-1 IPaddr::192.168.1.10/24/eth0 drbddisk::r0 Filesystem::/dev/drbd0::/opt::ext3 zimbra

Note that the above defines the primary node zimbra-1: do not change this to zimbra-2 when you make the file on zimbra-2. The last word ‘zimbra’ is not a typo for one of the servers: this tells Heartbeat what service to start when it does its magic.

Finally, create /etc/heartbeat/authkeys on both servers
This file needs an md5 string, which each heartbeat daemon uses to authenticate with the other. I ran a quick php ‘echo md5(“my password”); to get an md5 string.

auth 3
3 md5 yourrandommd5string

Protect the permissions of authkeys file on both servers:

chmod 600 /etc/heartbeat/authkeys

15) Reboot!
At this point Zimbra should fire up on zimbra-1 as normal. Do a ‘df -h’ on zimbra-1 and you’ll see the /dev/drbd0 device has mounted /opt and if you run ifconfig, you’ll see the eth0:0 entry that contains the virtual IP 192.168.1.10. You should be able to visit http://zimbra.yourdomain.com or http://192.168.1.10 and see a working Zimbra system that is running off of zimbra-1.

16) Test the failover
Shutdown zimbra-1. If you tail -f /var/log/messages on zimbra-1 as it shuts down, you should see it release drbd and heartbeat, and running tail -f /var/log/messages on zimbra-2 will show it pick up the virtual IP, mount /dev/drbd0 and kick off the Zimbra startup scripts.

When the startup scripts have finished, visit http://zimbra.yourdomain.com just like you did before and everything should appear to still be running, except now we’re running off zimbra-2!

Fire up zimbra-1 again and it will take back the control from zimbra-2.

Congratulations, you have automatic failover and high availability of your Zimbra service!

Feel free to leave comments, feedbacks, or corrections in the event that I’ve done something wrong.. but this worked for me no problems. I hope it works for you.

  1 comment for “Howto: Highly available Zimbra cluster using Heartbeat and DRBD

Leave a Reply