Create a Linux Failover Router

ha_logoNow that the Web server is running and the database is being backed up it is time to create a failover webserver. I had grandiose dreams back in 2006 when my main web server took a dive and I needed to buy a new server. Why not get 2 identical servers and run them in parallel, I don’t have that kind of money that’s why! So along came this tiger catalog (I wont say their full name…) Look I can get 2 computers for the price of one Dell!!! BIG MISTAKE!!! One system went up w/o a problem, but the other has a new power supply, memory, MB and CPU and still it sits in my closet, I believe the MB is giving another error code, I will never buy an ABIT MB again.

After investigating the how to create failover systems, I came upon 2 articles below that are good foundations for the principle. Linux-HA is a great website that reads like a dictionary, not a how to site. So I am looking for a few articles that will guide me through the process. Here are a few:



Apache failover with heartbeat and mon

Welcome to this mini howto on how to install a two node Apache failover using heartbeat and mon.
This is probably the base setup you want to start with.
I’ll cover a base Linux installation, but it’s mostly the same on other UNIX distributions.
I’ve left out areas such as NFS mounted web roots and IP based heartbeat, for simplicity.

Contents:


Introduction

Failover means you have a at least two nodes, a server (master) and a hot standby server (slave) which takes over the resources when the master fails.

Only one node is active at a time. The active machine will setup a virtual ip address, which the clients connect to. They won’t know which server they actually talk to.

In this example, a Apache server is being served by the active node..

Heartbeat provides the backbone for making sure that the nodes know who’s online. Only one can be online at the same time. If you want load balancing, read the resource section.

Heartbeat will monitor that the two machines can talk to each other via serial cable and/or network cables.

To achieve resource monitoring, ie Apache runs and answers on requests, you need another package: Mon.

A common misunderstanding is that heartbeat will do the resource monitoring for you.
My experience is that the software or network connections are more likely to crash than the machine itself.


Goal

During normal operation, the master (active) owns the resources and the IP address

In a failover scenario, the slave will take over master’s resources:
starting the same processes (apache in our case) and configure the network card to use the same virtual IP address (172.17.10.30)

Heartbeat runs on both machines, but Apache and Mon is only running on the active one. If mon decides a service is down, it will shut down heartbeat gracefully on that node.

In essence, if any of the following happens, you want the slave to take over the resources:

  • The master goes down (either physically crashes or no heartbeats are sent via serial or the NIC)
  • Mon on the running machine decides a service is down

But what if mon can’t contact anything, ip network is completely down, or some server room operator yanked a cable? That’s when heartbeat itself will sense that a member is down and will initiate the take over.


Setup

Hardware

You need two machines with at least one free serial port.
Get a null modem cable and hook it up between the machines.

One the master:

# cat < /dev/ttyS0

On the slave:

# echo "hello heartbeat" > /dev/ttyS0

You should see messages getting echoed on the master.

Configure the first machine (master) with 172.17.10.10, the second one (slave) with 172.17.10.20. Also make sure both machines know where the other part is, by editing /etc/hosts.

Excerpt from /etc/hosts
172.17.10.10 master
172.17.10.20 slave

If you have two network cards on the machines, hook up a cross over network cable to provide non-serial heartbeat too. If you don’t have a cross over, you can use regular cables, but go via a hub/switch. I’ll leave that out for simplicity.

Heartbeat

An easy getting started guide for Heartbeat can be read here. I explain the most basic setup below:
Install heartbeat on both the master and slave.

# rpm -i heartbeat-0.4.9.2-1.i386.rpm

[Don’t try 1.0.1 yet, because it requires stontih, requires pils, requires udp-snmp, requires libcrypto…. ]

Create a file /etc/ha/ha.cf (/var/log/ha-log) both on the master and slave

/etc/ha/ha.cf
logfile /var/log/ha-log
keepalive 2
deadtime 10
serial  /dev/ttyS0
node master
node slave

Create a file /etc/ha/haresources that looks exactly the same on both master and slave.
This is where you define the virtual ip. When the slave sees this it knows its not supposed to take over the resources until the master is down..

/etc/ha/haresources
master 172.17.10.30 apache mon

Create a /etc/ha/authkeys (same on both nodes):

/etc/ha/authkeys
auth 1
1 crc

Mon

Obtain the mon packages here. Do all these steps on both master and slave.
Install them under /etc/ha.d/mon

# cd /etc/ha.d
# tar xzvf mon-0.99.2.tar.gz
# mv mon-0.99.2.tar.gz mon

Mon requires some external Perl modules. Use your nearest CPAN mirror, but I’ve provided with links so that you know where to start looking.

Unpack them and follow install instructions, which is usually

# perl Makefile.pl
# make
# make test
# make install

Create a /etc/ha.d/mon/mon.cf.
This configuration will try to get the page /test.html from the webserver every 30s. It also tries to ping the router (to see if the node has “connectivity”). Replace the 172.17.10.254 with your router address, or comment out the whole watch routers if your don’t want that.
Replace operator@yourdomain.com with your email address.

/etc/ha.d/mon/mon.cf
cfbasedir   = /etc/ha.d/mon/etc
alertdir    = /etc/ha.d/mon/alert.d
mondir      = /etc/ha.d/mon/mon.d
statedir        = /etc/ha.d/mon/state.d
logdir = /var/log/
maxprocs    = 20
histlength = 100
randstart = 10s

authtype = getpwnam

hostgroup web-fe 172.17.10.30
hostgroup routers 172.17.10.254

watch web-fe
    service http
        interval 30s
        monitor http.monitor -p 80 -u /test.html
        allow_empty_group
        period wd {Mon-Sun}
            alert bring-ha-down.alert -S "web server node member down" \
		operator@yourdomain.com
            upalert mail.alert -S "web server is back up" \
		operator@yourdomain.com
            alertevery 600s
                alertafter 2
watch routers
    service ping
        interval 10s
        monitor ping.monitor
        allow_empty_group
        period wd {Mon-Sun}
            alert bring-ha-down.alert -S "node member NIC down" \
		operator@yourdomain.com
            upalert mail.alert -S "web server is back up" \
		operator@yourdomain.com
            alertevery 10s

As you see in the alert section above, I’ve defined an alert script, bring-ha-down.alert.
Create such a script here: /etc/ha.d/mon/alert.d/bring-ha-down.alert.
It will call the mail alert and then bring down heartbeat on that node. Make it executable.

/etc/ha.d/mon/alert.d/bring-ha-down.alert
/etc/ha.d/mon/alert.d/mail.alert $*
/etc/rc.d/heartbeat stop

At this point, I usually copy the whole Copy the whole /etc/ha.d/mon directory to the slave.

Apache

Edit your httpd.conf (normally /usr/local/apache/conf/httpd.conf or /etc/httpd/httpd.conf).
Create a virtual host like this: (replace with your settings)

excerpt from httpd.conf
NameVirtualHost 172.17.10.30
<VirtualHost 172.17.10.30>
    ServerAdmin operator@yourdomain.com
    DocumentRoot /usr/local/apache2/htdocs/
    ServerName yourserver.yourdomain.com
    ErrorLog logs/test-errors_log
    CustomLog logs/test-access_log common
</VirtualHost>

When this node is active, Apache will listen on 172.17.10.30

test.html

Create a test.html in the httproot, with text MASTER on node1 and SLAVE on node2.
Put it in your webroot (/usr/local/apache/htdocs or similar).

This way you can easily test which active node is up.

Start scripts

Create /etc/ha.d/mon/mon-start.

/etc/ha.d/mon/mon-start

#!/bin/bash
MON_HOME=/etc/ha.d/mon

case "$1" in
    start)
        if [ -f $MON_HOME/mon.pid ]; then
                echo "mon already started"
                exit
        fi
        echo "Starting Mon"
    $MON_HOME/mon -c $MON_HOME/mon.cf -L $MON_HOME -P $MON_HOME/mon.pid &
        ;;
    stop)
    if [ -f $MON_HOME/mon.pid ]; then
    echo "Stopping Mon"
        kill -9 `cat $MON_HOME/mon.pid`
        rm  -f $MON_HOME/mon.pid
    else
        echo "no server pid, server doesn't seem to run"
    fi

    ;;
    status)
        echo "doing good"
        ;;
    *)
    echo "Usage: $0 {start|stop|status|reload|restart}"
    exit 1
esac
exit 0

Upon startup, Heartbeat will look in /etc/rc.d/init.d or /etc/ha.d/resource.d for it’s resource start scripts.
Create links in /etc/ha.d/resource.d

# cd /etc/ha.d/resource.d
# ln -s ../mon/mon-start mon
# ln -s /etc/rc.d/apache apache

OK, now heartbeat knows how to start/stop mon and apache. Check that you have startscripts like this:start scripts


Start the cluster

As we’ve configured Heartbeat, it will start apache and mon on the master
For the slave, starting heartbeat with a healthy master, will put it in standby and patiently wait for the master to go down.
start heatbeat on the master

# /etc/rc.d/heartbeat start

Do the same on the slave

Check out the masters log, /var/log/ha-log. This is a good log.

/var/log/ha-log
heartbeat: ... info: Configuration validated. Starting heartbeat 0.4.9.2
heartbeat: ... info: heartbeat: version 0.4.9.2
heartbeat: ... info: Heartbeat generation: 7
heartbeat: ... notice: Starting serial heartbeat on tty /dev/ttyS0
heartbeat: ... info: Local status now set to: 'up'
heartbeat: ... info: Local status now set to: 'active'
heartbeat: ... info: Heartbeat restart on node master
heartbeat: ... info: Node master: status up
heartbeat: ... info: Running /etc/ha.d/rc.d/status status
heartbeat: ... info: Running /etc/ha.d/resource.d/IPaddr 172.17.10.30 status
heartbeat: ... info: Node master: status active
heartbeat: ... info: Resource acquisition completed.
heartbeat: ... info: Running /etc/ha.d/rc.d/status status
heartbeat: ... info: Running /etc/ha.d/rc.d/ip-request ip-request
heartbeat: ... info: Running /etc/ha.d/resource.d/IPaddr 172.17.10.30 status
heartbeat: ... info: Acquiring resource group: master 172.17.10.30 apache mon
heartbeat: ... info: Running /etc/ha.d/resource.d/IPaddr 172.17.10.30 start
heartbeat: ... info: ifconfig eth0:0 172.17.10.30 netmask 255.255.0.0  \
 broadcast 172.17.255.255
heartbeat: ... info: Sending Gratuitous Arp for 172.17.10.30 on eth0:0 [eth0]
heartbeat: ... info: Running /etc/ha.d/resource.d/apache  start
heartbeat: ... info: Running /etc/ha.d/resource.d/mon  start

Here are some common troubleshooting scenarios:

heartbeat: ... ERROR: Bad permissions on keyfile [/etc/ha.d/authkeys], 600 recommended.

fix that with chmod 600 /etc/ha.d/authkeys

heartbeat: ... ERROR: Current node [mastah] not in configuration

check your hostname, which has to be the same as you defined it in /etc/ha.d/ha.cf and /etc/ha.d/haresources

heartbeat: ... Cannot locate resource script apache
heartbeat: ... Cannot locate resource script mon

Make sure you have start scripts in /etc/rc.d/init.d or /etc/ha.d/resource.d that takes the start/stop argument and are exectuable. Try them manually first.

If mon shuts down heartbeat right away, it probably means you haven’t configured the http ping or router ping properly (thus making Mon think it’s not answering).

Check that heartbeat brings up the virtual ip, this is what the master should look like:

# ifconfig -a
eth0:0    Link encap:Ethernet  HWaddr ....
          inet addr:172.17.10.30  Bcast:172.17.255.255  Mask:255.255.0.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          Interrupt:11 Base address:0x1000

Test failover scenarios

If you’ve come this far, good! Now it’s time to try to shake down the master.
Apache is now listening on that virtual ip, so point a browser to http://172.17.10.30/test.html.
You should see the test page displaying “MASTER”.

1. Turn off the master, or do a init 1, to get out of the multiuser runlevel.
It should take ~10 secs for the slave to determine that the master is down.
Soon enough, you’ll see this in the slaves log

heartbeat: ... WARN: node master: is dead
heartbeat: ... info: Link master:/dev/ttyS0 dead.
heartbeat: ... info: Link master:eth1 dead.
heartbeat: ... info: Running /etc/ha.d/rc.d/status status
heartbeat: ... info: Running /etc/ha.d/rc.d/ifstat ifstat
heartbeat: ... info: Running /etc/ha.d/rc.d/ifstat ifstat
heartbeat: ... info: Taking over resource group 172.17.10.30
heartbeat: ... info: Acquiring resource group: master 172.17.10.30 apache mon
heartbeat: ... info: Running /etc/ha.d/resource.d/IPaddr 172.17.10.30 start
heartbeat: ... info: ifconfig eth0:0 172.17.10.30 netmask 255.255.0.0 \
 broadcast 172.17.255.255
heartbeat: ... info: Sending Gratuitous Arp for 172.17.10.30 on eth0:0 [eth0]
heartbeat: ... info: Running /etc/ha.d/resource.d/apache  start
heartbeat: ... info: Running /etc/ha.d/resource.d/mon  start

Do a ifconfig -a to see that the slave took over the IP address

Point your browser to http://172.17.10.30/test.html. You should see “SLAVE”.

Start the master again to get back to normal operation. Start heartbeat if necessary.

2. do a /etc/rc.d/apache stop on the master.

After a while (defined in mon.cf) you should see this in /var/log/messages log at the master

Feb 21 18:10:13 master mon[28334]: failure for web-fe http 1045879813 172.17.10.30

Mon will sense this and call the alert script, which does a shutdown heartbeat.
This is equivalent to that apache has died, it takes 10 secs for Mon to realize that it can’t ping the server anymore, and thus will bring down heartbeat.
Check the log on the slave to see that it brings up the interface,
In production, you’d have to fix the master, and then bring heartbeat back up on it, in which slave releases the resources. Start heartbeat on master again. This will make the slave to release all the resources.

Now you’re ready to read about all the options of heartbeat and mon. Refer to the Resources section below.

If you have trouble getting heartbeat start at boot, send me an email.


Resources


thomas.olausson@home.se, 2003-02-05. Comments and questions are welcome.

Another Way…

Using a Linux failover router

By Preston St. Pierre on April 13, 2005 (8:00:00 AM)


Printer friendly page Print
Comment on this article Comments

Today, it’s hard to imagine an organization operating without taking advantage of the vast resources and opportunities that the Internet provides. The Internet’s role has become so significant that no organization can afford to have its Net connection going down for too long. Consequently, most organizations have some form of a secondary or backup connection ready (such as a leased line) in case their primary Net connection fails. However, the process of switching over from the primary to the backup connection, if done manually by the system administrator, can take some time, depending upon how ready the backup setup is and on the availability of the administrator at the right moment. The process can even become a costly affair if the organization must buy dedicated routers for the purpose of automatic switchover. But there is an easy and cost-effective alternative — setting up a Linux failover router.
In this article we will look at setting up an existing Linux machine as a failover router to provide quick and automatic switchover from a dead Internet connection (the primary connection) to one that is operational (the secondary connection).To begin, you’ll need a PC with any recent GNU/Linux distro installed. You’ll also need three network cards to put into this Linux box. Two of the three network cards, say eth0 and eth1, will connect to the Internet routers/gateways of your primary ISP (say ISP1) and secondary ISP (say ISP2). The third network card, say eth2, will connect to your internal LAN.Setting up the networkBegin by setting up your network based on the configuration information available to you. You can make the configurations from the X Window GUI using the Network utility. To do so, open the Network utility from Main Menu > System Settings > Network. This will open up a network configuration window displaying a list of all the network cards installed on your system. Double-click on the network card you wish to configure, select the Statically Set IP Addresses option, and assign the IP address along with the subnet mask. There is also a Default Gateway Address field; tou can leave it blank for the time being, as it can be specified later on from the command line.Assign the IP addresses provided to you by your ISPs to the two network cards, eth0 and eth1. In our setup, we assigned eth0=61.16.130.100 and eth1=200.15.110.101 (which are public IP addresses), along with the subnet mask 255.255.255.224.Assign a private IP address based on your internal LAN subnet to your third card. We assigned eth2=10.0.0.1, where 10.0.0.0/24 was the address range for our internal LAN setup. Save your changes and exit.

Now turn on IP packet forwarding on the Linux box by changing the value of net.ipv4.ip_forward to 1 in the /etc/sysctl.conf file and executing the command:

# sysctl -p

Next, you need to configure iptables by adding certain rules, so that your internal LAN can route packets to the Internet. For this, issue the following commands as root:

# iptables  -t  nat  -A  POSTROUTING  -o  eth0  -j  MASQUERADE# iptables  -t  nat  -A  POSTROUTING  -o  eth1  -j  MASQUERADE

# iptables  -A  FORWARD  -s  10.0.0.0/24  -j  ACCEPT

# iptables  -A  FORWARD  -d  10.0.0.0/24  -j  ACCEPT

# iptables  -A  FORWARD  -s  !  10.0.0.0/24  -j DROP

The above commands turn on masquerading in the NAT table by appending a POSTROUTING rule (-A POSTROUTING) for all outgoing packets on the two Ethernet interfaces, eth0 and eth1. The next two lines accept forwarding of all packets to and from the 10.0.0.0/24 network. The last line drops the packets that do not come from the 10.0.0.0/24 network.

To make the iptables rules permanent, save them as follows:

# iptables-save > /etc/sysconfig/iptables

Now you must restart your network, as well as iptables:

# /etc/init.d/network  restart# /etc/init.d/iptables  restart

To see if your new iptables rules have gone into effect, type iptables --L.

Enabling failover routing

After you have configured your network, the next step is to enable failover routing on your Linux box, so that if the first route dies the router will automatically switch over to the next route. To do so, you’ll need to add the default gateway routes provided to you by your ISPs for both your network cards:

# route add default gw 61.16.130.97 dev eth0# route add default gw 200.15.110.90 dev eth1

Here, 61.16.130.97 is the gateway address given by ISP1 and 200.15.110.90 is the gateway address given by ISP2. Replace them with the addresses available to you. These routes will disappear every time you reboot the system. In order to make these routes permanent add the above two commands in the /etc/rc.d/rc.local file, which is run at boot time.

Also make sure that all the computers on your internal LAN (10.0.0.0/24) have their default gateway address set as the IP address of the eth3 Ethernet interface (i.e. 10.0.0.1) of your failover router.

Finally, modify the /proc/sys/net/ipv4/route/gc_timeout file. This file contains a numerical value that denotes the time in seconds after which the kernel declares a route to be inactive and automatically switches to the other route if available. Open the file in any text editor and change its default value of 300 to some smaller value, say 10 or 15. Save the changes and exit.

Now your Linux machine is ready to serve as a failover router, automatically and quickly switching to the secondary route every time the primary route fails.

Preston St. Pierre is a computer information systems student at the University of the Fraser Valley in British Columbia, Canada.

Leave a Reply