Cluster Part 1: Howto Build an Apache2 Cluster in Debian Lenny

After spending many hours reviewing several tutorials, I decided to write my own procedure in hopes that others will find it successful the first time around, avoiding the same problems I encountered. There is no guarantee this will work for you. However, it is working for me.

« Back to Introduction | Cluster Part 2: Howto Build a MySQL Cluster in Debian Lenny »

About the Apache2 Cluster

The concept of an Apache cluster is fairly simple. The three Apache nodes are all exactly the same, with the exception of the hostname/ip address. The Apache2 configuration files, as well as the web directories (/var/www/*) all contain the same files. We will use rsync to keep the files the same on all servers. The real magic happens on the load balancer server, where the load balancer monitors each apache server and distributes requests evenly across all three servers. In fact, one request to a web page may involve retrieval of many files. In an apache cluster environment, the page you request may actually be returning parts from multiple servers. One thing to keep in mind, if you plan on using a web application that requires sessions, you must store the session variable in a database. Otherwise, a user's session will be lost in an apache cluster.

The 5 servers and a shared IP to be used in this tutorial are as follows:

loadb1.example.com:	        192.168.1.1 [ loadb1 ]
loadb2.example.com:		192.168.1.2 [ loadb2 ]
apachenode1.example.com: 	192.168.1.3 [ apachenode1 ]
apachenode2.example.com: 	192.168.1.4 [ apachenode2 ]
apachenode3.example.com: 	192.168.1.5 [ apachenode3 ]
floating_ip_address             192.168.1.99

In this example, we will use 5 Debian Lenny servers. You should start with a clean operating system installation for this tutorial. All steps below should be performed as root unless otherwise specified.

Apache2 Web Server Installation & Configuration [apachenode1,2,3]

Install latest Apache 2 packages

apt-get update
apt-get install apache2

Configure the apache servers to log the original IP of the visiting user. Since we're using load balancers, by default the IP address of the load balancer would be used. Thus, causing problems when we want to log visits for statistics and other purposes.

nano /etc/apache2/apache2.conf

Locate the following line, comment it out with # and add new line as shown below. Make sure the line is on one line in the file unlike shown here:

#LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\"...
LogFormat "%{X-Forwarded-For}i %l %u %t \"%r\" %>s %b
\"%{Referer}i\" \"%{User-Agent}i\""
combined

Create a haproxytest.txt file in the webroot directory. HAProxy will use this file as a test to see if the server is running and accessible. If it is inaccessible, HAProxy will know the server is down and will redirect traffic to the remaining Apache nodes.

touch /var/www/haproxytest.txt

Since this file will be accessed many times per minute, we don't want to log access to it. Therefore, we need to edit the virtual host file.

nano /etc/apache2/sites-available/default

Add the following and comment out any other "CustomLog" lines in the file.

SetEnvIf Request_URI "^/haproxytest\.txt$" dontlog
CustomLog /var/log/apache2/access.log combined env=!dontlog

Restart apache for your settings to take effect.

/etc/init.d/apache2 restart

Configure the load balancers [loadb1, loadb2]

Unless specified otherwise, all steps below should be completed on both load balancers.

Install HAProxy

apt-get install haproxy

Backup your config file and create a new one

cp /etc/haproxy/haproxy.cfg /etc/haproxy/haproxy.cfg_orig
rm -f /etc/haproxy/haproxy.cfg
nano /etc/haproxy/haproxy.cfg

Add the following to haproxy.cfg

global
        log 127.0.0.1   local0
        log 127.0.0.1   local1 notice
        #log loghost    local0 info
        maxconn 4096
        #debug
        #quiet
        user haproxy
        group haproxy

defaults
        log     global
        mode    http
        option  httplog
        option  dontlognull
        retries 3
        redispatch
        maxconn 2000
        contimeout      5000
        clitimeout      50000
        srvtimeout      50000

listen webfarm 192.168.1.99:80
       mode http
       stats enable
       stats auth someuser:somepassword
       balance roundrobin
       cookie JSESSIONID prefix
       option httpclose
       option forwardfor
       option httpchk HEAD /haproxytest.txt HTTP/1.0
       server apachenode1 192.168.1.3:80 cookie A check
       server apachenode2 192.168.1.4:80 cookie B check
       server apachenode3 192.168.1.5:80 cookie B check

Enable haproxy on startup

nano /etc/default/haproxy

Set ENABLED to 1 and uncomment the line if commented out

ENABLED=1

Install Heartbeat so load balancers listen on floating IP address

apt-get install heartbeat

Enable HAProxy binding to the floating IP address

nano /etc/sysctl.conf

Change to:

net.ipv4.ip_nonlocal_bind=1

Apply changes by running:

sysctl -p

Create three config files. We will edit each individually.

touch /etc/ha.d/authkeys
touch /etc/ha.d/ha.cf
touch /etc/ha.d/haresources

Edit authkeys.

nano /etc/ha.d/authkeys

Add the following and replace "somerandomstring" with a string of unique characters. This file must be identical on loadb1 and loadb2.

auth 3
3 md5 somerandomstring

Secure this file

chmod 600 /etc/ha.d/authkeys

loadb1: Edit High Availability config file

nano /etc/ha.d/ha.cf

Add the following:

#
#       keepalive: how many seconds between heartbeats
#
keepalive 2
#
#       deadtime: seconds-to-declare-host-dead
#
deadtime 10
#
#       What UDP port to use for udp or ppp-udp communication?
#
udpport        694
bcast  eth0
mcast eth0 225.0.0.1 694 1 0
ucast eth0 192.168.1.2
#       What interfaces to heartbeat over?
udp     eth0
#
#       Facility to use for syslog()/logger (alternative to log/debugfile)
#
logfacility     local0
#
#       Tell what machines are in the cluster
#       node    nodename ...    -- must match uname -n
node    loadb1.example.com
node    loadb2.example.com

loadb2: Edit High Availability config file

nano /etc/ha.d/ha.cf

Add the following:

#
#       keepalive: how many seconds between heartbeats
#
keepalive 2
#
#       deadtime: seconds-to-declare-host-dead
#
deadtime 10
#
#       What UDP port to use for udp or ppp-udp communication?
#
udpport        694
bcast  eth0
mcast eth0 225.0.0.1 694 1 0
ucast eth0 192.168.1.1
#       What interfaces to heartbeat over?
udp     eth0
#
#       Facility to use for syslog()/logger (alternative to log/debugfile)
#
logfacility     local0
#
#       Tell what machines are in the cluster
#       node    nodename ...    -- must match uname -n
node    loadb1.example.com
node    loadb2.example.com

loadb1: Edit High Availability resources file.

nano /etc/ha.d/haresources

Enter the floating IP address:

loadb1.example.com 192.168.1.99

loadb2: Edit High Availability resources file.

nano /etc/ha.d/haresources

Enter the floating IP address:

loadb2.example.com 192.168.1.99

Start Heartbeat and HAProxy on both loadbalancers

/etc/init.d/heartbeat start
/etc/init.d/haproxy start

Testing Load Balancers [loadb1, loadb2]

We can also test the cluster by bringing down nodes. For example, if we bring down loadb1, then loadb2 should pick up the load.

/etc/init.d/haproxy stop

Now try loading your web site. If it loads, then the node failed over to the secondary load balancer.

Let's start the primary back up.

/etc/init.d/haproxy start

Testing the Web Servers [apachenode1, apachenode2, apachenode3

The same principle can be used for testing the load balancing between the Apache web servers. We can bring down two nodes, and the third node should still serve up web pages.

apachenode1: Stop apache2.

/etc/init.d/apache2 stop

apachenode2: Stop apache2.

/etc/init.d/apache2 stop

Now you should still be able to browse your site, knowing the first 2 nodes are down. The third node should be handling the requests. Let's start our web servers back up:

apachenode1: Start apache2.

/etc/init.d/apache2 start

apachenode2: Start apache2.

/etc/init.d/apache2 start

Status using Web Interface

Earlier in the haproxy.cfg file, we set a couple of lines (user haproxy, group haproxy) which are used to login to a web interface showing the health of your apache web servers. To access this interface, go to your floating IP address like this: http://192.168.1.99/haproxy?stats

Conclusion

Configuring an Apache2 cluster in this way eliminates a single point of failure. This allows you to disable servers for maintenance without any down time. Something not covered in this article is critical. Your three Apache servers should be mirrors of each other. Therefore, you need to come up with a solution for making sure your files are consistent on all three web servers including your Apache configuration files. This can be done by using RSync. Also, if your application uses sessions for allowing Logins, you must store the sessions in a database. If you do not, sessions will be lost as users are bounced around to the different Apache nodes.

« Back to Introduction | Cluster Part 2: Howto Build a MySQL Cluster in Debian Lenny »

Author: 
Group: 
Software Design & Engineering