Cluster Part 1: Howto Build an Apache2 Cluster in Debian Lenny
After spending many hours reviewing several tutorials, I decided to write my own procedure in hopes that others will find it successful the first time around, avoiding the same problems I encountered. There is no guarantee this will work for you. However, it is working for me.
« Back to Introduction | Cluster Part 2: Howto Build a MySQL Cluster in Debian Lenny »
About the Apache2 Cluster
The concept of an Apache cluster is fairly simple. The three Apache nodes are all exactly the same, with the exception of the hostname/ip address. The Apache2 configuration files, as well as the web directories (/var/www/*) all contain the same files. We will use rsync to keep the files the same on all servers. The real magic happens on the load balancer server, where the load balancer monitors each apache server and distributes requests evenly across all three servers. In fact, one request to a web page may involve retrieval of many files. In an apache cluster environment, the page you request may actually be returning parts from multiple servers. One thing to keep in mind, if you plan on using a web application that requires sessions, you must store the session variable in a database. Otherwise, a user's session will be lost in an apache cluster.

The 5 servers and a shared IP to be used in this tutorial are as follows:
loadb1.example.com: 192.168.1.1 [ loadb1 ] loadb2.example.com: 192.168.1.2 [ loadb2 ] apachenode1.example.com: 192.168.1.3 [ apachenode1 ] apachenode2.example.com: 192.168.1.4 [ apachenode2 ] apachenode3.example.com: 192.168.1.5 [ apachenode3 ] floating_ip_address 192.168.1.99
In this example, we will use 5 Debian Lenny servers. You should start with a clean operating system installation for this tutorial. All steps below should be performed as root unless otherwise specified.
Apache2 Web Server Installation & Configuration [apachenode1,2,3]
Install latest Apache 2 packages
apt-get update apt-get install apache2
Configure the apache servers to log the original IP of the visiting user. Since we're using load balancers, by default the IP address of the load balancer would be used. Thus, causing problems when we want to log visits for statistics and other purposes.
nano /etc/apache2/apache2.conf
Locate the following line, comment it out with # and add new line as shown below. Make sure the line is on one line in the file unlike shown here:
LogFormat "%{X-Forwarded-For}i %l %u %t \"%r\" %>s %b
\"%{Referer}i\" \"%{User-Agent}i\"" combined
Create a haproxytest.txt file in the webroot directory. HAProxy will use this file as a test to see if the server is running and accessible. If it is inaccessible, HAProxy will know the server is down and will redirect traffic to the remaining Apache nodes.
touch /var/www/haproxytest.txt
Since this file will be accessed many times per minute, we don't want to log access to it. Therefore, we need to edit the virtual host file.
nano /etc/apache2/sites-available/default
Add the following and comment out any other "CustomLog" lines in the file.
CustomLog /var/log/apache2/access.log combined env=!dontlog
Restart apache for your settings to take effect.
/etc/init.d/apache2 restart
Configure the load balancers [loadb1, loadb2]
Unless specified otherwise, all steps below should be completed on both load balancers.
Install HAProxy
apt-get install haproxy
Backup your config file and create a new one
cp /etc/haproxy/haproxy.cfg /etc/haproxy/haproxy.cfg_orig rm -f /etc/haproxy/haproxy.cfg nano /etc/haproxy/haproxy.cfg
Add the following to haproxy.cfg
global
log 127.0.0.1 local0
log 127.0.0.1 local1 notice
#log loghost local0 info
maxconn 4096
#debug
#quiet
user haproxy
group haproxy
defaults
log global
mode http
option httplog
option dontlognull
retries 3
redispatch
maxconn 2000
contimeout 5000
clitimeout 50000
srvtimeout 50000
listen webfarm 192.168.1.99:80
mode http
stats enable
stats auth someuser:somepassword
balance roundrobin
cookie JSESSIONID prefix
option httpclose
option forwardfor
option httpchk HEAD /haproxytest.txt HTTP/1.0
server apachenode1 192.168.1.3:80 cookie A check
server apachenode2 192.168.1.4:80 cookie B check
server apachenode3 192.168.1.5:80 cookie B check
Enable haproxy on startup
nano /etc/default/haproxy
Set ENABLED to 1 and uncomment the line if commented out
ENABLED=1
Install Heartbeat so load balancers listen on floating IP address
apt-get install heartbeat
Enable HAProxy binding to the floating IP address
nano /etc/sysctl.conf
Change to:
net.ipv4.ip_nonlocal_bind=1
Apply changes by running:
sysctl -p
Create three config files. We will edit each individually.
touch /etc/ha.d/authkeys touch /etc/ha.d/ha.cf touch /etc/ha.d/haresources
Edit authkeys.
nano /etc/ha.d/authkeys
Add the following and replace "somerandomstring" with a string of unique characters. This file must be identical on loadb1 and loadb2.
auth 3 3 md5 somerandomstring
Secure this file
chmod 600 /etc/ha.d/authkeys
loadb1: Edit High Availability config file
nano /etc/ha.d/ha.cf
Add the following:
# # keepalive: how many seconds between heartbeats # keepalive 2 # # deadtime: seconds-to-declare-host-dead # deadtime 10 # # What UDP port to use for udp or ppp-udp communication? # udpport 694 bcast eth0 mcast eth0 225.0.0.1 694 1 0 ucast eth0 192.168.1.2 # What interfaces to heartbeat over? udp eth0 # # Facility to use for syslog()/logger (alternative to log/debugfile) # logfacility local0 # # Tell what machines are in the cluster # node nodename ... -- must match uname -n node loadb1.example.com node loadb2.example.com
loadb2: Edit High Availability config file
nano /etc/ha.d/ha.cf
Add the following:
# # keepalive: how many seconds between heartbeats # keepalive 2 # # deadtime: seconds-to-declare-host-dead # deadtime 10 # # What UDP port to use for udp or ppp-udp communication? # udpport 694 bcast eth0 mcast eth0 225.0.0.1 694 1 0 ucast eth0 192.168.1.1 # What interfaces to heartbeat over? udp eth0 # # Facility to use for syslog()/logger (alternative to log/debugfile) # logfacility local0 # # Tell what machines are in the cluster # node nodename ... -- must match uname -n node loadb1.example.com node loadb2.example.com
loadb1: Edit High Availability resources file.
nano /etc/ha.d/haresources
Enter the floating IP address:
loadb1.example.com 192.168.1.99
loadb2: Edit High Availability resources file.
nano /etc/ha.d/haresources
Enter the floating IP address:
loadb2.example.com 192.168.1.99
Start Heartbeat and HAProxy on both loadbalancers
/etc/init.d/heartbeat start /etc/init.d/haproxy start
Testing Load Balancers [loadb1, loadb2]
We can also test the cluster by bringing down nodes. For example, if we bring down loadb1, then loadb2 should pick up the load.
/etc/init.d/haproxy stop
Now try loading your web site. If it loads, then the node failed over to the secondary load balancer.
Let's start the primary back up.
/etc/init.d/haproxy start
Testing the Web Servers [apachenode1, apachenode2, apachenode3
The same principle can be used for testing the load balancing between the Apache web servers. We can bring down two nodes, and the third node should still serve up web pages.
apachenode1: Stop apache2.
/etc/init.d/apache2 stop
apachenode2: Stop apache2.
/etc/init.d/apache2 stop
Now you should still be able to browse your site, knowing the first 2 nodes are down. The third node should be handling the requests. Let's start our web servers back up:
apachenode1: Start apache2.
/etc/init.d/apache2 start
apachenode2: Start apache2.
/etc/init.d/apache2 start
Status using Web Interface
Earlier in the haproxy.cfg file, we set a couple of lines (user haproxy, group haproxy) which are used to login to a web interface showing the health of your apache web servers. To access this interface, go to your floating IP address like this: http://192.168.1.99/haproxy?stats
Conclusion
Configuring an Apache2 cluster in this way eliminates a single point of failure. This allows you to disable servers for maintenance without any down time. Something not covered in this article is critical. Your three Apache servers should be mirrors of each other. Therefore, you need to come up with a solution for making sure your files are consistent on all three web servers including your Apache configuration files. This can be done by using RSync. Also, if your application uses sessions for allowing Logins, you must store the sessions in a database. If you do not, sessions will be lost as users are bounced around to the different Apache nodes.
« Back to Introduction | Cluster Part 2: Howto Build a MySQL Cluster in Debian Lenny »
