High availability load balancing using HAProxy on Ubuntu (part 1)
In this post we will show you how to easily setup loadbalancing for your web application. Imagine you currently have your application on one webserver called web01:
+---------+
| uplink |
+---------+
|
+---------+
| web01 |
+---------+
But traffic has grown and you’d like to increase your site’s capacity by adding more webservers (web02 and web03), aswell as eliminate the single point of failure in your current setup (if web01 has an outage the site will be offline).
+---------+
| uplink |
+---------+
|
+-------------+-------------+
| | |
+---------+ +---------+ +---------+
| web01 | | web02 | | web03 |
+---------+ +---------+ +---------+
In order to spread traffic evenly over your three web servers, we could install an extra server to proxy all the traffic an balance it over the webservers. In this post we will use HAProxy, an open source TCP/HTTP load balancer. (see: http://haproxy.1wt.eu/) to do that:
+---------+
| uplink |
+---------+
|
+
|
+---------+
| loadb01 |
+---------+
|
+-------------+-------------+
| | |
+---------+ +---------+ +---------+
| web01 | | web02 | | web03 |
+---------+ +---------+ +---------+
So our setup now is:
- Three webservers, web01 (192.168.0.1), web02 (192.168.0.2 ), and web03 (192.168.0.3) each serving the application
- A new server (loadb01, ip: (192.168.0.100 )) with Ubuntu installed.
Allright, now let’s get to work:
Start by installing haproxy on your loadbalancing machine:
loadb01$ sudo apt-get install haproxy
Now let’s backup the original haproxy configuration file and create a new one with our config which will tell haproxy to listen for incoming http requests on port 80 and balance them between the three webservers:
loadb01$ sudo mv /etc/haproxy/haproxy.cfg /etc/haproxy/backup_haproxy.cfg loadb01$ sudo vi /etc/haproxy/haproxy.cfg
Paste the following configuration there:
global
maxconn 4096
user haproxy
group haproxy
daemon
defaults
log global
mode http
option httplog
option dontlognull
retries 3
option redispatch
maxconn 2000
contimeout 5000
clitimeout 50000
srvtimeout 50000
listen webcluster *:80
mode http
stats enable
stats auth us3r:passw0rd
balance roundrobin
option httpchk HEAD / HTTP/1.0
option forwardfor
cookie LSW_WEB insert
option httpclose
server web01 192.168.0.1:80 cookie LSW_WEB01 check
server web02 192.168.0.2:80 cookie LSW_WEB02 check
server web03 192.168.0.3:80 cookie LSW_WEB03 check
Enable HAproxy by editing the /etc/default/haproxy file
loadb01$ sudo nano /etc/default/haproxy
and setting ENABLED to 1:
# Set ENABLED to 1 if you want the init script to start haproxy. ENABLED=1 # Add extra flags here. #EXTRAOPTS="-de -m 16"
Then, start HAProxy:
loadb01$ sudo /etc/init.d/haproxy start
Now open your webbrowser and browse to http://129.168.0.100/ (or whatever IP you have set for loadb01), you should be served a file from one of the webservers! The loadbalancing is now working, but let’s take a closer look at some of the things we configured in the HAProxy configuration:
listen webcluster *:80
Listen for incoming connections on all interfaces, port 80 (the * can also be replaced with a single ip address)
stats enable stats auth us3r:passw0rd
This enables HAProxy’s statistics interface which you can access by browsing to http://192.168.0.100/haproxy?stats login with the username and password given and you should see a nice statistics report like this:
balance roundrobin
This line set’s HAProxy’s balancing algorithm to ’roundrobin’ (which is also the default one), it basically makes sure each subsequent request is handled by the next server in the line. For other possible algorithms to use here, please check section 4.2 of Haproxy’s configuration manual: http://haproxy.1wt.eu/download/1.4/doc/configuration.txt
option httpchk HEAD / HTTP/1.0
This option enables HTTP checking on the web servers, HAProxy will issue HTTP requests to / and check for a valid response, if the webserver does not give a valid response (for example when it’s down) haproxy will mark the server as down and will not send any requests to it anymore. You can also see this in the statistics interface, here’s an example with the webserver on web02 stopped:
cookie LSW_WEB insert server web01 192.168.0.1:80 cookie LSW_WEB01 check server web02 192.168.0.2:80 cookie LSW_WEB02 check server web03 192.168.0.3:80 cookie LSW_WEB03 check
The first line in this block enables the use of cookies, basically, when a user reaches the webcluster group, the cookie LSW_WEB will be created and the server id (LSW_WEB01, LSW_WEB02, LSW_WEB03) will be stored in it. For all next requests in the same session, HAProxy will look at the cookie and redirect that user to the same webserver (unless it’s down).
The last three lines define the backend webservers which HAProxy will use, you can easily add more lines here as the infrastructure grows.
Allright the loadbalancing is working and we are almost there, just one thing left to do in this article and that’s fixing your webserver logs on the web01/web02/web03 servers. Since requests now changed from:
user --> webserver
To:
user --> HAProxy --> webserver
You will see the loadbalancer’s ip in the access log on your webservers. In order to fix this when you are using Apache webserver open your /etc/apache2/apache2.conf file and replace this line:
LogFormat "%h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" combined
By
#LogFormat "%h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" combined
LogFormat "%{X-Forwarded-For}i %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
Then restart/reload apache and the logging should be fixed, it will now include the IP address which is send in the X-Forwarded-For header (This header contains a value representing the client’s IP address.) that HAProxy includes in all requests to the backend webserver. We enabled that earlier by setting the
option forwardfor
option in the HAPRoxy configuration.
That’s it!, over the course of next weeks we will be posting some more articles on this subject, covering:
- Adding high-availability for the loadbalancer (as it’s now a single point of failure
)
- MySQL database scalability options.
If there’s anything else you’d like us to cover, or if you have any questions please leave a comment!



MyQL on a diferent server!
We’ve just migrated our application to a new server cluster last week, co-incidentally on dedicated servers from LeaseWeb. We’re now using the HAProxy software to load balance all our HTTP traffic, which replaces our previous installation’s ServerIron hardware load balancer. So far we’re very impressed with the performance, features and cost savings that HAProxy delivers!
In terms of the Apache part of the setup, it’s worth mentioning that we’ve opted to use a backport of Apache 2.3′s ‘mod_remoteip’ module to monitor the ‘X-Forwarded-For’ HTTP header and then correctly pass on the client’s IP address, which is more useful than merely adjusting Apache’s LogFormat directive as shown in this post. The ‘mod_remoteip’ module will, for example, allow you to perform access control in Apache based on the client’s IP address. This is of course a good option if you are compiling Apache 2.2.x from source.
Finally, I hope I’m not being too premature considering that you are planning a follow-up blog post, but in the interests of helping others on the high-availability or failover side of things, we’ve used the lightweight Keepalived VRRP daemon, as recommended by Willy Tarreau the author HAProxy, to manage a floating IP address across master and standby load balancer servers. I believe others use software called Heartbeat instead.
See:
http://www.gossamer-threads.com/lists/apache/users/390137
http://blog.loadbalancer.org/configure-haproxy-with-tproxy-kernel-for-full-transparent-proxy/
http://www.keepalived.org/
@Warren:
Thanks for your comment! Good to see you’re getting impressive results using HAProxy on our infrastructure.
The additional information is definitely useful for the readers, so don’t hold back in posting it. There are quite some options around to do high availability (be it hardbeat or keepalived) – but we’ll try to cover at least some of them
Robert
In lighttpd the Logformat part could be :
accesslog.format = “%{X-Forwarded-For}i %l %u %t \”%r\” %>s %b \”%{Referer}i\” \”%{User-Agent}i\”"
in the /etc/lighttpd/lighttpd.conf
Interested in load-balancing setup for Postgres 9.1 server as well…
[...] If you still want to deploy cross-region load balancing, you can use an open source tool such as HAPROXY which operates on the same architecture as [...]
[...] http://www.leaseweblabs.com/2011/07/high-availability-load-balancing-using-haproxy-on-ubuntu-part-1/ [...]
[...] If you still want to deploy cross-region load balancing, you can use an open source tool such as HAPROXY which operates on the same architecture as [...]