Setting up squid as a reverse proxy / web accelerator
Getting Squid to work as a reverse proxy proved quite a challenge as I had no prior experience with squid or http caching and so was starting from scratch. The documentation for squid consists of a faq, a users guide and a configuration manual, which contains a lot of the information that is in the squid.conf file as comments placed above each option. The best documentation on reverse proxying using squid is here. This alone did not provide enough information to get reverse proxying working with midgard and the squid mailing lists, which can be searched using the search function on the squid website, provided an invaluable resource.
Although squid is available as a debian package, the easiest way to get it working as a reverse proxy is to compile it with the --disable-internal-dns option so I ended up downloading the source from the website and compiling it with this option. Also if you want to be able to produce custom logs for statistical analysis by webalizer and other similar programs you may want to apply the custom_log patch. This means that you can produce logs which contains useragent and referer stats without having to log everything and then filter it with a perl script. To get the custom_log patch to work you will need to use the latest snap-shot of the stable squid version which you can get from here. Unfortunately the custom logging options don't seem to correspond exactly to those in apache. This makes it more difficult to set up seperate logging for the different virtual hosts you might be running as you can not just apply the split-logfile.pl to your combined log file. I currently trying to learn some perl so I can figure out how to adapt the script or alternatively to better understand how the logging options in apache and squid work so I can adjust the squid ones to produce the same output as apache.
Once you have downloaded and unzipped the latest stable squid source cd into the source directory and run the following command:
$ patch -p1 < path-to-customlog-2.5.path
Just had to then tell patch what the name of the files that it should patch. I just typed in the name of all the files in the patch.
Then we compile the new source.
$ ./configure --prefix=/usr/local/squid --disable-internal-dns --enable-async-io
$ make all
$ sudo make install
Squid should now be installed in /usr/local/squid.
The next step is to edit the squid.conf file which is in /usr/local/squid/etc/. This file is very well commented and very long. Now a lot of these options are not relevant if you are only using squid for a reverse proxy. If you are interested though I recommend reading all the comments and also referring to the documentation mentioned above. To view my squid.conf file click here. Refer to the documentation on reverse proxying for more information on the various options that I have used.
We now have to edit the httpd.conf file and the midgard-data.conf file so that apache will now only listen on port 80 on the loopback device. This will let us have squid listening on port 80 on our external device. By keeping apache listening on port 80 we don't have to worry about changing any of the settings for our midgard websites.
The only change to make in the httpd.conf file is to uncomment the BindAddress line and change it to:
BindAddress 127.0.0.1
Now for the midgard-data.conf file we have to set the virtualhosts that are currently on port 80 on the external device to be on the loopback device. Rather than run through all the changes you can view my midgard-data.conf file here. Once we have done this the last step is to edit the /etc/hosts file so that squid passes requests for the different virtual hosts eg test.wilderness.org.au or www.sydney.wilderness.org.au to the apache server listening on the loopback device. Add your different virtual domains to the hosts file like this:
127.0.0.1 test.wilderness.org.au
127.0.0.1 www.sydney.wilderness.org.au
127.0.0.1 sydney.wilderness.org.au
Ok now that we have done all this we have to get squid to create its swap directories.
$ cd /usr/local/squid/sbin
$ sudo ./squid -z
Now restart apache to get the changes to the config files loaded.
$ sudo /etc/init.d/apache/restart
Check that you can still access your staging website on port 8001 or where ever you have it and then check that you can access the webserver from the localhost:
$ lynx http://127.0.0.1/
Note you will not be able to access your midgard websites as mod_midgard will not recognise 127.0.0.1 as a domain that it should serve, this is just to test that apache is working on the loopback device. Now that apache is up start squid:
$ ./squid
You can check what parameters squid has started with by looking at cache.log in /usr/local/squid/var/logs/. Now I recommend opening 2 shells on your server and looking at the output of the squid and apache access logs to see what is happening when you request a page.
$ cd ../var/logs/
$ sudo tail -f access.log
$ sudo tail -f /var/log/apache/access.log
At the moment squid should be passing all requests straight to apache and none of the pages will be being cached. In order to get caching to work for you php pages (or in fact html pages or images) you have to send the appropriate HTTP headers with the pages that you want to be cached.
A fantastic tutorial on the whole caching thing is at:
Caching Tutorial for Web Authors and Webmasters
For some excellent examples of how caching headers can be sent from PHP pages see the following.
Supporting Conditional GET in PHP
The ultimate caching solution :)
PHP Cache Control
Making PHP Applications Cache-Friendly
You can see our solution to adding caching headers to midgard pages in the page below
To set caching for anything that you have being served directly by apache rather than midgard, for example non-Midgard images, add the following to your httpd.conf file.
#Set up caching for images in /var/www/images
<Directory /var/www/images/>
AllowOverride None
order allow,deny
Allow from all
ExpiresActive On
ExpiresDefault "access plus 1 day"
</Directory/>
Images stored in the images folder will now be cached for 1 day from the time of access.
Once you've added your headers, test that your pages are caching properly by using the Cachability engine
I ran stress tests using siege and got the following results. I used the following command bombardment twsurls.txt 10 30 3 5 which meant that siege would hit the front page of the website for 1 minute initially simulating 10 users requesting the page every 5 secs then 40 users then 70 users. The first was hitting the site without caching so that every page would have to be served by midgard. Looking at top I could see CPU usage go as high as 60% with mysql being by far the biggest user.
| Date & Time | Trans | Elap Time | Data Trans | Resp Time | Trans Rate | Throughput | Concurrent | OKAY | Failed |
| 08/19/03 19:55:45 | 98 | 59.67 | 3e+06 | 3.85 | 1.64 | 56596 | 6.33 | 98 | 0 |
| 08/19/03 19:57:15 | 92 | 59.97 | 3e+06 | 18.93 | 1.53 | 46152 | 29.04 | 92 | 0 |
| 08/19/03 19:58:45 | 132 | 59.98 | 3e+06 | 20.18 | 2.2 | 46322 | 44.41 | 132 | 1 |
The following are the results for exactly the same test but this time with squid running and the caching headers set. During that time CPU usage got to a high of approximately 8% of which my intrusion detection program snort was the biggest user followed by squid.
| Date & Time | Trans | Elap Time | Data Trans | Resp Time | Trans Rate | Throughput | Concurrent | OKAY | Failed |
| 08/18/03 17:13:37 | 192 | 60.25 | 6e+06 | 1.08 | 3.19 | 105949 | 3.43 | 192 | 0 |
| 08/18/03 17:15:07 | 677 | 59.97 | 2e+07 | 1.34 | 11.29 | 375325 | 15.13 | 677 | 0 |
| 08/18/03 17:16:37 | 841 | 59.98 | 3e+07 | 2.7 | 14.02 | 466168 | 37.89 | 841 | 0 |
