May 18, 2011
Usu

Simulate real users load on a webserver using Siege and Sproxy

We all know ApacheBench which is a really great tool for “brute-force” benchmarking, but what if we need to simulate n real users browsing a website in a plausible way?

Well, Siege and Sproxy serve this exact purpose!
Sproxy is a proxy that we can use to collect a list of urls to feed Siege with, let’s see how.

First, we obviously need to install all the software we will need.

Make sure you have make, g++, perl-URI and perl-libwww-perl installed (or liburi-perl and libwww-perl on debian based distros).

1) Sproxy:

wget ftp://ftp.joedog.org/pub/sproxy/sproxy-latest.tar.gz
tar xzvf  sproxy-latest.tar.gz
./configure
make
make install

2) Siege:

wget ftp://ftp.joedog.org/pub/siege/siege-latest.tar.gz
tar xzvf  siege-latest.tar.gz
./configure
make
make install

We know need to collect some urls using Sproxy.

Launch sproxy and take note of the port it binded to, we will use this information to connect to it as a standard proxy. Here it comes the cool thing, every url we will visit through sproxy will be written to a file in a format Siege understands.

The url list will be written in ~/urls.txt, you can override this using he flag -o: for example:

sproxy -o /home/user/benchmark/urls.txt

Collecting urls manually using a browser (e.g: Firefox) could be a long process, especially for a forum or for websites with a lot of content. Wouldn’t it be cool to let an automatic tool do the hard work for us? Of course, so wget comes to the rescue!
Yes, you heard (read?) right, wget is a really powerful program even if you’ve always used it for downloading files only, so here’s an example of how we can crawl a website through sproxy using wget:

wget -r -o verbose.txt -l 0 -t 1 --spider -w 1 -e robots=on -e "http_proxy = http://127.0.0.1:9001" "http://www.example.com"

-w: specify the seconds to wait between requests to avoid overloading the webserver
-l: the number of maximum levels to descend, 0 means infinite.

For other options such as excluding some directories/file extensions please refer to “man wget”.

After some minutes/hours of crawling we should have a lot of urls in our sproxy file, let’s filter out all the duplicates:

sort -u -o uniq_urls.txt urls.txt

Finally, we can launch the actual siege test that will simulate real users visits across the urls we previously collected:

siege -v -c 50 -i -t 3M -f uniq_urls.txt -d 10

-v will give you more information of what siege is doing during the test.
-c specifies the number of total users simulated.
-i indicates to hit the urls in a random fashion.
-t specify the running time of siege, you can use S, M or H
-f  is pretty obvious.
-d specify the maximum delay between a simulated user requests.

Maybe the last one is a little tricky to understand, I’ll try to explain it better: siege in order to simulate a real users load randomize the time wait between page requests of a user, so -d indicates the maximum seconds between A user’s “clicks”. 30 seconds is a good value if you think of users behavior on a website, but of course you can set whatever you think is appropriate.

That’s all, feel free to drop a comment if you are having troubles, there’s something wrong with the article or you just want to say thanks.

1 Comment

Leave a comment