What is proxy traffic ?

When dealing with website traffic, sooner or later you will run into the concept of proxy traffic.
Proxy traffic, is as the name suggest, web traffic that has been proxied through one or more servers.

Why would a surfer use a proxy server in the first place?
There are many reasons for using a proxy server while browsing, some a perfectly good reasons while other are more devious, below we’ll go over some of them

Initially caching proxy servers was a great way to optimize the web experience for surfers on slow dial-up and similar connections.
By using a caching proxy, frequently visited websites would have most of the resources like graphics, javascript and stylesheets cached, so the proxy server would return the cached result instead of having to fetch it from the webserver over the slow internet connection.
If the resource was not cached, or the webserver requested that the content should not be cached, the proxy server would fetch the content on the behalf of the server, and once fetched return it to the surfer and cache it in case someone else wanted the same resource.

With faster connections, this is less common, but organizations and offices might still force their users to browse via their proxy server in order to monitor website activity or block certain websites, like social networks etc that are not relevant for the users.

In never times, we see a new use for proxy servers, where companies wants to optimize the mobile experience. Opera Mini implemented a proxy to help make websites more mobile friendly and optimize the actual data that needed to be passed to the device over slower mobile connections.

Google has implemented a similar feature in Chrome for Android called “Data Saver
With this feature enabled, all your (non-ssl) traffic will go through googles servers and they will optimize the data before your browser receives them.

Why do I need to think about proxy traffic?
As the proxy server requests the webpage on behalf of the surfer, the owner of the webserver will receive the request from the proxy server and not the surfer. Checking the raw access logs, the IP logged is therefor the IP of the proxy server.

If you use a system on your website to detect if credentials are shared, you might end up flagging proxy users, since multiple users can use the same proxy and therefor have the same IP address.

If you are using a GeoIP system like Maxmind.com to find the location of the user based on the IP address, you might end up with the location of the proxy server and not the actual surfer.

Regular proxy servers, will add a header so the server can tell it’s a proxy request. The header field and format can vary, but the general standard is to use X-Forwarded-For
X-Forwarded-For: client, proxy1, proxy2

So to sum up, you might think most of your traffic is from Tier 1 countries, yet find that your WebClicks stats show mostly countries such as Brazil, India & China. This is because WebClicks software look at the source of the surfer and your own website stats might not.

Blackhats
There are however special proxy servers that will completely hide the fact that they are proxying requests, these proxies are known as “elite proxies”.
These proxies are mostly used when you want to scrape a site that has throttling, or wants to create a high number of users on a website, without tying them together by having the same IP.

Don’t hesitate to get in touch with us, if you are in doubt about any traffic sources origins. We are always happy to pull the log data for you.