Over the last few weeks I’ve been looking at the discrepancies in various site statistics reported by (i) Google Analytics, and (ii) AWStats. Many forums report that Google Analytics shows 30% of the visitor numbers and page views of AWStats, and the figures on my sites are more or less in line with this.
So the question is… which is correct?
The answer, as far as I can see is that neither web analytics approach wins. They both have their advantages and disadvantages, and both provide valuable information for anyone interested in understanding how users interact with their website. They both have a place.
On a side note, Webalizer is sometimes touted as another log analyzer solution. But it seems development on Webalizer stopped in 2002, so I’m leaving that one alone. AWStats, on the other hand, is at release 6.8 which happened in November 2007. Best of all AWStats is either installed by default with your web hosting account, or can be installed (usually with a little technical help) manually under the free GNU license.
Page Tagging
Google Analytics is typical of page tagging analytics. It requires your web designer to place a small piece of Javascript on every page you want analyzed on your website. Additionally, you need to set up an account with Google and verify that data is being received. None of this is hard, in fact it’s ridiculously easy, but the fact remains that you need somehow to make changes to every page on your website for it to work.
If you use an HTML editor like Dreamweaver and you’ve used templates, it’s as easy as inserting the code in your page template and uploading the new pages. If you use Wordpress or some similar CMS system, you just place the code in the appropriate PHP file and you’re done.
One other thing - your reader’s browser should be configured to accept cookies. GA will still work if it doesn’t, but the data received is a little less accurate.
Logfile Analyzers
AWStats is a typical log analyzer. It’s a standalone application that looks at your web server logfiles to process, collate and aggregate the data they hold. Web servers generate logfiles, and they are processed at regular intervals (12 midnight, usually) so that data is available to AWStats users on demand.
There’s no changes needed to your web pages for logfile analyzers to work. And cookies are not required.
Advantages and Disadvantages
Google Analytics is steamrolling all other analytics offerings and, if it isn’t already, is destined to become the gorilla app in the field. However, there are a few very good reasons not to rely 100% on GA (or on any page tagging technology). Here they are:
#1 - User Count Accuracy
Because GA requires a user’s browser to run Javascript, there’s a very small minority of users (the ones who turn Javascript off) who won’t be seen by GA. It’s also good programming practice to put the Javascript code right at the every end of the web page, just before the </body> tag; in fact Google recommends it. If the page doesn’t load completely for whatever reason, the visit won’t get counted.
AWStats also has accuracy problems, but for a different reason. Web server logfiles don’t intrinsically differentiate between human and non-human visitors. No-human visitors are, for example, search engine spiders, content crawlers, email address leechers, and various worms looking for ways into your site. AWStats uses a list of known search engine spiders (403 of them, to be exact) to separate search engine activity (it also uses hits to robots.txt, which is a good reason to have one if your don’t already), but the more nefarious crawlers go unnoticed and counted as human visitors.
The real figure is going to lie somewhere between those reported by GA and AWStats. Sorry… that’s as good as it gets for free.
While I’m on this topic, there’s a discussion that will be covered in a later post about measuring ‘unique’ visitors. In this, GA wins hands down. When I write this post I’ll place a link here.
#2 - Page Count Accuracy
If you’ve tagged every single page on your website, the figure reported by GA is going to be reasonably accurate. Once again, this is subject to the limits of having your tracking code placed at the end of every page. The big benefit of GA is that your web designer gets to decide what exactly a page is. There’s no confusion about ‘hits’ (a discredited and useless measure that still gets referred to as if it means something), and a measured page view means that the page has fully loaded and is displayed in the user’s browser.
AWStats is also pretty good, in fact it gives a little more information. Web server logs can differentiate between fully loaded pages (return code 200) and partially loaded pages (return code 206), and AWStats does process these as you’d expect. The extra information you get access to is what pages didn’t load fully - this is good information that flags problems with your website, or pages that load too slowly for humans. Either way it’s something easily identified that can be fixed, and will improve your reader’s web experience.
Both GA and AWStats suffer from a perceived problem caused by web page caching. This is talked about in forums but I don’t think it’s a major issue. Web cache control protocol (WCCP) directives are pretty well supported and understood these days, and correct web server configuration bypasses the issue.
#3 - Load Speed
For GA to record a visit, it needs the JavaScript to execute and data needs to be processed on an external site (Google’s servers).
This takes time. If Google’s servers are slow, it can take a lot of time. I personally haven’t experienced this, but there are reports on the web about slowdowns attributable to Google’s remote code execution.
On the other hand, AWStats does its thing offline, once a day. The web server writes logfiles as part of its normal operations so they don’t cause any speed issue at all.
I really think this is a minor point, but it’s worth mentioning.
#4 - Search Engine Activity
Search engine spiders don’t currently execute Javascript, so visits by search engines are not recorded by GA. If you want to know what spiders have visited, and when and how often, the only way to get this information is via a log analyzer like AWStats.
This is, to me, the biggest reason to keep using AWStats. This is crucial information, and 100% relevant to anyone who’s doing any form of search engine optimization.
#5 - Bandwidth Theft
AWStats provides statistics on both page and non-page elements served by the web server. This makes is simple to isolate images or files that are linked to by other websites and served to the user when their page is loaded. Essentially, they steal your bandwidth.
When this happens, you simply block the offending website. Problem solved.
GA doesn’t allow you to identify when this happens (you operate in blissful ignorance).
Conclusion
AWStats offers capabilities that Google Analytics, by its nature, can’t match. On the other hand, GA has heaps of features (that I haven’t covered here) that virtually demand its use for any website operator that cares about improving their readers’ web experience.
I think there’s a strong case to use them both. They are both free to use, and together they give you a very complete and reasonably accurate snapshot of how your website meets the needs of your readers.

3 responses so far ↓
1 Google Analytics & AWStats Work Really Well Together | Stratify Pty Ltd // Apr 15, 2008 at 4:20 pm
[…] long-time readers know, I see a lot of value in log file analyser stats packages like AWStats. Google Analytics gets all the attention these days; even so, there’s information buried in […]
2 Vulpine Mobile // Apr 17, 2008 at 1:22 pm
Oranges are not the only fruit. That said, people seem to neglect the growing community of mobile phones. Even though most modern phones now have Javscsript (ECMAScript) it’s probably not nice to use up someones paid bandwidth just for the reason of bean counting. Thus, AWStats wins hands down compared to GA in this area.
3 Mark // Apr 17, 2008 at 1:29 pm
Vulpine, agreed. And GA does introduce a small but noticeable delay on page load completion, which most people don’t pick up on.
I’ve only had one customer that cared about web sites on mobile phones… but that doesn’t detract from your argument. Since mobile phone bandwidth is the most expensive internet access on the planet, regardless of supplier, it does indeed make a whole lot of sense to keep the beancounting off the client.
Leave a Comment