Random header image... Refresh for more!

The AWStats Dashboard: Period and Summary

April 21st, 2008 · No Comments

This post is the fifth in a series discussing the effective use of AWStats alongside Google Analytics. If you like what you read, consider subscribing to my full feed RSS.

Let’s dive straight into AWStats. I’m going to assume you have AWStats operational on your web hosting account, and that you know how to access the AWStats dashboard. If not, check out my earlier post which should have all the details you need.

Here’s what the dashboard looks like:

First thing you notice is that the dashboard is divided up into two columns. On the left side, you have an independently-scrollable menu that gives you access to every report AWStats can produce. To be honest I never use this – everything is also available from the right side, where the actual data is, and because there’s context to the links you have a better idea of what it actually means.

By the way, you website these stats are drawn from is shown in the top left corner – for the purposes of this exercise I have blanked it out (client site, and all that).

On the right side you find all the interesting stuff, and it’s organised into grouped windows of data.

The First Window – What Data Am I looking At?

At the very top, you see the Last Update: field. This, as you’d expect, tells you how current the data is that’s being displayed. I took this screenshot at 1.59PM on 15 April, 2008. My local time is AEST, which is UTC + 10. I know my server (it’s located in Canada) runs at UTC - 7. So the data I’m looking at is just under 24 hours old (23 hours 53 minutes, to be exact). If I had waited an hour, it’s likely I would have day of data to look at.

How do I calculate the age? First I figure out the time difference between now, local time, and when the data was updated. This works out to be 40 hours and 53 minutes. Then I subtract the 17 hour time difference, and that leaves me with the real value of 23 hours, 53 minutes.

How do I know what time my server is running on? I could ask my web host manager, but it’s just as easy to have a look at my raw server logs. For details on this have a look at my earlier post about log files, and that information your need is sitting right there, in the Date/ Time field.

Although it’s not shown on the screen grab above, on some installations you will see the text Update now. It’s hyperlinked and when you click it, it forces AWStats to go grab the latest log file, churn through them, and update all the stats you see. Not every implementation has this ability (mine doesn’t), because it’s a configurable item and my web host has decided not to offer this. But you might be lucky.

Does it matter? Nah, not much. If it doesn’t impact your actions, it doesn’t matter. This is a philosophical point I blogged about yesterday. But if your gets lots of traffic, and your boss needs current stats like five minutes ago, it’s a good thing to have.

Next you see the Reported period: field. It always starts off showing the current reporting period (always a month/ year) but you can change that to any period for which log file data exists. You can go back years, if you want, and this can be incredibly valuable. AWStats does pre-process raw log file data into its own (text) format, so it’s very quick to change periods. It’s not like AWStats has to analyse files every time. This is a great feature, but you can only access data in month-size snapshots.

Finally, to the right is the AWStats logo which, like all good logos, will take you to the AWStats product home page. Immediately below is a little row of flags – select one of these if English is not to your linking – my choices are French, German, Italian, Dutch and Spanish.

The Second Window – Summary Data

Next window down gives us the first of many views into actual web site activity. Reported period:, First visit: and Last visit: are self explanatory, although if you saw the reported period as April 2008, and the first visit as (say) April 6, it’s a pretty good indication that traffic levels are not exactly ’satisfactory’ Don’t laugh, it happens.

The data in the following table is organised into two rows:

  • Traffic viewed
  • Traffic not viewed

… and four columns:

  • Unique visitors
  • Number of visits
  • Pages
  • Hits
  • Bandwidth

Traffic Viewed vs Traffic Not Viewed

Traffic viewed is AWStats’ best guess as to the level of eyeball traffic to your web site. One of the data items that’s included in log files is HTTP User Agent, which (among other things) tells us the browser that’s being used to access your web site. If this is one of the recognised browsers, it will get recorded as traffic viewed.

Traffic not viewed is AWStat’s best guess as to the level on non-eyeball traffic. Non-eyeball traffic is mostly generated by search engines, but it could be nasty stuff like viruses, worms, link and email harvesters, and so on. It could also be someone linking to images on your web site, effectively stealing your bandwidth for their own purposes (more in this in a later post).

To make it on this list, first AWStats looks at the HTTP User Agent value and compares it against values for known search engines (there are over 300 of these). Next it looks for a request from each IP address for the robots.txt file, which is a special file located in the root directory of your web site containing search engine directives. If it wants robots.txt, there’s a very good chance it’s a search engine making the request.

Back to traffic viewed. It really is a best guess figure, for a few reasons. First, the only data that AWStats has to go on is log file entries. It could be an individual, tapping on their keyboard , that’s looking at your web site. That is a valid visitor and you want it reported as such. But it could be the visitor has one or more proxy servers between him/ her and your web server (in fact it’s a certainty) and unless the proxy server is set up correctly, you may not ever see that request at your web server and in your stats.

You see, proxies can operate in ‘transparent’ mode, as they should, and the right information is passed through to you web server and recorded correctly, even though your server didn’t have to actually serve the information (it saves you bandwidth). But many proxies operate at levels of ‘anonymity’, and this information is shielded from all upstream servers. In this case you will get erroneous data and there’s nothing you can do about it. The errors creep in because anonymous proxy networks exist that don’t even complete a single page from the one IP address. Some parts of a page are requested by one IP address, others from a second IP address, and so on. You may get inflated visitor counts, or they may be deflated (ie, never counted in the first place).

Stuff like this makes it really tough to get accuracy, but does it really matter? If you’re getting 300 visitors a day, are you going to change anything when one day is up or down by 10%? Didn’t think so. Analytics is good for aggregated data, and minor fluctuations are to be expected.

Anonymous proxy networks are set up specifically to hide end user identities, and to ensure their Internet activity can’t be intercepted somewhere along the track and usage tracked back to them. There are surely plenty of good legitimate reasons to want this level of anonymity, but I can’t think of any right now. But if you’re a privacy freak, anonymous access is available for not much money each month.

To cap this point off, traffic viewed vs traffic not viewed are estimates, not precise measurements. But they are close enough for our purposes and when viewed alongside results from Google Analytics (which uses another data collection method entirely, we can put some tight bounds on the real figure.

Unique Visitors vs Visitors

Again, these figures are estimates. Proxies get in the way, as described above. But we also need to factor in the effect of firewalled networks, where tens or even hundreds of users can share a single IP address through the magic (it’s in every home network router) called network address translation (NAT). Basically, the router keeps track of whose browser made what request, and sorts incoming and outgoing traffic correctly. But from outside the network, all that’s visible is that single IP address. The IP address is all that AWStats has to go on to count individual visitors, so in this case traffic will be understated.

There’s another problem caused by home networks, since many operate with a dynamic IP address. That is, the same external IP address is not kept constant for any length of time. So a returning visitor may have one IP address now, and another the next time they log in. Since AWStats looks at constancy of IP addresses to figure out who is a new visitor and who is a returning visitor (and from this, the number of unique visitors) errors are inevitable.

A unique individual can log in from different locations – home, school, work, and so on. Although they are the same person and you want to record their visits as such, it’s not possible (with any technology) to correctly grab this data.

Finally, it’s standard throughout the analytics world that the time between visits for a unique visitor should not exceed 30 minutes (it’s configurable in AWStats, but this is the figure most people use). What this means is that is someone accesses your web site, then wanders off to make a cup of coffee and doesn’t come back to your web site for 45 minutes, then AWStats will count their access as two unique visits. That’s the way it is!

Nonetheless, visitor data is, for me, the most important data you can get from analytics and there’s surprising consistency between analytics packages like AWStats and Google Analytics. It’s true that AWStats visitor numbers are usually a bit higher than Google’s, but they fall within a predictable band that’s about 20% wide. My best estimate is that the true figure lies about halfway between them.

The metric (visits/visitor) is a really important one that should be monitored regularly. It is a great indicator of the value your web site gives to visitors (subject to the limitations described above). If they like your web site, they will return for more – it’s that simple. To be honest, Google’s cookie-based approach provides a more accurate figure but for a quick indicator it’s a good one.

Pages

This is an accurate figure, but it’s not exactly what you think it is. In AWStats, pages is actually the number of page requests made of your web server. A page can be a few different things, and it may not actually get viewed by the end user (only the end user knows that, and they have no way of telling AWStats).

To AWStats, a page means:

  • A static HTML or XML page
  • A dynamic PHP or CGI page
  • Certain executable file types, like .com

AWStats excludes these from its page count:

  • Cascading style sheets
  • Images

Executable files are included in the page count, because they may generate a displayable page. It’s an assumption the designers of AWStats made. Typically the count for these types of pages is very small (or non-existent) and can be ignored, but if your web site uses files like this it’s good to have them counted.

The (pages/visit) metric immediately under Pages is really useful. If you’ve gone to the trouble of attracting a visitor to your web site you want them to explore not just the page they landed on, but other areas of your site as well. This is a key metric of site stickiness and it’s something you should look at often. If the figure goes up or down markedly after a marketing campaign or a code change, you should figure out why and do more (or less) of it.

Hits

This is one of those stupid measurements that have no meaning in real life, and no business value to impart. A hit is a single request on your web server. First a page is uploaded, and then the visitor’s browser sends out a stream of requests for all the elements that make up the page. Cascading style sheets, JavaScript files, images, externally referenced files, etc, etc. On the screenshot above you’ll see there’s around 35 hits per visit, implying around 15 hits per displayed page.

So what? Am I going to cut out images to get the figure down, put more in to get it higher? The business, the business, it’s all about the business. Not how many objects I can squeeze on a single web page.

Here’s the Stratify theory on hits. Some time, early in the Internet’s history, some insecure web developer wanted to find the biggest number he could, so it looked like he was doing something worthwhile. The biggest number he could find was ‘hits’ and dammit, it sounded really punchy, really tough. Real manager-speak. So down at the Golf Club the manager could say to his envious buddies ‘My new web site took 10,000 hits today, what did yours do?’… and the hit culture was born. Totally brain-dead.

The bottom line on hits. Ignore it, unless you need to impress your clueless boss.

Bandwidth

This is accurate and has some usefulness. Bandwidth as such is much less of a problem than it used to be from a web server perspective, as most hosting plans offer hundreds of gigabytes transfer every month for only a few dollars. But as a web manager it’s important to realise that not everyone has broadband access. In fact, around the world the majority of access is still via dial-up modem, with all the limitations of dropouts, slow speed, and frustration.

If your web site is accessed by people with dial-up access you need to think real hard about how much bandwidth per page your web site serves. At dial-up speeds, your crafted Flash, multimedia audio/ video and hi-res banners will not be welcome.

The Next Post

So there we have it – 2,539 words and only the first two windows covered. The remaining ones in AWStats will need much less explanation, and some can be covered in a line or two. I’ll continue this series over the next few days.

If you would like to learn more about traffic generation,
search engine optimization and web analytics,
subscribe to my full feed RSS. My RSS feed is updated daily.

Tags: Analytics

0 responses so far ↓

  • There are no comments yet...Kick things off by filling out the form below.

Leave a Comment