Random header image... Refresh for more!

So What Is A Log File, Anyway?

April 15th, 2008 · 2 Comments

This post is the second in a series discussing the effective use of AWStats alongside Google Analytics. If you like what you read, consider subscribing to my full feed RSS.

Evan, the first commenter on my previous post, gave me a virtual kick in the butt for getting too technical. So Evan, I hope this post helps you understand a little better what a log file is, and what it can be used for.

First, Some Definitions

According to Wikipedia, a web server log file (I’ve left the links intact):

‘… maintains a history of page requests. The W3C maintains a standard format[1] for web server log files, but other proprietary formats exist. More recent entries are typically appended to the end of the file. Information about the request, including client IP address, request date/time, page requested, HTTP code, bytes served, user agent, and referer are typically added. These data can be combined into a single file, or separated into distinct logs, such as an access log, error log, or referer log. However, server logs typically do not collect user-specific information.’

What does that all mean? A web server log file is a file that records data about the information requested (by the user) and sent (by the server). It is a simple text file, which means it can be opened and read in something like Microsoft NotePad. The data is organised in rows, so each line of the file corresponds to one request. Each piece of data is separated by a space, and text data is surrounded by quotes. Each new request just adds another line to the bottom of the file.

Wikipedia goes on to say:

‘A statistical analysis of the server log may be used to examine traffic patterns by time of day, day of week, referrer, or user agent. Efficient web site administration, adequate hosting resources and the fine tuning of sales efforts can be aided by analysis of the web server logs. Marketing departments of any organization that owns a website should be trained to understand these powerful tools.’

AWStats is a piece of software that does exactly this – statistical analysis of web server log files. Log files can be thousands of lines long, every day, so it’s simply not practical to open a log file in (say) NotePad and expect to get any usable information from it in its standard raw form.

What Does A Log File Look Like?

I went into my own log file a few minutes ago, and here are the first two entries for Evan’s access of the previous post (changed his IP address):

199.199.99.99 - - [15/Apr/2008:00:55:52 -0700] “GET /analytics/google-analytics-awstats-work-really-well-together/ HTTP/1.1″ 200 25762 “-” “Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13″

199.199.99.99 - - [15/Apr/2008:00:55:52 -0700] “GET /wp-content/themes/Cutline11/style.css HTTP/1.1″ 200 12021 “http://stratify.com.au/analytics/google-analytics-awstats-work-really-well-together/” “Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13″

Could you imagine wading through 50,000 lines like these, trying to get insight into your web site? Don’t think so. That’s why apps like AWStats are so useful.

What Does It Mean?

As you no doubt figured out already, the data on each line is predefined to mean certain things. So I’m going to go through the second line, one item at a time, and explain it. Remember each piece of data is separated by a space, and text data has quotes.

  1. IP Address (199.199.99.99) – This is the IP address of Evan’s PC. If Evan has a broadband router between his PC and the Internet (like most of us), this is actually the wide area network (WAN) IP address of his router, not the local area network (LAN) IP address.
  2. Ident (-) – This piece of data isn’t logged by my web server, so its ‘no value’ is shown in the log file as a simple dash. In fact, this is a throwback to the early days of the Internet and I’ve never seen it used.
  3. UserID (-) – if my web site required Evan to login to access my content, his login username would be shown here. But my web site is not protected, it’s there for you all, so again it’s a ‘no value’ dash.
  4. Date Time (15/Apr/2008:00:55:52 -0700) – This records the date and time of the request – no surprises here. It also shows that my web server time is UTC -0700, or 7 hours before GMT.
  5. Requested File (GET /wp-content/themes/Cutline11/style.css HTTP/1.1) – Good information here. The GET directive simply means that the request is for information from the web server to Evan’s browser. Next is the actual file requested, in this case it’s my cascading style sheet (CSS). The final piece tells me that the request is for hypertext transfer protocol (HTTP) data, according to HTTP standard 1.1.
  6. Status Field (200) – This is a three digit code that indicates the success (or otherwise) of the request. A code of 200 is good – it means the request was filled successfully, with no errors.
  7. File Size (12021) – This is simply the number of bytes sent by the web server to Evan’s browser.
  8. Referrer (http://stratify.com.au/analytics/google-analytics-awstats-work-really-well-together/) – This is the page that started the request in the first place. To understand this, have a look at the first log file line I’ve shown above. In this line, the Referrer field is a dash, or ‘no value’. That’s because Evan found my post on a page that doesn’t pass this information, most likely a search engine or a Yahoo! Groups email I sent out. So what happens is that Evan clicks on a link to my post, which is loaded into his browser. Evan’s browser then goes through the web page code, one line at a time, and either shows what’s there on his PC screen or scoots off to retrieve any files that are needed. In this case the first file requested was my CSS file which governs how my web page should render, so that’s a necessary file.
  9. User Agent (Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13) – This final piece of data tells my web server about the software running on Evan’s PC. It’s a Windows XP machine, and he’s using the latest version of the Firefox browser to check my page out. This is good information to have because all browsers have quirks in the way they render a web page, and this information lets canny webmasters serve up slightly different code to different browsers (my WordPress theme, Cutline, actually does this for Internet Explorer 6 and 7).

So Evan, that’s a short introduction to the fascinating world of web server log files. However, the only two takeaway points of any importance are:

  • Lots of data about every request gets recorded, and
  • Apps like AWStats analyse it into useful information for webmasters.

What do webmasters actually do with that information? That’s a series of posts for another day.

If you would like to learn more about traffic generation,
search engine optimization and web analytics,
subscribe to my full feed RSS. My RSS feed is updated daily.

Tags: Analytics

2 responses so far ↓

  • 1 AmyL // Apr 21, 2008 at 12:27 pm

    Ohhh, very good! I’m learning quite a bit here.

  • 2 The AWStats Dashboard - Period and Summary | Stratify Pty Ltd // Apr 21, 2008 at 8:24 pm

    […] it’s just as easy to have a look at my raw server logs. For details on this have a look at my earlier post about log files, and that information your need is sitting right there, in the Date/ Time […]

Leave a Comment