Random header image... Refresh for more!

Ultraseek and Documents

July 28th, 2006 · No Comments

This week I had a company ask me what was meant by a “single document” in Ultraseek’s index. Ultraseek is licensed by document count tiers (starting at 5,000 documents) and this particular company was worried that the license they wanted wouldn’t handle their situation.

So here’s the answer:

The basic answer, and the one that fits most situations, is that an Ultraseek document corresponds to a single URL (when web sites are being indexed), or to a single file (when we’re talking about spidering a file directory). So this post is one document. A Micorsoft Word file is one document. An Adobe PDF file is one document (and here’s a surprise - some site search engines break a multi-page PDF file into single pages, and each page is a single document - the Ultraseek Admin Guide would therefore be one document in Ultraseek or 296 documents in another search engine).

When a web page uses frames or <iframe /> tags, obviously the content of that page comes from more than one URL. In this case, each separate URL is counted as a document.

Where pages are dynamically generated, such as .php, .jsp or .asp pages, an Ultraseek document is whatever is generated by a unique URL.

When pages have identical content but different URLs or filenames, they are seen as duplicates and only counted once. So dupes don’t add to Ultraseek’s document count. Ultraseek’s deduping capabilities are effective and can be tuned if necessary.

Hopefully this clears it all up.

Tags: Site search