Tech Update
Web traffic analysis could save your site
By Kevin Ferguson
August 22, 2001

Forward in Email Format for Printer

Craig Macfarlane, chief technology officer of StudentAdvantage.com, smiles incredulously as he studies the numbers: 4.7 million unique users logged onto his company's FansOnly.com Web site in December. No, wait--make that 1.75 million. Or rather, 1.13 million.

The first of the three numbers from which Marfarlane has to choose was generated using a homegrown Web traffic-analysis program; the second was offered by PC Data Online; and the third was provided by Media Metrix. At least two of

advertisement
them are inaccurate. But most likely, all three are off the mark. "It's scientific, but it's messy," says Macfarlane.

It's also a hassle. "If we are undercounted and don't appear in the top-tier sites, we don't get to meet with agencies or significant advertisers," he says. "This certainly does affect our ability to sell advertising." So the Web site has to scrutinize the three numbers to figure out which ones are closest to reality.

Despite considerable improvements to the methodologies employed by Web traffic-analysis tools over the past two years, it seems that deriving meaningful results with them is still often two parts art and one part science. Regardless, the results they produce are indispensable.

Editor's Note:
At press time, PC Data Online had ceased doing business. Certain assets of the company had been acquired by comScore Networks, which is honoring the contracts of former PC Data Online customers.

This story originally appeared in CNET Enterprise on 4/9/01.

Despite the lack of certainty in Web traffic numbers, analyzing your traffic can be invaluable. For example, do you want to know why your customers abandon their virtual shopping carts before hitting the checkout line? Look at the page-navigation statistics. Can't understand why visitors never make it to the fourth page of your online catalog? See how much time they're forced to spend on the first three pages. Sensing a dramatic shift in the demographics of your visitors? Analysis tools can show you which Web sites they visited before coming to you.

Broadly speaking, you have three ways to go about measuring and analyzing Web traffic. You can install traffic-analysis software, such as that sold by NetGenesis, WebTrends, and Accrue Software, on your own servers (regardless of whether you host your own site or use an ISP for that purpose). You can outsource this task to a service provider, such as WebSideStory, that specializes in traffic reporting. Or you can subscribe to an independent tracking service, such as Nielsen/NetRatings, or Media Metrix.

Most recently, traffic-analysis vendors have been offering a mishmash of products to attract a broader audience. For example, in March 2001, WebSideStory began selling its first packaged software, an analytical tool called HitBox DataWise, while software developer WebTrends now offers a hosted service called WebTrends Live.

Each product has its pros and cons. WebSideStory's HitBox Enterprise, for instance, is very customizable and good at tracking larger historical Web trends, but requires labor-intensive steps in tracking traffic minutia, such as a specific page's ebb and flow of visitors over several weeks. (Such trends are easier to track using WebSideStory's recent release, DataWise.) And PC Data Online captures useful details about Web usage--for example, repeated visits to the same site--but it generates such details by collecting data from only a sample of consumer-oriented Web users. While the consumer sample is vast--120,000 and growing by 3,000 a month--specialized sites such as FansOnly.com's Notre Dame souvenir shop are likely underrepresented.

A bigger technical watershed separating traffic-reporting tools, though, may be their ability to track cached pages. When PC users request Web pages from Internet service providers, they are often viewing pages that were cached in the ISP's data centers once and served to multiple users. The result: some cached pages are never counted and can throw off Web-traffic reports. How many cached pages are missed? It's difficult to say. Critics of traffic-analysis tools put the number as high as 10 percent for some sites, but the vendors of those tools say the number is negligible.

Web traffic analysis tools
Web traffic measurement services
-- ComScore Networks NetScore
-- HitBox Enterprise 6.3
-- HitBox DataWise
-- Media Metrix
-- Nielsen/NetRatings
Web traffic analysis software
-- Accrue Insight 5.0
-- NetGenesis 5.0
-- WebTrends Live Enterprise Edition
-- WebTrends Enterprise Reporting Server
The Web traffic tools that are most susceptible to missing cached pages, critics argue, are those that use data-collection methods known as log-file analysis and network packet sniffing, such as WebTrends and NetGenesis. Network packet sniffers, which usually reside on standalone servers between the Web server and the firewall, scan Web server data packets that stream past, copy them, and then forward them to a database. Log-file analysis records the requests made from Web, proxy, and other Internet servers--noting such things as the visitor's IP address and the time it took to process the request--and sends them to a database for subsequent number crunching.

Proponents of log-file analysis insist that tracking cached pages is not the problem it was with the Web design tools available just a few years ago. "I understand why advertisers are concerned about this, but it's a bit of an urban legend now," says Kevin Epstein, director of product management in Inktomi's networks products division. "All you need to do is put a noncacheable object in your page, like a piece of text." That way, even if graphics and banner ads are served from cache memory, the pages will still be tracked.

But even vendors that consistently count cached pages aren't always on the same page. WebSideStory and PC Data Online, for example, do capture traffic routed through caching servers, but their services still report different traffic numbers for the same Web pages during the same time. Why? Again, different methods. WebSideStory uses a technique known as page tagging by which the company's clients place a few lines of code at the bottom of each Web page they want tracked. Each time that page is requested, whether or not the page has been cached, WebSideStory is notified.

PC Data Online, on the other hand, doesn't code each page, but captures the URL requested by placing tracking software on survey participants' hard drives. (PC Data's tracking software, @PC Data, starts tracking Internet usage as soon as users open up their browsers. @PC Data collects and temporarily stores a log of participants' Web activities for 15 minutes. The data is then sent in real time in an encrypted message to PC Data.) Caching, therefore, is not an issue for either.

The best bet is to use a combination of third-party auditing tools, such as those offered by Media Metrix or PC Data, and analysis tools from NetGenesis, WebTrends, and the like. The auditing tools will help you compare your site to others in your market segment. The analysis tools will give you more specifics on your site.What are the characteristics of the best traffic-analysis tools? Enterprise users suggest you consider these five points:

Scalability. Pick software that handles quickly expanding sites; busy Web sites can generate gigabytes of traffic reports each day. "Most vendors' software can't handle the volume," says Dan Vesset, a senior analyst at IDC. "That's the biggest reason why businesses change software vendors." Case in point: ABC Distribution switched from WebTrends to WebSideStory 15 months ago because WebTrends couldn't handle the 6GB of data generated each day by the online gift catalog's more than 100,000 visitors. WebTrends has since released software designed for high-traffic sites.

Available reports. There are hundreds of reports that measure different types of activities to choose from. Some show the amount of time a user spends on each Web page; others show the paths users take to navigate your site; and still others note how much time a user spends with offline applications, such as Microsoft Word, before returning to the Web. You won't need 80 percent of the available reports, but the ones you pick can be crucial. It all depends on your business and the context in which the numbers are read. For example, a report that shows that users spend an average of 20 minutes per visit sounds wonderful--unless you also look at the paths they take in navigating your site. You might find that they spend so much time not because they love your site, but because they keep getting lost.

Customization. Static monthly reports can take you only so far. Consider those that let you customize online reports and easily integrate data into other applications, such as eCRM programs.

Price. Dust off your wallet. Typical of software in its class, NetGenesis 5.0 will cost enterprises about $160,000, which includes approximately $60,000 for NetGenesis consultants to spend six weeks analyzing your business and deploying the product. WebTrends Enterprise Reporting Server, a browser-based program that exemplifies the middle tier of Web-traffic products, starts at $4,100 for one server. The average customer eventually spends about $30,000, says the company. Hosted solutions, such as those from WebSideStory, will vary in cost, depending on the volume of traffic analyzed. But expect to pay $2,000 to $5,000 per month.

Platform. This isn't the headache it was 18 months ago. Previously, some tools were available only for Windows NT, requiring Herculean efforts by larger Web site hosts to port data over to Unix-based servers. Now, in their efforts to attract larger enterprises, vendors have released Unix-compatible applications. Not many Linux tools are yet available, however. Web server support is often not an issue either. Most traffic tools now support the usual suspects, including Apache, Microsoft IIS, and Netscape Enterprise.

Glossary of Web-analysis terms
Crawlers: Also called spiders or bots (short for robots), these programs automatically visit Web sites, read pages, and collect information. Used often in search engines, crawlers can artificially inflate the number of page visits for a particular site up to 30 percent. The better traffic-analysis tools filter such visits out when creating traffic reports.
Page Views: The number of times a Web page is opened, typically measured per person. Page-view statistics often do not include the specifics for frames within those pages. Also, the page-view count generally does not distinguish between unique and repeat visitors.
Paths: The navigation routes visitors take on a site--a particularly useful measurement of how difficult a site is to maneuver and the popularity of specific pages.
Reach: The portion, usually given as a percentage, of a target audience (e.g., 18- to 34-year-old males, or small businesses) that has opened a particular page or site.
Referrers: URLs denoting the portals or Web sites through which another site is reached.
Retention: The measurement of unique users who return to the same site or page over a given time. PC Data Online, for example, measures it as "the percentage of a site's traffic during the previous month that also came back during the current month."
Unique Users: Individuals, often identified through the use of cookies, IP addresses, or passwords, who visit a site. Compare with visitors, below.
Visitors: Number of persons who visit a site. An individual who visits a site three times in one day is typically counted as three visitors.






TECH UPDATE TODAY DAILY:
Dan Farber and David Berlind deliver daily insights on the business and technology news that matters to enterprise IT.


Enterprise Alerts
Surveys
Computers: Desktops & Laptops
IT Management
Security
IT Professionals

Manage My Newsletters





Home News Tech Update White Papers Downloads Reviews & Prices