In search of better search results
By Dan Farber, Tech Update
May 10, 2004

Moore's Law posits that transistor density on integrated circuits doubles every couple of years. Similarly, there is a doubling of data every nine months and continued exponential growth in the size of the worldwide Web. Fortunately, Moore's Law and assumed, unnamed corollaries to it keep storage on a track to handle the increasing load at consistently improving speeds and costs. On the other hand, finding what you want on the Web or in data warehouses is not improving rapidly, despite the increasing speed at which terabytes and even petabytes can be scanned for relevant results.

Clearly, the rate of improvement in delivering high quality search results isn't keeping up with Moore's Law in terms of doubling every couple of years. In fact, the "law of search results" could be expressed as an inverse to the growth in the size and complexity of the data.

The Nielsen Norman Group published its Web Usability 2004 survey results last week, citing search as a one of the major stumbling blocks to successful use of the Web. A search engine is the first action taken in 88 percent of the Web sessions, according to the study, with users visiting an average of 3.2 sites per session (other than search engines).

In terms of search success, all types of users (both the casual and more experienced users surveyed) were satisfied only 42 percent of the time with their search results. More experienced users were satisfied 50 percent of the time, which equates to a failing grade.

The worst search experiences are in using internal site searches rather than the mega search engines. Usability guru and Nielsen Norman Group principal Jakob Nielsen characterized internal, intranet search implementations as "beneath contempt." Many users avoid corporate sanctioned, internal search engines, and use Web search engines to find information outside the firewall related to their company.

Part of the problem is that search is inherently an input/output mechanism. About 60 percent of users surveyed typed in a single word to initiate a search, and another 20 percent used two words. Advanced search capabilities were accessed by 1 percent of those surveyed and only 3 percent used quote marks or other query syntax to refine searches. In addition, the survey showed that the first link in a search result page got 51 percent of total clicks and the second link, 16 percent. The same could be said of search corporate databases, leading to another law of search: Don't expect users to apply more than the basic tools and techniques to acquire information from a search engine.

Nielsen recommends some basic strategies to modify user behavior, such as an easily visible search box that is at least 27 characters wide (which encourages multi-word queries), spell checking, and manual tweaking of query terms.

"You can add editorial judgment to search engines by taking the top 1,000 query terms and deciding what are the top places to go for those words or phrases," Nielsen said. "You can also look at search logs to see the most common entries and what vocabulary is used to search for those entries, and then add synonyms to tweak the search engine.

In addition, applying the appropriate metadata to content, such as page titles, headlines and summary text, will improve search results, Nielsen said. These steps precipitate another law of searching: Automation alone will not yield satisfactory search results.

In addition, corporate data exists in both structured and unstructured forms (e.g., e-mail, Office documents, Web pages, audio files) and lives in silos that aren't well integrated for searching. Another law of search: You can't search on what you what you can't get at.

Given the results of the Nielson Norman Group's research and the patterns of usage among those surveyed, search engines have to get a lot smarter and more focused and contextual to deliver satisfactory results to a majority of users. The vast majority of people aren't going to mess with advanced search techniques to overcome some of the current limitations of search. At a minimum, most companies need to overhaul their search, invest in taxonomies and metadata, and employ a professional search engine that has a team constantly tuning and improving the engine.

Companies such as Autonomy FAST, Google, Northern Light, Verity and Vivisimo offer enterprise search engines that have various technical approaches. Google has its PageRank> and text-matching techniques. Autonomy, for example, applies concept matching, Bayesian inference technology.

Specialized search engines that focus on a specific domain, such as GlobalSpec for engineering and technical information, can deliver better results than the brute force engines. IBM is working on WebFountain, a search engine that runs thousand of programs continuously to index and metatag content and applies natural language analysis to provide contextual reference. Web Fountain is designed to deal with sophisticated queries, such as tracking the reputation of a company or product.

Don't expect major breaththroughs in search any time soon, however. According to Gary Flake, principal scientist at Yahoo Research Labs, "search engines today are like the 8-track tapes of the music industry." This leads to a final, for now, generalized law of search results, courtesy of William Shakespeare: "...you shall seek all day ere you find them, and when you have them, they are not worth the search."

You can write to me at dan.farber@cnet.com. If you're looking for my commentaries on other IT topics, check the archives.