Tech Update Software Infrastructure
David Berlind's Reality Check
David Berlind
Taxonomy today, ROI tomorrow
By David Berlind
April 11, 2004
Forward inEmailFormat forPrinter

Among the top ten strategic technologies for 2005 that Gartner analysts are advising enterprises to act on as soon as possible are those that address the thorny problem of enterprise taxonomies.

According to Gartner analyst Rita Knox, wrapping data and knowledge in metadata-based frameworks reduces the time needed to retrieve that information, thereby freeing up enterprises' most precious resource -- time -- for allocation to tasks that deliver real value, such as creating, servicing, and selling.

advertisement

Mining legacy data presents three challenges: determining where those bits are stored, what stored them there (for example, e-mail clients and servers and their largely proprietary repositories), and how (format, security, etc) they're stored. The amount of corporate data and knowledge that is resting on PC's hard drives (where it's virtually undiscoverable and inaccessible) and that could be beneficial to other users, accounts for the vast majority of an organization's information assets. The remaining information, which is found on servers, may be more accessible, but is still just as difficult to relate and mine.

So difficult is relating, mining, and even replicating existing information to other repositories that Microsoft is creating a new file system (WinFS) for Windows that promises not only to divorce data from the applications that create it, but also provides a metadata framework for speeding the information's discovery, retrieval, and marriage to other related information. WinFS, which is based on Microsoft's next version of SQL Server (code-named Yukon), is expected to make its appearance in the next version of Windows (code-named Longhorn), which isn't do until 2006 at the earliest. So far, metadata functionality is not a part of the public roadmap for any competing operating systems -- including Linux.


What's the meta, data?
Metadata is widely known as data about data. For example, the title of a document, who created it, when it was created, and the language it is written in are all forms of metadata. Even corporate data, such as the data generated in the course of a transaction, may have some metadata attached to it. A call center, for example, may generate a lot of transactional data relating to incoming calls, but attaching an operator's name (or code) to each of the transactions qualifies as metadata much as an author's name that is attached to a document.

Another form of metadata that could be assigned to document or a transaction is its category or classification. This is where taxonomy comes into play. Taxonomies are generally hierarchical and come in handy for assigning knowledge and data to categories. They involve the selection of the standard vocabulary that represents categories, sub-categories, sub-sub categories, and so on. Taxonomies also are thesauri because they also map synonyms to that terminology. As more data, documents, and knowledge are categorized according to any given organization's taxonomy, rich connections can be made between dissimilar data types that were previously related, but impossible to connect technologically because there wasn't a taxonomical superstructure sitting across or between the various stove-pipes for making those connections.

In the meantime, there's no shortage of vendors with solutions to wrap some or most of an enterprise's data and knowledge, regardless of where it lives or what's creating it in a metadata layer.

Getting going
"For humans, it's obvious how to organize things. But, learning to be explicit so that computers know what it is so we can find it is a tremendous challenge," according to Gartner analyst Rita Knox. The three challenges to programming explicitness into our information technology include understanding the differences between metadata and a taxonomy and what they're each good for; getting executive buy-in for metadata and taxonomy-related projects; and properly anticipating the need for the specialized talent required for such projects.

Getting executive buy-in can be difficult. According to Knox, enterprises have frequently cited complexity and ROI as issues that scare them away from taking on such ambitious projects. Though taxonomies can improve employee productivity (less time searching, more time on the front lines) and ultimately customer satisfaction (especially if they're working directly with external facing systems), quantifying those gains in terms of ROI can be difficult. Knox recommends tying taxonomy projects to very specific business goals such as improving response rate or customer acceptance. One application mentioned by Knox that could be a hot button for CEOs and corporate attorneys has to do with regulation compliance and the ability to mine archives for e-mails that might be relevant to the discovery phase of a particular investigation.




Knox provided an example of how the lack of a good taxonomy hurts the productivity of people looking for information as well as those serving it. Employees unable to find a certain bit of information on the intranet of their human resources department will end up calling the HR department. In turn, the person who takes the call simply provides the URL to the information and, sure enough, it was there on the intranet all along. It was just impossible to find. This is the sort of simple problem that a good taxonomy can address. But the answer requires the organization to understand that very often, the group serving the data has a different view of the data than does the employee who seeks that information.

Developing such understandings of how differently certain constituents may view the same information and knowledge, and then acting on that in a way that produces a successful taxonomy design and ongoing maintenance, also requires someone specializing in library science. Knox's discussion of and why library scientists are so critical to the success of a taxonomy project drew a many nodding heads from the Symposium/ITxpo showgoers that attended the Knox's session.

A lot of taxonomy work already done by others can be downloaded or purchased. For example, within particular vertical industries (healthcare for example), certain enterprises may find that a taxonomy is widely available.

One thing I wanted but didn't get from Knox's presentation at Gartner's Symposium/ITxpo was more of a strategic roadmap that takes into account (1) technologies that, like WinFS, represent sea changes in metadata or taxonomy deployments and (2) relevant standards, like the Resource Description Framework specification from the World Wide Web consortium that could prove to be disruptive to today's conventional wisdom (and the solutions to match).

For now, I recommend doing your own homework on these developments and, before buying into any solutions, asking the providers of those solutions how their product plans map to other longer term trends, developments, and standards.

You can write to me at david.berlind@cnet.com. If you're looking for my commentaries on other IT topics, check the archives.




TECH UPDATE TODAY DAILY:
Dan Farber and David Berlind deliver daily insights on the business and technology news that matters to enterprise IT.


Enterprise Alerts
Surveys
Computers: Desktops & Laptops
IT Management
Security
IT Professionals

Manage My Newsletters





Home News Tech Update White Papers Downloads Reviews & Prices