Ten Taxonomy Myths

The Montague Institute offers ten myths that need to be dispelled before embarking on a taxonomy project. They've got a *really* broad definition of taxonomy (think "classification system") but the myths are still useful to deflate before your client or boss goes taxonomy-happy.

How to create a Controlled Vocabulary

Over at Boxes and Arrows, Karl Fast, Fred Leise, and Mike Steckel deliver a great "how-to" tutorial on creating controlled vocabularies. It's one thing to talk about how great CVs are, it's even better to show how to build them.

Card-based Classification Evaluation method

Donna Maurer shares her technique for evaluating classification schemes over at Boxes and Arrows. Ten minutes from twenty users means that it's pragmatic, and it addresses classification specifically, instead of being part of a prototype with other issues to evaluate. Here's what you need to do this kind of evaluation:

  • A proposed classification system or proposed changes to an existing system. Some uncertainty, mess, and duplication are OK.
  • A large set of scenarios that will cover information-seeking tasks using the classification.
  • A pile of index cards and a marker.
  • Someone to scribe for you.

Looks great - thanks Donna!

Why you need your very own taxonomy.

Tom Smith has a great introductory article on Why You Need Your Own Taxonomy. Useful for explaining taxonomies and facets to management or clients.

David Riecks pointed me to his site on controlled vocabularies. David discusses the benefits of using CVs and offers a lot of examples of heavily-used controlled vocabularies and thesauri. Since David is a photographer, he also has a special interest in image indexing and devotes a special section to image databases and CVs.

What Is A Controlled Vocabulary?

Karl Fast, Fred Leise and Mike Steckel in Boxes and Arrows.

    Finding the right words to communicate the message of your website can be one of the most difficult parts of developing it. Our authors guide you through the concepts behind a well-designed controlled vocabulary and discuss the pros and cons of its development.
All About Facets & Controlled Vocabularies

Karl Fast, Fred Leise and Mike Steckel have started a series of articles on Boxes and Arrows to make facetted classification and controlled vocabularies accessible to practicing IA's without LIS backgrounds. Look forward to it.

The Montague Institute gives us 10 taxonomy myths to dispel, so you can get past the hype and correctly grok how taxonomies will really work for you.

    Taxonomies have recently emerged from the quiet backwaters of biology, book indexing, and library science into the corporate limelight. They are supposed to be the silver bullets that will help users find the needle in the intranet haystack, reduce "friction" in electronic commerce, facilitate scientific research, and promote global collaboration. But before this can happen, practitioners need to dispel the myths and confusion, created in part by the multi-disciplinary nature of the task and the hype surrounding content management technologies.
Way finding/losing in a digital library

I am in a discussion with a programmer about ways to offer navigation using a poly-hierarchical arrangement of nodes. He brought up the concept of directed acyclic graphs (DAG), which is from Mathematics. I learned from the Free Online Dictionary of Computing that the idea is that a directed graph would contain no cycles, i.e. if there is a route from node A to node B then there is no way to cycle or loop back. I can see some applications benefitting from this algorithm, such as in forward citation searching. I think I may not understand the concept entirely, but I am guessing that in an information environment, this means that you'd lose context the deeper you find yourself in a directed path. Or perhaps it simply means you navigate forward to point A from point B and has nothing to do with providing backward movement.

The problem we're experiencing is that we have been dealing with a legacy of organizing by collections/products/services, which is reinforced in our site navigation. Oddly, we don't have problems post-coordinately displaying term combinations in database search results. Rather, in search results we display other terms from the subject taxonomy to narrow results by subject. The problem we have is with the legacy of hierarchical arrangements of access points organized by: collections, services, topics (this uses slices of the subject taxonomy). It's a very library-centric view that we've been dealing with changing, and if you ever worked in a library (corporate, private, special or public) you might know how difficult it is create this type of change.

I've pointed out that the concept of surfacing more facets of index terms would be helpful for browsing. Jim Anderson at Rutgers helped me to buy into this idea while I was in library school, and before I knew much about the web, I advocated this idea in an image index I proposed in 1997. That naive and over-ambitious Filemaker Pro screen shows how I envisioned it. It's funny. Today, I'm wondering how we can support the display of polyhierachical classifications such as our subject taxonomy and other database fields. We have some ideas floating around, but I feel like a toddler trying to topple an elephant.


Some follow-up. We're kicking around the idea of a) showing multiple breadcrumbs, and b) showing local navigation for one of the hierarches where the node exists. With the local navigation, we're going to check where the user came from in order to determine which tree to show. If they came from a bookmark or an email (most of our pages are also lined to from email alerts) we will show nothing, unless the node only has one parent, then we will show that tree. This is the theory. We need to test, but interested in opinions. Have you done something like this in a better way?

One Source taxonomy for company information

One Source has announced that they've started offering for sale their offering their Global Business Taxonomy, a business information classification system. I've used the One Source Business Browser in the past and have been impressed with how they index company information and present company profiles. If you've ever compared Factiva's (nee Dow Jones Interactive) company profiles to One Source you'll know what I mean.

Taxonomy & Content Classification

Mike Lee points to and discusses the Delphi white paper, "Taxonomy & Content Classification" 1.3mb PDF, which is apparently licensed to every vendor mentioned in the paper -- my office mate Dave (the taxonomy guy) has seen three differently branded versions of the paper. It's apparently a good summary of why you should employ a taxonomy in your CMS. Mike says, "sheds some light on the misconceptions on the definition of a taxonomy, describes the benefits of systematic content classifcation, and surveys the currently available technology tools". They apparently also give some kind of seminar, "Proving Ground for Taxonomy & Information Architecture", but when I looked at the

B&N browser: Browsing book facets

I blogged the newish B&N book browser earlier today. Can't remember what I said about it. Mainly that it reminds me of Flamenco and FacetMap, I think. Perhaps I said something about facet classification being surfaced on the UIs of big ecommerce sites or some stuff.

B&N Book Browser: Browsing facets that descibe books

I just looked at Barnes & Noble's Book Browser feature, which offers a way to browse books by subject and type of literature. The browser start page shows headings categorized under the different major sections you might find in the book store -- Fiction, Non-Fiction, Business. Each major section has subsections that closely match what I've seen in B&Stores.

I've read in a few places that people don't think that there have been good implementations employing the concepts of Ranganathan. I don't agree with that. This is an example of how the business world is employing the concept of categories for browsing and refining. Are these facets? In a broad sense of the word, yes. Like the Flamenco interface, the Book Browser allows you to see terms surfaced from several facets and then iteratively select terms or drill down until a string is formed that describes the information you find.

Context-Aware Computing: The Return of Ranganathan?

Peterme, musing on how we'll make sense of information offered in context-aware mobile devices, discusses facet-based description as a solution. I logged some thoughts of my own on his site because he makes sense to me.

    Our devices will know where we are, and be able to augment our experience in that area in various ways--the ability to write and read notes that others have placed; read the history of the spot you're standing on; find out about any activities of interest occurring nearby, etc. etc. ... I can pretty much guarantee that frustration will be the norm if we develop context-aware computing in the same fashion as we have most of our information-rich internet technologies.
I'm glad there's a Peterme RSS feed now. It's aggregated here. Now we just need a blackbeltjones feed.

Controlled Vocabularies in the Trenches

Victor jots down some thoughts about creating controlled vocabularies within the context of the design of a project he's working on. He discusses some real considerations and dependencies related to the development of a controlled vocabulary and implications for systems design. Here's some of my own thoughts/reactions, based on experience.

I've watched the controlled vocabularies of subject headings and company information grow within my organization (a corporate library services org.) over the last four years. The approach we've taken is sort of like a web services model or much like a vendor service, such as those where data aggregators provide indexed content with their own proprietary controlled vocabulary (e.g. Factiva). This seems to me to be a good model because it centralizes semantic tagging and creation of indexing terms in one place, while enterprise use at different levels of granularity. When following this model, you're still confronted with the issues of knowledge representation when developing your terminology, but the system considerations are separated. The design of IR systems using indexes benefit from documenting scope, domain, documentary units, indexable matter, etc. prior to implementation. I have this great unpublished text by Jim Anderson that serves as a framework for such documentation.

Here's a short description of our approach, which has been top down and bottom up. Our people created our CVs starting with close relationships with business units to develop a set of subject headings and a company authority list. They iterated through these lists using the top down approach, informing the list with their subject area expertise. Then they take the bottom-up approach and add/modify terms that reflect subject headings identified while doing the daily work of indexing (knowledge representation). For my org., this is a daily process since a team of indexers sifts through machine filtered data and applies more granular indexing or alters machine-applied terms. As the telecom landscape changes or as our indexing needs require, terms are added to the vocab's. We have one person who manages/develops them, and a few additonal subject area experts who work on development of new terms in new subject areas. User feedback informs changes along the way. The controlled vocabularies are offered up for use by disparate systems within our company to represent that corpus of indexed data, or slices of it, as desired.

As an IA, I generally work with our taxonomy specialists to create page inventories -- sort of like microscopic content inventories on steroids -- that specify combinations of index terms used to build content modules. As an example, I show a small piece of one of these inventories on my old and dated portfolio. This use of the term content inventory is not typical in our field, I know. What this really is, is a design document showing such things as rubrics of content modules with their associated labels, and database searches that use terms from a controlled vocabulary. Maybe I should present something on this process some day. It's really a hybrid IA and technical document, but it's a format my entire team uses on all data-dense sections of our site.

Incidentally, the taxonomy guys I'm talking about are presenting on this topic at an ARK seminar in NYC in November in case you're interested. They're really smart. Hopefully they will get to network a bit at this thing, because everyone in our group could get pink slips if the cost-cutting winds decide to blow in our direction.

Practical Taxonomies: Stop Searching - Start Finding

My office mates, Dave Goessling and Raphael Lasar, are giving the presentation, "Creating and implementing an effective taxonomy" at ARK Group's taxonomy seminar at Le Parker Meridien in New York, NY on 18-20 November 2002. A PDF for the "Practical Taxonomies" seminar is available for the rest of the program from ARK's conferences page. Other speakers include Amy Warner and knowledge managers from various financial institutions, government agencies, and other large corporations.


Is it me, or does anyone else find it interesting that everyone's so interested in Ranganathan lately. Seen in the news aggregator in the last few weeks:

  • Ranganathan for IA's -- Facet analysis is the term that everyone's dying to use. But, the basic idea of facets can be groked in about 2 minutes.
  • Peter V pointed to Fred Liese's article, "Using Faceted Classification to Assist Indexing", which is one of the best introductions to facet analysis and its practical approach in indexing that I've seen next to Louise Spiteri's articles. Liese compares enumerative to facet-based classification, defines facets in simple terms and provides very practical tips for developing your facets and using them in indexing.
  • Prolegomena to Library Classification -- It's more interesting that people are reading this sort of material. I wasn't surprised to find that Peter was reading it. He appears to be making his way through a lot of classification literature. I wonder how people might apply what they learn from examining Ranganathan's ideas around colon classification. I think the general idea here has to do with the flexibility of classification using his system rather than using a rigid system like the Dewey Decimal system. Also I think the concepts behind his system can better be used for post-coordination of classes.
A Taxonomy Primer

I came across Amy Warner's article "A Taxonomy Primer" on her consulting site. Should be a helpful primer for people being introduced to the concepts associated with using thesauri.


Catalogablog is David Bigwood's weblog. I presume he's a cataloger since he's talking about MARC fields. He's also discusses metadata more generally for you non LIS types.

Faceted metadata authoring tool

XFMLManager is a free authoring tool for hierarchical, faceted metadata. It is not yet available. We will also host the upcoming Hierarchical Faceted Metadata Authoring Experiment.

