Indexes v. full text searches

October 4, 2008

Comment from an email discussion list member

In my opinion, indexing is pointless. Searching is much more important than indexing.

I think you should provide the simplest automatic indexing, based on topic headings, and rely on your users to use full-text search to actually find anything.

Full-text search is easy and it works perfectly. Indexing, on the other hand, depends on the skill and commitment of the indexer. For myself, I feel that my life is too short to spend much of it on indexing.

My response

Yes, full text search (FTS) has its place (and Google et al have a lot to answer for in this regard).

BUT FTS brings with it a lot of garbage as it find any matching words without any reference to their context. You can narrow the results list by judicious use of Boolean operands and syntax such as quote marks, parentheses and the like—but most users don’t know about them or how to use them. However, this still doesn’t put the words into context. In many ways FTS is like a concordance… ‘let’s list every word we find in the document no matter how important it is’. Concordances treat every word equally.

Indexes on the other hand—at least, well-constructed indexes—serve a different purpose in focusing the user’s attention on the words most likely to have relevance.

Well-constructed indexes are done by humans who can detect nuances and meaning of words—in the context of their usage. Indexes created by software (of any kind) just don’t have this sophistication at the moment. A good indexer can create “see” references from unused terms to used terms, and “see also” references to similar terms. Complex indexes also refer users to broader and narrower terms.

Others have mentioned synonyms and they are one of the prime reasons why human indexers are much better at this than software. A FTS can’t distinguish between words such as “editing”, “amending”, “changing”, altering”, “modifying” (and their variations)—yet they could all be used in the document. A human indexer can set one term as the preferred term and refer all other uses to that term, thus offering the user the FULL range of topics that cover anything to do with changing something.

Others have also mentioned that print output (NOT screen versions of print, like PDF) doesn’t have FTS, and a good TOC and Index is the way users access the information. I believe there is a place for both… and I would be very sad to see the day when indexes are relegated to the trash.

One comment

  1. > Well-constructed indexes are done by humans who can detect nuances and meaning of words—in the context of their usage.

    But Rhonda, you forget something important. Software, including search engines, are written by people. It is up to program designers to create sensible human interfaces that is can work effectively. There will come a time, in the not too distant future, when FTS will be as powerful as indexing. It has come a long way already. Google search is able to suggest an alternative to typed phrases that appears to take into consideration of context. It is probably not very advanced, but I am sure it is evolving and improving. Software should able to do almost anything humans can do and that is what is so special about software.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: