h1

Word: Concordance/word list creators

April 6, 2010

Microsoft Word is a pretty powerful tool, but for some reason you cannot get a list of all the words that have been used in a document. You can get the number of words, number of characters, even reading level scores, but not a simple concordance listing all the words used in the document, with the frequency of use. The only ‘concordance’ Word has is a manually created one, related to indexing — definitely not what I wanted. I wanted a list of all words in the document so that I could check them for inconsistencies (e.g. sub-tropical versus subtropical).

So off to the internet I went — only to find that there aren’t many tools out there that do this, and of those, I only found one that is free. Also, some of these tools are pretty complex — possibly unnecessarily so for my purposes — and some are lacking any sort of Help to get you started.

I downloaded and tested three tools, and of those I can only recommend one for generating a list of all words in a document, along with their frequency of use.

The tools I tested were WordSmith, WordListCreator and TextStat. (I also checked StyleWriter, a tool I already own, to see if it created a word list — it doesn’t.)

WordSmith

From http://www.lexically.net/wordsmith/index.html

  • The interface was NOT at all intuitive; there was some Help, but it didn’t help much.
  • It doesn’t load Word documents — you have to convert them to plain text (TXT) files first (it took a while to find this out).
  • When I finally did get a document loaded, the results and options were confusing.
  • The trial version has a 20 (?) word limit on what it displays.
  • Overall assessment — confusing!
  • Cost: GBP50 for a single license.
  • Note to any budding technical writer wanting a project to tackle — offer to rewrite the Help!

WordListCreator

From http://wordlistcreator.com/

  • The interface was fairly intuitive, so I didn’t need to refer to the Help
  • Many functions were limited in the trial version and so I couldn’t test them — instead, I got little pop-up windows telling me to buy the product whenever I tried some of these functions.
  • Overall assessment — I wasn’t prepared to part with my money when I couldn’t test all functions, particularly the ones I needed. Cripple the output/results, but not the functions.
  • Cost: US$19.95

TextSTAT

From http://neon.niederlandistik.fu-berlin.de/en/textstat/ (After several hours of not being able to access their website [hopefully this was a small glitch on the day I tried], I FINALLY got to their web page and downloaded their software.)

  • Simple to install and to run.
  • The interface was pretty intuitive and I rarely needed to access the Help, except to get started (you have to create a ‘corpus’ first, which is like a container file into which you can add your files, then you click the Frequency button to get a Word list.)
  • It deals with Word documents (no need to convert them to text).
  • Double-clicking a word in the Word List opens the Concordance tab showing that word in context. Double-clicking the line in the Concordance tab opens the Citation tab where you can see the word/line inside its paragraph in the document; you also get a link to the document so you can open it from the TextSTAT application.
  • You can export your word list to a text or CSV file, or to Word or Excel.
  • Overall assessment — it did everything I wanted. It’s a keeper!
  • Cost: Free!

In the example screen shot below, I’ve loaded four similar documents into the corpus, then arranged the words found alphabetically. In addition to inconsistent (but likely correct) capitalization, I can see that two variations of the same term are used (sub-tropics and subtropical, highlighted in yellow).  I can double-click on each word to find out which document it is used in.

[Links checked April 2010]

6 comments

  1. Thank you.

    I appreciate the fact that you took the time to evaluate a number of tools including mine (Word List Creator). I am currently doing research myself on the need and uses of such tools. The way others use my software are varied – and very different than my original thoughts when I first created it. I wanted a way to help my children read better. I thought that if I could extract distinct vocabulary words then I could make flash cards and quiz them!

    At any rate, I am most interested in how you, and others might use this sort of software. This way I can create relevant updates. Other inputs such as the price is good as well.

    Thanks for a good article.

    Jared


  2. Think this is just what I need for my book, thanks!


  3. Thank you. I had been looking for hours to find this tool to build a word list for creating an automatic index in Word. I was even experimenting with Word macros. Nothing was working. Then thanks to you I found TextSTAT here, what great little program! It is exactly what I was looking for and then some!


  4. Thanks for your research. My Word5 had this function which I loved. But not on upgrades… no surprise. Will give TextSTAT a try.
    Note to Jarad: hope you can use the research, as I have, as a writer, always found a word count function to be one of the most valuable tools.


  5. THANK YOU VERY MUCH!!!
    This is incredible. I can’t tell you how many hours I spent looking for a tool that would help me create simple lists of words based on a doc file for further processing in traditional CAT tools. Commercial terminology creating tools were very expensive and too complex. The tool you suggested just does the right job, period. It extracts words, sorts them in a list, offers a link to words’ actual context and on the top of all this exports the generated list into xls file. Dream come true! Thank you.


  6. This looks great… but only halfway there. I need a tool like this one that will list all the words… and the page number(s) on which they appear!!



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 297 other followers

%d bloggers like this: