Word: Concordance/word list creatorsApril 6, 2010
Microsoft Word is a pretty powerful tool, but for some reason you cannot get a list of all the words that have been used in a document. You can get the number of words, number of characters, even reading level scores, but not a simple concordance listing all the words used in the document, with the frequency of use. The only ‘concordance’ Word has is a manually created one, related to indexing — definitely not what I wanted. I wanted a list of all words in the document so that I could check them for inconsistencies (e.g. sub-tropical versus subtropical).
So off to the internet I went — only to find that there aren’t many tools out there that do this, and of those, I only found one that is free. Also, some of these tools are pretty complex — possibly unnecessarily so for my purposes — and some are lacking any sort of Help to get you started.
I downloaded and tested three tools, and of those I can only recommend one for generating a list of all words in a document, along with their frequency of use.
The tools I tested were WordSmith, WordListCreator and TextStat. (I also checked StyleWriter, a tool I already own, to see if it created a word list — it doesn’t.)
- The interface was NOT at all intuitive; there was some Help, but it didn’t help much.
- It doesn’t load Word documents — you have to convert them to plain text (TXT) files first (it took a while to find this out).
- When I finally did get a document loaded, the results and options were confusing.
- The trial version has a 20 (?) word limit on what it displays.
- Overall assessment — confusing!
- Cost: GBP50 for a single license.
- Note to any budding technical writer wanting a project to tackle — offer to rewrite the Help!
- The interface was fairly intuitive, so I didn’t need to refer to the Help
- Many functions were limited in the trial version and so I couldn’t test them — instead, I got little pop-up windows telling me to buy the product whenever I tried some of these functions.
- Overall assessment — I wasn’t prepared to part with my money when I couldn’t test all functions, particularly the ones I needed. Cripple the output/results, but not the functions.
- Cost: US$19.95
From http://neon.niederlandistik.fu-berlin.de/en/textstat/ (After several hours of not being able to access their website [hopefully this was a small glitch on the day I tried], I FINALLY got to their web page and downloaded their software.)
- Simple to install and to run.
- The interface was pretty intuitive and I rarely needed to access the Help, except to get started (you have to create a ‘corpus’ first, which is like a container file into which you can add your files, then you click the Frequency button to get a Word list.)
- It deals with Word documents (no need to convert them to text).
- Double-clicking a word in the Word List opens the Concordance tab showing that word in context. Double-clicking the line in the Concordance tab opens the Citation tab where you can see the word/line inside its paragraph in the document; you also get a link to the document so you can open it from the TextSTAT application.
- You can export your word list to a text or CSV file, or to Word or Excel.
- Overall assessment — it did everything I wanted. It’s a keeper!
- Cost: Free!
In the example screen shot below, I’ve loaded four similar documents into the corpus, then arranged the words found alphabetically. In addition to inconsistent (but likely correct) capitalization, I can see that two variations of the same term are used (sub-tropics and subtropical, highlighted in yellow). I can double-click on each word to find out which document it is used in.
[Links checked April 2010]