Word: Concordance/word list creators

April 6, 2010

Microsoft Word is a pretty powerful tool, but for some reason you cannot get a list of all the words that have been used in a document. You can get the number of words, number of characters, even reading level scores, but not a simple concordance listing all the words used in the document, with the frequency of use. The only ‘concordance’ Word has is a manually created one, related to indexing — definitely not what I wanted. I wanted a list of all words in the document so that I could check them for inconsistencies (e.g. sub-tropical versus subtropical).

So off to the internet I went — only to find that there aren’t many tools out there that do this, and of those, I only found one that is free. Also, some of these tools are pretty complex — possibly unnecessarily so for my purposes — and some are lacking any sort of Help to get you started.

I downloaded and tested three tools, and of those I can only recommend one for generating a list of all words in a document, along with their frequency of use.

The tools I tested were WordSmith, WordListCreator and TextStat. (I also checked StyleWriter, a tool I already own, to see if it created a word list — it doesn’t.)


From http://www.lexically.net/wordsmith/index.html

  • The interface was NOT at all intuitive; there was some Help, but it didn’t help much.
  • It doesn’t load Word documents — you have to convert them to plain text (TXT) files first (it took a while to find this out).
  • When I finally did get a document loaded, the results and options were confusing.
  • The trial version has a 20 (?) word limit on what it displays.
  • Overall assessment — confusing!
  • Cost: GBP50 for a single license.
  • Note to any budding technical writer wanting a project to tackle — offer to rewrite the Help!


From http://neon.niederlandistik.fu-berlin.de/en/textstat/ (After several hours of not being able to access their website [hopefully this was a small glitch on the day I tried], I FINALLY got to their web page and downloaded their software.)

  • Simple to install and to run.
  • The interface was pretty intuitive and I rarely needed to access the Help, except to get started (you have to create a ‘corpus’ first, which is like a container file into which you can add your files, then you click the Frequency button to get a Word list.)
  • It deals with Word documents (no need to convert them to text; however, if you use track changes, work on a copy of the doc with track changes all accepted otherwise you’ll get some odd results).
  • Double-clicking a word in the Word List opens the Concordance tab showing that word in context. Double-clicking the line in the Concordance tab opens the Citation tab where you can see the word/line inside its paragraph in the document; you also get a link to the document so you can open it from the TextSTAT application.
  • You can export your word list to a text or CSV file, or to Word or Excel.
  • Overall assessment — it did everything I wanted. It’s a keeper!
  • Cost: Free!

In the example screen shot below, I’ve loaded four similar documents into the corpus, then arranged the words found alphabetically. In addition to inconsistent (but likely correct) capitalization, I can see that two variations of the same term are used (sub-tropics and subtropical, highlighted in yellow).  I can double-click on each word to find out which document it is used in.

See also: Blog post on how to get word frequency using Calibre: https://cybertext.wordpress.com/2021/06/30/use-calibre-to-get-a-word-frequency-list/

[Links last checked June 2021]


  1. Thank you.

    I appreciate the fact that you took the time to evaluate a number of tools including mine (Word List Creator). I am currently doing research myself on the need and uses of such tools. The way others use my software are varied – and very different than my original thoughts when I first created it. I wanted a way to help my children read better. I thought that if I could extract distinct vocabulary words then I could make flash cards and quiz them!

    At any rate, I am most interested in how you, and others might use this sort of software. This way I can create relevant updates. Other inputs such as the price is good as well.

    Thanks for a good article.


  2. Think this is just what I need for my book, thanks!

  3. Thank you. I had been looking for hours to find this tool to build a word list for creating an automatic index in Word. I was even experimenting with Word macros. Nothing was working. Then thanks to you I found TextSTAT here, what great little program! It is exactly what I was looking for and then some!

  4. Thanks for your research. My Word5 had this function which I loved. But not on upgrades… no surprise. Will give TextSTAT a try.
    Note to Jarad: hope you can use the research, as I have, as a writer, always found a word count function to be one of the most valuable tools.

    This is incredible. I can’t tell you how many hours I spent looking for a tool that would help me create simple lists of words based on a doc file for further processing in traditional CAT tools. Commercial terminology creating tools were very expensive and too complex. The tool you suggested just does the right job, period. It extracts words, sorts them in a list, offers a link to words’ actual context and on the top of all this exports the generated list into xls file. Dream come true! Thank you.

  6. This looks great… but only halfway there. I need a tool like this one that will list all the words… and the page number(s) on which they appear!!

  7. Thank you ! TextStat is exact what I was looking for. I export to excel and quit all conjuntions and prepositions…amazings results !

  8. Thank you! You’re awesome!

  9. The ‘Calibre’ epub editor has a word list feature. Calibre’s primary function is ebook library management, however the ePub editor is a separate program within the larger ‘package’ – ebook.editor.exe. You can convert the DOCX to ePub, by importing it into the editor.

    The word list is available from the editor’s menu via Tools/Reports. It shows occurrence counts and language (assuming ‘foreign’ words in the DOCX are set appropriately), the three columns (word, language and times used) are sortable, and you can filter the list using regular expressions. The word list can be saved to a CSV file. Double clicking on a word will step through the occurrences in the XHTML code window.

    Calibre is open source, available for Windows, Linux and OSX, there is a Portable version for Windows. Website ==>> http://calibre-ebook.com/

    I have tried all the tools mentioned here, and a couple of others who’s names escape me, I find them too elaborate.

  10. I went to “wordlistcreator.com” and it says “Home Improvement Junkie” and no sign of software.

    I have a way to generate a word list in an RTF -> TXT file, but it doesn’t keep a count of usages, which can be invaluable when you’re checking a book for spelling consistency, particularly of names. I’ve downloaded TextStat to check it out, and I am grateful to rightpaddock for pointing out that the calibre book editor can do the same job.

    Thank you, cybertext.

  11. Thanks Ed Bear — I’ll update the post. Those links were last checked when I wrote it in 2010.

  12. You are most welcome. I just ran a quick test of Calibre’s edit book “Reports” and exported the wordlist to a CSV file. rightpaddock has just made my week! I’m proofreading a science fiction book, and the word count shows one occurrence of “3-D” and three occurrences of “3D”. My spelling and usage consistency checks are going to be much easier uptime.

  13. For consistency checking like this, I HIGHLY recommend PerfectIt –you can add your own customisations, and it’s been worth every penny I’ve paid for it. I’m a very happy user. It’s available from http://www.intelligentediting.com and you can try it for free. Check out the video tutorials to get a feel for how it works and how you can customise it to your own needs.

  14. Thanks. As a longtime Wordperfect (the superior word processor) user, which already contains a concordance function, I was glad to see that someone had corrected Microsoft’s failure to meet its users needs (not unusual for them). Now, if I could only get them to create their own “Reveal Codes” feature, I’d consider using Word (sometimes).

  15. […] decade ago I investigated some software for getting a concordance or word frequency list (https://cybertext.wordpress.com/2010/04/06/word-concordanceword-list-creators/). The tools listed in that post are still available and TextSTAT is quick and easy to download and […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: