Mapping changing language

July 30, 2015

Based on a Writing Tip I wrote for my team…


English is a changing language. Some changes take centuries; others are much quicker. Changes that apply in Australia may not apply in the UK or US, and vice versa.

This Writing Tip discusses some language changes and shows how computer algorithms can plot these graphically. My focus is on the paired words that start as separate words, morph into a hyphenated form, and then become a single word.

I’ll start with an easy one that changed very quickly: database. At one point in the 1960s and 70s it was two separate words (‘data base’), then briefly went through a hyphenated phase (‘data-base’), before settling down and becoming the single word we’re all familiar with (‘database’).


Likewise, ‘email’ – it started as ‘e mail’, became ‘e-mail’ for quite a while, and is now accepted by most style guides as the single word, ‘email’, though ‘e-mail’ is still hanging on in usage.


Many words are at different stages of morphing into a single word – some will never get there, others may remain hyphenated for a long time (decades, centuries), while others leap into single word status very easily. Who decides when a word pair becomes one? Lexicographers (the people who write dictionaries), who follow usage patterns when deciding whether to hyphenate, join, or leave separate a word pair. But lexicographers can’t keep up with usage patterns that change very quickly, so dictionaries are just one guide as to how to deal with a word pair.

More recently, computer algorithms (such as that used by Google Books Ngram Viewer: https://books.google.com/ngrams/) are plotting usage across centuries of writing (since 1800). However, as far as I can tell, there seems to be no way for casual users of the Ngrams to separate the language of books written in American English from those written in British English (and forget about Australian English – I doubt many books in the Google Books database use Australian English). This means that the results, while fascinating to some, don’t take the place of dictionaries in the ‘home’ language. Algorithms of usage patterns also don’t take the place of style guides put out by professional associations and societies (see this article on whether a honey bee should be a ‘honeybee’ or not: http://entomologytoday.org/2014/05/06/is-it-honey-bee-or-honeybee-bed-bug-or-bedbug-house-fly-or-housefly/).

I used Google Books Ngram Viewer to track down a few words that we use, which have several variations, to see when usage changed. My search criteria for each was separate words, hyphenated form, and joined form. What these graphs don’t show is whether the word is used as a noun, adjective, verb, or something else – context can dictate how you deal with a word pair (e.g. ‘to start up’ something [two words, verb] is quite different to ‘start-up activities’ [adjectival phrase]). Update: I’ve since found out you can specify which part of speech — the ‘Part-of-speech’ information on this page: http://books.google.com/ngrams/info

If you’re interested in language, try the Ngram Viewer with your own words to see when words came into usage (just enter one word), how they changed (enter variations of the word), etc.












This last one compares ‘color’ and ‘colour’ (click on it to view it full size), but doesn’t separate out the British or American English usages/instances. Surprisingly, ‘colour’ was most common until about 1890 when ‘color’ came into prominence. Was this a reflection of language usage, or that the Google Books project had more British English books published in the 1800s scanned than American English ones? And vice versa after the 1890s?


For fun, I entered a single four-letter word and came up with this graph, which clearly shows that the word was used in printed books until the 1820s and then did not appear again in books until the late 1950s! Those Victorians had a lot of influence…



And I entered another word (‘gay’) that has changed meaning over the past 100 years, dropping out of usage as that change occurred (from the 1940s to 1980s), then picking up its current usage from the early 1980s:


I also checked the stats for ‘awesome’, which show a big swing upwards from the 1960s onwards:


And in the same vein, I checked ‘groovy’, which surprisingly was used as far back as the 1840s, but had its heyday in the 1960s and 70s, with a resurgence in the late 1990s/early 2000s:


Fascinating stuff! I could play all day…

See also: http://www.brainpickings.org/2014/01/17/uncharted-big-data/

[Links last checked Aug 2015]


  1. You choose what dataset to use (American, British, French, etc) in the dropdown box “from the corpus” when you set up the query

  2. Yes, but Australian English isn’t listed.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: