Archive for July, 2018

h1

Word: Wildcard find and replace for numbers inside parentheses

July 22, 2018

In a comment on another post (https://cybertext.wordpress.com/2015/07/14/word-wildcard-find-and-replace-for-numbers-and-trailing-punctuation/), AVi asked if there was a way to find percentage numbers (e.g. 56%) that were inside parentheses, and replace them with the same number but without the parentheses — i.e. (56%) becomes 56%.

There is, but it’s a bit trickier than usual because parentheses are also special characters in Word’s find/replace lexicon—these have to be ‘escaped’ for Word to treat them as normal characters and not as special characters.

In figuring this out, I also took into account that there might be single numerals (e.g 4%), triple numerals (e.g. 125%), and numerals with one or more decimals (e.g. 75.997%).

Here’s what I came up with that worked for all those scenarios:

  1. Press Ctrl+H to open the Find and Replace dialog.
  2. Click More, then select the Use wildcards check box.
  3. In Find What, type: ([\(])([0-9]*%)([\)])
  4. In Replace With, type: \2
  5. Click Find Next, then click Replace once the first is found. Once you’re happy that it works, repeat until you’ve replaced them all.

What the find and replace ‘codes’ mean:

The three elements (each is enclosed in parentheses) of the Find are:

  1. ([\(]) — You need to find a specific character (the opening parenthesis), so you need to enclose it in parentheses. However, because parentheses are special wildcard characters in their own right, you need to tell Word to treat them as normal text characters and not as special characters, so you put in a backslash ‘\‘ (also known as an ‘escape’ character) before the (, AND surround this string in square brackets [ ] (otherwise, it won’t work).
  2. ([0-9]*%) — The [0-9] represents any number from 0 to 9; the * represents any more characters immediately after that number (more numbers, or a decimal point), thus not limiting the find to only single digit numbers; and the % symbol says this string of numbers found must finish with that symbol. This finds numbers like 2%, 25%, 283%, 25.4%, etc.
  3. ([\)]) — You need to find the closing parenthesis, so you need to enclose it in parentheses. However, because the closing parenthesis is a special wildcard character in its own right, you need to tell Word to treat it as a normal text character and not as a special character, so you ‘escape’ it with a backslash ‘\‘ before the ), then surround that string in square brackets.

There are no spaces preceding or trailing any of these elements, or in between them, so if you copy the code from this blog post, get rid of any preceding spaces otherwise it won’t work .

For the Replace: \2 — Tells Word to replace the second element of the Find with what was in the Find (i.e. a number followed by a % symbol) .

AVi: I hope this solves your problem. Donations to keeping this blog ad-free gratefully accepted (see the link at the top right of the page).

 

h1

Word: Finding duplicate words

July 11, 2018

I had a long list (57 pages!) of Latin species names, sorted into alphabetical order. I’d separated the words so that there was only one word on each line. My next task was to go through and remove all the duplicates (i.e. a word immediately followed by the same word) so I could add the final list to my custom dictionary for species in Microsoft Word. I started doing it manually—it’s easy enough to find duplicates when the words are familiar, but for Latin words, my brain just wasn’t coping well and I was missing subtle differences like a single or double ‘i’ at the end of a word. There had to be a better way…

And there is! Good old Dr Google came to the rescue, and with a bit of fiddling to suit my circumstances (one word on each line), I got a wildcard find and replace routine to find the duplicates.

NOTE: DO NOT do a ‘replace all’ with this, in case Word makes unwanted changes. In my case it didn’t treat the second word as a whole word for matching purposes (e.g. it thought banksi and banksii were duplicates). Even though I had to skip some of these, it was still worth it to automate much of the process. Another caveat—if you have several lines of the same word, each pair will be found, but you’ll have to run the find several times to get them all. Much better to move your cursor into Word and delete the excess multiple duplicates when you find them. You may still have to do a couple of passes over the document, but the heavy lifting will have been done for you.

Here’s what I did to get it work:

  1. Press Ctrl+H to open the Find and Replace window.
  2. Click More, then select the Use Wildcards checkbox.
  3. In the Find What field, type (<*>)^013\1 (there are no spaces in this string).
  4. In the Replace With field, type \1 (there are no spaces in this string either).
  5. Click Find Next.
  6. When a pair of matching whole words is found, click Replace. NOTE: If the second word is only a partial match for the first word, click Find Next.
  7. Repeat steps 5 and 6 until you’re satisfied you’ve found them all.

How this works:

  • (<*>) is the first element (later represented by \1) of the find. The angle brackets specify the start and end of a word, and the ‘word’ is anything (represented by the *). In other words, you’re looking for a whole ‘word’ of any length and made up of any characters (including numbers).
  • ^013 is the paragraph marker at the end of the line. In my situation, each word was on its own line with a paragraph mark at the end of the line. If you don’t have this situation, leave this out and replace it with a space (two repeated words in the same line are separated by a space). NOTE: Normally you can find a paragraph mark in a Find with ^p, but not with a wildcard Find—you have to use ^013.
  • \1 is the first element. In the Find, it means the duplicate of whatever was found by (<*>); in the Replace, it means replace the duplicated word with the first word found.

 

h1

Word: Find all words with two or more capital letters

July 9, 2018

Someone in one of my online editing groups wanted to find all the acronyms and initialisms in their document—any word comprising two or more capital (‘cap’) letters (e.g. AB, CDEF, GHIJK, etc.). They wanted a command that would find each one so they could check it (possibly against a glossary), then click Find Next to jump to the next one.

Wildcards to the rescue!

Here’s how:

  1. Press Ctrl+H to open the Find and Replace window.
  2. Click the Find tab (we only want to find these, no replace them with anything else).
  3. Click More to show further options.
  4. Select the Use wildcards checkbox.
  5. In the Find what field, type: <[A-Z]{2,}>
  6. Click Find next to find the first string of two or more caps.
  7. Keep clicking Find next to jump to the next string of two or more caps.

How this works:

  • The opening and closing arrow brackets (< and >) specify that you want a single whole word, not parts of a word. Without these, you would find each set of caps (e.g. in the string ABCDEF, you would find ABCDEF, then BCDEF, then CDEF, then DEF, then EF, before moving on to the next set of caps).
  • [A-Z] specifies that you want a range (the [ ] part) of caps that fall somewhere in the alphabet (A-Z). If you only wanted capped words that started with, say, H through to M, then you’d change the range to [H-M] and all other capped words starting with other letters would be ignored.
  • {2,} means you want to find capped words with at least two letters in the specified range (i.e. A-Z). If you only wanted to find two- and three-letter capped words, then you’d change this to {2,3}, and all capped word of four or more letters would be ignored. By not specifying a number after the comma, the ‘find’ will find capped words of any length containing at least two letters.