Archive for the ‘Word’ Category

h1

Word: Finding duplicate words

July 11, 2018

I had a long list (57 pages!) of Latin species names, sorted into alphabetical order. I’d separated the words so that there was only one word on each line. My next task was to go through and remove all the duplicates (i.e. a word immediately followed by the same word) so I could add the final list to my custom dictionary for species in Microsoft Word. I started doing it manually—it’s easy enough to find duplicates when the words are familiar, but for Latin words, my brain just wasn’t coping well and I was missing subtle differences like a single or double ‘i’ at the end of a word. There had to be a better way…

And there is! Good old Dr Google came to the rescue, and with a bit of fiddling to suit my circumstances (one word on each line), I got a wildcard find and replace routine to find the duplicates.

NOTE: DO NOT do a ‘replace all’ with this, in case Word makes unwanted changes. In my case it didn’t treat the second word as a whole word for matching purposes (e.g. it thought banksi and banksii were duplicates). Even though I had to skip some of these, it was still worth it to automate much of the process. Another caveat—if you have several lines of the same word, each pair will be found, but you’ll have to run the find several times to get them all. Much better to move your cursor into Word and delete the excess multiple duplicates when you find them. You may still have to do a couple of passes over the document, but the heavy lifting will have been done for you.

Here’s what I did to get it work:

  1. Press Ctrl+H to open the Find and Replace window.
  2. Click More, then select the Use Wildcards checkbox.
  3. In the Find What field, type (<*>)^013\1 (there are no spaces in this string).
  4. In the Replace With field, type \1 (there are no spaces in this string either).
  5. Click Find Next.
  6. When a pair of matching whole words is found, click Replace. NOTE: If the second word is only a partial match for the first word, click Find Next.
  7. Repeat steps 5 and 6 until you’re satisfied you’ve found them all.

How this works:

  • (<*>) is the first element (later represented by \1) of the find. The angle brackets specify the start and end of a word, and the ‘word’ is anything (represented by the *). In other words, you’re looking for a whole ‘word’ of any length and made up of any characters (including numbers).
  • ^013 is the paragraph marker at the end of the line. In my situation, each word was on its own line with a paragraph mark at the end of the line. If you don’t have this situation, leave this out and replace it with a space (two repeated words in the same line are separated by a space). NOTE: Normally you can find a paragraph mark in a Find with ^p, but not with a wildcard Find—you have to use ^013.
  • \1 is the first element. In the Find, it means the duplicate of whatever was found by (<*>); in the Replace, it means replace the duplicated word with the first word found.

 

h1

Word: Find all words with two or more capital letters

July 9, 2018

Someone in one of my online editing groups wanted to find all the acronyms and initialisms in their document—any word comprising two or more capital (‘cap’) letters (e.g. AB, CDEF, GHIJK, etc.). They wanted a command that would find each one so they could check it (possibly against a glossary), then click Find Next to jump to the next one.

Wildcards to the rescue!

Here’s how:

  1. Press Ctrl+H to open the Find and Replace window.
  2. Click the Find tab (we only want to find these, no replace them with anything else).
  3. Click More to show further options.
  4. Select the Use wildcards checkbox.
  5. In the Find what field, type: <[A-Z]{2,}>
  6. Click Find next to find the first string of two or more caps.
  7. Keep clicking Find next to jump to the next string of two or more caps.

How this works:

  • The opening and closing arrow brackets (< and >) specify that you want a single whole word, not parts of a word. Without these, you would find each set of caps (e.g. in the string ABCDEF, you would find ABCDEF, then BCDEF, then CDEF, then DEF, then EF, before moving on to the next set of caps).
  • [A-Z] specifies that you want a range (the [ ] part) of caps that fall somewhere in the alphabet (A-Z). If you only wanted capped words that started with, say, H through to M, then you’d change the range to [H-M] and all other capped words starting with other letters would be ignored.
  • {2,} means you want to find capped words with at least two letters in the specified range (i.e. A-Z). If you only wanted to find two- and three-letter capped words, then you’d change this to {2,3}, and all capped word of four or more letters would be ignored. By not specifying a number after the comma, the ‘find’ will find capped words of any length containing at least two letters.

 

h1

Word: Random filler for confidential documents

March 1, 2018

Have you ever needed to send a Word document that needs fixing to a Word expert outside your organisation, but are concerned about confidentiality issues? If so, the good people at Office Watch have created a macro that replaces all text with randomly generated letters, but preserves all the formatting, numbering etc. that your Word guru needs to check.

The full VBA code and a description of how the macro works is here: https://office-watch.com/2018/replace-confidential-text-filler-word/

NOTE: DO NOT use this on your real document — make a copy, and run it on the COPY ONLY, then send the copy to the person outside your organisation.

[Links last checked March 2018]

h1

Word: Transpose Surname, Firstname to Firstname Surname

December 26, 2017

I came across a heap of names styled ‘Surname,<space>Firstname’ (e.g. Smith, Jane) and needed to change them to ‘Firstname Surname’ (i.e. Jane Smith).

As with any find/replace operation, identifying the pattern is the first step. Once you’ve done that, the rest is pretty easy. In this example, the pattern was clear — each surname and first name started with a capital letter followed by one or more lower case letters, there was a comma after the surname, and then a space before the start of the first name. Each surname only had a single first name. Because names vary in length, I needed to use wildcards to specify matching the pattern for any number of letters.

Below is what I came up with for this swap — others more clever than me may have a more elegant way to do this, but this worked for me.

CAUTIONS and WARNINGS:

  • I don’t advise doing a ‘replace all’ with this — if there’s anything else that matches the pattern that ISN’T a name, it will get changed too.
  • This find/replace only finds whole names with a single capital letter (i.e. it finds Smith, Jones, Haythornthwaite, Jane, Rosemary, Jonathan). It does NOT find names with more than one capital (e.g. McDonald, AnnMarie) or with an apostrophe (e.g. O’Malley).
  • Hyphenated words are found, but transpose incorrectly (e.g. Smith, Jane-Ann changes to Jane Smith-Ann not Jane-Ann Smith; similarly Jones-Brown, John changes to Jones-John Brown).
  • Surnames with a first and middle name or initial will be found but transposed incorrectly (e.g. Smith, Jane K. Susan will become Jane Smith K. Susan instead of Jane K Susan Smith). Surnames with an initial letter instead of a first name will not be found (e.g. Philips, A. is not found)
  • Names separated with anything other than a comma, or that have two or more spaces between the comma and the first name will NOT be found.
  • Names with accents, umlauts, and other diacritical marks over letters (e.g. René) are found and transposed correctly.

Despite all the cautions and warnings above, if you have a long list of names to change, then you could run this find/replace, replacing one at a time and manually fixing the others that aren’t found or that will transpose incorrectly. It’s still quicker than doing them all manually.

Steps:

  1. Save your document.
  2. Press Ctrl+H to open the Find/Replace window.
  3. Click the More button.
  4. Select Use Wildcards.
  5. In the Find What field, enter this (copy it from here and paste as there’s a space in the string of characters): (<[A-Z])([a-z]@>)(, )(<[A-Z])([a-z]@>)
  6. In the Replace With field, enter this (again, copy/paste as there’s a space in here that’s hard to see): \4\5 \1\2
  7. Click Find Next.
  8. Click Replace if it finds a name you want to transpose; if not, click Find next to go to the next one. (Note: Replace All is super powerful and you could change things you don’t want to, so err on the side of caution and click Find Next > Replace > Find Next until all are done).

Explanation:

  • Parentheses surround each ‘element’ of the find. These are represented by numbers in the replace (i.e. the 4th set of parentheses in the find becomes \4 in the replace)
  • < indicates the beginning of a word; > indicates the end of a word
  • [A-Z] looks for any upper case letter; [a-z] looks for any lower case letter
  • @ looks for any number of the instruction immediately previous (e.g. [a-z]@> looks for any number of lower case letters up to the end of a word — this covers the varying length of names)

 

h1

Word: Find duplicated words

December 6, 2017

This find/replace is based on Paul Beverley’s work, so full acknowledgement to him for teaching me how to do this via his YouTube videos and his free book.

********

Some of my authors inadvertently type the same word twice (e.g. is is, the the), and it’s often hard to pick these up when editing. If you run spellcheck, you may find them, but there’s no guarantee of that. The find and replace below uses wildcards to find any instance of duplicated words, followed by a space or a common punctuation mark, and then replaces that with a single word and the trailing space or punctuation.

NOTE: This find/replace only finds words with the exact same case, so it will find ‘the the’, ‘THE THE’, and ‘The The’, but it won’t find instances where each word has the same letters but with different cases (e.g. ‘the The’, ‘The the’, ‘tHe thE’ etc.)

Steps:

  1. Press Ctrl+H to open the Find and Replace dialog box.
  2. Click More, then select the Use wildcards option.
  3. In the Find field, type: (<[A-Za-z]@)[ ,.;:]@\1>
    (Note: There’s a space in there, so I suggest you copy this Find string.)
  4. In the Replace field, type: \1
  5. Click Find Next then click Replace. Repeat.

 

How this works — at least how I *think* it works:

  • Find: Look for the start of any word (<) made up of any number (@) of letters ([A-Za-z]) followed by a space or punctuation ([ ,.;:]) then repeat that find (@\1) until you can’t any more words that match the pattern (>).
  • Replace: Replace the first element (the first of the duplicate words) with itself (that’s the \1 bit), which effectively deletes the rest.

[Links last checked December 2017]

h1

Word: Find a year followed by a comma and replace with a semicolon

November 22, 2017

Another early morning question posed on Facebook…

The person was trying to use Word’s wildcard find and replace to convert all strings of Authorname nnnn, Authorname nnnn, Authorname nnnn, Authorname nnnn (i.e. any author’s name, followed by a 4-digit number for a year, such as Smith 2005, Jones 1997, etc., followed by a comma, followed by another author’s name etc.). He wanted to convert all the comma separators to semicolons, ending up with Authorname nnnn; Authorname nnnn; Authorname nnnn; Authorname nnnn. (I’ve italicised the text for clarity — it wasn’t in his original.)

Wildcard find/replace is all about finding the pattern and then figuring out how best to interpret that pattern in a meaningful way in how you search for what you want, and how you replace it with what you want.

In this example, an author name always ends in a lower case letter, is followed by a space, then four numbers for the year, a comma, a space, then an upper case letter for the next author’s name. The last item in the list doesn’t quite match the pattern (no comma, space, upper case letter following it),  but that one doesn’t need to change so we can ignore that variation to the pattern. He wanted to keep everything except the comma, which he wanted to change to a semicolon.

Here’s how I solved it using Word’s wildcard find and replace  (there may be a more elegant solution, but this one worked for me):

  • Find: ([0-9]{4})(,)( )([A-Z]) 
  • Replace: \1;\3\4

If you need to use this, I suggest you copy it as there’s a space in the third set of parentheses that you can’t see.

How this works:

  • Find: Look for any number from 0 to 9 [0-9] that has 4 digits {4} — this is the first element and is surrounded by parentheses. Then look for a comma (another element, so also surrounded by parens). Next look for a space (wow, more parens), and finally look for any upper case letter [A-Z] and as it’s a unique element, surround it by parens too.
  • Replace: Replace the first element (the 4-digit number) with itself (that’s the \1 bit), then a semicolon, then replace the third and fourth elements of the find with themselves (e.g. \3\4).

You keep everything you don’t want to change (elements 1, 3, and 4) and only change the second element by typing a semicolon in between elements 1 and 3.

 

 

h1

Word: Wildcard replace with a backslash

November 22, 2017

This morning, well before I was properly awake, I solved a problem someone had posed on a Facebook group I’m in. They had an issue getting Word’s wildcard find and replace to do what they wanted and had asked members of the group to help. I’m writing this up for my own future reference as there’s some information in here about the peculiarities of the backslash character that I may need to use again in the future. [Random fact: The backslash character is known by several names, including the reverse virgule and the reverse solidus.]

The person was trying to find an easy way to find all instances of 3x and replace with 3\x\. Actually, she was trying to do more than that — if she’d only been looking for that, then a normal find/replace should work. For the rest of the string, however, she really needed to use wildcards. Where she was getting stuck was defining the Find correctly, and then the Replace.

Here’s my solution (using wildcards):

  • Find: (3)(x) 
  • Replace: \1^92\2^92

How this works:

  • First, look for 3 followed immediately by x. I separated them in the Find string with parentheses so that I could treat them as separate elements in the Replace string.
  • Next, for the replace, type \1 to replace the first element (the 3) with itself, then type ^92 to add a backslash character (you can’t type a \ as that won’t work), then \2 to replace the second element of the Find with itself (i.e. the x), then another ^92 for a final backslash character.

Two things to note:

  • The backslash is an escape character in a Find, so if you need to find one, you need to surround it with square brackets and ‘escape’ it — i.e. [\\] in a Find.
  • The backslash is a special character in Replace too as it designates the element you want to replace with itself. Instead, you have to use ^92 in place of a \.