Posts Tagged ‘find and replace’

h1

Word: Add a character to all items in a long list, and style it in a colour

July 27, 2018

Here was a tricky one posed by my husband first thing this morning. He had a very long column in a table (some 2750+) rows, in which he had some sort of code, like a product identifier. Fortunately, the codes were all the same — they were all K-xxxx, where ‘xxxx’ was a 4-digit number (e.g. K-1234, K-5432, etc.).

He wanted to add a zero immediately after the hyphen. Easy enough. But he wanted this zero character to be blue! Hmmm… (and no, I have no idea why! Update: He had a list of catalogue numbers, but when the company used up all the 4-digit numbers, they changed to 5-digit numbers. To sort them correctly by catalogue number in Word, he needed to add a leading 0 to the 4-digit ones, but he wanted to show that the 0 wasn’t part of the original number, thus the colour.)

After a few minutes of testing I achieved what he wanted. I had to do three find and replace passes, with one of them a wildcard find/replace — the first pass added the 0, the second changed the colour of the hyphen and its trailling zero to blue, and the third changed the colour of the hyphen back to black, leaving just the 0 after the hyphen in blue text.

NOTE: If you’re doing something like this on your own document, either work on a COPY until you’ve refined the procedure and know you won’t inadvertently replace something you shouldn’t have, OR at the ‘Replace All’ steps below, click ‘Replace’ instead, followed by ‘Find Next’. You will have many more clicks to do, but it’s a safer option.

Here’s what I did:

First pass – add a zero after the hyphen:

  1. Open the Find/Replace window (Ctrl+H).
  2. In the Find What field, type K-
  3. In the Replace With field, type K-0
  4. Select the list (or column in a table) you want to apply this change to
  5. Click Replace All. This adds a zero after the K-, so you end up with codes like K-01234, K-05432, etc.
  6. Leave the Find and Replace window open.

Second pass – make the hyphen and the zero another colour:

  1. For the second pass, click outside the selection to position the cursor away from it (I had to do this because as soon as I entered the wildcard string for the colour, ALL the selected text changed to blue, without me even clicking Replace All — very strange).
  2. In the Find/Replace window, click More.
  3. Select the Use wildcards checkbox.
  4. In the Find What field, type (-)(0) (there are NO spaces in this string).
  5. In the Replace With field, type \1\2 (there are NO spaces in this string).
  6. With your cursor still in the Replace With field, click Format.
  7. Select Font.
  8. Choose a colour from the Font color drop down.
  9. Click OK.
  10. Check the Replace With field — it should have Font color: <name of colour> below the field. The only thing below the Find What field should be Use wildcards. If you have something different, repeat these steps, and make sure you follow Step 6 exactly.
  11. Select the list (or column) again.
  12. Click Replace All. This changes the hyphen and trailling zero to the colour you selected.
  13. Leave the Find and Replace window open.

Third pass – remove the colour from the hyphen:

  1. For the third and final find and replace pass, click outside the selection to position the cursor away from it. Don’t forget to do this!
  2. Clear the Use wildcards checkbox.
  3. In the Find What field, delete the existing characters, then type a hyphen.
  4. In the Replace With field, delete the existing characters, then type a hyphen.
  5. With your cursor still in the Replace With field, click Format, select Font, then in the Font Color drop-down box, select Automatic (or another font colour).
  6. Select the list (or column) again.
  7. Click Replace All. This changes the hyphen colour to the colour you selected in Step 5.

 

h1

Word: Wildcard find and replace for numbers inside parentheses

July 22, 2018

In a comment on another post (https://cybertext.wordpress.com/2015/07/14/word-wildcard-find-and-replace-for-numbers-and-trailing-punctuation/), AVi asked if there was a way to find percentage numbers (e.g. 56%) that were inside parentheses, and replace them with the same number but without the parentheses — i.e. (56%) becomes 56%.

There is, but it’s a bit trickier than usual because parentheses are also special characters in Word’s find/replace lexicon—these have to be ‘escaped’ for Word to treat them as normal characters and not as special characters.

In figuring this out, I also took into account that there might be single numerals (e.g 4%), triple numerals (e.g. 125%), and numerals with one or more decimals (e.g. 75.997%).

Here’s what I came up with that worked for all those scenarios:

  1. Press Ctrl+H to open the Find and Replace dialog.
  2. Click More, then select the Use wildcards check box.
  3. In Find What, type: ([\(])([0-9]*%)([\)])
  4. In Replace With, type: \2
  5. Click Find Next, then click Replace once the first is found. Once you’re happy that it works, repeat until you’ve replaced them all.

What the find and replace ‘codes’ mean:

The three elements (each is enclosed in parentheses) of the Find are:

  1. ([\(]) — You need to find a specific character (the opening parenthesis), so you need to enclose it in parentheses. However, because parentheses are special wildcard characters in their own right, you need to tell Word to treat them as normal text characters and not as special characters, so you put in a backslash ‘\‘ (also known as an ‘escape’ character) before the (, AND surround this string in square brackets [ ] (otherwise, it won’t work).
  2. ([0-9]*%) — The [0-9] represents any number from 0 to 9; the * represents any more characters immediately after that number (more numbers, or a decimal point), thus not limiting the find to only single digit numbers; and the % symbol says this string of numbers found must finish with that symbol. This finds numbers like 2%, 25%, 283%, 25.4%, etc.
  3. ([\)]) — You need to find the closing parenthesis, so you need to enclose it in parentheses. However, because the closing parenthesis is a special wildcard character in its own right, you need to tell Word to treat it as a normal text character and not as a special character, so you ‘escape’ it with a backslash ‘\‘ before the ), then surround that string in square brackets.

There are no spaces preceding or trailing any of these elements, or in between them, so if you copy the code from this blog post, get rid of any preceding spaces otherwise it won’t work .

For the Replace: \2 — Tells Word to replace the second element of the Find with what was in the Find (i.e. a number followed by a % symbol) .

AVi: I hope this solves your problem. Donations to keeping this blog ad-free gratefully accepted (see the link at the top right of the page).

 

h1

Word: Finding duplicate words

July 11, 2018

I had a long list (57 pages!) of Latin species names, sorted into alphabetical order. I’d separated the words so that there was only one word on each line. My next task was to go through and remove all the duplicates (i.e. a word immediately followed by the same word) so I could add the final list to my custom dictionary for species in Microsoft Word. I started doing it manually—it’s easy enough to find duplicates when the words are familiar, but for Latin words, my brain just wasn’t coping well and I was missing subtle differences like a single or double ‘i’ at the end of a word. There had to be a better way…

And there is! Good old Dr Google came to the rescue, and with a bit of fiddling to suit my circumstances (one word on each line), I got a wildcard find and replace routine to find the duplicates.

NOTE: DO NOT do a ‘replace all’ with this, in case Word makes unwanted changes. In my case it didn’t treat the second word as a whole word for matching purposes (e.g. it thought banksi, banksia, and banksii were duplicates). Even though I had to skip some of these, it was still worth it to automate much of the process. Another caveat—if you have several lines of the same word, each pair will be found, but you’ll have to run the find several times to get them all. Much better to move your cursor into Word and delete the excess multiple duplicates when you find them. You may still have to do a couple of passes over the document, but the heavy lifting will have been done for you.

Here’s what I did to get it work:

  1. Press Ctrl+H to open the Find and Replace window.
  2. Click More, then select the Use Wildcards checkbox.
  3. In the Find What field, type (<*>)^013\1 (there are no spaces in this string).
  4. In the Replace With field, type \1 (there are no spaces in this string either).
  5. Click Find Next.
  6. When a pair of matching whole words is found, click Replace. NOTE: If the second word is only a partial match for the first word, click Find Next.
  7. Repeat steps 5 and 6 until you’re satisfied you’ve found them all.

How this works:

  • (<*>) is the first element (later represented by \1) of the find. The angle brackets specify the start and end of a word, and the ‘word’ is anything (represented by the *). In other words, you’re looking for a whole ‘word’ of any length and made up of any characters (including numbers).
  • ^013 is the paragraph marker at the end of the line. In my situation, each word was on its own line with a paragraph mark at the end of the line. If you don’t have this situation, leave this out and replace it with a space (two repeated words in the same line are separated by a space). NOTE: Normally you can find a paragraph mark in a Find with ^p, but not with a wildcard Find—you have to use ^013.
  • \1 is the first element. In the Find, it means the duplicate of whatever was found by (<*>); in the Replace, it means replace the duplicated word with the first word found.

 

h1

Word: Find all words with two or more capital letters

July 9, 2018

Someone in one of my online editing groups wanted to find all the acronyms and initialisms in their document—any word comprising two or more capital (‘cap’) letters (e.g. AB, CDEF, GHIJK, etc.). They wanted a command that would find each one so they could check it (possibly against a glossary), then click Find Next to jump to the next one.

Wildcards to the rescue!

Here’s how:

  1. Press Ctrl+H to open the Find and Replace window.
  2. Click the Find tab (we only want to find these, no replace them with anything else).
  3. Click More to show further options.
  4. Select the Use wildcards checkbox.
  5. In the Find what field, type: <[A-Z]{2,}>
  6. Click Find next to find the first string of two or more caps.
  7. Keep clicking Find next to jump to the next string of two or more caps.

How this works:

  • The opening and closing arrow brackets (< and >) specify that you want a single whole word, not parts of a word. Without these, you would find each set of caps (e.g. in the string ABCDEF, you would find ABCDEF, then BCDEF, then CDEF, then DEF, then EF, before moving on to the next set of caps).
  • [A-Z] specifies that you want a range (the [ ] part) of caps that fall somewhere in the alphabet (A-Z). If you only wanted capped words that started with, say, H through to M, then you’d change the range to [H-M] and all other capped words starting with other letters would be ignored.
  • {2,} means you want to find capped words with at least two letters in the specified range (i.e. A-Z). If you only wanted to find two- and three-letter capped words, then you’d change this to {2,3}, and all capped word of four or more letters would be ignored. By not specifying a number after the comma, the ‘find’ will find capped words of any length containing at least two letters.

 

h1

Word: Transpose Surname, Firstname to Firstname Surname

December 26, 2017

I came across a heap of names styled ‘Surname,<space>Firstname’ (e.g. Smith, Jane) and needed to change them to ‘Firstname Surname’ (i.e. Jane Smith).

As with any find/replace operation, identifying the pattern is the first step. Once you’ve done that, the rest is pretty easy. In this example, the pattern was clear — each surname and first name started with a capital letter followed by one or more lower case letters, there was a comma after the surname, and then a space before the start of the first name. Each surname only had a single first name. Because names vary in length, I needed to use wildcards to specify matching the pattern for any number of letters.

Below is what I came up with for this swap — others more clever than me may have a more elegant way to do this, but this worked for me.

CAUTIONS and WARNINGS:

  • I don’t advise doing a ‘replace all’ with this — if there’s anything else that matches the pattern that ISN’T a name, it will get changed too.
  • This find/replace only finds whole names with a single capital letter (i.e. it finds Smith, Jones, Haythornthwaite, Jane, Rosemary, Jonathan). It does NOT find names with more than one capital (e.g. McDonald, AnnMarie) or with an apostrophe (e.g. O’Malley).
  • Hyphenated words are found, but transpose incorrectly (e.g. Smith, Jane-Ann changes to Jane Smith-Ann not Jane-Ann Smith; similarly Jones-Brown, John changes to Jones-John Brown).
  • Surnames with a first and middle name or initial will be found but transposed incorrectly (e.g. Smith, Jane K. Susan will become Jane Smith K. Susan instead of Jane K Susan Smith). Surnames with an initial letter instead of a first name will not be found (e.g. Philips, A. is not found)
  • Names separated with anything other than a comma, or that have two or more spaces between the comma and the first name will NOT be found.
  • Names with accents, umlauts, and other diacritical marks over letters (e.g. René) are found and transposed correctly.

Despite all the cautions and warnings above, if you have a long list of names to change, then you could run this find/replace, replacing one at a time and manually fixing the others that aren’t found or that will transpose incorrectly. It’s still quicker than doing them all manually.

Steps:

  1. Save your document.
  2. Press Ctrl+H to open the Find/Replace window.
  3. Click the More button.
  4. Select Use Wildcards.
  5. In the Find What field, enter this (copy it from here and paste as there’s a space in the string of characters): (<[A-Z])([a-z]@>)(, )(<[A-Z])([a-z]@>)
  6. In the Replace With field, enter this (again, copy/paste as there’s a space in here that’s hard to see): \4\5 \1\2
  7. Click Find Next.
  8. Click Replace if it finds a name you want to transpose; if not, click Find next to go to the next one. (Note: Replace All is super powerful and you could change things you don’t want to, so err on the side of caution and click Find Next > Replace > Find Next until all are done).

Explanation:

  • Parentheses surround each ‘element’ of the find. These are represented by numbers in the replace (i.e. the 4th set of parentheses in the find becomes \4 in the replace)
  • < indicates the beginning of a word; > indicates the end of a word
  • [A-Z] looks for any upper case letter; [a-z] looks for any lower case letter
  • @ looks for any number of the instruction immediately previous (e.g. [a-z]@> looks for any number of lower case letters up to the end of a word — this covers the varying length of names)

 

h1

Word: Find duplicated words

December 6, 2017

This find/replace is based on Paul Beverley’s work, so full acknowledgement to him for teaching me how to do this via his YouTube videos and his free book.

********

Some of my authors inadvertently type the same word twice (e.g. is is, the the), and it’s often hard to pick these up when editing. If you run spellcheck, you may find them, but there’s no guarantee of that. The find and replace below uses wildcards to find any instance of duplicated words, followed by a space or a common punctuation mark, and then replaces that with a single word and the trailing space or punctuation.

NOTE: This find/replace only finds words with the exact same case, so it will find ‘the the’, ‘THE THE’, and ‘The The’, but it won’t find instances where each word has the same letters but with different cases (e.g. ‘the The’, ‘The the’, ‘tHe thE’ etc.)

Steps:

  1. Press Ctrl+H to open the Find and Replace dialog box.
  2. Click More, then select the Use wildcards option.
  3. In the Find field, type: (<[A-Za-z]@)[ ,.;:]@\1>
    (Note: There’s a space in there, so I suggest you copy this Find string.)
  4. In the Replace field, type: \1
  5. Click Find Next then click Replace. Repeat.

 

How this works — at least how I *think* it works:

  • Find: Look for the start of any word (<) made up of any number (@) of letters ([A-Za-z]) followed by a space or punctuation ([ ,.;:]) then repeat that find (@\1) until you can’t any more words that match the pattern (>).
  • Replace: Replace the first element (the first of the duplicate words) with itself (that’s the \1 bit), which effectively deletes the rest.

[Links last checked December 2017]

h1

Word: Find a year followed by a comma and replace with a semicolon

November 22, 2017

Another early morning question posed on Facebook…

The person was trying to use Word’s wildcard find and replace to convert all strings of Authorname nnnn, Authorname nnnn, Authorname nnnn, Authorname nnnn (i.e. any author’s name, followed by a 4-digit number for a year, such as Smith 2005, Jones 1997, etc., followed by a comma, followed by another author’s name etc.). He wanted to convert all the comma separators to semicolons, ending up with Authorname nnnn; Authorname nnnn; Authorname nnnn; Authorname nnnn. (I’ve italicised the text for clarity — it wasn’t in his original.)

Wildcard find/replace is all about finding the pattern and then figuring out how best to interpret that pattern in a meaningful way in how you search for what you want, and how you replace it with what you want.

In this example, an author name always ends in a lower case letter, is followed by a space, then four numbers for the year, a comma, a space, then an upper case letter for the next author’s name. The last item in the list doesn’t quite match the pattern (no comma, space, upper case letter following it),  but that one doesn’t need to change so we can ignore that variation to the pattern. He wanted to keep everything except the comma, which he wanted to change to a semicolon.

Here’s how I solved it using Word’s wildcard find and replace  (there may be a more elegant solution, but this one worked for me):

  • Find: ([0-9]{4})(,)( )([A-Z]) 
  • Replace: \1;\3\4

If you need to use this, I suggest you copy it as there’s a space in the third set of parentheses that you can’t see.

How this works:

  • Find: Look for any number from 0 to 9 [0-9] that has 4 digits {4} — this is the first element and is surrounded by parentheses. Then look for a comma (another element, so also surrounded by parens). Next look for a space (wow, more parens), and finally look for any upper case letter [A-Z] and as it’s a unique element, surround it by parens too.
  • Replace: Replace the first element (the 4-digit number) with itself (that’s the \1 bit), then a semicolon, then replace the third and fourth elements of the find with themselves (e.g. \3\4).

You keep everything you don’t want to change (elements 1, 3, and 4) and only change the second element by typing a semicolon in between elements 1 and 3.