h1

Word: Extract file names of images

August 30, 2018

In my post yesterday, I showed three ways of extracting file name information if it was entered in the Alt Text properties of the image. However, file names/file paths don’t automatically populate the Alt Text description field, except under some circumstances. In my testing:

  • if I dragged a saved image from Windows Explorer into Word 2010, the file path and name was added to the Alt Text description field automatically (NOTE: Dragging from Explorer did NOT add the file path/name information in Word 2016 [and possibly in Word 2013]! It seems Microsoft changed this function after Word 2010 for at least one of these reasons: large corporate clients didn’t want file paths visible to readers with the knowledge to find out what they were, or, putting the file path in Alt Text actually went against accessibility guidelines, because a screen reader would read the file path NOT a considered description of what the image was about.)
  • if I dragged a saved image from SnagIt Editor, the file path and name was added to the Alt Text description field automatically
  • if I clicked Insert > Picture in Word, selected an image, then clicked Insert, none of the file information was carried across
  • if I clicked Insert > Picture in Word, selected an image, then clicked the drop-down arrow next to Insert and then selected Insert and Link, the file path/name information is stored in Word, but NOT under Alt Text—you can only find it by going to File > Info, then clicking Edit Links to Files (bottom right of that Info window), then scrolling through the list of links to find those for images. Click the item in the list and the file path/name is displayed in the Source information section. You can’t select it to copy it.

An alternative way to find these file names (whether detailed in Alt Text or not) is to save the Word document as an XML file, then open the XML file in an HTML editing program (you can use a program like Notepad, but you have to find them one at a time for each image file extension [e.g. .gif, .jpg, .jpeg, .png, .emf, .svg, .bmp, .tif, .tiff, etc.]). Ignore any found with paths/names like /media/image1.png etc. as these are the automatically created image files when you saved as XML—they are not the original final names. When you find a legitimate file path/name, copy it to another document. Yes, this is tedious, and there is a marginally quicker way if you have an HTML editing program installed and have a lot of image file names to deal with.

NOTE: You will never get the file name/path for anything inserted using Insert > Picture without linking.

I had Microsoft Expression Web installed on my computer so I used that—it allows you to search for all instances of a term (in my case, a file extension) and lists them in a Results pane. From there, you select them all, then copy the list and paste it into Excel. Once in Excel, you can delete the columns you don’t need and do find/replace to clean up the XML coding you don’t want. Hopefully, you’ll get enough file names to make this a worthwhile exercise.

Here are the steps I used with Expression Web:

  1. Open a new Excel document.
  2. Open the Word document, then save it as an XML file (File > Save As, select Word XML document (*.xml) as the file type). Close the Word document.
  3. Go to (Windows) Explorer, right-click on the XML file you saved in Step 2, then choose the program to open it—in my case, I opened it with Microsoft Expression Web 2.0, and the rest of these instructions are for that program.
  4. Go to Edit > Find, then enter the search term for an image format file extension (the most common ones are .gif, .jpg, .jpeg, .png, .emf, .svg, .bmp, .tif, and .tiff). You can make this quicker by entering .jp to find both jpg and jpeg formats, or .ti to find both .tif and .tiff formats.
  5. Click Find All.
  6. The Results list below the main window shows all lines where the search term was found. Click in the Results list, then press Ctrl+a to select all results.
  7. Right-click on the selected results, then choose Copy Results (Note: Ctrl+c will NOT copy them).
  8. Go to your blank Excel document, and paste the copied results into Excel.
  9. Repeat Steps 4 to 8 for all the other image file formats.
  10. When you have finished finding them all and putting them into the Excel document, save the Excel file.
  11. Now it’s time to clean up the Excel file:
    • delete any columns you don’t need (I only needed the Matched Text column, so I deleted the others)
    • delete any rows you don’t need (those with /word/media/image1.png or /media/image1.png in them—these are the auto generated file names created when you save the Word document as XML)
    • delete any duplicates
    • if you see patterns in the XML code, do several find/replace runs on the file to get rid of the unwanted XML code (e.g. find <wp:cNvGraphicFramePr><a:graphicFrameLocks noChangeAspect=”1″… and replace with nothing)
    • manually delete any remaining XML code that you can’t delete easily with find/replace
  12. Once you’ve cleaned it up, your Excel file should list all the found file paths/names. Sort the list alphabetically so you can identify and delete any further duplicates. (Note: Some images may be listed twice — just the file name and another with the full file path.)

[Links last checked August 2018]

One comment

  1. […] description often included the file name and sometimes the full file path on the network (see the follow-up post to this one for information on the method to use to always get the file name/path added […]



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: