h1

Acrobat: Convert scanned image of text to text

January 18, 2010

In the category of “I didn’t know I could do that!” comes a neat Adobe Acrobat trick I discovered a few days ago.

I had received a PDF document from my work colleague — she wanted me to convert it into an editable Word document as no-one could locate the original. However, what she sent me wasn’t a ‘real’ PDF — it was a ‘scan to PDF’ of a paper document, and thus an image. So when I tried to copy the text to paste it into a Word document, the selection tool treated my selection as an image. All I could see ahead of me was many hours retyping this 40+ page document…

But then I noticed an option on the right-click menu of the selection. It was Recognize text using OCR. Hmmm… What might that be? To be safe, I made a copy of the PDF, then ran the Recognize text option on the copy (you can also convert the entire document from this menu path: Document > OCR Text Recognition > Recognize Text Using OCR).

Because the original document was almost entirely text, with a few tables thrown in and one figure, the conversion was painless and quick. The best thing was that I then had text I could copy and paste into the new Word document I was creating.

According to the Acrobat Help:

You can use Acrobat to recognize text in previously scanned documents that have already been converted to PDF. Optical character recognition (OCR) software enables you to search, correct, and copy the text in a scanned PDF. To apply OCR to a PDF, the original scanner resolution must have been set at 72 dpi or higher.

I wasn’t aware this tool even existed, so I was pretty pleased to have found it, and even more impressed at how easy it was to use on the scanned document I had.

BTW, I’m using Acrobat Professional 9. I’m not sure how long this feature has been in Acrobat, or if it’s even in the Standard version.

5 comments

  1. I have used the Nuance (Scansoft)PDF suite for many years – more economical than Adobe’s equivalent – this capability is readily at hand – will even detect for example, that a product was developed using a later version of Word and its output will be backwards compatible.


  2. here your header is not match as you wrote the solution.
    the fact is if u could convert scanned or editaed photo/image to plain text or word document? can you do that.? please Email me -debasish911@gmail.com


  3. As I stated in the opening paragraph, this is for a document that’s been scanned to PDF, likely via a fancy printer. The result is an ‘image’ of the text document. It’s this that sort of PDF that can be converted with the method detailed above.

    Photos likely can”t be converted to text, even if they are PDFs and have text in them. I haven’t tried, but I would suspect that they wouldn’t convert well, if at all.


  4. As I understand it, Microsoft Word has this as an Add-in http://office.microsoft.com/en-us/word-help/scan-and-edit-a-document-HA010254879.aspx However, My company don’t allow me to install add-ins so this way really useful! I can confirm that this OCR button is there in Adobe Acrobat Professional 8 as well.


  5. Acrobat Professional v7.1.4: Document>Recognize Text using OCR



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 317 other followers

%d bloggers like this: