Acrobat: Convert scanned image of text to textJanuary 18, 2010
In the category of “I didn’t know I could do that!” comes a neat Adobe Acrobat trick I discovered a few days ago.
I had received a PDF document from my work colleague — she wanted me to convert it into an editable Word document as no-one could locate the original. However, what she sent me wasn’t a ‘real’ PDF — it was a ‘scan to PDF’ of a paper document, and thus an image. So when I tried to copy the text to paste it into a Word document, the selection tool treated my selection as an image. All I could see ahead of me was many hours retyping this 40+ page document…
But then I noticed an option on the right-click menu of the selection. It was Recognize text using OCR. Hmmm… What might that be? To be safe, I made a copy of the PDF, then ran the Recognize text option on the copy (you can also convert the entire document from this menu path: Document > OCR Text Recognition > Recognize Text Using OCR).
Because the original document was almost entirely text, with a few tables thrown in and one figure, the conversion was painless and quick. The best thing was that I then had text I could copy and paste into the new Word document I was creating.
According to the Acrobat Help:
You can use Acrobat to recognize text in previously scanned documents that have already been converted to PDF. Optical character recognition (OCR) software enables you to search, correct, and copy the text in a scanned PDF. To apply OCR to a PDF, the original scanner resolution must have been set at 72 dpi or higher.
I wasn’t aware this tool even existed, so I was pretty pleased to have found it, and even more impressed at how easy it was to use on the scanned document I had.
BTW, I’m using Acrobat Professional 9. I’m not sure how long this feature has been in Acrobat, or if it’s even in the Standard version.