Sunday, January 18, 2009

Correct OCR text in PDFs

When you scan to Formatted Text & Graphics output, Acrobat analyzes bitmaps of text and substitutes words and characters for those bitmap areas. If the ideal substitution is uncertain, Acrobat marks the word as suspect. Suspects appear in the PDF as the original bitmap of the word, but the text is included on an invisible layer behind the bitmap of the word. This makes the word searchable even though it is displayed as a bitmap. You can accept these suspects as they are, or you can use the TouchUp Text tool to correct them.
Note: If you try to select text in a scanned PDF that does not have OCR applied, or try to perform a Read Out Loud operation on an image file, Acrobat asks if you want to run OCR. If you click OK, the Recognize Text dialog box opens and you can select options, which are described in detail under the previous topic.
  1. Do one of the following:
    • Choose Document > OCR Text Recognition > Find All OCR Suspects. All suspect words on the page are enclosed in boxes. Click any suspect word to show the suspect text in the Find Element dialog box.

    • Choose Document > OCR Text Recognition > Find First OCR Suspect.

      Note: If you close the Find Element window before correcting all suspect words, you can return to the process by choosing Document > OCR Text Recognition > Find First OCR Suspect, or by clicking any suspect word with the TouchUp Text tool.
  2. In the Find option, choose OCR Suspects.
  3. Compare the word in the Suspect text box with the actual word in the scanned document, and accept, correct, or ignore the word. If the suspect was incorrectly identified as text, click the Not Text button.
  4. Review and correct the remaining suspect words, and then close the Find Element dialog box.
Facebook link:
Http://tinyurl.com/elsontan

No comments: