Suggested Practices for Better OCR Results

July 19, 2004 | KB: 1000643
Laserfiche 6, Plus 6, Scanning 6

Summary

Optical Character Recognition is used to create text from a scanned image. Follow these suggested practices to achieve better OCR Results:

  • OCR only those images that have clear typewritten or printed text. Do not OCR images that contain handwritten text, photographs, or other artwork.
  • If your document has no typewritten or typeset text, you can use templates to carefully categorize the document for easy retrieval later.
  • Remember that image-only documents can still be retrieved by searching the fields.
  • OCR only those images with text that reads from left to right on the page. If you can clearly read the text on an image without needing to rotate it, the image will translate properly.
  • Text that appears too dark or too light may OCR, but with mixed results. If the text on the image appears too dark, zoom in on the questionable area. If the letters do not run together, the text should OCR. On the other hand, if the image is too light, zoom in to see if any of the characters have gaps in them. If there are no gaps, the text should OCR.
  • If the scanned image appears too dark or faded on your monitor, OCR may misinterpret the text. The clearer the image, the more accurate the text translation. As a general rule, if the letters of text on an image are complete but not touching when displayed in the zoom mode, OCR should translate the text accurately.
  • To correct images that are too light or too dark, you must re-scan them. The lightness/darkness controls on your document window only affect your monitor display.
  • The OCR engine recognizes most typed and typeset text between 6 and 24 points in size. Ultimately, the accuracy of the OCR depends largely on the clarity of the paper original.
  • LaserFiche cannot perform full-text indexing or full-text searches on documents with more than about one megabyte of text. If you are scanning or importing a long document such as a book, you may want to break it up into several smaller documents.