Automatically Generate Text From PDF Files.

January 7, 2005 | KB: 1000849
Snapshot 6

Summary

There are a variety of factors that determine whether Snapshot will generate text from PDF files. Each factor is listed below.

  • When creating a PDF file, make sure that you are creating it from a file that contains text. Creating a PDF file from an image will not associate text with the PDF file.
  • When printing the PDF file, make sure that the Write Text File check box is marked. This setting can be found on the File Formats tab of the Laserfiche Snapshot Properties dialog box.
  • When printing the PDF file, make sure that the Print as image check box is cleared. This setting can be found in Adobe 6 by clicking Advanced from the Print dialog box.
  • If the PDF file uses embedded fonts, then you should print the PDF file using the Snapshot printer from Adobe 5.
  • If fonts are not embeded in the PDF file, then you may use Adobe 6.

More Information

Snapshot does not process printed images with OCR. It relies on the native application to send text to the Snapshot printer driver. If the application does not sent any text to the printer driver when printing, Snapshot will not generate text for that file. Consequently, you will need to process the image with OCR from Laserfiche.

One factor that determines whether text will be generated from a PDF file is the manner in which the PDF is created. Some PDFs are plain images wrapped in the PDF file format. This means that these PDFs technically do not contain any text. An example of such a PDF is one created from a scanned image. This will generate an image PDF that is not associated with text. This will prevent text from being generated for the PDF file when it is processed by Snapshot.