<< Click here to display Table of Contents >> »[Top of section]«

Using plugins: OCR plugin

Contents

Optical character recognition routines convert images of text into real characters which may be used to supply text to EscapeE in the same way that a database supplies values to fields.

If you have the Tesseract optical character recognition routine installed you can use the tesseract plugin. This supersedes the OCR plugin (uses Microsoft Office Document Imaging – the MODI optical character recognition routine) documented below.

To configure the OCR plugin:

1.	Sweep out the part of the page containing the image of the text and select New field... from the pop-up menu to define a field as usual.

2.	Select OCR plugin from the list on the 'Advanced' page then click Configure. The "Include text when exporting?" dialog pops up.

3.	Choose either: Yes to include the "recognized" text in the document. It will be hidden under the image so that it is available for use (e.g. searching) without showing the actual characters. Or No to exclude recognized text from the exported document.

4.	The "Leave blank if OCR failure?" dialog pops up. Choose: No to show "OCR failed" message whenever no text can be recognized, or Yes when, for example, it is likely that there is no text to be found in the field area on some of the pages. In this case, the warning would be superfluous.

Click OK.

Example

For example, a field named OCRfield could be used in a composite field whose value was {OCRfield}. This field could have a tag so that its action and sub-fields would be conditional on the OCRfield having a particular value.

Note

If MODI is not installed on your system, you may simulate it by means of a further dialog that pops up automatically:

6.	In "Microsoft Office Document Imaging - not installed. Simulate OCR?" dialog, choose Yes.

Choosing Yes exports dummy text where OCR text would occur in the document. Up to 5 lines are generated so as to fill the field area at 1/6" line spacing. For example, a field named ADDRESS might appear as:

Line 1 of field ADDRESS
Line 2 of field ADDRESS

Choosing No instead just causes the field to be blanked or to contain the error message according to the configuration set in step 4 above. Choosing Cancel causes an error message box each time OCR is attempted.