'ocr' Service

enaio® 11.0 »

Together with the text recognition software ABBYY FineReader, the 'ocr' service converts image documents into text documents that are used for full-text indexing. It can also be used to create PDF files with hidden text, various PDF/A formats, and highly compressed PDFs.

Configuration

The 'ocr' service works with the following default settings:

PDF profile: format PDF/A1b
PDF profile: method Balanced
Text profile Predefined: TextExport.ini
File transfer to enaio® rendition-plus Stream
Number of cores for ABBYY FineReader 1

These settings can be changed via the ocr-prod.yml configuration file located in the\servicemanager\config\ directory.

Example of a configuration in the ocr-prod.yml file:

finereader:
  profile:
    pdfa: PDFA1bBalanced.ini
    text: TextExport.ini
  engine:
    numberOfCores: 1
rest:
  transferPolicy: stream

The example corresponds to the default settings.

Only the settings that differ from the default settings need to be specified.

Integrating a Profile File

You can customize the profile or create your own profile file with additional settings and integrate it via the ocr-prod.yml configuration file.

Integration example:

finereader: 
  profile:   
    pdfa: 'file://d:/enaio/OCRconfig/custom_ocr.ini'
engine:   
  numberOfCores: 4
rest:   
  transferPolicy: 'auto'  

Example of a profile file:

[PDFExportParams]
Scenario = PES_Balanced
PDFAComplianceMode = PCM_Pdfa_1b

[PrepareImageMode]
CorrectSkew = true

[PagePreprocessingParams]
CorrectOrientation=true
CorrectSkew=TSPV_No
CorrectGeometry=TSPV_No

[RecognizerParams]
TextLanguage = German,French,English
DetectLanguage = true
BalancedMode=true

[PageAnalysisParams]
DetectVerticalEuropeanText=true

[ObjectsExtractionParams]
DetectTextOnPictures=true

You can find information on the settings in the ABBYY FineReader documentation.

Examples of settings areas:

[PDFExportParams] Setting the parameters for exporting recognized text into PDF format.
[PagePreprocessingParams]   Setting parameters for preprocessing pages.
[RecognizerParams] Setting recognizer parameters such as language settings.
[PageAnalysisParams] Setting parameters for layout analyses
[ObjectsExtractionParams] Setting parameters for extracting objects