Text Recognition and PDF/A Generation with the 'ocrservice’ Microservice

yuuvis® RAD 9.x »

The 'ocrservice' service is a microservice that generates text documents used for full-text indexing from image documents in conjunction with the text recognition software ABBYY FineReader. It can also be used to generate PDF files with hidden text, various PDF/A formats, and highly compressed PDFs.

Configuration

The 'ocrservice' service works with the following default settings:

PDF profile: format PDF/A1b
PDF profile: procedure Balanced
Text profile Default: TextExport.ini
File transfer to yuuvis® RAD rendition-plus Stream
Number of cores for ABBYY FineReader 1

These settings can be modified in the ocr-prod.yml configuration file located in the \servicemanager\config directory.

Example of a configuration in the ocr-prod.yml file:

finereader:
  profile:
    pdfa: PDFA1bBalanced.ini
    text: TextExport.ini
  engine:
    numberOfCores: 1
rest:
  transferPolicy: stream

The example corresponds to the default settings.

Only the settings that differ from the default settings need to be entered.

Integrating the Profile File

You can customize the profile or create your own profile file with additional settings and include it using the ocr-prod.yml configuration file.

Example of the integration:

finereader: 
  profile:   
    pdfa: 'file://d:/yuuvis/OCRconfig/custom_ocr.ini'
engine:   
  numberOfCores: 4
rest:   
  transferPolicy: 'auto'  

Example of a profile file:

[PDFExportParams]
Scenario = PES_Balanced
PDFAComplianceMode = PCM_Pdfa_1b

[PagePreprocessingParams]
CorrectOrientation=false
GeometryCorrectionMode=GCM_DontCorrect
CorrectSkew=TSPV_No

[RecognizerParams]
TextLanguage = German,French,English
DetectLanguage = true
BalancedMode=true

[PageAnalysisParams]
DetectVerticalEuropeanText=true

[ObjectsExtractionParams]
DetectTextOnPictures=true

Further information on settings can be found in the documentation of ABBYY FineReader.

Examples of setting areas:

[PDFExportParams] Setting parameters for exporting recognized text to PDF format
[PagePreprocessingParams]   Setting parameters for page preprocessing
[RecognizerParams] Setting recognition parameters such as language settings
[PageAnalysisParams] Setting parameters for layout analyses
[ObjectsExtractionParams] Setting parameters for the extraction of objects