Text Recognition and PDF/A Generation with the 'ocrservice’ Microservice
The 'ocrservice' service is a microservice that generates text documents used for full-text indexing from image documents in conjunction with the text recognition software ABBYY FineReader. It can also be used to generate PDF files with hidden text, various PDF/A formats, and highly compressed PDFs.
Configuration
The 'ocrservice' service works with the following default settings:
PDF profile: format | PDF/A1b |
PDF profile: procedure | Balanced |
Text profile | Default: TextExport.ini |
File transfer to yuuvis® RAD rendition-plus | Stream |
Number of cores for ABBYY FineReader | 1 |
These settings can be modified in the ocr-prod.yml configuration file located in the \servicemanager\config directory.
Example of a configuration in the ocr-prod.yml file:
finereader:
profile:
pdfa: PDFA1bBalanced.ini
text: TextExport.ini
engine:
numberOfCores: 1
rest:
transferPolicy: stream
The example corresponds to the default settings.
Only the settings that differ from the default settings need to be entered.

-
Customizing a PDF profile
To customize the PDF format and the method, assign a value to the finereader:profile:pdfa property that consists of a format and procedure: <Format><procedure>.ini
The following formats can be generated:
Format Spelling PDF PDF PDF/A1a PDFA1a PDF/A1b PDFA1b PDF/A2a PDFA2a PDF/A2u PDFA2u PDF/A3a PDFA3a PDF/A3u PDFA3u The following processes are available:
Process Description MaxQuality Generates results with the best resolution. Speed and degree of compression are of secondary importance. MaxSpeed Generates results based on the fastest process. The quality and degree of compression are of secondary importance. MinSize Generates results with the smallest file size. Speed and quality are of secondary importance. Balanced Generates results that establish a healthy balance between quality, speed, and degree of compression. -
Customizing the text profile
TextExport.ini is currently the only text profile for creating texts that is available.

To specify how the files are transferred, assign a value to the rest:transferPolicy property.
File transfer | Description |
---|---|
stream | Transfer via an HTTP stream |
fileref | Transfer via file system reference |
auto |
The transfer type is selected automatically. The IP address of the yuuvis® RAD rendition-plus end point determines whether yuuvis® RAD rendition-plus and the 'ocrservice' service run on the same computer. If so, the transfer is carried out via file system references; if not, it is performed via an HTTP stream. |

The maximum number of cores that ABBYY FineReader can work with depends on the license purchased.
Entry: finereader:engine:numberOfCores: <number>
Integrating the Profile File
You can customize the profile or create your own profile file with additional settings and include it using the ocr-prod.yml configuration file.
Example of the integration:
finereader:
profile:
pdfa: 'file://d:/yuuvis/OCRconfig/custom_ocr.ini'
engine:
numberOfCores: 4
rest:
transferPolicy: 'auto'
Example of a profile file:
[PDFExportParams]
Scenario = PES_Balanced
PDFAComplianceMode = PCM_Pdfa_1b
[PagePreprocessingParams]
CorrectOrientation=false
GeometryCorrectionMode=GCM_DontCorrect
CorrectSkew=TSPV_No
[RecognizerParams]
TextLanguage = German,French,English
DetectLanguage = true
BalancedMode=true
[PageAnalysisParams]
DetectVerticalEuropeanText=true
[ObjectsExtractionParams]
DetectTextOnPictures=true
Further information on settings can be found in the documentation of ABBYY FineReader.
Examples of setting areas:
[PDFExportParams] | Setting parameters for exporting recognized text to PDF format |
[PagePreprocessingParams] | Setting parameters for page preprocessing |
[RecognizerParams] | Setting recognition parameters such as language settings |
[PageAnalysisParams] | Setting parameters for layout analyses |
[ObjectsExtractionParams] | Setting parameters for the extraction of objects |