Text Recognition and PDF/A Generation with ABBYY FineReader

yuuvis® RAD 10.x

The 'ocrservice' service is a microservice that, in collaboration with an installation, generates text documents used for full-text indexing from image documents in conjunction with the text recognition software ABBYY FineReader. PDF files with hidden text, various PDF/A formats, and highly compressed PDFs can also be generated.

The 'Index file for full-text search' property is required for the object types.

Tesseract can be installed and integrated as an OCR component instead of ABBYY FineReader.

Configuration

The 'ocrservice' service with ABBYY FineReader works with settings that must be specified in the ocr-prod.yml configuration file located in the \servicemanager\config\ directory.

Example of a configuration in the ocr-prod.yml file:

finereader:
  profile:
    pdfa: PDFA1bBalanced.ini
    text: TextExport.ini
  engine:
    numberOfCores: 1
rest:
  transferPolicy: stream
                                                            

The parameters from the example have the following function:

PDF profile: format	PDF/A1b
PDF profile: procedure	Balanced
Text profile	Default: TextExport.ini
File transfer to yuuvis® RAD rendition-plus	Stream
Number of cores for ABBYY FineReader	1

Customizing a profile

Customizing a PDF profile

To customize the PDF format and the method, assign a value to the finereader:profile:pdfa property that consists of a format and procedure: <format><procedure>.ini

The following formats can be generated:

Format	Spelling
PDF	PDF
PDF/A1a	PDFA1a
PDF/A1b	PDFA1b
PDF/A2a	PDFA2a
PDF/A2u	PDFA2u
PDF/A3a	PDFA3a
PDF/A3u	PDFA3u

The following processes are available:

Process	Description
MaxQuality	Generates results with the best resolution. Speed and degree of compression are of secondary importance.
MaxSpeed	Generates results based on the fastest process. The quality and degree of compression are of secondary importance.
MinSize	Generates results with the smallest file size. Speed and quality are of secondary importance.
Balanced	Generates results that establish a healthy balance between quality, speed, and degree of compression.

Customizing the text profile

TextExport.ini is currently the only text profile for creating texts that is available.

Defining how files are transferred

To specify how the files are transferred, assign a value to the rest:transferPolicy property.

Transfer types
File transfer	Description
stream	Transfer via an HTTP stream
fileref	Transfer via file system reference
auto	The transfer type is selected automatically. The IP address of the yuuvis® RAD rendition-plus end point determines whether yuuvis® RAD rendition-plus and the 'ocrservice' service run on the same computer. If so, the transfer is carried out via file system references; if not, it is performed via an HTTP stream.

If you want to file objects for long-term archiving and they are not yet available in PDF/A format, it is possible to use the service in conjunction with ABBYY FineReader to create a file in PDF/A format and save it as a new version of the object. In that case, please contact the consulting team at OPTIMAL SYSTEMS.

Integrating the Profile File

You can customize the profile or create your own profile file with additional settings and include it using the ocr-prod.yml configuration file.

Example of the integration:

finereader: 
  profile:   
    pdfa: 'file://d:/yuuvis/OCRconfig/custom_ocr.ini'
  engine:   
    numberOfCores: 4
rest:   
  transferPolicy: 'auto'  
                                                            

Example of a profile file:

[PDFExportParams]
Scenario = PES_Balanced
PDFAComplianceMode = PCM_Pdfa_1b

[PagePreprocessingParams]
CorrectOrientation=true
CorrectSkew=TSPV_No
CorrectGeometry=TSPV_No

[PrepareImageMode]
CorrectSkew = false

[RecognizerParams]
TextLanguage = German,French,English
DetectLanguage = true
BalancedMode=true

[PageAnalysisParams]
DetectVerticalEuropeanText=true

[ObjectsExtractionParams]
DetectTextOnPictures=true
                                                            

Further information on settings can be found in the documentation of ABBYY FineReader.

Examples of setting areas:

[PDFExportParams]	Setting parameters for exporting recognized text to PDF format
[PagePreprocessingParams]	Setting parameters for page preprocessing
[PrepareImageMode]	Setting parameters for image optimization before processing.
[RecognizerParams]	Setting recognition parameters such as language settings
[PageAnalysisParams]	Setting parameters for layout analyses
[ObjectsExtractionParams]	Setting parameters for the extraction of objects