Installing an OCR Component

yuuvis® RAD 10.x »

The 'ocrservice' microservice includes the text recognition software ABBYY FineReader, which generates text documents for full-text indexing and PDF files with hidden text from image documents. The 'Index file for full-text search' property is required for the object types.

Tesseract and ABBYY FineReader are available as OCR components.

FineReader/Tesseract in comparison
  FineReader Tesseract
License SMUA license provided by OPTIMAL SYSTEMS

License-free

Apache license version 2.0

Installation Installation via a setup file included as part of the installation data Part of the yuuvis® RAD service-manager installation
Languages Additional fees may be apply for further languages Additional languages available free of charge
Supported image formats FineReader documentation Tesseract documentation
PDF rendition with hidden text Yes Yes
PDF/A rendition Yes No
Barcode recognition Yes No
Number of cores May be subject to additional fees depending on the license Not licensed-based/no additional costs, default: 4

Tesseract

Tesseract is installed as part of yuuvis® RAD service-manager. Tesseract is preconfigured if the corresponding option is activated.

The <service-manager>\config\ocr-prod.yml configuration file is created during installation. The file contains the languages that are specified during installation. The file can be edited to include additional or other languages.

Example:

tesseract:
  languages: deu,eng 

The 'ocrservice' service for Tesseract is included in the <service-manager>\config\servicewatcher-sw.yml configuration file:

- name: ocrservice
  type: microservice
  profiles: prod,cloud,red,tesseract
  instances: 1
  memory: 512M
  port: 7241-7250
  path: ${appBase}/ocrservice/ocrservice-app.jar
  env:
    ProgramData: null
    ALLUSERSPROFILE: null
    #OMP_THREAD_LIMIT: 4

The OCR engine must be configured in the route.properties configuration file located in the \rendition-plus\webapps\osrenditioncache\WEB-INF\classes\config\ directory:

ocr-engine=finereader

In general, the finereader parameter activates an OCR component (ABBYY FineReader or Tesseract).

Language for Tesseract

The following languages are available for Tesseract:

Abbreviation Language
chi_sim Chinese (simplified)
chi_sim_vert Chinese vertical (simplified)
eng German
eng English
fra French
ind Indonesian
ita Italian
jpn Japanese
jpn_vert Japanese vertical
kor Korean
kor Korean vertical
msa Malay
spa Spanish
tha Thai

The language files for these languages are installed in the \<service-manager>\data\tesseract_data directory. Other languages are available for download and must be copied to this directory.

ABBYY FineReader

You need the SMUA license (Software Maintenance and Upgrade Assurance), which you purchase from OPTIMAL SYSTEMS and integrate during installation when installing ABBYY FineReader.

ABBYY FineReader must be installed on a workstation on which yuuvis® RAD service-manager is installed with the 'ocr', 'adminservice', 'discoveryservice', and 'renditionsidecar' services.

ABBYY FineReader is installed via the setup.exe application from the \finereader installation directory. Follow the installation dialogs.

After installation, settings for PDF creation in particular can be adapted via the ocr-prod.yml configuration file located in the \<service-manager>\config\ directory.

Integration into the Microservice Infrastructure

To integrate ABBYY FineReader, follow these steps

  • Enter the number of instances in the servicewatcher-sw.yml configuration file located in the \<service-manager>\config\ directory:
  • - name: ocrservice
      type: microservice
      profiles: prod,cloud,red
      instances: 0
      memory: 128M
      port: 7241-7250
      path: ${appBase}/ocrservice/ocrservice-app.jar
      env:
        ProgramData: null
        ALLUSERSPROFILE: null

  • If yuuvis® RAD rendition-plus is also not installed at the workstation, then the IP of yuuvis® RAD rendition-plus must be entered in the application-red.yml configuration file located in the \<service-manager>\config\ directory:
    yuuvis.rendition.server: <host>:8090
  • ABBYY FineReader must be configured as an OCR component in the route.properties configuration file located in the \rendition-plus\webapps\osrenditioncache\WEB-INF\classes\config\ directory:
    ocr-engine=finereader
  • In general, the finereader parameter activates an OCR component (ABBYY FineReader or Tesseract).

Uninstalling

You can uninstall ABBYY FineReader via the Windows Control Panel.

Updates

For information on updating components, see Release Information.