'consistency check' Service
The 'consistency check' service ensures the consistency of the data in the Elasticsearch database so that manual intervention directly in Elasticsearch is no longer necessary.
The data in the 'osftslog' table from the enaio® database is used to determine whether renditions and full-text processing were successful or failed.
The current main functions of the 'consistency check' service:
- Display data from the 'osftslog' table and reprocess failed jobs.
- Re-indexing of Elasticsearch indexes.
- Comparison of entries between Elasticsearch and enaio® databases based on the object ID and triggering of new processing if there are differences.
Access to the User Interface
The 'consistency check' service can be accessed via enaio® services-admin or directly via the following URL: http://<service-manager-admin-IP>:<port>. Default port: 8047.
The navigation bar consists of five tabs:
- Jobs: A list of the jobs of the 'consistency check' service.
- Failed: Shows a list of failed jobs. The corresponding jobs have the status 'failed' in the 'osftslog' table and are at least one hour old.
- Elasticsearch: Displays the current status of the connected Elasticsearch database and allows you to re-index Elasticsearch indexes.
- Object: Enables entries for objects in the Elasticsearch and enaio® databases to be compared and objects to be reprocessed.
- Swagger: Access to the API of the 'consistency check' service.
The 'Jobs' Tab
The tab displays either all active jobs or all jobs. If a job is selected, the corresponding job data is displayed. The view can be updated.
Jobs can run periodically in the background to manage data. They can have either the 'unfinished' or 'finished' status.
Information about the job is displayed. The job type is either 'hotfix' or 'update':
-
Hotfix
Repeatedly executed jobs, for example, the 'failed index job processing' job and the 'parent data updates' job of the 'index' service.
-
Update
Jobs executed once. These jobs are executed in whole or in part until completion, for example, re-indexing jobs.
Completion is indicated by an end date as a time stamp and a result.
Jobs Currently Available
Automatically executed jobs:
-
failed index job processing
Hotfix job of the 'consistency check' service, which is performed periodically. The job checks 'flag2' in the 'osftslog' table. Any unfinished or failed indexing jobs are sent for reprocessing.
-
failed rendition job processing
Hotfix job of the 'consistency check' service, which is performed periodically. The job checks 'flag2' in the 'osftslog' table. Any unfinished or failed rendition jobs are sent for reprocessing.
-
customer user addon fields re-index job
Update job of the 'consistency check' service, which is performed periodically until completion. The job converts a character string with semicolon-separated terms into an array in certain add-on fields.
-
multi-selection catalog fields re-index job
Update job of the 'consistency check' service, which is performed periodically until completion. The job converts a character string with semicolon-separated terms into an array in certain catalog fields.
Jobs that can be executed once:
-
(re)creation of autocomplete index
Manually executed job of the 'index' service. The job processes enaio® index data and creates and updates index data for auto-complete.
-
(re)creation of autocomplete index
Update job of the 'migration' service that is executed during a full-text migration. The job transfers the auto-complete index data from the previous Elasticsearch or, for enaio® versions prior to 9.10, processes the enaio® index data and creates and updates the auto-complete index data.
-
re-index elasticsearch index job
Hotfix job of the 'consistency check' service, which is performed manually. The job re-indexes the existing index and then sets the alias to the newly created index.
-
check re-indexing job
Update job of the 'migration' service that is executed during a full-text migration. The job checks and transfers any missing data from the previous to the current Elasticsearch.
-
re-indexing job
Update job of the 'migration' service that is executed during a full-text migration. The job transfers and partially converts data from the previous to the current Elasticsearch.
-
parent data update
Hotfix job of the 'index' service, which is executed by request from the CP queue. If location data is changed, all affected higher-level data is updated in all objects.
Execution of Automatic Jobs
The following automatic jobs can run depending on the data consistency:
-
failed index job processing, failed rendition job processing
Depending on the data consistency, the following automatic jobs can run after data has been migrated:
-
failed index job processing, failed rendition job processing, customer user add-on fields re-index job, multi-selection catalog fields re-index job
No automatic jobs will run during the migration.
Batch processing
The batch size can be adjusted via the ccservice-prod.yml configuration file located in the \config directory of enaio® service-manager in order to control the system load.
The batch size controls how many corrupt objects are placed in the CPB queue for processing at the same time.
Parameter: osfts.scrollsize, default value: 500
The 'Failed' Tab
The 'Failed' tab displays all failed jobs from the 'osftslog' table of the enaio® database.
The view can be filtered via the enaio® object ID and by date and time period. Date format: yyyyMMdd or yyyy-MM-dd. The view must be updated after entries via Refresh.
A maximum of 1,000 entries can be displayed. The entries displayed are at least one hour old in order to avoid conflicts due to access to running processes.
The Flag2 column shows the following error numbers:
-
Error numbers less than 1000 refer to indexing errors.
-
Error numbers greater than 1000 refer to rendition errors.
Flag2 | Description |
---|---|
0 | No text; the object is a register, a folder, a document without pages. |
1 | Old 'OK status', no longer set from enaio® documentviewer 8.50. |
2 | Obsolete error number, no longer set. |
3 | Obsolete error number, no longer set. |
200 | Text exists; an empty string/empty text is valid. |
404 | The ID is unknown to the rendition service; there is no text yet. |
422 | An error occurred while extracting text; no text is returned. |
500–599 | Server error: enaio® server, Tomcat, ABBYY FineReader. |
1001 | When the rendition route begins. |
1002 | When the document is transferred to the OCR queue. |
1003 | When the document is picked up by the 'ocr' service. |
1004 | When processing has been successfully completed. |
1005 | Renditions for this object are deleted. |
1006 | Rendition generation terminated with errors. |
Reprocessing of Failed Jobs
Entries in the Index and Rendition columns can be selected on the 'Failed' tab and reprocessed using Reprocess: The objects are placed in the CPB queue.
-
Index
Objects will be re-indexed in Elasticsearch.
Re-indexing folders and registers does not automatically lead to the re-indexing of the documents they contain.
-
Rendition
Reprocessing of objects in the rendition service.
Reprocessing in the rendition service is always followed by re-indexing in Elasticsearch.
Entries can also be selected using the error number. Clicking the error number in the Flag2 column selects all entries with the same error number corresponding to the errors in the Index or Rendition column.
Re-indexing and reprocessing in the rendition service can significantly consume system resources. For this reason, errors should be analyzed in detail and the causes of errors rectified before a large number of objects are processed again.
The 'Elasticsearch' Tab
You can use the 'Elasticsearch' tab to view the status of Elasticsearch and specific Elasticsearch indexes can be re-indexed.
Re-indexing in Elasticsearch
Select an Elasticsearch index for re-indexing from the list.
Elasticsearch indexes:
- autocomplete
Data for auto-complete - enaioblue
All full-text data - fieldinfo
Data from the index data fields - locationinfo
Data on location - systeminfo
Job processing data for running and completed jobs, hotfix jobs, and update jobs - tmp
Object IDs of objects whose higher-level data needs to be updated - searchlog
Data on requested searches of the 'search' service
Re-indexing is started via Reindex:
-
Re-indexing is carried out asynchronously.
-
Re-indexing can take time and can be viewed on the 'Jobs' tab via Active Jobs.
-
An error message is displayed in the event of errors.
-
The 'Jobs' tab opens once the re-indexing is successful.
-
It is possible to re-index the same index multiple times. However, it does not take place at the same time, but rather one after the other.
-
The naming of the index is managed by the service and takes place in a sequential format: index name + "" + sequence number. The old index is deleted once all the data has been moved, and the new one is given an alias: Index name without "" and sequence number.
The 'Object' Tab
The 'Object' tab can be used to compare the data of an object between the enaio® database and the data in Elasticsearch.
The 'consistency check' and 'index' services must run in the same discovery service zone/cluster for a comparison.
For the comparison, an enaio® object ID is entered and the comparison is carried out via Compare. If there are no differences, then no difference is displayed.
An object can be re-indexed in Elasticsearch or forwarded to the rendition service for reprocessing via the 'Object' and Reprocess tab.
An error message is displayed in the event of errors.