Importing Documents via Core API

1. Overview

This tutorial shows how documents can be imported into a yuuvis® API system via the Core API. During this tutorial, a short Java application will be developed that implements the HTTP requests for importing documents. We additionally provide a JavaScript version of this tutorial.

Check out our graphical overview of the architecture which describes the basic use case flow for importing documents.

2. Requirements

To work through this tutorial, the following is required:

Set-up yuuvis® API system (see Installation Guide)
A user with at least read permissions on a document type in the system (see linK:https://help.optimal-systems.com/yuuvis/Momentum/{short-version}/tutorials/login_to_the_core_api.html[tutorial for permissions])
Simple Maven project

3. Maven Configuration

Our Java client will submit its requests to the Core API using OkHttp 3.12 by Square, Inc. Therefore, the following block must be added to the Maven dependencies in the pom.xml of the project:

pom.xml

<dependency>
    <groupId>com.squareup.okhttp3</groupId>
    <artifactId>okhttp</artifactId>
    <version>3.12.0</version>
</dependency>

Client Configuration To interact with the yuuvis® API system via the Core API, we use an OkHttp3 client to send HTTP requests and read their responses.

OkHttp3 Client and Variables

String baseUrl = "http://127.0.0.1"; //baseUrl of gateway: "http://<host>:<port>"
String username = "clouduser";
String userpassword = "secret";
String tenant = "default";
String auth = java.util.Base64.getEncoder().encodeToString((username + ":" + userpassword).getBytes());

OkHttpClient.Builder builder = new OkHttpClient.Builder();
OkHttpClient client = builder.build();

For more information on setting up the OkHttp3 client with cookie handling, please refer to this Login Tutorial.

4. Importing a Single Document

To import a document using the Java client, we need the metadata and optionally the content of the document (depending on the schema definition, there are document types that may or must have content or must not have content).

The metadata when importing a document has the following format:

metaData.json

{
    "objects": [{
        "properties": {
            "objectTypeId": {
                "value": "document"
            },
            "Name": {
                "value": "test import"
            }
        },
        "contentStreams": [{
            "cid": "cid_63apple"
        }]
    }]
}

In our example, the schema contains an object type document with the Name property, which may or must have content. The content is referenced in the contentStreams object by specifying a cid (multipart content ID). In the example, the cid references a multipart content with content ID cid_63apple.

A content file can be in different file formats. It is recommended to specify the format correctly in the metadata and in the multipart request. If the content type is not specified, it is automatically determined during the content analysis. If the content type determination is not clear or the content analysis is switched off, the content type application/octet-stream is used.

In our example we have chosen a text file (Content-Type: text/plain).

4.1. Request

For an import, a POST request must be sent to the endpoint /api/dms/objects with a multipart body consisting of metadata and, if applicable, content of the object to be imported. To construct such a request, we use a MultipartBody.Builder(), which allows us to build the request body from several form parts as follows.

Building the Multipart Body with OkHttp3


RequestBody requestBody = new MultipartBody.Builder()
        .setType(MultipartBody.FORM)
        .addFormDataPart("data", "metaData.json",
            RequestBody.create(MediaType.parse("application/json; charset=utf-8"),
                new File("./src/main/resources/metaData.json")))
        .addFormDataPart("cid_63apple", "test.txt",
            RequestBody.create(MediaType.parse("text/plain; charset=utf-8"),
                new File("./src/main/resources/test.txt")))
        .build();

We use a Request.Builder() to create a request object with the multipart body, headers, and the URL. The following headers are necessary for the import because they contain user information of the user accessing the endpoint: Authorization header that contains the Base64-coded credentials of the user and an X-ID-TENANT-NAME header that contains the tenant name of the user. If the used OkHttp client supports cookie handling, the Authorization header can be omitted after the client’s first request, since the logon information is stored in a session cookie (see also Login Tutorial).

Building a POST Request for an Import

Request request = new Request.Builder()
        .header("Authorization", "Basic "+ auth)
        .header("X-ID-TENANT-NAME", tenant)
        .url(baseUrl + "/api/dms/objects") //baseUrl: "http://<host>:<port>"
        .post(requestBody)
        .build();

4.2. Response

To display the response of the API to the console, we create an associated response object when the request is executed. Please note that an IOException can be thrown by the OkHttpClient when creating the response object.

Handling any IOException

try{
    Response response = client.newCall(request).execute();
    System.out.println(response.body().string());   //print to console
} catch (IOException e) {
    e.printStackTrace();
}

5. Importing Multiple Documents in Batch Mode

If multiple documents are to be imported at the same time, this can be done using the same endpoint of the Core API. Instead of a single object, the objects list consists of several metadata records. The individual content files of the objects then each require a unique cid as the name of the form-data parts in the multipart request. This cid is referenced in the associated metadata record in the contentStreams list, which allows metadata to be uniquely assigned to content.

metaDataBatch.json

{
    "objects": [{
        "properties": {
            "objectTypeId": {
                "value": "document"
            },
            "Name": {
                "value": "test import object 1"
            }
        },
        "contentStreams": [{
            "cid": "cid_63apple"
        }]
    },
    {
      "properties": {
            "objectTypeId": {
                "value": "document"
            },
            "Name": {
                "value": "test import object 2"
            }
        },
        "contentStreams": [{
            "cid": "cid_64apple"
        }]
    }]
}

5.1. Batch Request

In the multipart body, we create a separate FormDataPart for the content of each object, whose first parameter is the content ID (cid).

Building a POST Request for a Batch Import

RequestBody batchImportRequestBody = new MultipartBody
        .Builder()
        .setType(MultipartBody.FORM)
        .addFormDataPart("data",
            "metaDataBatch.json",
            RequestBody.create(MediaType.parse("application/json; charset=utf-8"),
                new File("./src/main/resources/metaDataBatch.json")))
        .addFormDataPart("cid_63apple",
            "test1.txt",
            RequestBody.create(MediaType.parse("text/plain; charset=utf-8"),
                new File("./src/main/resources/test1.txt")))
        .addFormDataPart("cid_64apple",
            "test2.txt",
            RequestBody.create(MediaType.parse("text/plain; charset=utf-8"),
                new File("./src/main/resources/test2.txt")))
        .build();

The assembly of the request object is identical to the normal import.

5.2. Response of Batch Request

If successful, the response object contains a multi-element objects list that contains the metadata records of all objects imported in this batch import.

6. Referencing an Existing Binary Content File

To save storage in your repository, multiple documents can reference the same binary content file. Specify contentStreamId and repositoryId of the existing binary content file in the import request as shown in the following example.

The archivePath is only required if its value cannot be determined from the current request. For example, it is possible to configure a pathTemplate containing dynamic path elements like DATE in the application-storage.yml configuration file. Values for archivePath resulting from such dynamic path templated cannot be reconstructed for subsequently created objects that reference the same binary content.

metaData.json

{
    "objects": [{
        "properties": {
            "objectTypeId": {
                "value": "document"
            },
            "Name": {
                "value": "test import with reference on existing content"
            }
        },
        "contentStreams": [{
            "contentStreamId": "DB886B37-FEEF-11E9-BCEC-3FAD551A2B8A",
            "archivePath": "default/DOCUMENT/2024-03-21/86/B/",
            "repositoryId": "repo252"
        }]
    }]
}

A special case is the creation of compound documents and corresponding sub-documents.

7. Summary

In this tutorial an OkHttpClient with Cookie-Handling was used to import documents via the Core API, both in batch mode and individually.

A complete code example can be found in this git repository.