Difference between revisions of "REST Interface"

From Transkribus Wiki
Jump to: navigation, search
Line 13: Line 13:
* The HTTP header as cookie named "JSESSIONID"
* The HTTP header as cookie named "JSESSIONID"
* The request parameter "JSESSIONID"
* The request parameter "JSESSIONID"
For testing the service methods in the browser, there is a similar login method that works via GET:<br/>

Revision as of 14:33, 2 June 2017

The Transkribus application communicates with the server via a set of RESTful service methods.

Visualization with SWADL

  • The complete definition can be found in the file: application.wadl
  • Visualized with Swadl
  • The parameters for all the methods described below can be found in that service description file.


Most of the methods require the user to be logged in to the services which is achieved by POSTing the user credentials to the login method:

The method returns an XML with the user profile and the collections that are allowed to be accessed by this user.
Subsequent requests to the service must then include the Session-ID from the XML either in:

  • The HTTP header as cookie named "JSESSIONID"
  • The request parameter "JSESSIONID"


Once a user is authenticated, the collections, for which the user has access rights, can be listed via the following calls:

The call returns a list with collections, where each object contains the collection-ID, collection name and the role of the current user in this collection.

Documents in a collection can be listed analogously via GET request:

This call returns a list of document metadata objects.

In order to retrieve a complete document, one has to GET the following:

The returned object is made up of the document metadata and the complete page list where each page contains among other attributes:

  • A link to the page image
  • A list of transcript files where each transcript contains:
    • A link to the page XML file
    • The responsible user's ID and name
    • A timestamp
    • The edit status of the transcript, e.g. NEW, IN_PROGRESS, DONE, FINAL

For POSTing a new PAGE XML for a page in order to update the transcription the following path is given:

The method accepts the query params:

  • status: The edit status of the new transcript (see above for values)
  • overwrite: true or false. States if the recent version should be overwritten or not. Overwriting only works if the recent version was saved by the same user and the edit status is not "NEW".


All processing tasks, such as document creation, layout analysis, HTR processing, OCR processing, etc., are run as threads (jobs) on the server. Each job has a status and can be monitored and cancelled. A job list can be retrieved at:

Jobs are either persistent, e.g. document creation, HTR, etc., or non-persistent, e.g. layout analysis jobs. The former are stored in the database while the latter are only kept in memory and are removed from the job list after 1 hour or if the server is restarted.
The details of a specific job can be retrieved at:

In order to cancel a job, a POST request has to be sent to the following path:


The methods in the layout analysis path can be used to start jobs for automatically detecting textblocks, lines, and baselines within lines. All methods return the job-ID of the created job which can be used to query the job status. The result of the computation is stored as a new transcript version.

Detect blocks

Detect lines

The detect line algorithm optionally allows to specify region-IDs, where the lines should be detected.

Detect baselines

The detect baseline algorithm optionally allows to specify region-IDs, where the lines should be detected.


Under the recognition sub-path, there are methods for starting OCR and HTR processing.


POST to this path starts an OCR job for a specific document page. the response contains the job-ID of the OCR job, which can be used to query the job status (see Jobs section).


The new HTR API is described here

Legacy HMM HTR Training and recognition

HTR Training

POST to this path starts an HTR Training job, that trains a HMM models named {model-name} based on the documents {doc-ID_1} to {doc-ID_n}. The response contains the respective job-ID that can be used to query the job status.

HTR Processing

A GET request to this path returns a list of available HMM HTR models. The attribute modelName is used to identify the model when starting a recognition process with POST to: