Difference between revisions of "REST Interface"

From Transkribus Wiki
Jump to: navigation, search
(Remove outdated information on Layout Analysis functions)
(OCR)
Line 57: Line 57:
 
Under the recognition sub-path, there are methods for starting OCR and HTR processing.
 
Under the recognition sub-path, there are methods for starting OCR and HTR processing.
 
===OCR===
 
===OCR===
*<code>https://transkribus.eu/TrpServer/rest/recognition/ocr?collId={collection-ID}&id={doc-ID}&page={pageNr}</code>
+
*<code>https://transkribus.eu/TrpServer/rest/recognition/ocr?collId={collection-ID}&id={doc-ID}&pages={pageNr}</code>
 
POST to this path starts an OCR job for a specific document page. the response contains the job-ID of the OCR job, which can be used to query the job status (see Jobs section).
 
POST to this path starts an OCR job for a specific document page. the response contains the job-ID of the OCR job, which can be used to query the job status (see Jobs section).
  

Revision as of 10:59, 30 October 2018

The Transkribus application communicates with the server via a set of RESTful service methods.

Visualization with SWADL

  • The complete definition can be found in the file: application.wadl
  • Visualized with Swadl
  • The parameters for all the methods described below can be found in that service description file.

Login

Most of the methods require the user to be logged in to the services which is achieved by POSTing the user credentials to the login method:

The method returns an XML with the user profile and the collections that are allowed to be accessed by this user.
Subsequent requests to the service must then include the Session-ID from the XML either in:

  • The HTTP header as cookie named "JSESSIONID"
  • The request parameter "JSESSIONID"

Collections

Once a user is authenticated, the collections, for which the user has access rights, can be listed via the following calls:

The call returns a list with collections, where each object contains the collection-ID, collection name and the role of the current user in this collection.

Documents in a collection can be listed analogously via GET request:

This call returns a list of document metadata objects.

In order to retrieve a complete document, one has to GET the following:

The returned object is made up of the document metadata and the complete page list where each page contains among other attributes:

  • A link to the page image
  • A list of transcript files where each transcript contains:
    • A link to the page XML file
    • The responsible user's ID and name
    • A timestamp
    • The edit status of the transcript, e.g. NEW, IN_PROGRESS, DONE, FINAL

For POSTing a new PAGE XML for a page in order to update the transcription the following path is given:

The method accepts the query params:

  • status: The edit status of the new transcript (see above for values)
  • overwrite: true or false. States if the recent version should be overwritten or not. Overwriting only works if the recent version was saved by the same user and the edit status is not "NEW".

Jobs

All processing tasks, such as document creation, layout analysis, HTR processing, OCR processing, etc., are run as threads (jobs) on the server. Each job has a status and can be monitored and cancelled. A job list can be retrieved at:

Jobs are either persistent, e.g. document creation, HTR, etc., or non-persistent, e.g. layout analysis jobs. The former are stored in the database while the latter are only kept in memory and are removed from the job list after 1 hour or if the server is restarted.
The details of a specific job can be retrieved at:

In order to cancel a job, a POST request has to be sent to the following path:

Recognition

Under the recognition sub-path, there are methods for starting OCR and HTR processing.

OCR

POST to this path starts an OCR job for a specific document page. the response contains the job-ID of the OCR job, which can be used to query the job status (see Jobs section).

HTR

The new HTR API is described here

Legacy HMM HTR Training and recognition

HTR Training

POST to this path starts an HTR Training job, that trains a HMM models named {model-name} based on the documents {doc-ID_1} to {doc-ID_n}. The response contains the respective job-ID that can be used to query the job status.

HTR Processing

A GET request to this path returns a list of available HMM HTR models. The attribute modelName is used to identify the model when starting a recognition process with POST to: