Difference between revisions of "Upload via REST API"

From Transkribus Wiki
Jump to: navigation, search
(https://transkribus.eu/TrpServerTesting/rest/uploads)
(https://transkribus.eu/TrpServerTesting/rest/uploads)
Line 1: Line 1:
 
The path <code>/rest/uploads</code> includes endpoints that allow to import a document into Transkribus.
 
The path <code>/rest/uploads</code> includes endpoints that allow to import a document into Transkribus.
  
==<code>https://transkribus.eu/TrpServerTesting/rest/uploads</code>==
+
==<code>https://transkribus.eu/TrpServerTesting/rest/uploads?collId={collectionID}</code>==
  
 
A <code>POST</code> request to this endpoint creates a new upload process on the server. It is mandatory to set the query parameter <code>collId</code> which must include the ID of a collection where the user has write access.<br/>
 
A <code>POST</code> request to this endpoint creates a new upload process on the server. It is mandatory to set the query parameter <code>collId</code> which must include the ID of a collection where the user has write access.<br/>

Revision as of 16:43, 23 August 2017

The path /rest/uploads includes endpoints that allow to import a document into Transkribus.

https://transkribus.eu/TrpServerTesting/rest/uploads?collId={collectionID}

A POST request to this endpoint creates a new upload process on the server. It is mandatory to set the query parameter collId which must include the ID of a collection where the user has write access.
If the Content-Type header is set to application/xml, the request body must include a valid Transkribus METS XML (TODO).
If the header specifies application/json then a JSON object of the following form is expected:

{
    "md": {
        "title": "Bentham Box 35",
        "author": "Jeremy Bentham",
        "genre": "Notes",
        "writer": "Secretary"
    },
    "pageList": {"pages": [
        {
            "fileName": "035_320_001.jpg",
            "pageXmlName": "035_320_001.xml",
            "pageNr": 1,
            "imgChecksum": "9d531932c8e24d5a5dc13c92063698c9",
            "pageXmlChecksum": "b644a9c34a65ee07c1c576194e720b4a"
        },
        {
            "fileName": "035_321_001.jpg",
            "pageXmlName": "035_321_001.xml",
            "pageNr": 2,
            "imgChecksum": "e3ae1a862b9cd53cc87c9325d2502547",
            "pageXmlChecksum": "8ba4758b8b8d5df562e25809692be340"
        }
    ]}
}

Besides some basic (optional) metadata, this object defines the structure of the document to upload including the filenames to expect.
A page object just has to have a fileName and a pageNr. All other fields are optional! The checksums must be computed with MD5, if used.
The response to this request will return an enriched object of the same type. It will include a unique upload ID (field uploadId) that is to be used for the following requests.

https://transkribus.eu/TrpServerTesting/rest/uploads/{uploadId}

This endpoint is used to PUT the files for each page to Transkribus. Note, that the path now includes the uploadId from the response of the initial request.
The Content-Type of each request must be multipart/form-data and it has to include the complete data for one page, i.e. if a pageXmlName was set in the given structure object, then the image as well as the XML have to be delivered.
The body part names to be used are img and xml respectively and both should be sent as application/octet-stream.
If checksums have been defined, then the server will check the files upon each request and respond with 200 only if the transmission was flawless.</br> Once all files have been delivered successfully, the server will automatically start the ingest process. A GET request can be used to poll the status of the upload process. Eventually it will include a field jobId that can be used to monitor the ingest process via GET requests to https://transkribus.eu/TrpServerTesting/rest/jobs/{id}.