Difference between revisions of "Upload via REST API"

From Transkribus Wiki
Jump to: navigation, search
(https://transkribus.eu/TrpServerTesting/rest/uploads/{uploadId})
(https://transkribus.eu/TrpServerTesting/rest/uploads/{uploadId})
 
Line 39: Line 39:
  
 
This endpoint is used to <code>PUT</code> the files for each page to Transkribus. Note, that the path now includes the uploadId from the response of the initial request.<br/>
 
This endpoint is used to <code>PUT</code> the files for each page to Transkribus. Note, that the path now includes the uploadId from the response of the initial request.<br/>
The Content-Type of each request must be <code>multipart/form-data</code> and it has to include the complete data for one page, i.e. if a pageXmlName was set in the given structure object, then the image as well as the XML have to be delivered.<br/>
+
The <code>Content-Type</code> of each request has to be <code>multipart/form-data</code> and it must include the complete data for one page, i.e. if a pageXmlName was set in the given structure object, then the image as well as the XML have to be delivered. It depends on the used library whether the <code>Content-Type</code> has to be set explicitly. Please refer to the respective documentation on multipart requests.<br/>
 
The body part names to be used are <code>img</code> and <code>xml</code> respectively and both should be sent as <code>application/octet-stream</code>.<br/>
 
The body part names to be used are <code>img</code> and <code>xml</code> respectively and both should be sent as <code>application/octet-stream</code>.<br/>
 
If checksums have been defined, then the server will check the files upon each request and respond with <code>200</code> only if the transmission was flawless.<br/>
 
If checksums have been defined, then the server will check the files upon each request and respond with <code>200</code> only if the transmission was flawless.<br/>
 
A <code>GET</code> request to this path can be used to check the status of the upload process intermediately.<br/>
 
A <code>GET</code> request to this path can be used to check the status of the upload process intermediately.<br/>
 
Once all files have been delivered successfully, the server will automatically start the ingest process. After the last <code>PUT</code> request is accepted, the returned object will include a field <code>jobId</code> that can be used to monitor the ingest process via <code>GET</code> requests to <code>https://transkribus.eu/TrpServerTesting/rest/jobs/{id}</code>.
 
Once all files have been delivered successfully, the server will automatically start the ingest process. After the last <code>PUT</code> request is accepted, the returned object will include a field <code>jobId</code> that can be used to monitor the ingest process via <code>GET</code> requests to <code>https://transkribus.eu/TrpServerTesting/rest/jobs/{id}</code>.

Latest revision as of 08:35, 14 December 2018

The path /rest/uploads includes endpoints that allow to import a document into Transkribus.

https://transkribus.eu/TrpServerTesting/rest/uploads?collId={collectionID}

A POST request to this endpoint creates a new upload process on the server. It is mandatory to set the query parameter collId which must include the ID of a collection where the user has write access.
If the Content-Type header is set to application/xml, the request body must include a valid Transkribus METS XML (TODO).
If the header specifies application/json then a JSON object of the following form is expected:

{
    "md": {
        "title": "Bentham Box 35",
        "author": "Jeremy Bentham",
        "genre": "Notes",
        "writer": "Secretary"
    },
    "pageList": {"pages": [
        {
            "fileName": "035_320_001.jpg",
            "pageXmlName": "035_320_001.xml",
            "pageNr": 1,
            "imgChecksum": "9d531932c8e24d5a5dc13c92063698c9",
            "pageXmlChecksum": "b644a9c34a65ee07c1c576194e720b4a"
        },
        {
            "fileName": "035_321_001.jpg",
            "pageXmlName": "035_321_001.xml",
            "pageNr": 2,
            "imgChecksum": "e3ae1a862b9cd53cc87c9325d2502547",
            "pageXmlChecksum": "8ba4758b8b8d5df562e25809692be340"
        }
    ]}
}

Besides some basic (optional) metadata, this object defines the structure of the document to upload including the filenames to expect.
A page object just has to have a fileName and a pageNr. All other fields are optional! The checksums must be computed with MD5, if used.
The response to this request will return an enriched object of the same type. It will include a unique upload ID (field uploadId) that is to be used for the following requests.

https://transkribus.eu/TrpServerTesting/rest/uploads/{uploadId}

This endpoint is used to PUT the files for each page to Transkribus. Note, that the path now includes the uploadId from the response of the initial request.
The Content-Type of each request has to be multipart/form-data and it must include the complete data for one page, i.e. if a pageXmlName was set in the given structure object, then the image as well as the XML have to be delivered. It depends on the used library whether the Content-Type has to be set explicitly. Please refer to the respective documentation on multipart requests.
The body part names to be used are img and xml respectively and both should be sent as application/octet-stream.
If checksums have been defined, then the server will check the files upon each request and respond with 200 only if the transmission was flawless.
A GET request to this path can be used to check the status of the upload process intermediately.
Once all files have been delivered successfully, the server will automatically start the ingest process. After the last PUT request is accepted, the returned object will include a field jobId that can be used to monitor the ingest process via GET requests to https://transkribus.eu/TrpServerTesting/rest/jobs/{id}.