Layout Analysis API

From Transkribus Wiki
Jump to: navigation, search

Starting layout analysis processes via the API can be done with POST requests to

The following query parameters are available on this endpoint:

  • collId: the collection ID you with the documents you want to process
  • doBlockSeg
    • true -> existing layout will be deleted
    • false (default) -> keep existing text block regions
  • doLineSeg
    • true (default) -> detect lines in text blocks
    • false -> keep existing lines
  • doPolygonToBaseline
    • true -> inspect line polygons and add baselines
    • false (default) -> keep existing baselines
  • doBaselineToPolygon
    • true -> extrapolate new line polygons from baselines
    • false (default) -> skip
  • jobImpl: the tool to use, default (omit this parameter) is "CITlabAdvancedLaJob" which is recommended for most documents

The request body specfies the pages to be processed, in terms of document IDs and page IDs. Optionally, a transcript ID (tsId) can specify a transcription version and PAGE XML region element IDs can be passed for processing specific sections of a page. The endpoint accepts JSON or XML:

   "docList" : {
      "docs" : [ {
         "docId" : 1543,
         "pageList" : {
            "pages" : [ {
               "pageId" : 1234,
               "regionIds" : [ "the_xml_id_of_a_text_region" ]
            }, {
               "pageId" : 12345,
               "tsId" : 1234567
            } ]
      } ]

Equivalent XML representation:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

If successful (HTTP status code 200), the response will contain a job status object with a jobId that can be used to monitor the progress (see Job API).