HTR

From Transkribus Wiki
Revision as of 14:19, 19 December 2016 by Philip Kahle (Talk | contribs)

Jump to: navigation, search

Training

For training a new HTR model using the new API (for RNN HTR), at first a configuration XML has to be created. Besides parameters (the example below includes the default values) mandatory fields are:

  • a model name
  • a description
  • the language
  • the collection ID where the input documents can be found and where the resulting model will be linked

The input for training is described in the TrainList section of the XML and is made up of train elements where each includes:

  • the document ID
  • a list of pages where each page includes
    • the page-ID
    • the ID of the transcript version that should be used for training

Optionally a test set can be specified in the TestList element analogously.

The training descriptor then should look like this:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<citLabHtrTrainConfig>
    <modelName>Test Model</modelName>
    <description>A description</description>
    <language>German</language>
    <colId>2</colId>
    <numEpochs>200</numEpochs>
    <learningRate>2e-3</learningRate>
    <noise>both</noise>
    <trainSizePerEpoch>1000</trainSizePerEpoch>
    <trainList>
        <train>
            <docId>1</docId>
            <pageList>
                <pages>
                    <pageId>1</pageId>
                    <tsId>1</tsId>
                </pages>
                <pages>
                    <pageId>2</pageId>
                    <tsId>2</tsId>
                </pages>
            </pageList>
        </train>
        <train>
            <docId>2</docId>
            <pageList>
                <pages>
                    <pageId>3</pageId>
                    <tsId>3</tsId>
                </pages>
                <pages>
                    <pageId>4</pageId>
                    <tsId>4</tsId>
                </pages>
            </pageList>
        </train>
    </trainList>
    <testList/>
</citLabHtrTrainConfig>

That XML is then send via POST to

https://transkribus.eu/TrpServer/rest/recognition/htrTrainingCITlab

and the call returns the job-ID of the training.

Note, that the models are now linked to the collection they were started in (cf. colId element in training descriptor XML).

List HTR models

GET request to: https://transkribus.eu/TrpServer/rest/recognition/{collection-ID}/list?prov={techProvider}

The call includes:

  • Path parameter: collection-ID
  • Query parameter: the tech provider. Here at the moment only "CITlab" is allowed as value.