Difference between revisions of "HTR"

From Transkribus Wiki
Jump to: navigation, search
(Listing available models)
Line 123: Line 123:
  
 
Note, that a model can not be removed from all collections.
 
Note, that a model can not be removed from all collections.
 +
 +
==Testing the model==
 +
For applying an HTR model, a POST request has to be sent to:
 +
 +
<code>https://transkribus.eu/TrpServer/rest/recognition/{collection-ID}/{htr-ID}/htrCITlab?id={doc-ID}&pages={page-string}&dict={dictionary filename}</code>

Revision as of 14:50, 19 December 2016

Training

For training a new HTR model using the new API (for RNN HTR), at first a configuration XML has to be created. Besides parameters (the example below includes the default values) mandatory fields are:

  • a model name
  • a description
  • the language
  • the collection ID where the input documents can be found and where the resulting model will be linked

The input for training is described in the TrainList section of the XML and is made up of train elements where each includes:

  • the document ID
  • a list of pages where each page includes
    • the page-ID
    • the ID of the transcript version that should be used for training

Optionally a test set can be specified in the TestList element analogously.

The training descriptor then should look like this:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<citLabHtrTrainConfig>
    <modelName>Test Model</modelName>
    <description>A description</description>
    <language>German</language>
    <colId>2</colId>
    <numEpochs>200</numEpochs>
    <learningRate>2e-3</learningRate>
    <noise>both</noise>
    <trainSizePerEpoch>1000</trainSizePerEpoch>
    <trainList>
        <train>
            <docId>1</docId>
            <pageList>
                <pages>
                    <pageId>1</pageId>
                    <tsId>1</tsId>
                </pages>
                <pages>
                    <pageId>2</pageId>
                    <tsId>2</tsId>
                </pages>
            </pageList>
        </train>
        <train>
            <docId>2</docId>
            <pageList>
                <pages>
                    <pageId>3</pageId>
                    <tsId>3</tsId>
                </pages>
                <pages>
                    <pageId>4</pageId>
                    <tsId>4</tsId>
                </pages>
            </pageList>
        </train>
    </trainList>
    <testList/>
</citLabHtrTrainConfig>

That XML is then send via POST to

https://transkribus.eu/TrpServer/rest/recognition/htrTrainingCITlab

and the call returns the job-ID of the training.

Note, that the models are now linked to the collection they were started in (cf. colId element in training descriptor XML).

Manage HTR models

Listing available models

Listing models can be done with a GET request to: https://transkribus.eu/TrpServer/rest/recognition/{collection-ID}/list?prov={techProvider}

The call includes:

  • Path parameter: collection-ID
  • Query parameter: the tech provider. Here at the moment only "CITlab" is allowed as value.

A model is described in the result e.g. by the following XML:

...
<trpHtr>
        <htrId>22</htrId>
        <name>Test Model</name>
        <description>A description</description>
        <provider>CITlab</provider>
        <created>
            <nanos>338000000</nanos>
        </created>
        <gtDocId>1614</gtDocId>
        <testGtDocId>1615</testGtDocId>
        <language>German</language>
        <trainJobId>3160</trainJobId>
        <cerString>1,000000</cerString>
        <charList> =1
,=2
.=3
...
</charList>
</trpHtr>
...

Worth noting is the htrId element which is used for applying the model.

The cerString includes CER values that have been determined during training against the train set or the (optional) test set.

The charList includes the character channel mapping and informs about the characters that are known to the model.

Adding models to other collections

In order to make a trained model available in another collection you POST to:

https://transkribus.eu/TrpServer/rest/recognition/{collection-ID}/{htr-ID}/add?collId={destination collection-ID}

Removing models from collections

DELETE request to:

https://transkribus.eu/TrpServer/rest/recognition/{collection-ID}/{htr-ID}/remove

Note, that a model can not be removed from all collections.

Testing the model

For applying an HTR model, a POST request has to be sent to:

https://transkribus.eu/TrpServer/rest/recognition/{collection-ID}/{htr-ID}/htrCITlab?id={doc-ID}&pages={page-string}&dict={dictionary filename}