The Text2Image (Text-to-Image) tool tries to match existing transcriptions on page level to a line segementation.
Existing transcriptions can currently be specified during upload of the document as seperate txt files
in a subfolder called 'txt'. Note that each txt file has to have the same basename as the corresponding image file.
Those transcriptions are then stored in 'dummy lines', i.e. lines with the size of the image.
The Text2Image tool first (optionally) starts a new layout analysis on the specified pages, then tries to match the given text to
The user can specify whether linebreaks of the given text are respected during matching.
The threshold value determines which matching quality is good enough to perform a match. A good value is between 0.02 and 0.05 but the default value of 0.0 is also a good choice as erroneous matches can be corrected afterwards anyway.
The Text2Image tool was implemented by the CITlab group from Rostock, see also https://github.com/CITlabRostock/CITlabModule