Difference between revisions of "Text2Image"
Line 4: | Line 4: | ||
Existing transcriptions can currently be specified during upload of the document as seperate txt files | Existing transcriptions can currently be specified during upload of the document as seperate txt files | ||
in a subfolder called 'txt'. Note that each txt file has to have the same basename as the corresponding image file.<br/> | in a subfolder called 'txt'. Note that each txt file has to have the same basename as the corresponding image file.<br/> | ||
− | Those transcriptions are then stored in | + | Those transcriptions are then stored in 'dummy lines', i.e. lines that take up the whole image.<br/> |
The Text2Image tool first (optionally) starts a new layout analysis on the specified pages, then tries to match the given text to | The Text2Image tool first (optionally) starts a new layout analysis on the specified pages, then tries to match the given text to | ||
this layout.<br/> | this layout.<br/> |
Revision as of 07:38, 20 May 2019
The Text2Image (Text-to-Image) tool tries to match existing transcriptions on page level to a line segementation.
Existing transcriptions can currently be specified during upload of the document as seperate txt files
in a subfolder called 'txt'. Note that each txt file has to have the same basename as the corresponding image file.
Those transcriptions are then stored in 'dummy lines', i.e. lines that take up the whole image.
The Text2Image tool first (optionally) starts a new layout analysis on the specified pages, then tries to match the given text to
this layout.
The user can specify whether linebreaks of the given text are respected during matching.
The threshold value determines which matching quality is good enough to perform a match. A good value is between 0.02 and 0.05 but the default value of 0.0 is also a good choice as erroneous matches can be corrected afterwards anyway.