Difference between revisions of "Text2Image"

From Transkribus Wiki
Jump to: navigation, search
(Text2Image)
Line 1: Line 1:
  
The Text2Image (Text to Image) tool tries to match existing transcriptions to a layout.<br/>
+
The '''Text2Image''' (Text-to-Image) tool tries to ''match existing transcriptions on page level to a line segementation''.
 +
 
 
Existing transcriptions can currently be specified during upload of the document as seperate txt files
 
Existing transcriptions can currently be specified during upload of the document as seperate txt files
 
in a subfolder called 'txt'. Note that each txt file has to have the same basename as the corresponding image file.<br/>
 
in a subfolder called 'txt'. Note that each txt file has to have the same basename as the corresponding image file.<br/>
Line 6: Line 7:
 
The Text2Image tool first (optionally) starts a new layout analysis on the specified pages, then tries to match the given text to
 
The Text2Image tool first (optionally) starts a new layout analysis on the specified pages, then tries to match the given text to
 
this layout.<br/>
 
this layout.<br/>
The user can specify whether linebreaks of the given text are respected during matching.<br/>
+
 
The threshold value determines which matching quality is good enough to perform a match. A good value is between 0.02 and 0.05 but
+
The user can specify whether ''linebreaks'' of the given text are respected during matching.
 +
 
 +
The ''threshold'' value determines which matching quality is good enough to perform a match. A good value is between 0.02 and 0.05 but
 
the default value of 0.0 is also a good choice as erroneous matches can be corrected afterwards anyway.
 
the default value of 0.0 is also a good choice as erroneous matches can be corrected afterwards anyway.

Revision as of 07:38, 20 May 2019

The Text2Image (Text-to-Image) tool tries to match existing transcriptions on page level to a line segementation.

Existing transcriptions can currently be specified during upload of the document as seperate txt files in a subfolder called 'txt'. Note that each txt file has to have the same basename as the corresponding image file.
Those transcriptions are then stored in "dummy lines", i.e. lines that take up the whole image.
The Text2Image tool first (optionally) starts a new layout analysis on the specified pages, then tries to match the given text to this layout.

The user can specify whether linebreaks of the given text are respected during matching.

The threshold value determines which matching quality is good enough to perform a match. A good value is between 0.02 and 0.05 but the default value of 0.0 is also a good choice as erroneous matches can be corrected afterwards anyway.