Questions and Answers

From Transkribus Wiki
Revision as of 14:27, 13 August 2015 by Guenter (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

I am new to Transkribus. Where shall I start?

Assuming that you have installed the Transkribus Expert client we recommend to go to the Transkribus Cloud collection, to open a document than to have a look to all the information provided in the canvas, the text editor, and all the tabs connected to this document. Since Transkribus is just a working tool but not made for presenting the documents, you may also have a look to the different export versions of the documents. In order to speed up this process you can download a full zip file containing all currently supported file formats from here.

I have already transcribed documents, respectively there are old editions which I would like to take as starting point for my transcription. Is there a mechanism to match them automatically with the image?

No, currently no matching algorithm is part of the Transkribus service portfolio. You need to segment the image and afterwards you can copy/paste the text into the lines of the text editor. Yes, there are methods and tools to match existing text with an image. This can be done with good success especially if a trained HTR model is already existing for this specific handwriting. Collegues are working on this issue and we hope to be able to provide sometimes a specific tool for this task. If you have larger amounts of text which shall be matched, please contact us directly.

I am historian and not so much interested in a perfect diplomatic transcription which meets all the requirements for a critical edition, but I am interested in a reasonably good standard transcription. How can I generate such a transcription in the easiest way?

  • Do not care too much about document, but work with larger units such as batches of documents or boxes.
  • Name them according to the convention/signatures of the archive where your images come from. In this way you can later on easily link your documents/batches with the metadata of the archive.
  • Transcribe diplomatic but do not use too many special and historic characters, also do not mark up some conventional abbreviations, etc.
  • Tag persons, locations and dates if they are important for you. Normalize them according to your needs.
  • Do not transcribe the full text, but just those sections which are interesting for your work. This is one of the advantages of Transkribus that still your text portions are linked with the text.
  • Share your documents with members of your working group and distribute tasks.

The main advantage of this approach is that - independently where your transcribed text is on a page and how it is - it can be used to train a HTR model specifically for your type of documents. Once there is such a model it can be applied to the rest of the pages and you may be able to search in the rest without ever having them read.

What is the difference between OCR (Optical Character Recognition) and HTR (Handwritten Text Recognition)?

  • Both technologies are very similar, but OCR is already in an industrial state, whereas HTR is still in an early phase.
  • Nevertheless the same technology which is used for HTR can also be exploited for recognising printed documents. This means that old books with seldom typefaces can be recognised with HTR models.