Questions and Answers
- 1 Getting help
- 2 Registration
- 3 Download and Installation
- 4 Handwritten Text Recognition (HTR) Workflow
- 4.1 What is needed for the HTR to work?
- 4.2 How do I create training data in Transkribus?
- 4.3 How much training data do I need to create?
- 4.4 Once I have created a set of training data, what happens next?
- 4.5 Once I have a HTR model for my collection, how do I use it?
- 4.6 How accurate is the HTR?
- 4.7 What is the difference between OCR (Optical Character Recognition) and HTR?
- 4.8 Can HTR work with documents in any language/layout/style?
- 4.9 Are documents in Transkribus publicly accessible?
- 4.10 Is there a mechanism to use existing transcripts to train the HTR?
- 4.11 How does Transkribus benefit from usage?
I am new to Transkribus. Where shall I start?
How can I contact the Transkribus team?
You can contact us with any questions or commments on email@example.com
How do I register for a Transkribus account?
You can sign up for a Transkribus account at the Transkribus website.
I have forgotten my Transkribus password, what should I do?
Visit the Transkribus website, click 'Login' and then 'Forgot password?'
Download and Installation
How to download and install
- Go to the Download and Installation page for basic instructions
Run Transkribus via command line
- Transkribus is contained in the main jar file Transkribus-<version>.jar
- To run the program from command line type: java -jar Transkribus-<version>.jar
- Note: Java 8 is needed to run the program. Make sure Java 8 is either installed system wide or copy a JRE into the program directory!
- Note: To run the scripts in Mac (or Linux) you may have to make them executable from the command line: (any version before 0.6.8)
- Mac console basics
- change into the program folder using 'cd' commands
- chmod +x Transkribus.command (or chmod +x Transkribus.sh for Linux!)
- Furthermore you will find several files in the Transkribus package copied to your computer:
- config.properties can be modified to adjust simple appearance properties
- virtualKeyboards.xml can be used to specify a set of virtual keyboards
- logback.xml can be modified to adjust logging properties (for expert users only)
- The 'libs' subfolder contains the necessary libraries for all platforms. Currently supported are:
- Windows 32/64 bit
- Linux 32/64 bit
- OSX 64 bit
Using a proxy server
- When the program has started, click on the home menu button on the top left and select "Proxy settings...". In the following dialog you can set the proxy host, port, user name (optional) and password (optional). This is the recommended method for using a proxy server.
- Alternatively, you can edit the start script (e.g. Transkribus.bat on Windows, Transkribus.sh on Linux) to include the environment variables for the proxy server:
java -Dhttps.proxyHost=<proxyserver> -Dhttps.proxyPort=<proxyPort> -Dhttps.proxyUser=<user name for proxy> -Dhttps.proxyPassword=<password for proxy> -jar Transkribus-0.7.0.jar
However, editing this file will be necessary on each update of Transkribus.
Logging in to the Server is not possible via Transkribus, but on the website it works.
- Since February 2017, old Transkribus versions are blocked from logging in for compatibility reasons. If you use a version older than 1.0.0 an update of the application may be necessary to solve this issue.
- There is a known issue with specific versions of Java 7 (e.g. Java 7u25). You can check your installed version by opening a terminal/command line and entering "java -version". If you encounter this problem, try updating Java on your machine.
Logging in is prevented by the Firewall of your Internet Provider
- Some IT departments are blocking the SSL port 443 and/or unknown applications via a firewall. Check with your IT department if that might be the case.
Norton Antivirus detects a threat and is blocking the zip file from being unpacked.
- Solution: This is a false alarm which Norton gives when encountering software it is not familiar with (WS.Reputation.1). You should be able to restore the file from quarantine by following the instructions from the following resource .
Versions older or equal than 0.6.5 cannot update (very long error message):
- Please click on the "Home" button (upper left corner), then "Install a specific version", select the newest version from "Releases" and tick the box beside "Download complete package".
- Afterwards click on "Update" or "Replace". This way, the complete package is downloaded and the update should work.
Wrong JAVA Version on Mac
- After opening the command file on the Mac, Transkribus says that there is a wrong Java version installed (22.214.171.124) instead of 1.7. However, there is the most current version of Java RE (126.96.36.199) installed.
- The problem is that Java 188.8.131.52 is the default Java on the command line which the Transkribus.command uses. You can check the default version by opening the terminal and typing 'java -version'.
- To solve the problem you can either download the latest jdk as a .tar.gz package from here:
and unpack it into the Transkribus folder - the Transkribus.command file will automatically check for java installations in its sub directories!
- Or you could make your java 8 installation the default one on command line following e.g. the instructions here:
Transkribus does not start on (Fedora) Linux - 'MOZILLA_FIVE_HOME not set' error message
- The package "libwebkitgtk" may not be installed. On Fedora you can install the package using dnf on the command line (use "yum" instead of "dnf" in older versions of Fedora):
sudo dnf install webkitgtk
Handwritten Text Recognition (HTR) Workflow
What is needed for the HTR to work?
HTR engines cannot process text straight away - they need to be trained to recognise a certain style of handwriting. This can be achieved by creating at least 100 pages (20,000) of training data (images and transcripts) in Transkribus.
How do I create training data in Transkribus?
Firstly, you need to upload your documents to the platform. Secondly, you need to segment the pages of your collection into text regions and baselines. Thirdly, you need to transcribe each page as accurately possible. For more information on these stages, have a look at our How to Guides.
How much training data do I need to create?
The more training data, the better! But you can start to train the HTR with as little as 100 pages (20,000 words) of documents written in a neat hand.
Once I have created a set of training data, what happens next?
You should then contact the Transkribus team by email (firstname.lastname@example.org). They can activate the training button in Transkribus for your. This way you can create a HTR model which is specific to the collection of documents that you have been working with in Transkribus. This process should take a few weeks.
Once I have a HTR model for my collection, how do I use it?
You can use your HTR model to automatically generate transcripts of your documents by clicking the "Run text recognition" button in the "Tools" tab in Transkribus. You can export your documents and search them in Transkribus by clicking the "Search" button in the Main menu. You can now also search your documents using our new Keyword Spotting tool.
How accurate is the HTR?
The accuracy of HTR is not complete but impressive Word and Character Error Rates are possible. The latest experiments have generated transcripts with a Character Error Rate of around 5%. This means that 95% of characters in an automatically-generated transcript would be correct. For some successful examples of HTR, have a look at our Example Documents or our Success Stories from the READ project blog. You can measure the accuracy of your HTR model in Transkribus using the 'Compare' function in the 'Tools' tab.
What is the difference between OCR (Optical Character Recognition) and HTR?
Both technologies are very similar, but OCR is already in an advanced state, whereas HTR is still in an early phase. Unlike OCR, HTR does not focus on individual letters. Instead, it scans and processes the image of entire lines and tries to decode this data. The main difference from the user's point of view is that the stage of Layout Analysis/Segmentation is integrated into the OCR engine, whereas it is a separate step in the workflow for HTR.
Can HTR work with documents in any language/layout/style?
In theory, yes! The software needs to be trained to understand each style of handwriting. Every piece of training data submitted to Transkribus is helping to strengthen the overall accuracy of the HTR.
Are documents in Transkribus publicly accessible?
No! Documents uploaded to Transkribus are private by default. You can use the "Manage collections..." button in the "Server" tab of Transkribus to allow specific users to view and/or edit your collection if you wish.
Is there a mechanism to use existing transcripts to train the HTR?
Yes, we now have a Text2Image matching tool that can match existing text with an image. If you have lots of existing transcripts and would like to use these to train a HTR model, please contact us.
How does Transkribus benefit from usage?
Our long-term goal is to train so many different writing styles that Transkribus will be able to deal with most handwritten documents without prior training. The more users work with Transkribus for their transcription, the faster we will reach this ambitious goal!