Integrated OCR Languages:
Within PDF Studio, it is possible to download and install the following languages for OCR.
5 most common languages:
- English
- French
- German
- Italian
- Spanish
Other languages available:
- Danish
- Dutch
- Finnish
- Norwegian
- Polish
- Portuguese
- Swedish
- Non-Latin Languages (including CJK)
For PDF Studio 10 & below
For advanced users only, in addition to the language above, it is possible to install other languages, as long as they are Western languages that are supported by PDF WinAnsiEncoding. Not all languages will work.
To download alternate languages, contact us at studiosupport@qoppa.com to request a specific language and we will send you a link.
So for instance, if you wanted to add the Icelandic language file, you would follow the steps below:
- Download the language file(s) from the links provided via email.
- Look for a directory called tess/tessdata on your machine
- In PDF Studio 9 and above, it is located under your user folder under the “.pdfstudioX” folder (where X is the version number)
- In PDF Studio 8, it is located under the installation directory, for instance C:\ProgramData\PDFStudio8 for Windows 7, Windows Vista, Windows 8, Windows 10.
- Extract / Copy the files contained in the gz file into the tessdata directory:
- For Icelandic: isl.traineddata
- Open the file called languages.xml located under the same directory in a text editor and add the corresponding entry for your new language file:
- For Icelandic:
<language name="isl" desc="Icelandic" file="tesseract-ocr-3.02.isl.tar.gz" />
- Don’t worry about any other tags present in other languages (such as URLPrimary), they are not needed.
- Exit and restart PDF Studio.