Hi there! I’m Kevin Hulse, the newish Solutions Enablement Specialist at Atalasoft. You may have worked with me directly after I started working at Atalasoft as a Developer Support Engineer nearly six years ago. Since starting here, I have worked in almost every department from Support to Engineering and now Marketing (watch out Sales). I hope to begin a small series of blogs on all things OCR and plan on providing interesting, technical-minded posts on our products, our customers, and document imaging in general, as well as posts on things that I simply find interesting enough to talk about.
Speaking of products, with the release of DotImage 10.4.1, our OCR libraries have been upgraded to handle version 3.02 of the Tesseract OCR Engine. This upgrade includes a few small improvements to speed and accuracy of processing as well as an increased ability to use new data packages to support more extended character sets. Additionally, here’s a list of all the languages we test against in our build unit tests:
- Chinese (Simplified)
- Chinese (Traditional)
- Norwegian (Bokmål)
*Comes with our software download
To add a downloaded package to your engine resources, unzip the downloaded package into your project’s bin/OcrResources/Tesseract/v3.02/tessdata/ folder. Then an additional CultureInfo object should be added to the Tesseract3Engine.GetSupportedRecognitionCultures list and can be used to set Tesseract3Engine.RecognitionCulture equal to the newly added language.
If you have any questions or want to discuss OCR, or even want to tell me how you are using OCR to improve your document imaging solutions, drop me a message in the comments section below.
About the AuthorFollow on Twitter More Content by Kevin Hulse