Some Introduction, Some Tesseract

February 27, 2014 Kevin Hulse

Hi there! I’m Kevin Hulse, the newish Solutions Enablement Specialist at Atalasoft. You may have worked with me directly after I started working at Atalasoft as a Developer Support Engineer nearly six years ago. Since starting here, I have worked in almost every department from Support to Engineering and now Marketing (watch out Sales). I hope to begin a small series of blogs on all things OCR and plan on providing interesting, technical-minded posts on our products, our customers, and document imaging in general, as well as posts on things that I simply find interesting enough to talk about.

Speaking of products, with the release of DotImage 10.4.1, our OCR libraries have been upgraded to handle version 3.02 of the Tesseract OCR Engine. This upgrade includes a few small improvements to speed and accuracy of processing as well as an increased ability to use new data packages to support more extended character sets.  Additionally, here’s a list of all the languages we test against in our build unit tests:

  • Chinese (Simplified)
  • Chinese (Traditional)
  • Danish
  • Dutch*
  • English*
  • Finnish
  • French*
  • German*
  • Greek
  • Hebrew
  • Italian*
  • Japanese
  • Korean
  • Norwegian (Bokmål)
  • Portuguese*
  • Russian
  • Spanish*
  • Swedish
  • Turkish


*Comes with our software download

 

 

To add a downloaded package to your engine resources, unzip the downloaded package into your project’s bin/OcrResources/Tesseract/v3.02/tessdata/ folder. Then an additional CultureInfo object should be added to the Tesseract3Engine.GetSupportedRecognitionCultures list and can be used to set Tesseract3Engine.RecognitionCulture equal to the newly added language.

 

 

If you have any questions or want to discuss OCR, or even want to tell me how you are using OCR to improve your document imaging solutions, drop me a message in the comments section below.

 

About the Author

Kevin Hulse

Kevin is the Associate Solutions Enablement Specialist (a Technical Marketing position) at Atalasoft. He has worked prior in both engineering and support at Atalasoft. He also runs the company sponsored softball team and is an avid game player.

Follow on Twitter More Content by Kevin Hulse
Previous Article
Improving OCR Results: Adding Spellcheck
Improving OCR Results: Adding Spellcheck

With the new Tesseract 3.2 engine available as an add-on for Atalasoft...

Next Article
How to Work With Library Developers/Support
How to Work With Library Developers/Support

In addition to writing code from the ground up, we also work with other...

Try any of our Imaging SDKs free for 30 days with Full Support

Download Now