Tesseract is an OCR (Optical Character Recognition) engine whose development is funded by Google since 2006.
As of version 11.10, Ubuntu still comes with Tesseract 2.04, which only supports 7 recognition languages. However, Tesseract 3.0 (released in Sept 2010) supports a total of 29 recognition languages. This guide will help you get Tesseract 3.01 working on Ubuntu 11.10
- Download and extract Tesseract 3.01:
tar zxvf tesseract-3.01.tar.gz
- Install the Leptonica image processing library:
sudo apt-get install libleptonica-dev
Note: make check fails in java/ with: No rule to make target `check'. Stop.
sudo make install
- Install recognition languages:
gzip -d ell.traineddata.gz
sudo mv ell.traineddata /usr/local/share/tessdata