Tesseract is an OCR (Optical Character Recognition) engine whose development is funded by Google since 2006.

As of version 11.10, Ubuntu still comes with Tesseract 2.04, which only supports 7 recognition languages. However, Tesseract 3.0 (released in Sept 2010) supports a total of 29 recognition languages. This guide will help you get Tesseract 3.01 working on Ubuntu 11.10

== Installation instructions ==

 * Download and extract Tesseract 3.01:

`wget http://tesseract-ocr.googlecode.com/files/tesseract-3.01.tar.gz`

`tar zxvf tesseract-3.01.tar.gz`

 * Install the Leptonica image processing library:

`sudo apt-get install libleptonica-dev`

 * Compile:

`./autogen.sh`

`./configure`

`make`

Note: make check fails in java/ with: ''No rule to make target `check'.  Stop.''

 * Install:

`sudo make install`

`sudo ldconfig`

 * Install recognition languages:

`wget http://tesseract-ocr.googlecode.com/files/ell.traineddata.gz`

`gzip -d ell.traineddata.gz`

`sudo mv ell.traineddata /usr/local/share/tessdata`

== See also ==
 * [[http://code.google.com/p/tesseract-ocr/|The Tesseract project page on Google Code]]
 * [[http://ubuntuforums.org/showthread.php?t=1647350|"Tesseract 3.0 + Ubuntu 10.04 Installation Guide" discussion thread on Ubuntu Forums]]
 * [[http://en.wikipedia.org/wiki/Tesseract_(software)|Tesseract on Wikipedia]]