Tesseract is an OCR (Optical Character Recognition) engine whose development is funded by Google since 2006. As of version 11.10, Ubuntu still comes with Tesseract 2.04, which only supports 7 recognition languages. However, Tesseract 3.0 (released in Sept 2010) supports a total of 29 recognition languages. This guide will help you get Tesseract 3.01 working on Ubuntu 11.10 == Installation instructions == * Download and extract Tesseract 3.01: `wget http://tesseract-ocr.googlecode.com/files/tesseract-3.01.tar.gz` `tar zxvf tesseract-3.01.tar.gz` * Install the Leptonica image processing library: `sudo apt-get install libleptonica-dev` * Compile: `./autogen.sh` `./configure` `make` Note: make check fails in java/ with: ''No rule to make target `check'. Stop.'' * Install: `sudo make install` `sudo ldconfig` * Install recognition languages: `wget http://tesseract-ocr.googlecode.com/files/ell.traineddata.gz` `gzip -d ell.traineddata.gz` `sudo mv ell.traineddata /usr/local/share/tessdata` == See also == * [[http://code.google.com/p/tesseract-ocr/|The Tesseract project page on Google Code]] * [[http://ubuntuforums.org/showthread.php?t=1647350|"Tesseract 3.0 + Ubuntu 10.04 Installation Guide" discussion thread on Ubuntu Forums]] * [[http://en.wikipedia.org/wiki/Tesseract_(software)|Tesseract on Wikipedia]]