Diff for "Tesseract3"


Differences between revisions 1 and 5 (spanning 4 versions)
Revision 1 as of 2011-12-31 09:04:36
Size: 1178
Editor: 78
Comment:
Revision 5 as of 2011-12-31 09:26:23
Size: 1392
Editor: 78
Comment:
Deletions are marked like this. Additions are marked like this.
Line 31: Line 31:
`sudo ldconfig`

 * Install recognition languages:

`wget http://tesseract-ocr.googlecode.com/files/ell.traineddata.gz`

`gzip -d ell.traineddata.gz`

`sudo mv ell.traineddata /usr/local/share/tessdata`

Tesseract is an OCR (Optical Character Recognition) engine whose development is funded by Google since 2006.

As of version 11.10, Ubuntu still comes with Tesseract 2.04, which only supports 7 recognition languages. However, Tesseract 3.0 (released in Sept 2010) supports a total of 29 recognition languages. This guide will help you get Tesseract 3.01 working on Ubuntu 11.10

Installation instructions

  • Download and extract Tesseract 3.01:

wget http://tesseract-ocr.googlecode.com/files/tesseract-3.01.tar.gz

tar zxvf tesseract-3.01.tar.gz

  • Install the Leptonica image processing library:

sudo apt-get install libleptonica-dev

  • Compile:

./autogen.sh

./configure

make

Note: make check fails in java/ with: No rule to make target `check'. Stop.

  • Install:

sudo make install

sudo ldconfig

  • Install recognition languages:

wget http://tesseract-ocr.googlecode.com/files/ell.traineddata.gz

gzip -d ell.traineddata.gz

sudo mv ell.traineddata /usr/local/share/tessdata

See also

Tesseract3 (last edited 2011-12-31 09:26:23 by 78)