Introduction

Namazu is a full-text search engine intended for easy use. Not only does it work as a small or medium scale Web search engine, but also as a personal search system for email or other files.

Namazu is the perfect solution for building an intranet search engine. It's light and low in dependencies (compared to Lucene like/based solutions).

Namazu can index a large number of file types including office documents, pdf's and code source files.

Installation

Namazu is available as a package and can be installed using any package management tool you prefer.

sudo apt-get install namazu2 namazu2-index-tools

Additional parsers are available if you install xpdf, wv, unzip... packages.

sudo apt-get install xpdf wv unzip unrtf

To tell namazu which files it is allowed to index, the /etc/namazu/mknmzrc has to be edited and the lines after $ALLOW_FILE should be uncommented.

Using Namazu

Namazu is composed of several tools like an indexer, a search tool and other. To create or update a namazu index the mknmz tool has to be used. To periodically synchronize the index, a line like below can be placed in crontab:

0,15,30,45 * * * * www-data mknmz --output-dir=/storage/archive/index  /storage/archive/data/ > /tmp/namazu.log 2>&1

The command will update the index from /storage/archive/index with the parsed data from /storage/archive/data/.

In this example we'll skip the command line tools to search the index and will create a web page for this job.

Namazu can be accessed through a web page by configuring a web form to pass queries to namazu.cgi — a cgi file which comes in official package and can be found at /usr/lib/cgi-bin/namazu.cgi.

Configure your favorite web server to point to (for example) /storage/archive, where you can create a cgi-bin directory. Copy the namazu.cgi file in it, and create a new file in the same directory under the name of .namazurc.

mkdir /storage/archive/cgi-bin
cp /usr/lib/cgi-bin/namazu.cgi /storage/archive/cgi-bin/
cat > /storage/archive/cgi-bin/.namazurc
Lang   en
Index   /storage/archive/index
Template        /storage/archive/index

The .namazurc will be loaded by namazu.cgi along with the settings from it. A more detailed file with .namazurc settings can be found in /etc/namazu/namazurc.

Next just configure your vhost to allow .cgi scripts. On Apache you should install libapache2-mod-fastcgi package. On nGinx there's an example available on FcgiWrap page.

Namazu (last edited 2009-07-17 20:32:43 by c7)