Introduction
Namazu is a full-text search engine intended for easy use. Not only does it work as a small or medium scale Web search engine, but also as a personal search system for email or other files.
Namazu is the perfect solution for building an intranet search engine. It's light and low in dependencies (compared to Lucene like/based solutions).
Namazu can index a large number of file types including office documents, pdf's and code source files.
Installation
Namazu is available as a package and can be installed using any package management tool you prefer.
sudo apt-get install namazu2 namazu2-index-tools
Additional parsers are available if you install xpdf, wv, unzip... packages.
sudo apt-get install xpdf wv unzip unrtf
To tell namazu which files it is allowed to index, the /etc/namazu/mknmzrc has to be edited and the lines after $ALLOW_FILE should be uncommented.
Using Namazu
Namazu is composed of several tools like an indexer, a search tool and other. To create or update a namazu index the mknmz tool has to be used. To periodically synchronize the index, a line like below can be placed in crontab:
0,15,30,45 * * * * www-data mknmz --output-dir=/storage/archive/index /storage/archive/data/ > /tmp/namazu.log 2>&1
The command will update the index from /storage/archive/index with the parsed data from /storage/archive/data/.
In this example we'll skip the command line tools to search the index and will create a web page for this job.
Namazu web search
Namazu can be accessed through a web page by configuring a web form to pass queries to namazu.cgi — a cgi file which comes in official package and can be found at /usr/lib/cgi-bin/namazu.cgi.
Configure your favorite web server to point to (for example) /storage/archive, where you can create a cgi-bin directory. Copy the namazu.cgi file in it, and create a new file in the same directory under the name of .namazurc.
mkdir /storage/archive/cgi-bin cp /usr/lib/cgi-bin/namazu.cgi /storage/archive/cgi-bin/ cat > /storage/archive/cgi-bin/.namazurc Lang en Index /storage/archive/index Template /storage/archive/index
The .namazurc will be loaded by namazu.cgi along with the settings from it. A more detailed file with .namazurc settings can be found in /etc/namazu/namazurc.
Next just configure your vhost to allow .cgi scripts. On Apache you should install libapache2-mod-fastcgi package. On nGinx there's an example available on FcgiWrap page.