The Zettair Search Engine -- Getting Started with HTML Documents

Here's an example of how easy it is to use Zettair. Follow these steps:

  1. Download Zettair as a zip file.

  2. Change into the directory where you've saved Zettair and unzip it:
    $ cd ~
    $ unzip zettair-0.9.3.zip
    

  3. Download this zipped collection (40,872 bytes) of HTML documents (which are part of the HTML 4.01 standard at http://www.w3.org/TR/html4/)

  4. Change into the directory where you've saved the collection and unzip it:
    $ cd ~
    $ unzip html.zip
    Archive:  html.zip
      inflating: collection/about.html
      inflating: collection/charset.html
      inflating: collection/conform.html
    ...
    

  5. Make and install the Zettair software:
    $ cd zettair-0.9.3
    $ ./configure --prefix=$HOME/local/zettair-0.9.3
    $ make
    $ make install
    

  6. Build an index on the files in the collection:
    $ mkdir ~/index
    $ cd ~/index
    $ find ~/collection/* | ~/local/zettair-0.9.3/bin/zet -i 
    zettair version 0.9.3
    created new index 'index'
    sources (type html): collection/about.html collection/charset.html 
    collection/conform.html collection/cover.html collection/references.html 
    collection/types.html 
    parsing collection/about.html...
    parsing collection/charset.html...
    parsing collection/conform.html...
    parsing collection/cover.html...
    parsing collection/references.html...
    parsing collection/types.html...
    merging...
    
    summary: 6 documents, 2049 distinct index terms, 0 10541 terms
    

    A Unix note: the command find ~/collection/* lists all files in the directory ~/collection, and this is piped as input into the Zettair index construction process. The result is that Zettair indexes all files in the directory. This command does the same thing:

    $ ~/local/zettair-0.9.3/bin/zet -i -c ../config/parser_settings.html -t 
    HTML collection/about.html collection/charset.html collection/conform.html 
    collection/cover.html collection/references.html collection/types.html
    

  7. Search the collection:
  8. $ ~/local/zettair-0.9.3/bin/zet
    > Tim Berners-Lee
    1. file:///collection/about.html (score 2.455709, docid 0)
    2. file:///collection/references.html (score 1.087303, docid 4)
    
    2 results of 2 shown (took 0.001164 seconds)
    > tags
    1. file:///collection/conform.html (score 0.952401, docid 2)
    2. file:///collection/references.html (score 0.664334, docid 4)
    
    2 results of 2 shown (took 0.000962 seconds)
    

  9. Enjoy!