The Zettair Search Engine -- Performing TREC experiments

Follow these steps:

  1. Download Zettair as a zip file.

  2. Change into the directory where you've saved Zettair and unzip it:
    $ cd ~
    $ unzip zettair-0.9.3.zip
    

  3. Make and install the Zettair software:
    $ cd zettair-0.9.3
    $ ./configure --prefix=$HOME/local/zettair-0.9.3
    $ make
    $ make install
    

  4. Build an index on the TREC collection (example shown uses the WT10G collection):
    $ cd ~
    $ ls wt10g/
    wt10g-1.html  wt10g-3.html  wt10g-5.html
    wt10g-2.html  wt10g-4.html  wt10g-6.html
    $ ~/local/zettair-0.9.3/bin/zet -i -t TREC -f wt10g wt10g/wt10g-*.html
    version 0.9.3
    sources (type trec): wt10g/wt10g-1.html wt10g/wt10g-2.html 
    wt10g/wt10g-3.html wt10g/wt10g-4.html wt10g/wt10g-5.html 
    wt10g/wt10g-6.html 
    parsing wt10g/wt10g-1.html...
    parsing wt10g/wt10g-1.html...
    parsing wt10g/wt10g-2.html...
    parsing wt10g/wt10g-3.html...
    parsing wt10g/wt10g-4.html...
    parsing wt10g/wt10g-5.html...
    parsing wt10g/wt10g-6.html...
    merging...
    
    summary: 1697027 documents, 9147236 distinct words
    

  5. Run the zet_trec executable with the TREC topic file to query the index for each of the topics:
    $ ls topics.*
    topics.451-500
    $ ~/local/zettair-0.9.3/bin/zet_trec -f topics.451-500 -r zettair \
      -n 1000 wt10g > topics.451-500.out
    $ head topics.451-500.out 
    451     0.000000        WTX064-B48-194  0       25.974307       zettair
    451     0.000000        WTX008-B37-10   0       25.728757       zettair
    451     0.000000        WTX064-B48-193  0       25.691912       zettair
    451     0.000000        WTX095-B05-124  0       25.075859       zettair
    451     0.000000        WTX031-B22-288  0       24.558171       zettair
    451     0.000000        WTX064-B48-198  0       22.862540       zettair
    451     0.000000        WTX092-B49-42   0       22.187891       zettair
    451     0.000000        WTX064-B48-188  0       22.069917       zettair
    451     0.000000        WTX003-B26-249  0       21.889636       zettair
    451     0.000000        WTX011-B16-71   0       21.377611       zettair
    

  6. Use the trec_eval program to evaluate the run:
    
    $ trec_eval qrels.trec9.main_web topics.451-500.out
    
    Queryid (Num):       48
    Total number of documents over all queries
        Retrieved:    45107
        Relevant:      2590
        Rel_ret:       1280
    Interpolated Recall - Precision Averages:
        at 0.00       0.6097 
        at 0.10       0.3957 
        at 0.20       0.3101 
        at 0.30       0.2587 
        at 0.40       0.2244 
        at 0.50       0.1798 
        at 0.60       0.1224 
        at 0.70       0.0868 
        at 0.80       0.0559 
        at 0.90       0.0419 
        at 1.00       0.0355 
    Average precision (non-interpolated) for all rel docs(averaged over queries)
                      0.1901 
    Precision:
      At    5 docs:   0.3333
      At   10 docs:   0.2812
      At   15 docs:   0.2417
      At   20 docs:   0.2146
      At   30 docs:   0.1826
      At  100 docs:   0.1094
      At  200 docs:   0.0763
      At  500 docs:   0.0428
      At 1000 docs:   0.0267
    R-Precision (precision after R (= num_rel for a query) docs retrieved):
        Exact:        0.2177