search engine group logo

The Zettair Search Engine -- Performing TREC experiments

Follow these steps:
  1. Download Zettair as a zip file.
  2. Change into the directory where you've saved Zettair and unzip it:
    [hugh@hugh hugh]$ cd ~
    [hugh@hugh hugh]$ unzip zettair-0.6.0.zip
    
  3. Make and install the Zettair software:
    [hugh@hugh hugh]$ cd zettair-0.6.0
    [hugh@hugh zettair-0.6.0]$ ./configure --prefix=$HOME/local/zettair-0.6.0
    [hugh@hugh zettair-0.6.0]$ make
    [hugh@hugh zettair-0.6.0]$ make install
    
  4. Build an index on the TREC collection (example shown uses the WT10G collection):
    [hugh@hugh zettair-0.6.0]$ cd ~
    [hugh@hugh hugh] ls wt10g/
    wt10g-1.html  wt10g-3.html  wt10g-5.html
    wt10g-2.html  wt10g-4.html  wt10g-6.html
    [hugh@hugh hugh] ~/local/zettair-0.6.0/bin/zet -i -t TREC -f wt10g wt10g/wt10g-*.html
    version 0.6.0
    sources (type trec): /home/hugh/wt10g/wt10g-1.html /home/hugh/wt10g/wt10g-2.html /home/hugh/wt10g/wt10g-3.html /home/hugh/wt10g/wt10g-4.html /home/hugh/wt10g/wt10g-5.html /home/hugh/wt10g/wt10g-6.html 
    parsing /home/hugh/wt10g/wt10g-1.html...
    parsing /home/hugh/wt10g/wt10g-1.html...
    parsing /home/hugh/wt10g/wt10g-2.html...
    parsing /home/hugh/wt10g/wt10g-3.html...
    parsing /home/hugh/wt10g/wt10g-4.html...
    parsing /home/hugh/wt10g/wt10g-5.html...
    parsing /home/hugh/wt10g/wt10g-6.html...
    merging...
    
    summary: 1697027 documents, 9147236 distinct words
    
  5. Run the zet_trec executable with the TREC topic file to query the index for each of the topics:
    [hugh@hugh hugh] ls topics.*
    topics.451-500
    [hugh@hugh hugh] ~/local/zettair-0.6.0/bin/zet_trec -f topics.451-500 -r zettair -n 1000 wt10g > topics.451-500.out
    [hugh@hugh hugh] head topics.451-500.out 
    451     0.000000        WTX064-B48-194  0       25.974307       zettair
    451     0.000000        WTX008-B37-10   0       25.728757       zettair
    451     0.000000        WTX064-B48-193  0       25.691912       zettair
    451     0.000000        WTX095-B05-124  0       25.075859       zettair
    451     0.000000        WTX031-B22-288  0       24.558171       zettair
    451     0.000000        WTX064-B48-198  0       22.862540       zettair
    451     0.000000        WTX092-B49-42   0       22.187891       zettair
    451     0.000000        WTX064-B48-188  0       22.069917       zettair
    451     0.000000        WTX003-B26-249  0       21.889636       zettair
    451     0.000000        WTX011-B16-71   0       21.377611       zettair
    
  6. Use the trec_eval program to evaluate the run:
    [hugh@hugh hugh] trec_eval qrels.trec9.main_web topics.451-500.out
    
    
    Queryid (Num):       48
    Total number of documents over all queries
        Retrieved:    45107
        Relevant:      2590
        Rel_ret:       1280
    Interpolated Recall - Precision Averages:
        at 0.00       0.6097 
        at 0.10       0.3957 
        at 0.20       0.3101 
        at 0.30       0.2587 
        at 0.40       0.2244 
        at 0.50       0.1798 
        at 0.60       0.1224 
        at 0.70       0.0868 
        at 0.80       0.0559 
        at 0.90       0.0419 
        at 1.00       0.0355 
    Average precision (non-interpolated) for all rel docs(averaged over queries)
                      0.1901 
    Precision:
      At    5 docs:   0.3333
      At   10 docs:   0.2812
      At   15 docs:   0.2417
      At   20 docs:   0.2146
      At   30 docs:   0.1826
      At  100 docs:   0.1094
      At  200 docs:   0.0763
      At  500 docs:   0.0428
      At 1000 docs:   0.0267
    R-Precision (precision after R (= num_rel for a query) docs retrieved):
        Exact:        0.2177
    

  7. Page created by Nick Lester, 15 September 2003.
    Updated by William Webber for Zettair version 0.6.0, 21 July 2004.