search engine group logo

The Zettair Search Engine

Zettair is a compact and fast text search engine designed and written by the Search Engine Group at RMIT University. It was formerly known as Lucy.

Overview

Zettair allows you to index and search HTML (or TREC) collections. It has been designed for simplicity as well as speed and flexibility, and its primary feature is the ability to handle large amounts of text. It has a single executable, which performs both indexing and searching: when an index doesn't exist, Zettair will create one for you based on the parameters you provide, and when you do have an index, Zettair will use that index to search the indexed data. It has a simple command-line interface, and supports ranked, simple (non-nested) Boolean, and phrase queries.

Features of Zettair include:

As of 0.6.1, the Zettair search engine has been tested on the following platforms:

We believe that it should operate under most POSIX-like environments.

Zettair was written in C and is licensed under a BSD-style license.

Latest News

29th September, 2006

The Windows executables for Zettair 0.9.3 have been updated so that they identify themselves with the correct version number.

8th September, 2006

Zettair 0.9.3 is now available, This is a bug-fix version, for the following issues:

Due to the nature of the docmap bug, we recommend that everyone upgrade to 0.9.3 from any previous 0.9 release.

10th July, 2006

Zettair 0.9.2 is now available, which fixes a number of minor bugs found since the 0.9.1 release.

29th June, 2006

Zettair 0.9.1 is now available, which simply fixes a number of minor bugs found since the 0.9 release. In addition to source downloads, we are now offering pre-compiled windows binaries for download.

6th June, 2006

Zettair 0.9 is now available. There has been a huge amount of change from 0.6.1, including:

old news

Downloading

The latest version of Zettair is 0.9.3 and is available for download here.

Documentation

Visit the Documentation page. Note: The documentation is also included in the download, in the directory zettair/doc.

Getting Started

Read our getting started with HTML documents guide. Getting Zettair up and running has seven simple steps!

It's also easy to get Zettair going for your TREC experiments. Here's a simple getting started with TREC guide.

FAQ

Q: How do i index formats other than HTML and TREC?
A: The HTML indexer will work over text data as well (things that look like markup may be ignored). Simply index your text data as HTML and search as normal. You can use common filters such as Antiword and ps2ascii to convert Microsoft Word documents, PostScript and PDF files into text data suitable for searching with Zettair.

Q: How much data is Zettair capable of handling?
A: We're not really sure. We've indexed over 100GB of data without problems. As of 0.6.1, we've indexed the 426GB TREC terabyte track collection.

Q: Are indexes portable between machines?
A: Indexes are currently portable between different machines, but aren't portable between different architectures. Unfortunately, we currently don't store floating point document weights portably. You may also experience problems if your source documents are stored in different locations on different machines.

Q: I want Zettair to ... and it currently doesn't. Help?
A: You can email us at zettair@cs.rmit.edu.au. We'll try our best to help, or provide some advice.

Q: I found a bug in Zettair, where do i report it?
A: Please mail us at zettair@cs.rmit.edu.au we'll deal with it.

Credits

Zettair contains Search Engine Group research contributions primarily from Justin Zobel, Hugh Williams, Falk Scholer, John Yiannis, Steffen Heinz, Nicholas Lester, William Webber, Alistair Moffat and Anh Vo.
Zettair contains source code contributions from Nicholas Lester, Hugh Williams, Justin Zobel, Falk Scholer, Dirk Bahle, John Yiannis, Bodo von Billerbeck, Steven Garcia and William Webber.



Last modified 8 September 2006