search engine group logo

The Zettair Search Engine

Zettair is a compact and fast text search engine designed and written by the Search Engine Group at RMIT University. It was formerly known as Lucy.

Overview

Zettair allows you to index and search HTML (or TREC) collections. It has been designed for simplicity as well as speed and flexibility, and its primary feature is the ability to handle large amounts of text. It has a single executable, which performs both indexing and searching: when an index doesn't exist, Zettair will create one for you based on the parameters you provide, and when you do have an index, Zettair will use that index to search the indexed data. It has a simple command-line interface, and supports ranked, simple (non-nested) Boolean, and phrase queries.

Features of Zettair include:

As of 0.6.1, the Zettair search engine has been tested on the following platforms:

We believe that it should operate under most POSIX-like environments.

Zettair was written in C and is licensed under a BSD-style license.

Latest News

3rd March, 2009

The revitalised development of Zettair is moving along nicely. To keep you all updated with the latest changes we're now providing in-development releases which are available here. These releases are not stable, but contain the latest bug fixes and patches. We'll attempt to release an in-development version monthly until a new stable release is ready.

10th February, 2009

Zettair is back in development. After an extended break we've decided to update Zettair. Initially releases will focus fixing the reported bugs that have accumulated over the past two years. Beyond that we hope to extend Zettair with new exciting new ideas along with some of the most popular feature requests. Stay tuned...

Older News

Older news postings can be found in the news archive.

Downloading

The latest version of Zettair is 0.9.3 and is available for download here.

Documentation

Visit the Documentation page. Note: The documentation is also included in the download, in the directory zettair/doc.

Getting Started

Read our getting started with HTML documents guide. Getting Zettair up and running has seven simple steps!

It's also easy to get Zettair going for your TREC experiments. Here's a simple getting started with TREC guide.

FAQ

Q: How do i index formats other than HTML and TREC?
A: The HTML indexer will work over text data as well (things that look like markup may be ignored). Simply index your text data as HTML and search as normal. You can use common filters such as Antiword and ps2ascii to convert Microsoft Word documents, PostScript and PDF files into text data suitable for searching with Zettair.

Q: How much data is Zettair capable of handling?
A: We're not really sure. We've indexed over 100GB of data without problems. As of 0.6.1, we've indexed the 426GB TREC terabyte track collection.

Q: Are indexes portable between machines?
A: Indexes are currently portable between different machines, but aren't portable between different architectures. Unfortunately, we currently don't store floating point document weights portably. You may also experience problems if your source documents are stored in different locations on different machines.

Q: I want Zettair to ... and it currently doesn't. Help?
A: You can email us at zettair@cs.rmit.edu.au. We'll try our best to help, or provide some advice.

Q: I found a bug in Zettair, where do i report it?
A: Please mail us at zettair@cs.rmit.edu.au we'll deal with it.

Credits

Zettair contains Search Engine Group research contributions primarily from Justin Zobel, Hugh Williams, Falk Scholer, John Yiannis, Steffen Heinz, Nicholas Lester, William Webber, Alistair Moffat and Anh Vo.
Zettair contains source code contributions from Nicholas Lester, Hugh Williams, Justin Zobel, Falk Scholer, Dirk Bahle, John Yiannis, Bodo von Billerbeck, Steven Garcia and William Webber.



Last modified 3 March 2009