The Zettair Search Engine
Zettair is a compact and fast text search engine designed and written by the Search Engine
Group at RMIT University. It was formerly known as Lucy.
Overview
Zettair allows you to index and search HTML (or TREC)
collections.
It has been designed for simplicity as well as speed and flexibility, and its primary feature
is the ability to handle large amounts of text.
It has a single executable, which performs both indexing and searching: when an index doesn't
exist, Zettair will create one for you based on the parameters you provide, and
when you do have an index, Zettair will use that index to search the indexed
data.
It has a simple command-line interface, and supports ranked,
simple (non-nested) Boolean, and phrase queries.
Features of Zettair include:
- Speed and scalability
- Boolean, ranked and phrase querying
- Modular C API for inclusion in other projects
- Native support for TREC experiments
As of 0.6.1, the Zettair search engine has been tested on the following
platforms:
- Linux
- FreeBSD
- Mac OS X (Darwin)
- Solaris
- Win32 (Windows 95, 98, NT, etc)
- Cygwin
We believe that it should operate under most POSIX-like environments.
Zettair was written in C and is licensed under a BSD-style license.
Latest News
29th September, 2006
The Windows executables for Zettair 0.9.3 have been updated
so that they identify themselves with the correct version number.
8th September, 2006
Zettair 0.9.3 is now available,
This is a bug-fix version, for the following issues:
- segfault if --stem=none is specified for indexing
- malformed output or assertion failure from zet_trec if docno-less documents are retrieved
- fix for the expanding docmap issue, where the docmap file would grow during each successive invocation of zet or zet_trec, until the index became corrupted
- summarisation bug caused lower-quality snippets to be generated
- improved handling of 'Topic:' and friends in TREC topic files, as accepted by zet_trec
Due to the nature of the docmap bug, we recommend that everyone
upgrade to 0.9.3 from any previous 0.9 release.
10th July, 2006
Zettair 0.9.2 is now available,
which fixes a number of minor bugs found since the 0.9.1 release.
29th June, 2006
Zettair 0.9.1 is now available,
which simply fixes a number of minor bugs found since the 0.9 release.
In addition to source downloads, we are now offering pre-compiled windows
binaries for download.
6th June, 2006
Zettair 0.9 is now available.
There has been a huge amount of change from 0.6.1, including:
- stemming
- transparent indexing of gzipped files (though summarisation will be slow for these files)
- an overhauled metric system, including a language for composing new metrics
- the addition of the Dirichlet-smoothed language modelling metric, now as default
- optional impact-ordered operation
- greatly decreased memory consumption
- increased querying and indexing speed
- integrated effectiveness evaluation
old news
Downloading
The latest version of Zettair is 0.9.3 and is available for download
here.
Documentation
Visit the Documentation page.
Note: The documentation is also included in the download, in the
directory zettair/doc.
Getting Started
Read our getting started with HTML documents guide. Getting Zettair up and
running has seven simple steps!
It's also easy to get Zettair going for your TREC experiments. Here's
a simple getting started with TREC guide.
FAQ
Q: How do i index formats other than HTML and TREC?
A: The HTML indexer will work over text data as well (things that look like
markup may be ignored). Simply index your text data as HTML and search as
normal. You can use common filters such as Antiword
and ps2ascii to convert Microsoft Word documents,
PostScript and PDF files into text data suitable for searching with Zettair.
Q: How much data is Zettair capable of handling?
A: We're not really sure. We've indexed over 100GB of data without
problems. As of 0.6.1, we've indexed the 426GB
TREC terabyte track collection.
Q: Are indexes portable between machines?
A: Indexes are currently portable between different machines, but aren't
portable between different architectures. Unfortunately, we currently
don't store floating point document weights portably. You may also
experience problems if your source documents are stored in different
locations on different machines.
Q: I want Zettair to ... and it currently doesn't. Help?
A: You can email us at zettair@cs.rmit.edu.au. We'll try our best to help,
or provide some advice.
Q: I found a bug in Zettair, where do i report it?
A: Please mail us at zettair@cs.rmit.edu.au we'll deal with it.
Credits
Zettair contains Search Engine Group research contributions primarily from Justin Zobel, Hugh Williams, Falk Scholer, John Yiannis, Steffen Heinz, Nicholas Lester, William Webber, Alistair Moffat and Anh Vo.
Zettair contains source code contributions from Nicholas Lester, Hugh Williams, Justin Zobel, Falk Scholer, Dirk Bahle, John Yiannis, Bodo von Billerbeck, Steven Garcia and William Webber.