Methods for querying large collections of documents continue
to develop. Despite the success of systems such as Google,
some queries are not satisfactorily resolved by current information
retrieval techniques, which are largely based on word matching.
Alternatives include approaches such as query expansion, where
thesaural terms can be learnt from collections and used to
broaden search, and use of information that can be derived from
sources such as query logs.
Structures for Efficient Retrieval
is essential to support efficient retrieval of information from
large databases. Specialized index structures are required for
different database applications. Spatial and multimedia databases
require index structures that support queries involving high-dimensional
data, whereas text databases require index structures that support
queries involving the text structure as well as ranking of answers.
Image and Video Retrieval
Image database management systems
should support efficient storage, retrieval and management of
large collections of images. Applications include managing images
for medical diagnosis, satellite data, and criminal records.
Some applications such as surveillance cameras need to handle
video as well as image data. Research issues being investigated
include software architectures, data models, query languages,
image matching algorithms, and indexing to support retrieval
based on image content as well as human assigned attributes.
International projects such as the Human Genome Initiative are
generating vast quantities of data, in particular strings representing
DNA. If good use is to be made of this data it must be possible
to search it both quickly and intelligently, using appropriate
fuzzy matching techniques. Topics being investigated include
indexing, compression, and advanced query evaluation techniques.
Text Query Evaluation
Queries on document collection are in principle extremely costly:
in a large collection, millions of documents have some level
of match to a typical query. Heuristics can drastically reduce
the costs. One approach is to exploit properties of text
matching functions, and provide accelerated searching for some
classes of queries. Another approach is to use phrases to
efficiently narrow search to smaller sets of documents.
Music and Audio
There are many properties of music that potentially
of stored performances to queries, such as musical theme or texture.
These properties can in principle be used as the basis of music
retrieval system. Another application is recommender systems,
where items of music can be linked by style.
Selected Source Code
- The zettair search engine download page
- 2002 integer compression code (zip file, 24k) from F. Scholer, H.E. Williams, J. Yiannis, and J. Zobel, ``Compression of Inverted Indexes For Fast Query Evaluation'' , In K. Jarvelin and M. Beaulieu and R. Baeza- Yates and S. H. Myaeng, Proc. ACM-SIGIR International Conference on Research and Development in Information Retrieval, Tampere, Finland, 222-229, 2002.
- Fast hashing code (web page with information and downloads) from J. Zobel, S. Heinz, and H.E. Williams, ``In-memory Hash Tables for Accumulating Text Vocabularies'', Information Processing Letter, 80(6), 271-277, 2001.
- 1999 integer compression code (tar gzip file, 12k) from H.E. Williams, and J. Zobel, ``Compressing Integers for Fast File Access'', Computer Journal, 42(3), 193-201, 1999.
- Fast Nucleotide Compression code (tar gzip file, 21k) from H.E. Williams, and J. Zobel, ``Compression of nucleotide databases for fast searching'', Computer Applications in the Biosciences, 13(5), 549-554, 1997.