seacrh engine group front page search engine group members student group members search engine group projects seacrh engine group publications about the search engine group

Information Retrieval

Methods for querying large collections of documents continue to develop. Despite the success of systems such as Google, some queries are not satisfactorily resolved by current information retrieval techniques, which are largely based on word matching. Alternatives include approaches such as query expansion, where thesaural terms can be learnt from collections and used to broaden search, and use of information that can be derived from sources such as query logs.

Indexing Structures for Efficient Retrieval

Indexing is essential to support efficient retrieval of information from large databases. Specialized index structures are required for different database applications. Spatial and multimedia databases require index structures that support queries involving high-dimensional data, whereas text databases require index structures that support queries involving the text structure as well as ranking of answers.

Image and Video Retrieval

Image database management systems should support efficient storage, retrieval and management of large collections of images. Applications include managing images for medical diagnosis, satellite data, and criminal records. Some applications such as surveillance cameras need to handle video as well as image data. Research issues being investigated include software architectures, data models, query languages, image matching algorithms, and indexing to support retrieval based on image content as well as human assigned attributes.

Genomic Databases

International projects such as the Human Genome Initiative are generating vast quantities of data, in particular strings representing DNA. If good use is to be made of this data it must be possible to search it both quickly and intelligently, using appropriate fuzzy matching techniques. Topics being investigated include indexing, compression, and advanced query evaluation techniques.

Text Query Evaluation

Queries on document collection are in principle extremely costly: in a large collection, millions of documents have some level of match to a typical query. Heuristics can drastically reduce the costs. One approach is to exploit properties of text matching functions, and provide accelerated searching for some classes of queries. Another approach is to use phrases to efficiently narrow search to smaller sets of documents.

Music and Audio

There are many properties of music that potentially allow matching of stored performances to queries, such as musical theme or texture. These properties can in principle be used as the basis of music retrieval system. Another application is recommender systems, where items of music can be linked by style.

Selected Source Code

  • The zettair search engine download page
  • 2002 integer compression code (zip file, 24k) from F. Scholer, H.E. Williams, J. Yiannis, and J. Zobel, ``Compression of Inverted Indexes For Fast Query Evaluation'' , In K. Jarvelin and M. Beaulieu and R. Baeza- Yates and S. H. Myaeng, Proc. ACM-SIGIR International Conference on Research and Development in Information Retrieval, Tampere, Finland, 222-229, 2002.
  • Fast hashing code (web page with information and downloads) from J. Zobel, S. Heinz, and H.E. Williams, ``In-memory Hash Tables for Accumulating Text Vocabularies'', Information Processing Letter, 80(6), 271-277, 2001.
  • 1999 integer compression code (tar gzip file, 12k) from H.E. Williams, and J. Zobel, ``Compressing Integers for Fast File Access'', Computer Journal, 42(3), 193-201, 1999.
  • Fast Nucleotide Compression code (tar gzip file, 21k) from H.E. Williams, and J. Zobel, ``Compression of nucleotide databases for fast searching'', Computer Applications in the Biosciences, 13(5), 549-554, 1997.