Parallel Multimedia Index Server

Compression-Domain Text Indexing and Retrieval

Faculty: Tzi-cker Chiueh

Group Members:

Project Description

One of the major problems with existing text retrieval systems is the size of the index files. We have developed an innovative scheme that integrates text compression and text indexing in a unified framework. In particular, the proposed scheme allows direct searching through compressed text without decompressing the text first. Moreover, the framework allows a flexible tradeoff between index storage requirement and query processing speed so that a text retrieval system can dynamically re-configure itself to optimize the access performance based on observed access patterns. The working prototype of this integrated text compression and retrieval system, Codir , can support exact as well as approximate search. Currently, we are working to extend Codir in the following directions:

In the very near future, we plan to branch out to the following directions:

Publications

Software Download [tgz]

Acknowledgement

This research is supported by an NSF Career Award MIP-9502067, NSF MIP-9710622, NSF IRI-9711635, a contract 95F138600000 from Community Management Staff's Massive Digital Data System Program, as well as fundings from Sandia National Laboratory, Reuters Information Technology Inc., and Computer Associates/Cheyenne Inc.
This page is last updated on 1-2-2002.