Parallel Multimedia Index Server

Compression-Domain Text Indexing and Retrieval With Real-Time Updates

Faculty: Tzi-cker Chiueh

Group Members:

Project Description

One of the major problems with existing text retrieval systems is the size of the index files. We have developed an innovative scheme that integrates text compression and text indexing in a unified framework. In particular, the proposed scheme allows direct searching through compressed text without decompressing the text first. Moreover, the framework allows a flexible tradeoff between index storage requirement and query processing speed so that a text retrieval system can dynamically re-configure itself to optimize the access performance based on observed access patterns. The working prototype of this integrated text compression and retrieval system, Codir-1 , can support exact as well as approximate search.

Frequently updates to a text retrieval system usually is handled by scheduling a system down time. This is undesirable in most cases. To make updates and query accessing possible simulatenously and yet not compromising performance presents us a research challenge. Based codir-1 protoye, we extended Codir-1 to Codir-2 i.e., Compression domain text indexing and retrieval with real-time updates. Our work has been done in the following directions:

Publications

Software Download [tgz]

Acknowledgement

This research is supported by an NSF Career Award MIP-9502067, NSF MIP-9710622, NSF IRI-9711635, a contract 95F138600000 from Community Management Staff's Massive Digital Data System Program, as well as fundings from Sandia National Laboratory, Reuters Information Technology Inc., and Computer Associates/Cheyenne Inc.
This page is last updated on 1-2-2002.