Parallel Multimedia Index Server
Compression-Domain Text Indexing and Retrieval With Real-Time Updates
Group Members:
Project Description
One of the major problems with existing text retrieval systems is
the size of the index files. We have developed an innovative scheme
that integrates text compression and text indexing in a unified
framework. In particular, the proposed scheme allows direct searching
through compressed text without decompressing the text first.
Moreover, the framework allows a flexible tradeoff between
index storage requirement and query processing speed so that a text
retrieval system
can dynamically re-configure itself to optimize the access
performance based on observed access patterns.
The working prototype of this integrated text compression and
retrieval system, Codir-1 , can support
exact as well as approximate search.
Frequently updates to a text retrieval system usually is handled by
scheduling a system down time. This is undesirable in most cases.
To make updates and query accessing possible simulatenously
and yet not compromising performance presents us a research challenge.
Based codir-1 protoye, we extended Codir-1 to Codir-2
i.e., Compression domain text indexing and retrieval with real-time updates.
Our work has been done in the following directions:
- Design and implementation of the lazy index update technique to
support real-time updates to document databases (codir-2)
- Design and implementation of the query result caching mechanism
to speed IR query processing (codir-2)
- Explore an I/O-driven execution strategy for transaction
processing systems (TSTE)
- Re-implement the Codir prototype to use multi-version concurrency
control to reduce the performance overhead due to lock contention. (codir-2)
Publications
- Srinidhi Varadarajan; Tzi-cker Chiueh,
"Sase: Implementation of A Compressed Text Search Engine,
in Proceedings of USENIX Internet Technologies and Systems,
Monterey, CA, December 1997.
- Tzi-cker Chiueh, Srinidhi Varadarajan,
"Compression-Domain Text Indexing and Retrieval,"
ECSL technical report ECSL-25, January, 1999.
- Lan Huang, Tzi-cker Chiueh,
"Efficient Real-Time Index Updates in Text Retrieval Systems,"
March, 1999.
Software Download
[tgz]
Acknowledgement
This research is supported by an NSF Career Award MIP-9502067,
NSF MIP-9710622, NSF IRI-9711635,
a contract 95F138600000
from Community Management Staff's Massive Digital Data
System Program, as well as fundings from Sandia National Laboratory,
Reuters Information Technology Inc., and Computer Associates/Cheyenne
Inc.
This page is last updated on 1-2-2002.