Parallel Multimedia Index Server
Compression-Domain Text Indexing and Retrieval
Group Members:
Project Description
One of the major problems with existing text retrieval systems is
the size of the index files. We have developed an innovative scheme
that integrates text compression and text indexing in a unified
framework. In particular, the proposed scheme allows direct searching
through compressed text without decompressing the text first.
Moreover, the framework allows a flexible tradeoff between
index storage requirement and query processing speed so that a text
retrieval system
can dynamically re-configure itself to optimize the access
performance based on observed access patterns.
The working prototype of this integrated text compression and
retrieval system, Codir , can support
exact as well as approximate search.
Currently, we are working to extend Codir in the following
directions:
- Design and implementation of the lazy index update technique to
support real-time updates to document databases (codir-2)
- Design and implementation of the query result caching mechanism
to speed IR query processing (codir-2)
- Explore an I/O-driven execution strategy for transaction
processing systems (TSTE)
- Re-implement the Codir prototype to use multi-version concurrency
control to reduce the performance overhead due to lock contention. (codir-2)
In the very near future, we plan to branch out to the
following directions:
- Integrate Codir with a real-time TV video acquisition,
re-distribution and
storage server to provide keyword-based accesses to video repository.
- Explore automatic directory construction techniques that apply
learning algorithms to public-domain human-constructed directories
such as those from Yahoo! and Lycos.
- Investigate the design issues of information retrieval session
management systems that help individual users to manage their
personal web exploration sessions.
Publications
- Srinidhi Varadarajan; Tzi-cker Chiueh,
"Sase: Implementation of A Compressed Text Search Engine,
in Proceedings of USENIX Internet Technologies and Systems,
Monterey, CA, December 1997.
- Tzi-cker Chiueh, Srinidhi Varadarajan,
"Compression-Domain Text Indexing and Retrieval,"
Submitted to Software -- Practice and Experience, January, 1999.
- Lan Huang, Tzi-cker Chiueh,
"Efficient Real-Time Index Updates in Text Retrieval Systems,"
March, 1999.
Software Download
[tgz]
Acknowledgement
This research is supported by an NSF Career Award MIP-9502067,
NSF MIP-9710622, NSF IRI-9711635,
a contract 95F138600000
from Community Management Staff's Massive Digital Data
System Program, as well as fundings from Sandia National Laboratory,
Reuters Information Technology Inc., and Computer Associates/Cheyenne
Inc.
This page is last updated on 1-2-2002.