Parallel Multimedia Index Server
The PAMIS Project: Parallel Multimedia Index Server
Group Members:
Research Projects
- Content-Based Image Indexing:
With the advent of multimedia databases,
it becomes crucial to develop efficient mechanisms to
access image-oriented data sets other than based on textual keywords.
An important access method in such systems is to provide
"like-this" image retrieval.
We have developed a universal indexing system based on
a search data structure called Vantage-Point Tree , which
identifies the closest match in a database to a given query while
accessing only a small portion of the database elements.
The current PAMIS prototype allows indexed retrieval of multiple
nearest neighbors of a given query from a 50K-element
polygonal shape image database, and is built on top of 10 Pentium machines
connected by a 100-MHz Fast Ethernet.
Each shape is represented as a turning function.
To demonstrate this technology,
we have also built a document recognition system,
which first extracts the contour of
the first page of every document as its signature, and
represents it as a polygonal shape. Then the document signatures are
stored in PAMIS, which is searched using the Vantage-Point Tree
index at run time to match
the signature of a submitted document image query.
We have demonstrates this system in AIPA '97.
We are currently developing a multi-resolution indexing scheme based
on wavelet representations, and refining the scanned document
recognition system to accommodate noises due to contaminations,
different scanning resolutions, misalignment, tilts, and different
scales (shrink or expand).
- Indexed Retrieval of Compressed Text:
One of the major problems with existing text retrieval systems is
the size of the index files. We have developed an innovative scheme
that integrates text compression and text indexing in a unified
framework. In particular, the proposed scheme allows direct searching
through compressed text without decompressing the text first.
Moreover, the framework allows a flexible tradeoff between
index storage requirement and retrieval speed so that a text
retrieval system
can dynamically re-configure itself to optimize the access
performance based on observed access patterns.
Currently, we are enhancing this technology to include more
sophisticated term weighting schemes, as well as to include an
automatic document classification scheme for Internet documents.
- Stony Brook Video Server:
As a part of the second phase of the PAMIS project, we have developed
a LAN-based distributed video server called Stony Brook Video
Server (SBVS). This is the first video server that provides
end-to-end performance guarantee from the server's disk subsystem,
through the LAN, and to the client machines' display.
The current prototype, SBVS-II, runs over a
multi-segment Ethernet LAN.
That is, the video server and its clients can sit on different
Ethernet segments and the video playback is still guaranteed to be
smooth. SBVS-II is built on top of an off-the-shelf Pentium 100-MHz machine,
and is capable of delivering up to 45 MPEG-1 streams (1.5 Mbits/sec)
over a 100-Mbps Fast Ethernet. It currently supports automatic acquisition
and indexing of TV programs, and provides keyword-based accesses to
video archives. Currently we are building the next-generation of
SBVS that features scalable PC clusters and
automatic fault tolerance across disk and node failures.
- Terabyte Storage System:
With the goal of storing high-resolution digital video such as
surgical procedures and 2D and 3D biomedical image data,
we are working on a terabyte storage system called Stonehenge ,
which features a hierarchical storage architecture consisting of a
a large-scale software-driven disk array and a parallel array of
automated tape libraries. Our focus is on building innovative storage
management software on top of off-the-shelf commercial storage
hardare that minimizes the data retrieval response time while
cost-effectively scaling the total storage capacity to the terabyte
and petabyte range. The techniques we are investigating include
multi-resolution data compression schemes and dynamic load balancing
of parallel disk arrays.
Publications
- Tzi-cker Chiueh, Dimitris Margaritis, Srinidhi Varadarajan,
"Design, Implementation, and Evaluation of a Parallel Image Indexer,"
in Proceedings of Visual '97, San Diego, CA, December 1997.
- Tzi-cker Chiueh, Srinidhi Varadarajan,
"Sase: Implementation of A Compressed Text Search Engine,
in Proceedings of USENIX Internet Technologies and Systems,
Monterey, CA, December 1997.
- Tzi-cker Chiueh, Chitra Venkatramani, Michael Vernick,
"Design and Implementation of the Stony Brook Video Server,"
in Software -- Practice and Experience, February, 1997.
- Michael Vernick, Chitra Venkatramani, Tzi-cker Chiueh,
"Performance Evaluation of Stony Brook Video Server,"
ECSL-TR-24, February 1997.
- Michael Vernick, Chitra Venkatramani, Tzi-cker Chiueh,
"Adventures in Building The Stony Brook Video Server,"
in Proceedings of ACM Multimedia '96, Boston, MA., October 1996.
- Tzi-cker Chiueh, Allen Ballman, Kevin Kreeger,
"Multi-Resolution Indexing of Shape Images,"
ECSL-TR-41, October 1997.
- Tzi-cker Chiueh, Srinidhi Varadarajan,
"Compression-Domain Text Indexing and Retrieval,"
ECSL-TR-25 (revised), October 1997.
Acknowledgement
This project is sponsored by the Massive Digital Data
Systems (MDDS) Program sponsored by the Advanced
Research and Development Committee of the Community Management Staff,
NSF MIP-9710622 and NSF IRI-9711635 grants.