| Professor Tzi-cker Chiueh | |
| Maxim Lifantsev |
The Web becomes more and more extensive source of various information resources and services used by a substantial part of the people on the globe. Consequently tools and services for helping people to quickly and accurately locate web information resources they are interested in become more crucial for usability of the Web and for real accessibility of resources provided somewhere on the Web to the users interested in these resources. Such tools -- presently represented by web search engines and directories -- try to bridge the gap between a users' information need (which is usually expressed by a keyword-based query with additional restrictions based on, for instance, a category of resources) and the set of all web pages that have some textual content and are loosely connected by hyper-links, but do not have a build-in mechanism for mapping user queries into a set of relevant web resources.
Yuntis is a project to develop a mechanism (and a software system supporting it) to allow web page authors and general web surfers to collaboratively construct a mapping from keyword-based queries to (ordered) sets of web resources, so that this mapping best satisfies (that is, democratically) the expressed collective desires of all the web authors and surfers.
The main idea of the project is to support a democratic open voting model with power delegation when interested actors (that is, web page authors and web surfers) can express their opinions (desires) on how a web search and navigation service should answer different types of queries, as well as delegate the power to do so to any other actor by means of publishing on the Web of some metadata that is to be crawled and processed by the search engine. This metadata can be used to state such things as the use of a given fraction of voting power to associate a given short textual description with a given web resource, or to associate a universal "goodness" rank on some common scale with a web resource, or to transfer that fraction of power to a given other actor. More realistically such associations and power transfers are to be stated with respect to a category in a classification directory. In addition, metadata specifying which subcategories a given category should have and what the textual descriptions of (sub)categories in the directory should be can be used to collaboratively construct the classification directory itself. All this crawled metadata when combined with a given distribution of initial amounts of power given to each actor is to be used to construct a classification directory structure and associations of text and rank values with web resources as collectively desired by the actors that have provided the metadata. This information along with various statistics on it can then be used to answer user queries and serve the directory structure and various information about web resources and actors to a user of our web search engine service.
In the absence of the proposed kind of explicit metadata one can (inaccurately) convert various available data sources such as textual composition of web pages, linkage among web pages, and the structure of an existing classification directory such as ODP, into a common (meta)data format a system can use along with explicit metadata while the later gets more widespread.
A more abstract view of the project is development (and application to creation of a web searching service) of a mechanism for collaborative construction of various relations when the power to influence the relations can be delegated among the collaborating actors (people) and the initial amount of influencing power given to each actor is an independent parameter of the system. This allows one to get results that, for example, range from being completely democratic (when all actors are given equal amount of initial power) to being "elitarian" (when a selected set of actors are initially given more power than others) to being dictatorial or personalized (when one actor is initially given most of or all the power). All these cases can utilize the same power delegation and application network among the actors, that, for instance, can allow a "dictator" to build a hierarchy of trusted subordinates and allow a population of democratically cooperating actors to "elect" representatives by delegating them the power to influence the constructed relations. Thus, the power delegation mechanism should allow each actor to provide a relatively small amount of metadata in order to accurately express its wishes on how the constructed relations should be built.
The current prototype implementation of the Yuntis engine can be accessed at http://yuntis.ecsl.cs.sunysb.edu.
The prototype consists of three different data sets powered by three different engines: http://yuntis-edu.ecsl.cs.sunysb.edu, http://yuntis-usb.ecsl.cs.sunysb.edu, and http://yuntis-wrl.ecsl.cs.sunysb.edu.
The yuntis-edu server is based on a crawl of over nine million pages located on mainly English-speaking universities and some research labs.
The yuntis-wrl server is based on a crawl of over four million pages listed in a recent snapshot of ODP.
The yuntis-usb
server is based on a recent exhaustive
crawl of the .sunysb.edu domain.
You can compare it with
Google's SUNY SB search.
(This is not a very fair comparison because
Google's search uses linkage information
outside of the .sunysb.edu domain,
but returns results only inside of the .sunysb.edu domain,
whereas Yuntis has data only from the .sunysb.edu domain,
but can return results from other domains too.)
Another point of comparison is
Inktomi-powered
official SUNY SB search.
For the most recent description of the main features of the prototype see http://yuntis-usb.ecsl.cs.sunysb.edu/about/.
| I/O-Conscious Data Preparation for Large-Scale Web Search Engines. Maxim Lifantsev and Tzi-cker Chiueh. Proceedings of 28th International Conference on Very Large Data Bases, August 20-23, 2002, Hong Kong, China, Morgan Kaufmann, Hong Kong, August 2002. | |
| A System for Collaborative Web Resource Categorization and Ranking. Maxim Lifantsev. Ph.D. Dissertation Proposal, Department of Computer Science, SUNY at Stony Brook, Stony Brook, NY, October 2001. | |
| Open Peer-Review as Web's Self-Organization Force. Maxim Lifantsev. Technical Report TR-78, ECSL, Department of Computer Science, SUNY at Stony Brook, Stony Brook, NY, February 2000. | |
| Voting Model for Ranking Web Pages. Maxim Lifantsev. In Peter Graham and Muthucumaru Maheswaran, editors, Proceedings of the International Conference on Internet Computing (Las Vegas, Nevada, U.S.A.), CSREA Press, pages 143-148, Las Vegas, June 2000. | |
| Rank Computation Methods for Web Documents. Maxim Lifantsev. Technical Report TR-76, ECSL, Department of Computer Science, SUNY at Stony Brook, Stony Brook, NY, November 1999. |
Items are listed in reverse chronological order.
| OpenGRiD project code is the early version of our prototype implementation codebase. It contains C++ libraries for building event-driven non-blocking unithreaded applications. Sample applications such as a simple web server and a minimal web proxy are included. | |
| OGProxy's code provides the code for a web proxy that does some filtering of HTTP and HTML data. |
| Group's home page: WebBase Project | |||||||||||||||||||||||||
| Implementation: Google's prototype has been developed there as part of the WebBase Project | |||||||||||||||||||||||||
Current (Known) Researchers:
| |||||||||||||||||||||||||
Main Publications:
| |||||||||||||||||||||||||
Publication Collections:
|
Projects' Home Pages:
| |||||||||||||||||||||||||||||||||||
Main Researchers:
| |||||||||||||||||||||||||||||||||||
Main Publications:
| |||||||||||||||||||||||||||||||||||
Publication Collections:
|
| Group's home page: Web Archaeology Project | |||||||||||||||||
|
Implementation:
Mercator
web crawler and other search engine related prototypes
have been created.
Also various parts of AltaVista search engine have been developed here. | |||||||||||||||||
Main Researchers:
| |||||||||||||||||
Main Publications:
| |||||||||||||||||
Publication Collections:
|
Systems:
| |||||||||||||||||||
Main Researchers:
| |||||||||||||||||||
Main Publications:
| |||||||||||||||||||
Publication Collections:
|
| Implementation: Teoma prototype search engine | |||||||
| Project's home page: DiskoWeb Project | |||||||
Current (Known) Researchers:
| |||||||
Main Publications:
| |||||||
Publication Collections:
| |||||||
News Coverage:
|
Web Information Retrieval
Resources Site
maintained by Einat Amitay,
including
|
This research project has been supported in part by National Science Foundation from grants IRI-9711635 and MIP-9710622.
| Last updated on Feb. 01, 2003 by Maxim Lifantsev | |
| Comments, Suggestions? |