SBFilter : A Fast URL Filter Engine for Internet Access Management
A critical software technology to the effective use of the Web as a productivity enhancing tool is an access filter that focuses users to the part of the Web space that is both high quality and relevant to the context in question. Such a filter mechanism is useful in classroom settings to actively direct students' web accesses to the set of pages custom built for a given subject area, or passively prevent students from accessing improper web sites. In corporate environments, such a filter tool is essential to strike a balance between exploiting the wealth of information over the Web to facilitate business processes and losing productivity due to aimless or personal browsing [4].
Accesses to the global Internet usually go through a proxy web server which sits between the client machines within a subnet and the websites outside. The proxy server forwards the URL requests from client machines to external websites, receives their responses and hands them back to its client machines. Thus the proxy server provides an ideal location where web requests can be examined to determine if a particular request should be allowed access to the target web sites [5]. Earlier solutions assumed that the proxy server institutes a single access control policy that all requests have to observe. While this approach may be acceptable for a corporate proxy server, it is too restrictive for ISPs, because they are serving a large number of distinct subscribers, each potentially with a different access control requirement. It is conceivable that ultimately each household that accesses the Internet through an ISP may require a customized access control policy for each member in the family.
Given the URL of a web access, the access filter checks whether the client machine is on the Allow list and/or Disallow list of the URL. If the client is in the Disallow list the filter disallows the access. Typically each entry of the Allow and Disallow list is a client IP address. The semantics is that all accesses from clients in the Allow (Disallow list) of the target URL are allowed (disallowed). Because each entry is specified in terms of URL, this filter model can apply to email and FTP as well. A simple generalization of this basic filter model is that each entity in the external network can be associated with different Allow and Disallow lists, and the associations as well as the Allow and Disallow lists themselves can be modified at run time by authorized personnel.
To be effective, the access filter has to be able to intercept all web requests from the machines whose access to the Internet is to be managed. A natural choice is the Internet access provider (ISP) to which an education institution or a corporate organization is connected. When web accesses arrive at the ISP, the access filter consults with the accessed URL's associated Allow and Disallow lists using the client's IP address and determines whether the accesses should be denied or not. Given the industry trend that individual ISPs strive to differentiate market segments by offering special value-added services, such an access management feature appears to be a crucial ingredient to the service portfolio of future ISPs.
An important design goal of Internet access management systems is transparency, i.e., the existence of access filters is completely invisible to end users. This calls for an efficient filter engine architecture that can make the admission/rejection decision for each incoming web access without increasing user perceived delays. In this paper, we present the design and implementation of a high-performance and flexible URL request filter engine called SBFilter, which acts in conjunction with proxy server to provide internet access management. Using this filter, one can set up filtering rules based on source addresses on a per-host or a per-subnet basis, as well as on target pages down to individual file system subdirectory granularity. SBFilter improves the request filtering performance through filter caching and multiple filter processes running on one or multiple processors.
SBFilter successfully strikes a good balance between convenience and freedom of speech. Arbitrarily imposing a global filtering rule set upon all the customers of an ISP may be unconstituional since it may infringe upon the right to information of certain customers. SBFilter provides a convenient tool with which to regulate internet access on a basis which is more tailored to individual's right to information.
SBFilter uses an intelligent two-level caching scheme for fast URL request filtering. The access control rules can be specified using a simple text format and used to update the filter rule database while the proxy server is running. Being logically separate from the proxy server itself, the presence of the filter engine is completely transparent to the end user. It can also run in a clustered or non-clustered mode to provide different tradeoffs between flexibility and performance.