Gage : Quality of Service Guarantee for Cluster-Based Internet Service


Faculty: Tzi-cker Chiueh

Group Members: Chang Li, Gang Peng, Kartik Gopalan


Motivation

As Internet services evolve from a novelty to an essential building block for commercial enterprises' internal operations and external trading/commerce activities, corporate users are demanding guaranteed quality of service (QoS) from Internet service providers (ISP) through service level agreements (SLA), to hedge their Web-based systems against risks of unpredictable disruptions or breakdowns. Meanwhile, as the PC cluster becomes the de facto computing platform for scalable Internet services, ISPs need a software system that can provide guaranteed QoS on a per-user basis for large-scale PC clusters.

Project Description

Several projects address the issue of providing QoS for cluster-based Internet server systems. However, either they just provide differentiated rather than guaranteed QoS, or they require significant changes to the operating system kernel that the cluster system runs on. The Gage project, aims to develop a scalable QoS-aware resource scheduler for cluster-based Internet service which can provide guaranteed QoS on a per-user basis and is largely independent of the type of the Internet service and the hardware/software platform of the underlying cluster.

There are three components in Gage: request classification, request scheduling, and resource usage accounting. Each Internet service subscriber subscribes to a pre-specified level of QoS, and is allocated a per-subscriber request queue. When an input request arrives, the request classification module determines to which per-subscriber request queue this request should be queued in. Requests within a request queue are serviced in FIFO fashion. However, the request scheduling module determines which request from which per-subscriber queue should be serviced next to meet the QoS requirement of each service subscriber. Different input requests, even for the same Internet service, may consume different amount of system resource. The resource usage accounting module captures detailed resource usage information on each subscriber's service request, and feeds them back to the request scheduler so that it can dynamically allocate the system resource according to both subscribed QoS requirement and run-time resource consumption. These three components together physically partitions a given cluster into multiple sub-clusters each of which meets a particular subscriber's QoS requirement without interfering with one another and in a way regardless of the nature of the Internet service.

The figure below shows the system architecture of Gage:

gage-arch.JPG (58047 bytes)

Here, the implementation of Gage is for a Web server cluster running on Linux. The RDN is composed of request classification and request scheduling modules. And RPN includes resource usage accounting module. There are several key issues involved in the design:

To address the first issue, we use a variation of TCP splicing technique. In TCP splicing, a TCP connection is split into two real TCP connections: one is from the client to an intermediate node, namely RDN, and another one between the intermediate node and the host providing the service, namely RPN. Generally, the intermediate node forwards the IP packets from either end to another. However, this simple TCP splicing introduces a lot of overhead to RDN since it has to forward the entire content of response of each request from a RPN to the client. In our approach, the RPN directly sends back the response to the client instead of via RDN, which is shown in the figure above.

The request scheduling in Gage is basically a weighted round-robin scheduling mechanism. Each request queue on RDN is assigned some credit according to its customer's resource reservation in each time cycle. Periodically, the request scheduler visits each queue in a round-robin fashion and dispatches the requests in the queue according to its remaining budget to an RPN which has the lightest load. Also, when RDN gets a feedback from an RPN indicating the completion of a request, the scheduler selects a request from the queue in which the request just completed and dispatches it to the RPN.

The RPN part of Gage is implemented as a module. For each request, it detects the beginning and the end of the request by intercepting the packets passing through and captures the resource usage of each intercepted request by accounting the resource consumption by the threads servicing it. Once a request is serviced, the RPN module will send back a feedback message containing accurate resource usage of the request to RDN which uses this information to adjust the credit of the corresponding request queue accordingly. This implementation is mostly confined to the system call interface and a thin layer between the network layer and the link layer. As a result of this simplicity, Gage can be readily ported to other Internet services and platforms.


Publications


Related Projects


Last Modified: 6/2/2003 by Gang Peng