Network Centric Buffer Cache Organization
 

Faculty: Tzi-cker Chiueh
Group Members:


Motivation

Network storage architecture separates bit movement from control processing. In this architecture, traditional network file servers now mainly support the functions of name translation, access control, and relaying the bits between network storage servers and clients. Because most of the bits exchanged between clients and network storage servers pass through the network file servers without additional interpretation, in theory the file servers should be able to relay them without incurring additional data copying overheads, just like normal IP routers. In practice, however, this is rarely the case. Following on the legacy implementations, most modern network file servers are implemented in a layered fashion, with each layer having its own special internal data format. For example, NFS daemons are typically built on top of local file systems, which in turn may rest on top of an iSCSI module, which in turn sits on top of the TCP/IP stack, etc. Copying and transforming data across layers is the simplest way to maintain the modularity of this layered architecture. Following on these issues, the goal of this research is to develop techniques that minimize data copying overhead for "pass-through" servers such as NFS servers backed by network storage, while preserving their modular software architecture.

Unlike IP routers, a network file server backed by network storage not only relays bits between network storage servers and clients but also satisfies network file access requests using its local file cache, which is a very common case in practice. The reliance of modern network file servers on local file system cache makes it difficult to reduce the number of data copying operations in the pass-through servers because the format of file blocks in the file system cache is very different from their counterparts in the network stack. For example, in a Linux-based NFS server backed by iSCSI storage server, data is stored in the format of 1500-Byte sk_buff in the network stack and as contiguous 4-KByte or 8-KByte buffer chunks in the page/buffer cache. This requires them to be explicitly copied during the movement between the file system cache and the network protocol stack.

This project aims to develop an effective way to eliminate the unnecessary data copying operations that happen inside pass-through servers when moving data among different modules. In the meantime, the modification incurred by applying our technique should be minimized so as to make it portable to different platforms and friendly to legacy systems.

Basic Ideas

Recognizing that all data cached on a pass-through network file server will eventually be sent out on the network, we propose that the buffer/page cache in a pass-through NFS server be organized in a network-ready format and passing data between file system cache and network stack be through pointer manipulation. Each data item in the network-centric buffer cache (NCache) is called a network-centric buffer, which consists of a payload part that stores the file system data, and a metadata part that stores headers of various network protocol layers such as NFS, RPC, TCP/UDP, IP, Ethernet, etc. When a network packet containing normal file system data (i.e., non-metadata), for example an iSCSI read response or an NFS write request, arrives at the server, it is read into the network stack and cached in the network-centric cache without any modification. From this point on, both the in-kernel NFS server and the network stack access these cached data through pointers. When a cached data item is to be sent out over the network, it is moved directly from the network-centric buffer cache to the network interface card. Network packets that contain file system metadata are sent through the protocol stack in the usual way, because the pass-through network file server needs to interpret and maybe modify them. Since these packets are typically small, the overhead of physically copying them is not significant.

Two important performance benefits accrue from the proposed NCache architecture. First and foremost, it eliminates unnecessary data copying within the pass-through servers. Second, as cached data is stored in a network ready format, the amount of work required to send out a cached data item is reduced substantially. For example, the protocol headers do not need to be repeatedly allocated and deallocated, as they are pre-allocated and stored in the cache. Also, the checksum of a cached block can be either pre-computed or inherited from the payload's originator, and does not need to be calculated repeatedly every time the block is sent out.

Current Status

A fully operational prototype has been built on Linux with kernel version 2.4.19. Excluding the standalone NCache module, it only incurs fewer than 150 lines of code modified in Linux kernel. NCache has been successfully integrated with in-kernel NFS server and an in-kernel static Web server, kHTTPd. The performance measurement on in-kernel NFS server backed by iSCSI storage shows up to 90% throughput improvement when using NCache compared to the original one, and for the Web server up to 47% gain.


Publications

Related Work



Last Modified: 1/25/2005 by Gang Peng