1. Field of the Invention
The present invention relates to computers and computer networks. More particularly, the invention relates to scalable high speed distributive data storage and retrieval in a computer network environment.
2. Background of the Related Art
With the proliferation of computer networks such as the Internet, voluminous amount of data is generated continuously by a new generation of application such as messaging services, blogs, logs, and network monitors. Some of these applications can produce output data at a rate of several million bytes per second. Traditionally, computer disk storage offers a reasonable speed for applications to store its data. Such storage is no longer adequate to meet the demand of these new applications. To amplify the problem further, network speed is increasing at a much faster rate than that of disk systems while the need to perform parallel distributive processing on these huge dataset is becoming more the norm than exception. It becomes essential for enterprise to look for alternate storage solutions. Many modern operating systems and data management systems provide high speed data storage and retrieval solution built on top of an underlying file systems. As more features are added to the basic file system such as distributive access, indexed access, indexed sequential access, or hash access, the time necessary to access data rises quickly.
Output generated by computer programs is generally produced in the form of a sequence of data records. Such sequence of data records is destined for devices such as printer, disk storage, network adapter, or display monitor. When data is destined for disk, it is normally cached in a buffer before written onto the disk. When the cache is full, cached data must be flushed onto the physical storage before the program can continue. In the mean time, execution of the program will be blocked until the storage device is ready for more data. This method can only be used to handle output data at a rate that is less than the maximum rate of the storage device. The assumption is that the program can wait for the output device to be available before writing to it again. For the program executing in a network data capture device where each captured byte must be stored based on the network data rate, the program can no long wait for the disk device to flush its cache without risking data loss. Said in other words, operations of such program require data to be written in a continuous stream fashion without waiting for the disk storage where conventional means to write data to disk is no longer adequate.
FIG. 1 shows a conventional data flow diagram where an application program (2) reads data from a synchronous input device (1) and writes the data to the disk storage device (3). The application program (2) has control over reading of the incoming data from the synchronous input device (1), and therefore is able to wait (i.e., halt an incoming data stream) when the disk storage device (3) falls behind. Such read and write operations are referred to as synchronized (or synchronous) read and write, respectively.
FIG. 2 shows a conventional dataflow diagram where real time data is received from an asynchronous high speed data adapter (4) and is written to disk storage device (6) after it is processed by application program (5). When the incoming or output data rate reaches maximum throughput rate (e.g., limited by the disk storage device (6)), data loss or process anomaly will occur if the asynchronous high speed data adapter (4) is receiving data on a continuous basis. Such read and write operations are referred to as asynchronous read and write, respectively.