Database management systems (DBMS) have traditionally dealt with structured data that is stored in rows and columns. A row, or a tuple of column pieces, is also called relational data. Relational data is typically hundreds of bytes per row and is much smaller than unstructured (or file) data that has traditionally been managed in file systems. A single file (or LOB datatype) object can be anywhere from tens of kilobytes to hundreds and thousands of megabytes and as a result, passing such enormous amounts of bulk data between the network and the disk differs from how a row is transferred between the network and the disk.
In a DBMS, relational data may be passed from the network to underlying storage subsystem of the DBMS without any loss of performance. Optionally, storage of relational data in a database may involve reading in the data values from the network, writing the data values to a cache, and storing the relational data values to the disk. A cache is a collection of data that is a duplication of original values stored elsewhere or computed earlier, when the original data is expensive to fetch or to compute relative to retrieval from the cache.
In a database management system, a large object may be “streamed” into the database management system as a collection of a large number of small network packets. If each network packet of a large object is passed from the network to storage one at a time, then the performance of the database management system may suffer because each network packet would require space allocation, a storage layer update, and multiple Input/Output (I/O) calls for a small amount of data. The piecemeal space allocation for the large object may leave the disk fragmented and subsequent reads of the data may suffer due to the fragmentation. The small and frequent storage layer updates and I/O calls result in suboptimal performance for a large object write. Furthermore, the small disk I/Os waste disk bandwidth with the disk head seek and rotate involved in writing the large object data.
FIG. 1 is a block diagram of a system that illustrates one approach for storage of large objects. In FIG. 1, a Client 100 sends a Network Packet 102 with data for a large object over a Network 104 and the Network Packet 102 is stored temporarily in a Network Component Buffer 106. The Network Component Buffer 106 is a proprietary data structure of the network package that is used for the Network 104. Next, the Database Server 108 allocates space on Disk 114 in accordance with the network packet size and writes the contents from the Database Buffer Cache 110 to Disk 114, as shown with Packet1 of Large Object 116, Packet2 of Large Object 118, and Packet3 of Large Object 120.
As shown in FIG. 1, embodiments of this approach place a strain on the Database Server 108 resources with the streaming of data for the large object from a Network 104. For example, the approach in FIG. 1 requires frequent expensive memory copy calls to stream a large object with the copy of data from the Network Component Buffer 106 to the Database Buffer Cache 110 and from the Database Buffer Cache 110 to the Disk 114 for each network packet received which may place a strain on the Processor 112. As discussed above, the small space allocation, small storage layer updates and small I/Os to disk affect the DBMS loss of performance. As shown in FIG. 1, streaming the data from a Network 104 for the large object results in fragmentation on Disk 114 because the disk space is allocated upon receipt of each network packet and therefore, the space allocation does not result in contiguous blocks on disk. Thus, there is a need to reduce both the fragmentation and the expense on the DBMS that result from the storage of a large object on a disk.
Although embodiments are described in reference to a database server, it should be noted that the state maintenance in the access of a large object can also be used with other types of servers that store large objects.