1. Technical Field
This application relates to managing data storage for databases based on application awareness.
2. Description of Related Art
Storage devices are employed to store data that is accessed by computer systems. Examples of basic storage devices include volatile and non-volatile memory, floppy drives, hard disk drives, tape drives, optical drives, etc. A storage device may be locally attached to an input/output (I/O) channel of a computer. For example, a hard disk drive may be connected to a computer's disk controller.
As is known in the art, a disk drive contains at least one magnetic disk which rotates relative to a read/write head and which stores data nonvolatilely. Data to be stored on a magnetic disk is generally divided into a plurality of equal length data sectors. A typical data sector, for example, may contain 512 bytes of data. A disk drive is capable of performing a write operation and a read operation. During a write operation, the disk drive receives data from a host computer along with instructions to store the data to a specific location, or set of locations, on the magnetic disk. The disk drive then moves the read/write head to that location, or set of locations, and writes the received data. During a read operation, the disk drive receives instructions from a host computer to access data stored at a specific location, or set of locations, and to transfer that data to the host computer. The disk drive then moves the read/write head to that location, or set of locations, senses the data stored there, and transfers that data to the host.
A storage device may also be accessible over a network. Examples of such a storage device include network attached storage (NAS) and storage area network (SAN) devices. A storage device may be a single stand-alone component or be comprised of a system of storage devices such as in the case of Redundant Array of Inexpensive Disks (RAID) groups.
Virtually all computer application programs rely on such storage devices which may be used to store computer code and data manipulated by the computer code. A typical computer system includes one or more host computers that execute such application programs and one or more storage systems that provide storage.
The host computers may access data by sending access requests to the one or more storage systems. Some storage systems require that the access requests identify units of data to be accessed using logical volume (“LUN”) and block addresses that define where the units of data are stored on the storage system. Such storage systems are known as “block I/O” storage systems. In some block I/O storage systems, the logical volumes presented by the storage system to the host correspond directly to physical storage devices (e.g., disk drives) on the storage system, so that the specification of a logical volume and block address specifies where the data is physically stored within the storage system. In other block I/O storage systems (referred to as intelligent storage systems), internal mapping technology may be employed so that the logical volumes presented by the storage system do not necessarily map in a one-to-one manner to physical storage devices within the storage system. Nevertheless, the specification of a logical volume and a block address used with an intelligent storage system specifies where associated content is logically stored within the storage system, and from the perspective of devices outside of the storage system (e.g., a host) is perceived as specifying where the data is physically stored.
In contrast to block I/O storage systems, some storage systems receive and process access requests that identify a data unit or other content unit (also referenced to as an object) using an object identifier, rather than an address that specifies where the data unit is physically or logically stored in the storage system. Such storage systems are referred to as object addressable storage (OAS) systems. In object addressable storage, a content unit may be identified (e.g., by host computers requesting access to the content unit) using its object identifier and the object identifier may be independent of both the physical and logical location(s) at which the content unit is stored (although it is not required to be because in some embodiments the storage system may use the object identifier to inform where a content unit is stored in a storage system). From the perspective of the host computer (or user) accessing a content unit on an OAS system, the object identifier does not control where the content unit is logically (or physically) stored. Thus, in an OAS system, if the physical or logical location at which the unit of content is stored changes, the identifier by which host computer(s) access the unit of content may remain the same. In contrast, in a block I/O storage system, if the location at which the unit of content is stored changes in a manner that impacts the logical volume and block address used to access it, any host computer accessing the unit of content must be made aware of the location change and then use the new location of the unit of content for future accesses.
One example of an OAS system is a content addressable storage (CAS) system. In a CAS system, the object identifiers that identify content units are content addresses. A content address is an identifier that is computed, at least in part, from at least a portion of the content (which can be data and/or metadata) of its corresponding unit of content. For example, a content address for a unit of content may be computed by hashing the unit of content and using the resulting hash value as the content address. Storage systems that identify content by a content address are referred to as content addressable storage (CAS) systems.
Some storage systems receive and process access requests that identify data organized by file system. A file system is a logical construct that translates physical blocks of storage on a storage device into logical files and directories. In this way, the file system aids in organizing content stored on a disk. For example, an application program having ten logically related blocks of content to store on disk may store the content in a single file in the file system. Thus, the application program may simply track the name and/or location of the file, rather than tracking the block addresses of each of the ten blocks on disk that store the content.
File systems maintain metadata for each file that, inter alia, indicates the physical disk locations of the content logically stored in the file. For example, in UNIX file systems an mode is associated with each file and stores metadata about the file. The metadata includes information such as access permissions, time of last access of the file, time of last modification of the file, and which blocks on the physical storage devices store its content. The file system may also maintain a map, referred to as a free map in UNIX file systems, of all the blocks on the physical storage system at which the file system may store content. The file system tracks which blocks in the map are currently in use to store file content and which are available to store file content.
When an application program requests that the file system store content in a file, the file system may use the map to select available blocks and send a request to the physical storage devices to store the file content at the selected blocks. The file system may then store metadata (e.g., in an mode) that associates the filename for the file with the physical location of the content on the storage device(s). When the file system receives a subsequent request to access the file, the file system may access the metadata, use it to determine the blocks on the physical storage device at which the file's content is physically stored, request the content from the physical storage device(s), and return the content in response to the request.
In general, since file systems provide computer application programs with access to data stored on storage devices in a logical, coherent way, file systems hide the details of how data is stored on storage devices from application programs. For instance, storage devices are generally block addressable, in that data is addressed with the smallest granularity of one block; multiple, contiguous blocks form an extent. The size of the particular block, typically 512 bytes in length, depends upon the actual devices involved. Application programs generally request data from file systems byte by byte. Consequently, file systems are responsible for seamlessly mapping between application program address-space and storage device address-space.
File systems store volumes of data on storage devices, i.e., collections of data blocks, each for one complete file system instance. These storage devices may be partitions of single physical devices or logical collections of several physical devices. Computers may have access to multiple file system volumes stored on one or more storage devices.
File systems maintain several different types of files, including regular files and directory files. Application programs store and retrieve data from regular files as contiguous, randomly accessible segments of bytes. With a byte-addressable address-space, applications may read and write data at any byte offset within a file. Applications can grow files by writing data to the end of a file; the size of the file increases by the amount of data written. Conversely, applications can truncate files by reducing the file size to any particular length. Applications are solely responsible for organizing data stored within regular files, since file systems are not aware of the content of each regular file.
Files are presented to application programs through directory files that form a tree-like hierarchy of files and subdirectories containing more files. Filenames are unique to directories but not to file system volumes. Application programs identify files by pathnames comprised of the filename and the names of all encompassing directories. The complete directory structure is called the file system namespace. For each file, file systems maintain attributes such as ownership information, access privileges, access times, and modification times.
I/O interfaces transport data among the computers and the storage devices. Traditionally, interfaces fall into two categories: channels and networks. Computers generally communicate with storage devices via channel interfaces. Channels predictably transfer data with low-latency and high-bandwidth performance; however, channels typically span short distances and provide low connectivity. Performance requirements often dictate that hardware mechanisms control channel operations. The Small Computer System Interface (SCSI) is a common channel interface. Storage devices that are connected directly to computers are known as direct-attached storage (DAS) devices.
Computers communicate with other computers through networks. Networks are interfaces with more flexibility than channels. Software mechanisms control substantial network operations, providing networks with flexibility but large latencies and low bandwidth performance. Local area networks (LAN) connect computers medium distances, such as within buildings, whereas wide area networks (WAN) span long distances, like across campuses or even across the world. LANs normally consist of shared media networks, like Ethernet, while WANs are often point-to-point connections, like Asynchronous Transfer Mode (ATM). Transmission Control Protocol/Internet Protocol (TCP/IP) is a popular network protocol for both LANs and WANs. Because LANs and WANs utilize very similar protocols, for the purpose of this application, the term LAN is used to include both LAN and WAN interfaces.
Recent interface trends combine channel and network technologies into single interfaces capable of supporting multiple protocols. For instance, Fibre Channel (FC) is a serial interface that supports network protocols like TCP/IP as well as channel protocols such as SCSI-3. Other technologies, such as iSCSI, map the SCSI storage protocol onto TCP/IP network protocols, thus utilizing LAN infrastructures for storage transfers.
In at least some cases, SAN refers to network interfaces that support storage protocols. Storage devices connected to SANs are referred to as SAN-attached storage devices. These storage devices are block and object-addressable and may be dedicated devices or general purpose computers serving block and object-level data.
Distributed file systems provide users and application programs with transparent access to files from multiple computers networked together. Distributed file systems may lack the high-performance found in local file systems due to resource sharing and lack of data locality. However, the sharing capabilities of distributed file systems may compensate for poor performance.
Architectures for distributed file systems fall into two main categories: NAS-based and SAN-based. NAS-based file sharing places server computers between storage devices and client computers connected via LANs. In contrast, SAN-based file sharing, traditionally known as “shared disk” or “share storage”, uses SANs to directly transfer data between storage devices and networked computers.
NAS-based distributed file systems transfer data between server computers and client computers across LAN connections. The server computers store volumes in units of blocks on DAS devices and present this data to client computers in a file-level format. These NAS servers communicate with NAS clients via NAS protocols. Both read and write data-paths traverse from the clients, across the LAN, to the NAS servers. In turn, the servers read from and write to the DAS devices. NAS servers may be dedicated appliances or general-purpose computers.
NFS is a common NAS protocol that uses central servers and DAS devices to store real-data and metadata for the file system volume. These central servers locally maintain metadata and transport only real-data to clients. The central server design is simple yet efficient, since all metadata remains local to the server. Like local file systems, central servers only need to manage metadata consistency between main memory and DAS devices. In fact, central server distributed file systems often use local file systems to manage and store data for the file system. In this regard, the only job of the central server file system is to transport real-data between clients and servers.
SAN appliances are prior art systems that consist of a variety of components including storage devices, file servers, and network connections. SAN appliances provide block-level, and possibly file-level, access to data stored and managed by the appliance. Despite the ability to serve both block-level and file-level data, SAN appliances may not possess the needed management mechanisms to actually share data between the SAN and NAS connections. The storage devices are usually partitioned so that a portion of the available storage is available to the SAN and a different portion is available for NAS file sharing. Therefore, for the purpose of this application, SAN appliances are treated as the subsystems they represent.
Another adaptation of a SAN appliance is simply a general purpose computer with DAS devices. This computer converts the DAS protocols into SAN protocols in order to serve block-level data to the SAN. The computer may also act as a NAS server and serve file-level data to the LAN.
File system designers can construct complete file systems by layering, or stacking, partial designs on top of existing file systems. The new designs reuse existing services by inheriting functionality of the lower level file system software. For instance, NFS is a central-server architecture that utilizes existing local file systems to store and retrieve data from storage device attached directly to servers. By layering NFS on top of local file systems, NFS software is free from the complexities of namespace, file attribute, and storage management. NFS software consists of simple caching and transport functions. As a result, NFS benefits from performance and recovery improvements made to local file systems.
All database management systems (DBMSs) store and manipulate information. The relational approach to database management represents all information as “tables”. A “database” is a collection of tables, each table having rows and columns. In a relational database, the rows of a table represent records (collections of information about separate items) and the columns represent fields (particular attributes of a record). In conducting searches, a relational database matches information from a field (column) in one table with information from a corresponding field (column) of another table to produce a third table that combines requested data from both tables.
All database management systems have some mechanism for getting at the information stored in a database. Such a mechanism involves specifying data retrieval operations, often called “queries” as described below, to search the database and then retrieve and display the requested information.
All databases require a consistent structure, termed a schema, to organize and manage the information. In a relational database, the schema is a collection of tables. Similarly, for each table, there is generally one schema to which it belongs. Once the schema is designed, the DBMS is used to build the database and to operate on data within the database.
Conventional client/server distributed systems provide a centralized data storage and access facility that can serve as a DBMS, for managing information in response to data queries and update transactions. As used herein, the terms “data query” or “query” mean read-only requests for data and the terms “update transaction” or “transaction” mean any read-write operations involving changes to the data stored in the database. Client systems are connected to a network, which is connected to an application server. The client systems have client software for interfacing with server software on the application server. The client software could be any software application or module providing a user interface for issuing data queries or update transactions, such as for example, DBMS-specific client applications or more generally a Web browser application. Similarly, the server software could be a software application provided specifically for processing users' database requests or could be an application capable of providing more generalized services, such as a web server.
The application server is connected to a DBMS server, which has a data store. The DBMS server has DBMS software for managing data in the data store. DBMS software is available from many vendors, for example, Oracle Corp. of Redmond Shores, Calif., Sybase Inc. of Dublin, Calif., and International Business Machines Corp. of Armonk, N.Y., among others. As known in the art, the application server and the DMBS server could be the same computer system or different computer systems. Moreover, the application server and the DBMS server could be in the same facility, or they could be located in physically separated facilities.
A challenge with such centralized DBMSs is the limited capacity for handling a very large number of data queries or transactions. By increasing the computing power of the computer host serving the DBMS one can improve the DBMS's capacity. However, even with capital investments in advanced hardware, a company will see limited returns in terms of increased DBMS capacity.
In an attempt to provide increased capacity, some conventional client/server applications have implemented replicated DBMS systems. In such systems, multiple DBMS servers and data stores are use used to process user data queries and update transactions. With database replication, a single DBMS can be split into two or more participating systems. Each system handles a portion of the stored data as the “primary” resource, while others also store the data as a “secondary” resource. This provides both fault-tolerance (because of the duplicated data storage) and load balancing (because of the multiple resources for queries and updates).
In an example, when client systems are connected to a network, the client systems send data queries and update transactions to an application server, also connected to the network. The application server is connected to first and second DBMS servers via a load balancer and a switch. The first DBMS server has a primary database in one data store and a secondary database in another data store. Similarly, the second DBMS server has a primary database in one data store and a secondary database in another data store. In many replicated DBMS systems, the primary database served by one DBMS server is a secondary database served by a different server. For example, the primary database of the first DBMS server may be a replica of the secondary database of the second DBMS server, and the secondary database of the first DBMS server may be a replica of the primary database of the second DBMS server. In this manner, both DBMS servers can accommodate user requests thereby providing increased capacity. When the application server receives a user request, it passes the request on to the load balancer. The load balancer tracks the performance and loading of the DBMS servers to determine which server should be assigned the request. The switch provides increased communications bandwidth by separating the traffic according to the server designated to receive the request from load balancer.
Database replication has been an attractive technology for businesses that need increased reliability of database access (redundancy) or increased capacity beyond that available in one machine or locality (scalability). Although the concept of splitting the DBMS across multiple systems is simple, implementation has proved to be very complex. This complexity is realized in the form of additional systems management and programming effort. Even with this increased investment and complexity, it is recognized that many DBMS systems cannot adequately be scaled beyond two coupled systems.
The data flow in conventional DBMS systems generally follows the following steps carried out during a simple database query by a client system. As would be apparent to those skilled in the art, additional steps may be necessary for more complex queries or for database update transactions. In any event, the basic communication flow across a boundary between the client system and the application server and across another boundary between the application server and the DBMS server is representative of at least many conventional DBMS systems.
The client system issues an application-specific request to the application server. The application server receives the request from the client system and forwards the request to the DBMS server via a conventional client application programming interface (API). In the present example, the client API is a Java database connectivity (JDBC) client driver. As known in the art, APIs are language and message formats or protocols used by one application program to communicate with another program that provides services for it. APIs allow application programs to be written according to a defined standard thereby simplifying the communications between applications. Another API commonly used for database systems is the open database connectivity driver (ODBC).
The DBMS server receives the request from the application server via a server API, which may be for example, a JDBC server driver. The DBMS server executes the database query to retrieve results requested by the client. The DBMS server sends the results back to the application server via the server API (e.g., a JDBC server driver). The application server receives the results via the client API (e.g., a JDBC client driver). The application server formats the results and sends them to the client system, which receives the results requested.
Recently developed technology (e.g., Greenplum Database) provides a system and method to transparently distribute DBMS resources across multiple platforms and multiple data servers, making them broadly accessible by dispersed users and developers over networks such as the Internet. This technology extends a centralized DBMS system by adding a Resource Abstraction Layer (RAL) to a conventional database driver normally used to access a DBMS. The RAL implements DBMS resources that mirror the functionality of a centralized DBMS, but may be physically located at different networked locations. The RAL allows a plurality of remote server units (RSUs), implemented throughout the network, which receive and respond to data requests in place of the DBMS server. Each RSU maintains a database cache of recently accessed data from which incoming requests may be satisfied and can process database requests on behalf of the DBMS server. The DBMS server is contacted only if the RSU cannot respond to the request with cached data. In this case, the DBMS server processes the request as if it had been received directly from the application server. Once the DBMS server has retrieved the results of the request, it sends them back to the RSU. The RSU provides the results to the application server and stores the data in the database cache for use with future requests.
Using this technology, distributed DBMS resources can be allocated using policies implemented within the RAL. For example an RAL may distribute data requests according to geographic location, priority, time-of-day and server load. The RAL maps distribution policies to physically distributed DBMS resources (RSUs) by managing data structures that represent the state of available RSU resources. Accordingly, this technology replaces what would normally be a singular resource with one that conforms to the policy. Policies may be entered or changed while the systems are running.
This technology provides application developers with the important feature of transparency of the underlying database architecture. That is, an application program can take advantage of the benefits of load balancing and fault tolerance without the necessity for architecture-specific software coding.