As large computing enterprises continue their migration from centralized mainframes and volumes of data on direct access storage devices (DASD) and serial tape drives, the reliance upon open systems database technology has increased. The ability to quickly adapt mass storage systems to new platforms, while retaining high performance and reliability will remain key elements for system integrators.
In earlier times, rooms full of DASD and tape drives were maintained by legions of storage operators who physically mounted and unmounted tapes and disk packs and moved them to and from libraries according to daily schedules and batch job instructions. Technology improvements allowed the use of self-contained “mass storage” units, using robotic arms to move archived storage media to and from the drive mechanisms in a matter of seconds. Further developments of storage media have enabled a cache model in which large masses of data are held in offline resources and smaller portions can be uploaded to the high-speed cache as necessary. Data availability has also been increased through the use of arrays of mirrored databases, either single or multi-threaded, for multiple simultaneous access capabilities.
Even with mirrored systems, operational concerns often require that the magnetic or other storage media be archived or “backed up.” There are also occasional system re-organizations or restructuring during which a database may be converted by copying it out in one format (“export”) and copying it back in (“import”) with a different format, or into a different structure. Back-up issues will also arise when converting from one database management system (DBMS) to another, or sharing databases. Application programs themselves may also request the operating system to make a large “save” of data files, usually after a modification is made to the file. Backing up of large volumes of data can be a time consuming and resource intensive operation. Mass storage systems, disadvantageously may be unavailable while large back-up operations are performed.
FIG. 1 illustrates a typical system in which a host computer 10 is connected to a backup system 12 and a storage system 14. Creating a physical backup of the entire database 24 of the storage system 14 often requires a large investment of time and resources. In an open, networked array of storage devices, a physical backup of a database may be handled by arranging for a DBMS 22, such as provided by Oracle Corporation, to communicate with a dedicated back-up system 12, such as the EMC Data Manager, from EMC Corporation of Hopkinton, Mass. The DBMS system vendor often supplies an Application Programming Interface (API) 20 that can be installed in the host computer to handle the scheduling of the regular backups. The DBMS system typically reads the data from the database via the local DASD interface (such as SCSI bus), and delivers a buffer of data through the API. The application running the backup may be customized or optimized for the particular mass storage system selected, such as the EMC Data Manager (EDM) which is optimized to run with EMC's Symmetrix storage system(s). The EMC backup application, or something similar, take the necessary steps to send the data over the network using a connection-oriented protocol such as Transmission Control Protocol (TCP). The receiving backup system then sends the data to a mass storage unit 18, such as to write the data to an archive tape 18A. The major drawbacks of physical backup include that logical structures, such as tables of data cannot be backed up. Further, data cannot be transferred between machines of differing operating systems. Additionally, data blocks cannot be reorganized into a more efficient layout.
Many of the major DBMS companies also provide a more generalized facility in which the data is exported as a standardized file, such as in ASCII format, as part of a so-called “logical backup.” The ASCII format permits the file to be imported into most other systems, without insurmountable compatibility problems. However, presently DMBS companies generally do not provide the API necessary for a customer to properly handle the data stream generated by the logical backup. The result is that many of the DBMSs generate very large backup files that have to be stored locally until they can be written to an archive device.
To overcome this disadvantage, some customers create their own primitive solution by attaching a physical tape drive to the machine. The logical backup data stream is then directed into a process that Unix calls a “pipe,” buffered, and then directed (“piped”) by another Unix command such as one that writes the data to the local tape drive 16, a DASD, or other demountable, writable media. A Unix pipe can be thought of as a FIFO (first-in first-out) data file having one process writing data into it serially and another process serially reading data out. When the pipe is “empty,” the reading process simply waits for more data to be written by the other process. Other non-Unix operating systems such as DOS and Windows NT emulate the Unix pipe in various ways with similar results. Logical data streams are thus directed from a database export into another process that disposes of the data to the physical storage media, thus freeing up storage resources.
This primitive solution has several disadvantages. For one thing, it requires a physical tape drive 16 to be attached to the computer host 10 generating the backup. Alternatively, the logical backup could be piped to a command that writes the data onto disk or equivalent. However, this solution would require each such machine to have huge amounts of excess storage capacity. In either case, additional operations personnel must be assigned to handling the tapes and disks, and maintaining the drives. Extra storage devices, media libraries, and personnel also take up extra space in the facility. Another alternative would be to pipe the logical backup data stream into the network interface and send it to a different machine having a DASD or tape. When dealing with very large databases, these solutions could break down entirely, due to the operational difficulties of maintaining the necessary physical media, or open network connections.
The named pipe provides a standard mechanism that can be used by processes that do not have to share a common process origin for process-to-process-to-device transfers of large amounts of data. The data sent to the named pipe can be read by any authorized process that knows the name of the named pipe. In particular, named pipes are used in conjunction with the Oracle DBMS import/export utility to perform the logical backup/restores necessary to restructure or reorganize very large data bases (VLDBs). Typically, the user creates a named pipe and runs an export utility specifying the named pipe as the output device. The DBMS sees the pipe as a regular file. Another process, including for example Oracle DBMS commands such as dd (convert and copy a file), cpio (copy files in and out), rsh (execute command on a remote system), etc., then reads from the other end of the pipe and writes the data to actual media or the network. This technique is used to write export data to local disk/tape or over the network to available disk/tape on another machine.
As mentioned, a disadvantage of the existing methods is the large amount of time it takes to perform backups, during which the database may be partially or completely offline due to read/write interlocks. Some of this delay can be reduced by segmenting export/backup files, and running several processes in parallel. Even though the logical backup process can be segmented into parallel streams by some DBMSs, the implementations may be proprietary and not necessarily adaptable for import to another DBMS. Also, disadvantageously, a dedicated disk or dedicated tape is required. Further, in known implementations, there is an inability to catalog multiple versions.
VLDB reorganization and restructuring are major operations for which there is no known highly efficient solution. Current solutions do not use the data management services of an Enterprise Data Management Tool (such as EDM). A VLDB administrator has the ability to divide large tables into separate physical partitions. The partitions can be backed up, stored, exported and imported separately from the rest of the table. Customers consider this a critical feature that should be heavily used. Existing backup systems and APIs do not have the ability to export or import partitions in parallel using all available tape drives. Any API solution would be necessarily proprietary to the DBMS vendor, and not generalizable to other systems.