1. Field of the Invention
This invention relates to providing access to a raw data storage unit in a computer system, and, in particular, to providing such access along with some persistency.
2. Description of the Related Art
A “raw” data storage device is a data storage device that an operating system (OS) or other system-level software has allowed an application or other user-level software to use, without first creating a file system on the data storage device. The system software has no knowledge of the formatting or contents of the raw device. Thus, the raw device is a “black box” from the perspective of the system software. For example, a computer, having an OS and an application, may contain, or be connected in some manner to, a disk drive, a magnetic tape or some other data storage device, including, for example, a data storage device consisting of a plurality of LUNs (Logical Unit Numbers) within a SAN (Storage Area Network). If the OS has not placed a file system on the device, but the application has been allowed access to the device nonetheless, then the device is referred to as a raw device. In this case, the application may generally use the entire data storage device, storing and retrieving data in whatever format it chooses, without the constraints of a file system imposed by the OS. If an application or other user-level software places a file system on a data storage device, but the device does not contain a file system from the system software, then the device is nonetheless a raw device. A data storage unit (DSU), as described in greater detail below, is a uniquely identifiable data storage device or a uniquely identifiable portion of a data storage device. For example, in a SAN having multiple LUNs, with each LUN having a unique LUN ID (identification), each LUN is considered a DSU. Thus, a raw DSU is a DSU on which system-level software has not placed a file system or other system data.
For the purpose of this patent, if a DSU contains a file system, even one created by user-level software, then the DSU is referred to as a “formatted” DSU. Thus, for the purpose of this patent, a DSU is not considered a “formatted” DSU merely because it has some lower level formatting. If system-level software has placed a file system on a DSU, then the DSU is referred to as a “system-formatted” DSU.
Raw DSUs may be advantageously used in a variety of situations, in a variety of computer systems. For example, raw disks can utilize advanced block-level SAN hardware features. Also, a server computer running a database application may be connected to a raw data storage device, so that the database application may use the entire raw device for data storage in a format that is custom to the database application and that is optimized for the database application. The database application may use the storage space of a raw device more efficiently than a system-formatted device because of a reduction in data overhead, and the setup and use of the raw device may be faster without any intervention by the system software. Also, use of a raw device may lead to greater flexibility in some situations. For example, a raw device may be transferred from one computer system to another, without the constraints of a system-formatted file system.
System-formatted data storage devices provide numerous other advantages over the use of raw devices, however. One simple, but important advantage is the ability to use persistent names that are managed in a coherent local or global namespace system-wide. For example, if a computer is connected to a plurality of data storage devices and the computer is rebooted, then, after the reboot, the system software on the computer can read the file system on a system-formatted device to determine its contents, including the name of the device itself and the names of all directories and files on the device. The system software can then use these names and other information to find and access desired data, and to enable user-level software to find and access desired data. The file system on a system-formatted device also allows for the use of permissions/access control information. In contrast, for a raw device, the system software has no such structure from which to determine the device's identity and contents. In many situations, this lack of name persistency and various other limitations of raw devices may cause a variety of problems in existing computer systems, as partially described below. FIGS. 1A and 2 show two different computer systems that include raw LUNs. Each of these computer systems may encounter problems using the raw LUNs under existing system software, in some situations.
FIG. 1A shows a computer system comprising a plurality of computers, including a first computer 10A, a second computer 10B and a third computer 10C, connected to a SAN 30. The SAN 30 comprises a first LUN 34A, a second LUN 34B and a third LUN 34C. In the example of FIG. 1A, the first LUN 34A is system-formatted, with a file system, while the second LUN 34B and the third LUN 34C are raw devices. The LUNs 34A, 34B and 34C may be from any of various types of data storage devices, such as disks or tapes, or some combination of different types of devices. FIG. 1A shows a conventional Fibre Channel network 32 providing an interface between the computers 10A, 10B and 10C and the LUNs 34A, 34B and 34C, although other data interfaces, either simpler or more complex, may also be used.
As shown in FIG. 1A, the system-formatted LUN 34A includes a conventional partition table 33A and a single partition 35A. The partition 35A includes a file system 36, including a directory 38 and a plurality of files, including a first file 40, a second file 42 and a third file 44. The file system 36 may be virtually any type of file system, such as a conventional file system. Various other structures or organizations for the system-formatted LUN 34A are also possible. For this description, the primary relevant characteristics of the LUN 34A are that the LUN is system-formatted, so that it contains a file system, with one or more files. The LUN 34B also includes a conventional partition table 33B, along with a first partition 35B and a second partition 37B. The LUN 34C is shown without any partition table or any distinct partitions. Together, the LUNs 34B and 34C illustrate the fact that raw data storage devices may either be divided into multiple partitions or they may be left as a single data storage area.
The computers 10A, 10B and 10C may be substantially the same as each other, or they may be quite different. The computer 10A, for example, may comprise conventional computer hardware 20, including one or more processors, system memory, etc. The computer hardware 20 may include, in particular, a first host bus adapter (HBA) 22A and a second HBA 22B for interfacing with the SAN 30. Alternatively, the computer hardware 20 may include other interface cards or devices for interfacing with other types of data storage devices or networks.
The computer 10A also includes system software 14 running on the computer hardware 20 and a set of applications 12, including a particular application 12A, running on the system software 14. The system software 14 may include any of a wide variety of OSs, such as a version of a Windows OS from Microsoft Corporation or a distribution of Linux. The system software 14 may also include other system software, such as an advanced storage multipath manager or other software units that provide other basic or advanced capabilities. In this patent, system software may be referred to as an OS for simplicity, although any reference to an OS is not intended to preclude software that provides other functionality that is not generally included in a basic OS.
The system software 14 provides functionality for managing interactions with attached or otherwise accessible data storage devices. This functionality may be conceptually grouped together into a generalized functional unit, which will be referred to as a data storage manager 50A. Thus, the data storage manager 50A shown in FIG. 1A manages interactions between the application 12A and the LUNs 34A, 34B and 34C, for example. As one particular example, the data storage manager 50A may enable the application 12A to read from and/or write to the first file 40 on the first LUN 34A. The data storage manager 50A may also enable the application 12A to read from and/or write to the second LUN 34B, as another example.
The functions provided by the data storage manager 50A may be conceptually divided into a plurality of more specific functional groups, each of which may be represented by a different functional unit. Thus, as shown in FIG. 1B, the data storage manager 50A may include some or all of the following functional units: an advanced file manager 52A, a file system manager 54A, a storage device manager 55A, a storage path manager 56A and a HBA driver 58A. The data storage manager 50A may also include additional functionality that might not be included in any of these functional groups. Actual implementations of data storage managers are not necessarily divided into these functional units.
The HBA driver 58A may be a conventional HBA driver that provides an interface with the HBAs 22A and 22B, sending data to the respective HBAs and retrieving data from the HBAs. The storage path manager 56A may be a conventional multipath manager, selecting a path through the Fibre Channel network 32, possibly providing various advanced functions such as automatic failovers and failbacks, as well as load distribution functions. The storage device manager 55A manages the interface to particular storage devices. For example, if the LUN 34A is a different type of storage device than the LUN 34B, then the storage device manager 55A generally interfaces with the two devices in a different manner.
The file system manager 54A may be a conventional file system manager, such as one found in conventional OSs. As is well known, file system managers provide a wide variety of functions related to interfacing with a file system, including providing a namespace for addressing files and providing access control functions. The file system manager 54A may enable the application 12A to open and then read from and/or write to the first file 40, for example. The advanced file manager 52A provides advanced functions for interfacing with a file system. For example, the advanced file manager 52A may provide a copy-on-write (COW) function for interfacing with files. Thus, for example, a COW function may be used to create a redo log for the first file 40. Redo logs (or delta logs) are known in the art and are described in greater detail below.
The functionality of the data storage manager 50A has been implemented in a variety of existing computer systems, any implementation of which may be used for the data storage manager 50A of FIGS. 1A and 1B. The data storage manager 50A may be implemented as a single software unit or as a combination of multiple software units and/or of portions of one or more software units. As another alternative, portions of the data storage manager 50A may be implemented in hardware. The data storage manager 50A may be quite complex, providing a wide variety of both simple and complex functions, or it may be simpler and provide some subset of such simple and complex functions. The data storage manager 50A may be comprised of standard software routines, including those found in a conventional OS, such as a Windows OS or a Linux distribution, or it may be a custom software unit designed specifically for a particular implementation.
Referring again to FIG. 1A, suppose that the system software 14 comprises a Linux distribution and the application 12A is a database application. Suppose further that the raw LUN 34C contains a database that is managed by the application 12A, and that the LUN 34C has been given the name /dev2 based on a prior scan of data storage devices accessible through the Fibre Channel network 32. Thus, suppose, as is common practice, that the application 12A accesses the database on the LUN 34C using the name /dev2.
Now suppose, for some reason, the computer 10A is rebooted, such as because of a power failure. As is well known, the LUN 34C may not be given the same name, dev2, the next time the system software 14 scans accessible storage devices. For example, if an additional data storage device has been attached to the computer 10A or to the SAN 30, depending on the order in which the storage devices are discovered by the system software 14, the LUN 34C may be given a different name, such as the name /dev3, for example. Thus, this name given to a raw LUN by the system software is a nonpersistent name, meaning that it may change the next time the system software is rebooted or otherwise scans for accessible storage devices. Suppose, then, that the LUN 34B is given the name /dev2 this time around. Now, if the application 12A attempts to access the database using the name /dev2, the application will actually be accessing the LUN 34B, instead of the LUN 34C. Thus, the application 12A will not find the data that it is looking for and/or the application may corrupt the data that is stored on the LUN 34B.
As another example, suppose the computer 10A again has system software 14 that comprises a Linux distribution and that the computer 10A is set up to boot off the raw LUN 34C. If something happens that causes the LUN 34C to be discovered at a different position in the sequence of discovered storage devices, the computer 10A may not even be able to boot up successfully. Many experienced and knowledgeable computer users that have used raw disks have encountered these problems, or a wide variety of other problems based on the limitations of raw devices.
FIG. 2 illustrates another computer system that includes raw devices and that can have similar problems with the use of the raw devices. The computer system of FIG. 2 also includes a plurality of computers, including a first computer 10G and a second computer 10H. The computers 10G and 10H are connected to the same Fibre Channel network 32 as is illustrated in FIG. 1A, which provides access to the same system-formatted LUN 34A and the same raw LUNs 34B and 34C, which are also illustrated in FIG. 1A. The computers 10G and 10H may be substantially the same as each other and as the computer 10A, or they may be quite different. For example, the computer 10G may comprise the same computer hardware 20 as the computer 10A, including the same HBAs 22A and 22B.
Some of the software that is loaded onto the computer 10G is different, however, from the software that is loaded onto the computer 10A. In this example, the computer 10G is used to host a virtual computer system. Thus, a kernel 68 for the virtual computer system is loaded onto the computer hardware 20. The kernel 68 supports one or more virtual machine monitors (VMMs), such as a first VMM 64A and a second VMM 64B. Each of the VMMs in this system supports a single virtual machine (VM), although other configurations are also possible. Thus, the first VMM 64A supports a first VM 63A and the second VMM 64B supports a second VM 63B. Any of a wide variety of known or new virtual computer systems may be implemented in the computer 10G. The computer 10H, along with other possible computers in the computer system, may also implement virtual computer systems, although this is not necessary.
Going into more detail, the kernel 68 includes a number of software modules for supporting both the VMMs 64A and 64B and the VMs 63A and 63B, including a virtual storage manager 69 and a data storage manager 50B. The virtual storage manager 69 allocates available data storage resources between the multiple VMs in the virtual computer system, including the VMs 63A and 63B. The virtual storage manager 69 may present the data storage resources allocated to a VM as one or more virtual LUNs, or in some other form. The data storage manager 50B may include substantially the same functionality as the data storage manager 50A. Thus, the data storage manager 50B may include an advanced file manager, a file system manager, a storage device manager, a storage path manager and a HBA driver, which may be substantially the same as the corresponding functional units illustrated in FIG. 1B and described above.
The VMM 64A may also include a number of software modules for supporting the VM 63A. For example, the VMM 64A may include emulation software that exports virtual hardware 60 for the VM 63A. The virtual hardware 60 may implement any of a wide variety of different computer architectures. For example, the virtual hardware 60 may implement the same hardware platform as the underlying physical computer hardware 20. In particular, the VMM 64A may include a HBA emulator 66 for exporting a virtual HBA 62 within the virtual hardware 60. The VMM 64B may be substantially the same as the VMM 64A, or it may be substantially different.
The VM 63A may be loaded with guest system software and user software, just like a physical computer system. Thus, for example, the same system software 14 and the same applications 12 that were described above in connection with FIG. 1A may be loaded onto the virtual hardware 60. In this case, however, the “guest system software” 14 within the VM 63A is not considered system software with respect to the physical computer hardware 20 or the LUNs 34A, 34B and 34C, because the guest system software 14 has no control over system-level functions within the actual hardware. Instead, in relation to the physical hardware of the computer system, the guest system software 14 is considered user-level software. Thus, if the guest system software 14 creates a file system on the raw LUN 34C, for example, the raw LUN 34C remains a raw LUN and does not become a system-formatted LUN. In this case the kernel 68 is the system-level software with respect to the physical hardware, and only the kernel 68 can add a file system to a raw DSU to convert it into a system-formatted DSU.
The virtual computer system implemented within the computer 10G may use any of the LUNs 34A, 34B and 34C in a wide variety of manners. For example, the virtual storage manager 69 may allocate the entire raw LUN 34C for use by the VM 63A. The virtual storage manager 69 may present the LUN 34C as a virtual raw LUN within the VM 63A, and it may present it as the only LUN accessible to the VM 63A. The system software 14 in the VM 63A might boot up off the raw LUN 34C, and/or software within the VM 63A, such as one of the applications 12, might access data on the LUN 34C, accessing the LUN through the virtual HBA 62. As another alternative, the virtual storage manager 69 may allocate a partition of the raw LUN 34B for use by the VM 63A and present it as an entire virtual raw LUN.
The virtual computer system implemented within the computer 10G may run into problems that are similar to the ones described above with respect to the computer system illustrated in FIG. 1A. Thus, for example, if the computer 10G is rebooted for some reason, the virtual storage manager 69 may identify the wrong raw LUN as the LUN that has been allocated to the VM 63A, based on the order in which the data storage devices are discovered during a scan of accessible data storage devices by the data storage manager 50B. Thus, the VM 63A may attempt to reboot from the wrong LUN and/or the applications 12 may attempt to access data on the wrong LUN. Accordingly, a variety of things can go wrong, such as the VM 63A not booting correctly or one of the applications 12 corrupting data on one of the LUNs.
What is needed therefore is a better way to provide access to a raw DSU, one that overcomes some of the current limitations on the use of raw DSUs. For example, it would be advantageous to provide access to raw DSUs in a manner that provides a persistent naming capability.