FIG. 1 is a block diagram illustrating a conventional distributed file system and logical volume management architecture of a computer system. As shown in FIG. 1, logical volume manager 62 (at a system tier above network storage systems 16) is implemented as a software layer beneath local file system layer 64. Network storage systems 16 (e.g., disks) store data that are arranged in files 50. By executing logical volume manager 62, local file system layer 64 is presented with a data storage view represented by one or more discrete data storage volumes 66, each of which is capable of containing a complete file system data structure. The specific form and format of the file system data structure is determined by the particular file system layer 64 employed. For example, physical file systems, including the New Technology Filesystem (NTFS), the Unix Filesystem (UFS), the VMware Virtual Machine Filesystem (VMFS), and the Linux Third Extended Filesystem (ext3FS), may be used as file system layer 64.
As is conventional for logical volume managers, each of data storage volumes 66 is functionally constructed by logical volume manager 62 from an administratively defined set of one or more data storage units representing LUNs (Logical Units). Where LUN storage, at least relative to logical volume manager 62, is provided by network storage systems 16, data storage volumes 66 are assembled from an identified set of data storage units externally presented by network storage systems 16. That is, logical volume manager 62 is responsible for functionally managing and distributing data transfer operations to various data storage units of particular target data storage volumes 66. The operation of logical volume manager 62 is transparent to application 68 executed directly by a computer system or by clients of the computer system.
FIG. 2A is an architectural block diagram showing a file system and logical volume manager in a virtual machine based or virtualized computer system 72. Computer system 72 is constructed on a conventional, typically server-class, hardware platform 74 that includes host bus adapters 76 (HBA 76) in addition to conventional platform processor, memory, and other standard peripheral components (not separately shown). Hardware platform 74 is used to execute virtual machine operating system 78 (VMKernel 78) that supports virtual machine execution space 80 within which virtual machines 821-82N (VMs 821-82N) are executed. Virtual machine operating system 78 provides services and support to enable concurrent execution of VMs 821-82N. In turn, each of VMs 821-82N implements a virtual hardware platform (for example, virtual HW 84) that supports execution of a guest operating system (for example, guest operating system 86) and one or more client application programs (for example, application(s) 88). The guest operating systems may be instances of Microsoft Windows, Linux or Netware-based operating systems. Other guest operating systems can be equivalently used. In each instance, guest operating system 86 includes a native filesystem layer, typically either an NTFS or ext3FS type filesystem layer. These filesystem layers interface with virtual hardware platforms 84 to access, from the perspective of the guest operating systems, a data storage host bus adapter (HBA). The virtual hardware platforms (for example, virtual HW 84) implement virtual host bus adapters (for example, virtual HBA 90) that provide the appearance of the necessary system hardware support to enable execution of the guest operating systems transparent to the virtualization of the system hardware.
Filesystem calls initiated by the guest operating system to implement filesystem-related data transfer and control operations are processed and passed through the virtual HBAs (for example, virtual HBA 90) to adjunct virtual machine monitor layers (for example, VMM 921-92N) that implement virtual system support necessary to coordinate operation with VMKernel 78. In particular, a host bus adapter emulator (for example, HBA emulator 94) functionally enables data transfer and control operations to be ultimately passed to HBAs 76. System calls implementing data transfer and control operations are passed to virtual machine filesystem 64 (VMFS 64) for coordinated implementation with respect to ongoing operation of all of VMs 821-82N. That is, the native filesystems of the guest operating systems perform command and data transfer operations against virtual SCSI (Small Computer System Interface) devices presenting LUNs visible to the guest operating systems. These virtual SCSI devices are based on emulated LUNs actually maintained as files resident within storage space managed by VMFS 64. In this respect, VMFS 64 is to VMs 821-82N what storage system 16 (shown in FIG. 1) is to hardware platform 74. Permitted guest operating system command and data transfer operations against the emulated LUNs are mapped between LUNs visible to the guest operating systems and data storage volumes visible to VMFS 64. A further mapping is, in turn, performed by a VMKernel-based logical volume manager 62 to LUNs visible to logical volume manager 62 through data access layer 98, including device drivers (not specifically shown in FIG. 2A), and HBAs 76.
As explained above, in a virtualized computer system or any other type of computer system, a file system is typically required to provide pre-allocated (i.e., pre-grown) files to support sophisticated applications like databases, virtual machines, etc. FIG. 2B shows how a file system manages access to files stored on a disk in a virtualized computer system in more detail. In FIG. 2B, it is assumed that disk 16 is a SCSI disk accessed through a SCSI interface, although other interfaces may be used to access disk 16. VMKernel 78 of virtualized computer system 72 includes SCSI virtualization layer 620, file system 64, logical volume manager 62, device access layer 98, and device driver 628 to manage access of files 50 on disk 16.
As indicated by FIG. 2B, an application running on VM 82 that is accessing virtual disk 240 issues SCSI commands 282 to SCSI virtualization layer 620. In response, SCSI virtualization layer 620 issues file operations 284 to file system 64 based on SCSI commands 282, and in response, file system 64 converts file operations 284 to block operations 286 and provides block operations 286 to logical volume manager 62 (file system 64 manages creation, use, and deletion of files 50 stored on disk 16). In response, logical volume manager 62 issues raw SCSI operations 288 to device access layer 98 based on block operations 286. In response, device access layer 98 discovers physical storage devices such as disk 16 on a SAN (Storage Area Network) or inside a local server, and applies command queuing and scheduling policies to raw SCSI operations 288. Device driver 628 understands the input/output interface of HBA 76 (FIG. 2A) which interfaces with disk 16, and sends raw SCSI operations 288 received from device access layer 98 to HBA 76 to be forwarded to disk 16. Device driver 628 also manages HBA 76 directly and disk 16 indirectly, and is used by VMkernel 78 for interacting with HBA 76. Finally, file 50 residing on disk 16 is accessed.
FIG. 3 shows a structure of a pre-allocated file stored on a disk. A pre-allocated file is a file that is grown to a given size at the time of creation by reserving a requested number of file blocks (disk space) at creation time. For example, file 50 in FIG. 3 is pre-allocated with a file size of 8 MByte, comprised of 8 blocks 145-1, 145-2, . . . , 145-8; each with a size of 1 MByte. Each block 145-1 is comprised of sectors 147 that have a size of, for example, 512 bytes. Although FIG. 3 illustrates contiguous blocks 145-1, 145-2, . . . , 145-8, the blocks of the file 50 may not necessarily be contiguous.
Pre-allocated files are useful for a variety of reasons. First, disk space is guaranteed for a pre-allocated file, and thus there is reduced risk or no risk of an application running out of disk space at runtime because all required space is reserved at the time the pre-allocated file is created. Second, performance of an application using a pre-allocated file is enhanced because file system 64 (refer to FIG. 1 or to VMFS 64 of FIG. 2A) does not need to do block allocation and corresponding metadata IO (Input/Output) to change a file length as the application accesses newer regions of the file. Third, pre-allocated files typically have reduced fragmentation because all file blocks are allocated at the same time and file system 64 can place those allocated blocks belonging to the same file as close to each other as possible. As such, the pre-allocated file has a high chance of using contiguous blocks on disk.
A disadvantage of pre-allocated files is that file system 64 needs to initialize (for example, zero out) all blocks of the file. If the blocks are not zeroed out, application 68 using file 50 (refer to FIG. 1) will be able to access stale data remaining on disk 16. This is not desirable for security and application isolation. If blocks are not zeroed out before they are accessed by a new application, a malicious application may read and interpret stale data that was written in the context of another application, much after that other application was terminated and its file is removed from the file system. This security vulnerability is similar to the case where an application or computer system uses an unwiped (unscrubbed) hard disk 16 that belonged to another application or computer system. It is exacerbated in the case of pre-allocated files because of the relatively dynamic nature of file system block allocation as compared to hard disks.
Conventional file systems typically zero out all the blocks of a pre-allocated file when the pre-allocated file is created. However, zeroing out an entire file by doing disk writes at time of creation of a pre-allocated file is expensive, time-consuming, and ineffective for performance of the computer system because of the disk IO operations that required to zero out the blocks. This is impractical for creating large pre-allocated files in the GByte range, which is very common among databases and VM types of applications. In addition, it can be wasteful because parts of the pre-allocated file that the application may never access are also zeroed out.