The present invention is related to computer file access and in particular to improving file access in a proprietary storage system.
Networked-based storage technology has become a common storage paradigm for enterprise storage needs. As an example, FIG. 1 shows a conventional network attached storage (NAS) system comprising a NAS gateway 0103 providing access to storage. In the example shown in FIG. 1, a storage area network (SAN) 0105 provides high capacity storage that is typically required by an enterprise.
The NAS gateway 0103 acts as a file server that is attached to one or more disk arrays (data stores) 0106, 0107 via the SAN 0105. Typically, clients access files (e.g., read, write) via the NAS gateway using a common protocol such as the network file system (NFS) protocol or the common internet file system (CIFS) protocol. Files are stored in a volume 0106a of a primary disk array 0106. A data structure, or format, of the file system in the volume of the primary disk array is generally proprietary in nature, and varies from one vendor of a NAS gateway to another. Thus, only the NAS gateway software for a particular vendor of the gateway understands the file system format. Vendor-specific, proprietary formats are usually the result of the vendor's effort to increase data I/O performance, provide enhanced functionality of the file system beyond conventional functions, and so on.
A common operation of a NAS gateway is data backup. Typically, data is dynamically backed up using any of a number of known replication techniques, such as mirroring. Thus, the file system contained in the first volume 0106a is replicated on a volume 0107a of a secondary disk array 0107. An example of a secondary disk array is ATA disk drive-based disk array. This type of storage is typically used for data backup, data archiving, and so on.
FIG. 1 also shows one or more clients 0101a, 0101b who access the NAS gateway 0103 via a suitable communication network 0102. The client communicates with the NAS gateway using an appropriate protocol, e.g., NFS, CIFS, to access files. The data access that occurs at the client level is conventionally referred to as file level access.
The communication network 0102 can be a local area network (LAN), or a wide area network (WAN). Connection to the network is known. For example, in a LAN, the underlying physical layer is typically an ethernet, and the protocol is TCP/IP.
The SAN 0105 connects together the NAS gateway 0103, the primary disk array 0106, and the secondary disk array 0107. This physical layer is typically a Fibre Channel. The communication protocol is typically fibre channel protocol (FCP). Another communication protocol is iSCSI over Ethernet.
The figure also shows that one or more application servers 0104 might require access to replication data contained in the secondary data store 0107. Many types of applications might access the replication data; e.g., data backup applications, data analysis applications, and so on. Since the file system contained in data store 0106 is a proprietary system, so too is the replication data contained in data store 0107. Consequently, the application server is not able to access the data in the data store 0107 directly over the SAN 0105. Instead, the application server must access the data via the NAS gateway 0103.
Replication data that is stored as archive data is a requirement in many business operations. For example, government regulations require certain data, such as data collected by financial institutions and other financial market enterprises, to be archived for many years. Email archives, medical patient records, and the like also typically require archival for great periods of time. However, since the replicated data is contained in a proprietary format file system, that archived data becomes inaccessible if the provider of the proprietary file server system is no longer in business, or otherwise no longer supports the proprietary file system.
Continuous file activity in the file system contained in the data store 0106 invariably results in fragmentation. Files comprise one or more blocks allocated from the storage medium. As files are created and deleted, blocks are allocated and returned to a pool of blocks. Therefore over time, a file is very likely to consist of non-contiguous blocks. The result is a reduce file access performance due to the need to move the read/write heads of the data storage systems randomly about the storage media to access blocks which comprise a file. File access performance in a primary data store such as data store 0106 may not be greatly affected because of the higher performance design of a primary data store. However, the data store 0107 typically used to contain the replication data is lower cost hardware. Lower cost disk drives typically are associated with lower performance access.
It is desirable therefore to improve access performance in a proprietary file system. It is desirable to provide improved performance while still providing for the replication/backup capability of conventional data storage systems.