The present invention generally relates to data storage and data access systems, and more particularly, to systems and techniques which provide access to shared data for applications that perform on computer systems.
The ability of modem computer and data processing systems to share data has largely contributed to the popularity and rapid expansion of computer networking industries such as the Internet. In response to this insatiable demand, computer system and software developers have created various prior art data sharing mechanisms to allow one or more computer systems to obtain access to data created, stored, or maintained by another computer system. Generally, computer systems that share data do so over a network using a standardized client/server protocol for data exchange. Many such client/server protocols exist, examples of which include database access protocols, file sharing protocols, and world wide web (WWW) based protocols. Other data sharing systems have been developed to allow two computer systems to share data from a commonly shared storage device having a direct connection to each computer.
FIG. 1 illustrates an example of a commonly used prior art client/server data sharing mechanism called the xe2x80x9cNetwork File System (NFS).xe2x80x9d Sun Microsystems, Inc. of Mountain View, California developed NFS and owns NFS as a trademark. Many commercial operating systems incorporate NFS and its widespread use has allowed NFS to become an industry standard for sharing data between networked computer systems. In the illustrated example, a mainframe computer 102 operates using the MVS operating system 105 to allow software applications (e.g., a database program, not specifically shown) that execute on the mainframe 102 to create and store data in records and MVS data sets (not shown) within the storage device 110 according to an MVS specific format. An NFS Server 109, provided as part of the MVS operating system 105, xe2x80x9cunderstandsxe2x80x9d how to properly access the MVS data stored in the MVS data sets within the storage device 110. In other words, the NFS server 109 is customized for MVS and can read MVS data sets. The NFS Server 109 can xe2x80x9cexportxe2x80x9d MVS data maintained within the storage device 110 onto the network 113 (e.g., a TCP/IP network) for access by other remote computer systems such as the Unix workstation 101. The NFS server 109 only allows file local systems to be exported in this manner. In other words, the same computing system (i.e., mainframe 102 in this example) that maintains and manages file systems and data in the storage device 110 must execute the NFS server 109 which can export those file systems.
A systems manager (a person, not shown) responsible for managing workstation 101 can configure the NFS client 108, provided as part of the Unix operating system 104 to xe2x80x9cmountxe2x80x9d the MVS file system that is xe2x80x9cexportedxe2x80x9d by the NFS server 109. Once the NFS client 108 has xe2x80x9cmountedxe2x80x9d the xe2x80x9cexportedxe2x80x9d file system over the network 113, the application 106 that executes on workstation 101 can have access to (e.g., can read and write) data on the storage device 110 via the NFS client 108. Generally, the NFS client 108 provides such data access to the application 106 over the network 113 in real time, just as if the storage device 110 containing the data were coupled locally (e.g., via a direct disk drive connection such as a SCSI cable) to the workstation 101. By way of example, when the application 106 makes operating system calls to access data on the storage device 110 (e.g., uses a function such as fopen( ) to open data), the operating system passes such calls to the NFS client 108 which relays the calls to the NFS server 109 using a standard set of NFS protocol messages. The NFS server 109 receives the NFS protocol messages, and, using its knowledge of the MVS data and storage formats, carries out the requested commands (e.g., read and/or write) on data within MVS data sets maintained in the storage device 110.
Developers of NFS (e.g., 109) often customized an NFS server to the operating system in which it resides (MVS in this example). This allows the NFS server to xe2x80x9cservexe2x80x9d data created or maintained by that operating system to one or more NFS clients (e.g., 108) over the network 113. Likewise, developers custom design NFS clients (e.g., 108) for the operating system (e.g. 104) in which they will execute to allow applications (e.g., 106) that execute on that operating system to access data over the network 113 without regard for the type of host platform (e.g., MVS mainframe 102) that is serving the data.
The most commercially available version of NFS (NFS Version 3) has been widely adopted for remote data access and incorporates about fifteen standardized NFS protocol messages or commands, which collectively comprise the NFS protocol. The NFS clients and NFS server can exchange these messages. Examples of NFS messages exchanged between the NFS client and NFS server are READ, WRITE, MKDIR, RMDIR, RENAME, LINK, MKNOD, and so forth. Those skilled in the art will recognize that these NFS message closely parallel file system commands used to manipulate directories (e.g., mkdir( ), rmdir( )), files (e.g., read( ), write( )), and data structures (e.g., link( )) associated with file systems.
NFS protocol messages and commands generally allow an NFS client operating on most types of host computer platforms or operating systems (e.g., Unix, Windows, and so forth) to access an NFS server that serves data from most any other type of host platform or operating system. Incompatibilities of operating system calls and data and/or file storage formats between the client (e.g. Unix workstation 101) and server (e.g., Mainframe 102) are largely hidden from the application 106. For example, if application 106 uses a Unix command to list files (e.g., an xe2x80x9clsxe2x80x9d command) contained within a file system provided by the NFS client 108 (i.e., served from NFS server 109), the NFS client 108 may send a standard NFS protocol massage called xe2x80x9cREADDIRxe2x80x9d to the NFS server 109. The NFS server 109 receives the READIR NFS protocol message and can use a corresponding MVS command to obtain, for instance, MVS catalog information containing the names of data sets stored on the storage device 110. The NFS server (e.g., 109) can also use the NFS protocol to return data from the storage device 110 (e.g., data from an MVS data set) over the network 113 back to the NFS client (e.g., 108) to satisfy the access requests.
FIG. 2 illustrates another prior art technique for obtaining access to data stored on a remote computer system. The technique illustrated in FIG. 2 uses a standardized protocol called the file transfer protocol (FTP) to provide a connection 113 between an FTP server 121 and an FTP client 120 to transfer an entire file, for example, from the mainframe 102 to the workstation 101. Generally, whereas NFS (FIG. 1) requires a systems manager to mount and export an NFS file system to the workstation 101, in FIG. 2, a user application 106 can invoke the FTP client 120 directly using an FTP command to cause the FTP client 120 to request the entire contents of one or more files from the FTP server 121. In response to such an FTP command, the FTP client 120 provides standard FTP protocol messages over network 113 to the FTP server 121. In response to such messages, the FTP server 121 finds and then transfers the entire contents of the requested file(s) obtained from the storage device 110 back to the FTP client 120 on the workstation 101 via the network 113. The FTP client 121 receives the data during the transfer and stores the data into a file created within the local storage device 125 (e.g., local hard disk) on the workstation 101. Once the transfer is complete, the FTP session (i.e., the FTP protocol communications between the FTP client and FTP server) is over and the application 106 can access the copy of the requested file as needed directly on the local storage device 125.
As with NFS (FIG. 1), FTP clients and FTP servers communicate using a standard set of messages that collectively define the FTP protocol. Also as with NFS, the protocol communications and the actual data transfer typically occur over the same network connection 113. Since both protocols are standardized, computers that use different operating systems and file systems (e.g., Unix and MVS) can still exchange data. FTP is generally more limited in its capabilities than NFS since FTP merely provides a complete local copy of an entire data file. FTP is also not considered a true real-time data access protocol in that the data access by an application (e.g., 106) takes place generally after the entire file has been transferred to the destination local storage device (e.g., 125). Since FTP provides only a copy of the data file for use by the application 106, changes to the original file that occur after the FTP file transfer is complete may not be reflected in the copied version of the file stored within the local storage device 125. Most versions of the NFS protocol however operate in real-time to access data immediately when the server receives a data access request.
FIG. 3 illustrates another prior art data sharing technique which is described in more detail in U.S. Pat. No. 5,950,203, entitled xe2x80x9cMethod and Apparatus for High-Speed Access to and Sharing of Storage Devices on a Networked Digital Data Processing Systemxe2x80x9d (Stakuis et al.). This reference discloses a system that purports to provide the ability for two computer systems (nodes 16 and 18) to each directly access a storage device 36 that is directly coupled via paths 44 and 46 to each node 16, 18. As explained, a xe2x80x9cfused drivexe2x80x9d approach is taken in which node 18 acts as a server to store physical file mappings and other administrative information concerning data in the storage device 36. Generally, node 18 uses a network server process 56 to act as a file server serving data via a network connection 26 to the node 16. However, for some data access operations such as bulk reads and writes, the system allows node 16 to use the direct connection 46 to the shared storage 36 to perform data access. This system provides this capability since each of the nodes 16 and 18 are assumed to have the same file system storage formats. That is, each node 16 and 18 is able to natively access the data via a file system format imposed on the shared storage device 36 that is common to both nodes 16 and 18.
Generally, the system performs data access commands locally on node 16, without going over the network 26 to the server (e.g., the bulk reads and writes), by intercepting such calls in the filter driver 66 in node 16 and issuing them locally via direct connection 46 to the share storage 36. In other words, the illustrated example routes some data access requests through the regular networking technology 26, while others that can be handled locally are by-passed and go directly to the attached storage interface 46. In order to implement the system disclosed, all participating nodes must be directly coupled (e.g., node 16 coupled via interface 46) to the shared storage 36. All participating nodes (i.e., 16, 18) must also be in communication with each other via networking (e.g., network 26) and regular network protocols must be present that allow for mounting remote file systems. In other words, a distributed file system protocol such as NFS or CIFS (functionally similar to NFS but used for computers using the Microsoft Windows family of operating systems) must be present on both nodes 16 and 18.
The general operation of the system in FIG. 3 is as follows: A configuration program on node 16 provides a xe2x80x9cmake fusedxe2x80x9d command which essentially allows a client in the upper file system 50 on node 16 (e.g., a client of network server 56) to issue a xe2x80x9cmountxe2x80x9d command to mount a remote file system from node 36 that is xe2x80x9cservedxe2x80x9d by the network server 56 in node 18 over the network 26. During the processing of the xe2x80x9cmake fusedxe2x80x9d command, the filter driver 66 in node 16 detects that a direct connection 46 exists to the storage device 36 containing the file system to be remotely mounted and can locally (e.g., within node 16) create a mapped device for this file system. This essentially allows the filter driver 66 to directly mount the file system via interface 46 for certain data access commands.
The filter driver 66 can detect and intercept all attempted accesses to files within the locally xe2x80x9cmountedxe2x80x9d file system in shared storage 36. Upon such an initial attempted access to any file (e.g., an application 48 making a call to the upper file system 50 to the createfile( ) operating system function to create a file for reading or writing), the client filter driver 66 in node 16 intercepts the call to the createfile( ) function. The filer driver 66 then uses the distributed file system protocol (e.g., NFS) to issue a write( ) request over the network 26 to the network server 56 on node 18. The network server 56 is customized to obtain the write( ) request and to create the ghost file in the storage device 36 in response. The ghost file created by network server 56 in response to the write( ) command includes a file name and a file layout. Once the file layout exists for the ghost file on node 18 (created in response to the write( ) command), the filter driver 66 on node 16 then issues a read( ) distributed file system command (e.g., using NFS) over network 26 to the network server 56 on node 18 to xe2x80x9cpretendxe2x80x9d to read the ghost file just created. The read( ) command causes the network server 56 on node 18 to return the file layout for the ghost file to the filter driver 66 in node 16. The filer driver 66 stores the file layout in a map which indicates how the ghost file, which was not actually created in device 36, would have been laid out in the shared storage device 36.
When the filter driver 66 on node 16 has cached the map containing the file layout information, subsequent read( ) and write( ) requests to the file by applications 48 can be intercepted by the filter driver 66. In response to such access requests, the filter driver 66 interrogates the map for file layout information and using this information, blocks of data for the file can be read via the direct connection path 46 to the shared storage. In the case of writes, the process is similar but direct access writes via interface 46 are restricted to storage areas within the space defined in the file layout map. If the applications 48 attempt to write additional information to the file that exceeds the size of the file as determined in the file layout map (e.g., exceeds the disk storage space currently allocated for the file by the server 150), the distributed file system protocol (e.g., NFS) is used to perform the additional writes over the network 26 between the upper file system client 50 on node 16 and the network server 56 on node 18.
FIG. 4 illustrates yet another technique for sharing data between computer systems. The technique shown in this figure is provided by a software product called xe2x80x9cSymmAPI-Accessxe2x80x9d (formerly called xe2x80x9cInstaSharexe2x80x9d), which is produced by, and is a trademark of, EMC Corporation of Hopkinton, Mass., the assignee of the present invention. As illustrated, SymmAPI-Access provides a suite of SymmAPI-Access routines 130 which may be contained, for example, in a C function library on the workstation 101. During a design and development phase of the application 106, a programmer can incorporate calls to the SymmAPI-Access routines 130 within the code of the application 106. When the application 106 is subsequently executed on the workstation 101, the SymmAPI-Access routines 130 on the workstation 101 can interact over the network 135 with a SymmAPI-Access agent 131 on the mainframe 102. A combination of the routines 130 allows, for example, the application 106 to open and read MVS data sets (not shown) stored within the shared storage device 111.
More specifically, an application 106 can make a sequence of calls to the routines 130 which send SymmAPI-Access messages (not shown) to the SymmAPI-Access agent 131 on the mainframe 102. The SymmAPI-Access messages are used to request mainframe catalog information called metadata which contains data format, disk extent, and data location information for data stored in data sets maintained by the mainframe 102 in the shared storage 111. In response to requests for metadata, the SymmAPI-Access agent 131 returns the metadata to the routines 130 in the application 106 over the network 135. The SymmAPI-Access agent 131 also handles other issues related to mainframe data access such as security, user authorization, file locking and so forth. Once the application 106 receives the metadata, the application 106 can invoke calls to other SymmAPI-Access routines 130 which use the metadata obtained from the network 135 to directly access data in the data sets over a direct connection 138 to the shared storage 111. The direct connection 138 may be a high speed SCSI or fiber optic connection, for example.
In this manner, the SymmAPI-Access product allows an application 106 on the workstation 101 to obtain direct access to data maintained by the mainframe 102 in the shared storage 111 without having to transfer the actual data through the mainframe 102 and onto the network 135. As such, network bandwidth and mainframe processor cycles are conserved. An example of a shared storage device 111 that allows multiple data connections (e.g., connection 138 to the workstation 101 and connection 137 to mainframe 102) is the Symmetrix line of data storage systems produced by EMC Corporation.
The present invention significantly overcomes many deficiencies and problems that can arise in prior art data sharing mechanisms. For example, one such deficiency is that the prior art data sharing arrangements in FIGS. 1, 2 and 3 rely heavily on the use of a processor within the computer system (e.g., mainframe 102 or node 18) that is responsible for maintaining the data to also serve the data. With respect to the arrangements in FIGS. 1 and 2 (the NFS and FTP examples), the NFS server 109 (FIG. 1) or the FTP server 121 (FIG. 2) are responsible for transferring all data from the mainframe 102 back to either the NFS or FTP clients 108, 120. In FIG. 3, while the xe2x80x9cfused drivexe2x80x9d system can handle some data access transactions locally on node 16, the system requires that others use the network server 50 to serve the data over the network 26 back to the client 66. In each of these cases, large data transfers can place a heavy burden on the processor in the server computer, and can significantly reduce the amount of bandwidth available on the network (113 in FIGS. 1 and 2, 26 in FIG. 3). Moreover, in the case of the FTP protocol (FIG. 2), the system consumes local storage space 125 with a copy of the data file, which also gives rise to consistency concerns in the data as a result of the existence of two copies of the same data.
While the data sharing arrangement in FIG. 3 does alleviate some of the network and server processing burdens by allowing some data access to be provided over the direct connection 46 to the shared storage device 36, all nodes (16 in this example) that require access to the shared data using the technique each require a direct connection to the shared storage device 35. Such nodes also require a distributed network server 50 to operate on the same node that is responsible for maintaining the data in the device 36 (node 18 in this example), giving rise to processor burden concerns noted above. Also, since the network server 50 resides on the node 18 that is primarily responsible for maintaining the data in the storage device 36, clients in other nodes (e.g., filter driver 66 in node 16) require their respective nodes to have a direct connection to the shared storage 36 in order to intercept and re-direct data access calls over the directly connected interface 46. Without such a direct connection, the system would not function.
Other disadvantages of the system described above with respect to FIG. 3 are that applications 48 that require access to the shared data must perform (i.e., execute) on the node 16 that has the direct connection 46 to the shared storage device 36. If another node executes applications which require access to the data, those other nodes must contain a direct connection of their own to the shared storage device 36.
Further still, since the filter driver 66 relies heavily on its intimate knowledge of data storage formats (e.g., maps) used to store data within the storage device 36 and provides the same data storage format to the upper and lower file systems 50, 52 in node 16, it seems apparent that such a system would incur significant problems if the data storage format used to store data in the shared storage device 36 managed by node 18 were significantly different than a storage format natively used or required by node 16. As an example, in FIG. 3, if node 18 were an MVS mainframe storing data in a flat file system of MVS data sets on the storage device 36, and node 16 were an open systems platform that used a typical Unix hierarchical file system to store data, the filter driver 66 would certainly experience difficulty when attempting to correlate the MVS flat file system storage format with the more hierarchical storage format commonly found in Unix file systems. As such, while the reference U.S. Pat. No. 5,950,203 describing this system notes that the operating systems may be different, it seems implied that each node must use the same file system format to store data. This system may be problematic in real world situations where, for instance, a Unix workstation may actually require access to mainframe or even PC data in which case the two data formats may not precisely match.
A disadvantage of the data sharing arrangement in FIG. 4 is that each application 106 must incorporate system calls to the SymmAPI-Access routines directly into the source code of the application 106. This can limit the applicability of this system to custom uses. In other words, applications developed from scratch can benefit from such a system, but third party applications must be ported to use calls to the SymmAPI-Access routines 130 (FIG. 4). Porting software to the SymmAPI-Access platform may be a labor and time intensive process requiring intimate knowledge of the application code. Moreover, many software developers are reluctant to release their source code for porting purposes.
Finally, many of the prior art data sharing arrangements are implemented primarily in conjunction the operating system of each computing platform. For example, on the client side, the NFS client 108 (FIG. 1) and the filter driver client 66 (FIG. 3) are bound tightly to the operating system which generally invokes such components when calls to the operating system are made.
The FTP system (FIG. 2) and the Symm-API Access system (FIG. 4) each somewhat remove the tight bond with the operating system and let applications 106 that operate in the user space of the workstation 106 access the data. However, each of these systems suffers from the issues noted above related to requiring the calls to be integrated into the source code of the application 106. In other words, for applications to use such systems, developers must modify application code accordingly.
Conversely, the present invention significantly overcomes many of the problems associated with prior art systems. The present invention provides a configuration for data sharing comprising configurations and techniques that provide a user space distributed file system server for accessing shared data via standard clients that operate using standard protocols. Generally, the invention operates in a networked environment where a first computing system and a second computing system, which may be a mainframe, for example, each have a dedicated connection to a shared storage device. The first and second computing systems also have a network connection between each other. The first computing system operates a data access server and can serve mainframe data, which the first computing system does not primarily maintain, to local applications on the first computing system or to applications on other computing systems that do not necessarily have a direct connection to the shared storage device containing the data to be served.
In all instances, the system of the invention uses a client/server paradigm with data access clients using standard protocols as the preferred mechanism to access the data access server. Since the server of the data is not the same computing system as the system that primarily maintains the data, processor and network bandwidth with respect to the computer system that maintains the data are significantly conserved. This allows, for instance, mainframe data to be served by, for example, an open systems computing system while allowing the mainframe to focus on other tasks besides serving the data. In situations where many clients desire access to the data, the distributed design of the invention prevents the clients from burdening a single machine to gain data access. This allows the system of the invention to quite scalable.
Using this networking configuration, the system of the invention includes a method for providing access by a first computing system to data stored in a shared storage device managed by a second computing system. The access can be provided even in situations where the data storage format provided in the shared storage by the second computing system is incompatible with a data storage format required by the first computing system, though the two formats may also be compatible.
One such method provided by the invention receives, at a data access server performed on a first computing system, a client message to access data on the shared storage device. In response to receiving the client message, the data access server retrieves data storage information provided from the second computing system coupled to the first computing system. The data storage information allows the first computing system to access the data in the shared storage device in a manner that is compatible with the first computing system. The data access server then provides access to the data on the shared storage device, directly from the data access server, based on the retrieved data storage information.
In another embodiment, the data access server is a distributed data server and the operation of receiving the client message includes providing, from the data access server to at least one data access client requiring access to data in the shared storage device, a distributed data interface that operates according to a distributed data protocol. NFS, for example, may serve as such a distributed data protocol. This allows the data access server to communicate in an off-the-shelf manner with data access clients via client messages formatted according to the protocol. In operation, the data access server accepts the client message from the data access client using the distributed data protocol over the distributed data interface provided by the data access server. The client message includes a data access command formatted in accordance with the distributed data protocol. The data access command indicates a type access to be provided to the data in the shared storage device on behalf of the client.
In another configuration, the data access server is a distributed file system data access server and the distributed data interface is a distributed file system interface provided by the data access server and the distributed data protocol is a distributed file system protocol such as NFS or CIFS. The operation of accepting the client message includes receiving the client message from the at least one data access client in accordance with the distributed file system protocol. The distributed file system protocol may be, for example, at least one of a network file system (NFS) protocol, a web based network file system protocol (e.g., WebNFS) and/or a CIFS protocol. The distributed data protocol in the case of non-file system protocols may be a CORBA data exchange protocol, a Java Beans based messaging protocol, or a hypertext transfer protocol, for instance. Other protocols which are too numerous to mention here can also be supported between the client and data access server. Such protocols allow, for instance, the data access server to serve MVS data to clients in a standard manner, without modification to the clients or the client applications.
In another arrangement, a data access client requiring access to data in the shared storage device is performed on a computing system that is different than the first computing system and the operations of providing and accepting are performed by the data access server using the distributed data access protocol over a network coupling the first computing system with the computing system performing the at least one data access client. This allows applications that execute or otherwise perform on hosts that do not have a direct connection to the shared storage to nonetheless obtain access to the data via the data access server. Prior art data sharing mechanisms generally all require the host that executes the application to also have a dedicated (i.e., not a general network) connection to the shared storage device.
In another arrangement, the operation of retrieving data storage information retrieves the data storage information from a virtual file system maintained in the first computing system by the data access server. The virtual file system generally can obtain the data storage information from the second computing system prior to receipt of a client message in response to processing formerly received client messages. That is, the virtual file system can maintain data storage information about data that has, for instance, already be accessed by client request messages. Future requests can be handled by the data storage information cached in the virtual file system, without the need to go back to the second computing system via a data access routine.
In another arrangement of the invention including the virtual file system, the operation of retrieving the data storage information from the virtual file system includes searching a number of unodes in the virtual file system to obtain a unode corresponding to the data to which access is requested in the client request message and obtaining the data storage information from the virtual file system based on the unode. Unodes, which make up the virtual file system in such an embodiment, are assigned individual portions of data and a unode stores the data storage information for that respective portion.
In another arrangement, the operation of retrieving the data storage information first determines if suitable data storage information is available locally on the first computing system to allow the data access server to provide access to the data on the shared storage device in accordance with the client message in a manner that is compatible with the first computing system. Such data storage information may be available locally, for instance, in a virtual file system. If the required data storage information is available locally, the system of the invention uses the suitable data storage information that is available locally on the first computing system as the retrieved data storage information. If not, the system retrieves, from the second computing system, the data storage information that is required for accessing the data in the shared storage device in a manner that is compatible with the first computing system.
In another arrangement, the operation of providing access to the data on the shared storage device based on the retrieved data storage information includes performing, by the data access server, at least one data access routine to access the data in the shared storage device in a manner specified in the client message. The data access routine uses the data storage information to properly locate and access the data in a format that is compatible with the first computing system.
In another arrangement used to read data, the client message requests read access to the data in the shared storage on behalf of an application and the operation of performing the data access routine(s) to access the data in the shared storage device includes the operation of reading the data in a manner specified in the client message from the shared storage device at a location specified by the retrieved data storage information and returning the data read by the operation of reading from the data access server to a data access client that originated the client message. Such an arrangement allows, for example, clients to use NFS or CIFS to read MVS data sets from a mainframe that are served via the data access server. The clients may be local to the host performing the server, or may be across a network on other remote hosts.
In other arrangements, the first computing system is an open systems computing system and the second computing system is a mainframe computing system and the operation of receiving a client message includes allowing data access client(s) to access the data access server using a distributed file system protocol to request access via the data access server to mainframe data maintained by the mainframe computing system in the shared storage device. The operation of providing access to the data on the shared storage device from the data access server includes using the data storage information retrieved from the mainframe computing system to directly and compatibly access, by the data access server, the data stored on the shared storage device as specified by a command in the client message and then serving the data to the data access client(s) from the data access server using one or more distributed file system protocols.
In a variation of the above arrangements, the data access client(s) is performed on the first computing system and acts on behalf of an application also performed on the first computing system and the operations of retrieving the client message and providing access to the data are performed between the data access client and the data access server using the distributed file system protocol within the first computing system.
In another variation, there are a plurality of data access clients and the operation of serving the data includes the process of serving data maintained by the mainframe in the shared storage device from the data access server on the first computing system to the plurality of data access clients using a distributed file system protocol.
In yet another variation, at least one of the data access clients is performed on a computing system that is different that the first and second computing systems and the operations of retrieving the client message and providing access to the data are performed over a network coupling the first computing system and the computing system that is performing the data access client.
The general methods of the invention also include operations of maintaining, on the first computing system, a virtual file system containing a plurality of nodes, with at least one node for each portion of data for which access is requested via client messages. In these embodiments, the operation of retrieving the data storage information includes determining if the data for which access is requested via the client message has a corresponding node in the virtual file system, and if so, (i) retrieving the data storage information from the corresponding node in the virtual file system, and if not, (i) retrieving the data storage information from the second computing system, (ii) creating at least one node in the virtual file system based on the retrieved data storage information; and (iii) putting at least a portion of the data storage information retrieved from the second computing system into the node created for that data in the virtual file system.
In variations of the above embodiments, the operation of maintaining includes maintaining each of the plurality of nodes in the virtual file system on the first computing device in a hierarchical format, with different levels of the hierarchical format representing different elements of a storage system managed by the second computing system. The hierarchical format, in other embodiments, maps a mainframe storage arrangement of the data stored in the shared storage device to an open systems file system arrangement.
According to other variations, the operation of maintaining maintains, for each node in the virtual file system, information concerning the relation of that node to other nodes in the virtual file system and a unique handle for the node. The operation of maintaining can also maintain, for each node in the virtual file system, data access information including at least one access position for the data within the shared storage device.
In other arrangements, the operation of retrieving data storage information includes determining if appropriate data storage information is available in a virtual file system maintained by the data access server on the first computing system based on client request parameters in the client message. If not, the operation of the system of the invention includes selecting one or more first data access routines based on a protocol command specified by the client message. Then, the operation includes performing the first data access routine(s) to allow the data access server on first computing system to communicate with the second computing system to request the data storage information from the second computer system. The operation continues by receiving a response to the data access routine(s) from the second computer system and parsing the response to the data access routine(s) to determine the data storage information and placing the data storage information into the virtual file system maintained by the data access server on the first computing system. The data storage information may be placed, for example, into a unode data structure. However, if appropriate data storage information is available in a virtual file system maintained by the data access server on the first computing system (e.g., if a unode already exists and contains the required data storage information) based on client request parameters in the client message, the operation translates client request parameters contained in the client message into data access parameters useable for the selected data access routine(s). The operation of translating uses data storage information contained in a virtual file system (e.g., an appropriate unode or other data structure) to provide a location in the shared storage device of data for which access is specified in the client request message (i.e., data that matches the unode).
In another arrangement, the operation of translating client request parameters contained in the client message includes obtaining at least one client request parameter from the client message and mapping the client request parameter(s) to at least one data access routine parameter required for performance of the data access routine(s). The data access routine parameter(s) specify data storage information to allow the data access routine to obtain access to a location of data within the shared storage device.
According to yet another arrangement, the operation of mapping includes using data access translator functions to query a virtual file system of unodes for a specific unode corresponding to a data access handle provided in the client message and obtaining from the unode the data storage information.
In another arrangement, the operation of performing the data access routine(s) includes communicating between the data access server on the first computing system and a data access agent on the second computing system to obtain the data storage information required to perform a protocol command specified by the client message. In a related arrangement, the first computing device is an open system computing system and the second computing device is a mainframe and the data storage information is contained within metadata maintained within the mainframe. In such an arrangement, the operation of communicating sends a data access request to the data access agent to return metadata obtained from a mainframe catalog for the shared storage device. The metadata includes data storage information for the data maintained by the mainframe in the shared storage device. An example of metadata would be MVS data set catalog data.
According to the general arrangement, the operation of providing access to the data on the shared storage device includes mapping the data storage information into at least one data access routine parameter of at least one data access routine. Such a data access routines may be tailored, for example, to access the shared storage device, rather than the second computing system (e.g., the mainframe). Then, using this data access routine, the system directly accesses the shared storage device by performing the data access routine(s) to send data access requests to the shared storage device. This operation also includes retrieving, in response to the data access requests, a storage device response including data for which access is requested in the client message and providing the data to a data access client that originated the client message.
Another technique provided by the system of the invention is a method for providing access to data in a shared storage device from a data access server performing on a first computing system. The data is maintained by a second computing system, such as a mainframe. By maintained, what is generally meant is that the data set is initially created by the mainframe in a mainframe data storage format, or that the data in a data set or other storage file or format is routinely manipulated by the mainframe and thus that data""s catalog or data storage information is maintained or stored on the mainframe in a mainframe compatible storage format. Though the first computing system providing the data access server can access (e.g., read and write) the data according to this invention, the first computing system is generally not the primary computer system responsible for maintaining the data.
The operation using this configuration includes accepting, by the data access server from a data access client via a distributed data protocol, a request for access to the data and then obtaining storage characteristics of the data in the shared storage device by querying the second computing system. The operation continues by creating a virtual file system maintained by the data access server based on the storage characteristics of the data obtained from the second computing system. Finally, the operation concludes by determining if the virtual file system contains sufficient information to service the request by the data access server on the first computing system, and if so, servicing the request for access to the data, and if not, obtaining data storage information from the second computing system to properly service the requests and entering the obtained data storage information into the virtual file system in order to maintain the virtual file system and using the obtained data storage information to service the request. This arrangement then allows the data access server on the first computing system to create another file system for the data, that is separate from a file system or other catalog information provided by the second computing system (e.g., a mainframe) to primarily maintain the data. The virtual file system thus provides a compatibility bridge that can be quickly accessed by the data access server to serve the data to clients. Such a virtual file system supplies needs that are generally required by file sharing protocols such as NFS or CIFS, which generally expect a hierarchical file system format.
Other arrangements of the invention that are disclosed herein include software programs to perform the data access and server operations summarized above. More particularly, a computer program product is disclosed which has a computer-readable medium including computer program logic encoded thereon in the form of code implementing a data access server. The computer program logic, when executed on at least one processing unit with a computing system, causes the processing unit to perform the operations of serving data as indicated herein and as summarized by the methods and operations above. Such arrangements of the invention are typically provided as software on a computer readable medium such as an optical, floppy or hard disk or other such medium such as firmware in a ROM or RAM chip. The software can be installed onto a computer to perform the techniques explained herein. Accordingly, just a disk or other computer readable medium that is encoded with software or other code (e.g., object code) to perform the above mentioned methods, operations and/or their equivalents is considered to be an embodiment of the invention, even without a computer system or other hardware to actually load and execute or otherwise perform the software.
The system of the invention can be embodied strictly as a software program, as software and hardware, or as hardware alone.
Other arrangements of the invention include a first computing system providing access to shared data. The first computer system includes a processor, a memory system and a shared storage interface coupling the first computing system to a shared storage device in which the shared data is maintained by a second computing system in a manner that is not natively compatible to the first computing system. The computing system further includes an interconnection mechanism coupling the processor, the memory system and the shared storage interface and a network interface coupling the first computing system to a network and the second computing system. The memory system in such an embodiment is encoded with a data access server which executes on the processor in the first computing system. When executing, the data access server receives, via the network interface, a client message to access data on the shared storage device and in response to receiving the client message, retrieves, via the network interface, data storage information provided from the second computing system coupled to the first computing system. The data storage information is stored in the memory system and allows the data access server on the first computing system to access the data in the shared storage device in a manner that is compatible with the first computing system. The data access server provides access, via the network interface, to the data on the shared storage device in conformance with the client message based on the retrieved data storage information.
In another arrangement, a virtual file system is encoded within the memory system. Furthermore, the data access server is further encoded with logic, that when executed on the processor, determines if appropriate data storage information is available in the virtual file system based on client request parameters in the client message received via the network interface. If not, the data access server when further executed causes the processor to select at least one first data access routine based on a protocol command specified by the client message and causes the processor to perform the first data access routine to allow the data access server on first computing system to communicate over the network interface with the second computing system to request the data storage information from the second computer system. The system also receives a response to the data access routine from the second computer system via the network interface and parses the response to the at least one data access routine to determine the data storage information. The system then places the data storage information into the virtual file system maintained by the data access server in the memory system.
If appropriate data storage information is available in the virtual file system based on client request parameters in the client message received via the network interface, then the data access server when further executed causes the processor to translate client request parameters contained in the client message into data access parameters useable for the selected data access routine(s). The translator uses data storage information contained in a virtual file system to provide a location in the shared storage device of data for which access is specified in the client request message.
Another arrangement provides for a computer system including a data access server. The data access server is encoded as a process and includes a distributed data interface, a plurality of data access routines, a data access translator, and a means for maintaining a virtual file system. The data access server executes in the computer system to accept, via the distributed data interface, a request for access to the data from a data access client via a distributed data protocol. The system then obtains, via the data access translator and data access routines, storage characteristics of the data in the shared storage device by querying the second computing system. The system also maintains, via the data translator, the virtual file system based on the storage characteristics of the data obtained from the second computing system. The system also determines, via the data access translator, if the virtual file system contains sufficient information to service the request by the data access server on the first computing system, and if so, services the request for access to the data via data access routines and the distributed data interface. If not, the system obtains data storage information from the second computing system via the data access routines to properly service the requests and enters the obtained data storage information into the virtual file system via the data access translator in order to maintain the virtual file system. The system also uses the obtained data storage information to service the request via the distributed data interface.
Yet another arrangement of the invention provides a system that includes a first computer system providing access to data stored in a shared storage device managed by a second computing system. In this configuration, it may be the case that a data storage format provided in the shared storage by the second computing system is incompatible with a data storage format required by the first computing system. This is not a requirement however and the invention can work as described between first and second computing system that are the same or different architectures and that use the same or different file systems, data storage formats, and so forth. In any event, the first computing system includes a distributed data interface means for receiving, at a data access server performed on a first computing system, a client message to access data on the shared storage device. In response to receiving the client message, a data access routine retrieving means is included that retrieves data storage information provided from the second computing system coupled to the first computing system. The data storage information allows the first computing system to access the data in the shared storage device in a manner that is compatible with the first computing system. The distributed data interface means generally provides access to the data on the shared storage device, directly from the data access server, based on the retrieved data storage information.
An example implementation of the invention that incorporates many of the aforementioned embodiments is the InstaShare File Server which is incorporated as part of the InstaShare software library (also called SymmAPI-Access) that is manufactured by EMC Corporation of Hopkinton, Mass. While some aspects of InstaShare are explained above with respect to FIG. 4, the system of the invention explained herein, which can be incorporated into InstaShare, is not considered prior art, nor are such operations, aspects, apparatus or techniques disclosed here a part of the functionality of the system explained with respect to FIG. 4. In other words, the system of FIG. 4 represents prior versions of InstaShare, whereas certain embodiments of the invention presented herein represent advancements which can be incorporated into InstaShare, if so desired. For a complete description of the use and operation of this product, the reader is directed to the InstaShare user and programmer manuals and particularly to the sections concerning the InstaShare File and Data Sharing system. These manuals will be available from EMC Corporation. Such manuals are hereby incorporated by reference in their entirety.