This invention relates to data storage on disk drives and more particularly, to a method and apparatus for retrieving data records stored on a storage medium utilizing a data record locator index stored in memory.
Large disk storage systems like the 3380 and 3390 direct access storage devices (DASD) systems employed with many IBM mainframe computer systems are implemented utilizing many disk drives. These disk drives are specially made to implement a count, key, and data (CKD) record format on the disk drives. Disk drives utilizing the CKD format have a special xe2x80x9caddress markxe2x80x9d on each track signifying the beginning of a record on the track. After the address mark comes the three part record beginning with the xe2x80x9cCOUNTxe2x80x9d which serves as the record ID and also indicates the lengths of both the optional key and the data portions of the record, followed by the optional xe2x80x9cKEYxe2x80x9d portion, which in turn is followed by the xe2x80x9cDATAxe2x80x9d portion of the record.
Although this format gives the system and user some flexibility and freedom in the usage of the disk drive, this flexibility forces the user to use more complicated computer programs for handling and searching data on the disk. Since the disk drive track has no physical position indicator, the disk drive controller has no idea of the data which is positioned under the read/write head at any given instant in time. Thus, before data can be read from or written to the disk drive, a search for the record must be performed by sequentially reading all the record ID""s contained in the count field of all the records on a track until a match is found. In such a search, each record is sequentially searched until a matching ID is found. Even if cache memory is used, all the records to be searched must first be read into the cache before being searched. Since searching for the record takes much longer than actual data transfer, the disk storage system spends a tremendous amount of time searching for data which drastically reduces system performance.
Disk drives employing what is known as a Fixed Block Architecture (FBA) are widely available in small, high capacity packages. These drives, by virtue of their architecture, tend to be of higher performance than drives employing a CKD format. Such FBA drives are available, for example, from Fujitsu as 5.25xe2x80x3 drives with 1 gigabyte or greater capacity.
The distinct advantage of utilizing many small disk drives is the ability to form a disk array. Thus a large storage capacity can be provided in a reduced amount of space, and storage redundancy can be provided in a cost effective manner. A serious problem arises, however, when trying to do a xe2x80x9csimplexe2x80x9d conversion of data from CKD formatted disks to FBA disks. Two schemes for such a conversion have been considered which do not provide an acceptable solution to the conversion problem. The first of such schemes involves placing every field i.e. Count, Key and. Data, of the CKD formatted record into a separate block on the FBA disk drive. Although this scheme does not waste valuable disk space when CKD formatted records contain large amounts of data, the xe2x80x9cCountxe2x80x9d field which is very short (8 bytes) occupies an entire block which is typically at least 512 bytes. For example, a. CKD formatted record containing 47K bytes of data could be converted to 95 blocks of FBA disk, 512 bytes in length. In such a conversion, one block would be used to store the count of the record while 94 blocks (47K bytes length of data divided 512 bytes of FBA disk block) would be used to store data, for a total of 95 blocks. However, search time for finding the desired record is still a problem since all the records must be sequentially searched.
For records having very short data lengths such as eight bytes, however, one full track, or 94 CKD formatted data records would need 188 blocks on the FBA disk: 94 blocks for the count portion of the records and 94 blocks for the data portion of the records, even though each data record may only occupy 8 bytes of a 512 byte FBA block. Such a scheme may thus waste nearly 50% of the disk space on an FBA disk drive.
The second scheme for converting data from CKD to FBA drives involves starting each CKD record in a separate block and then writing the complete record in sequential blocks. Utilizing such a scheme, the first FBA block will contain the xe2x80x9ccountxe2x80x9d portion of the record as well as the optional key portion and the start of the data portion of the record. This scheme, however, produces serious system performance degradation when data must be written to the disk, since before writing data to the disk, the entire record must first be read into memory, modified, and subsequently written back to the disk drive. Such a loss in system performance is generally unacceptable.
This invention features an apparatus and method for retrieving one or more requested data records stored on a storage medium by searching for a data record identifier and associated data record locator index stored in high speed semiconductor memory. The apparatus receives one or more data records, each of the data records including at least a record identification portion and a data portion. The apparatus transfers and stores the data records to one or more data storage mediums. As the records are transferred to the data storage medium, the apparatus of the present invention generates a plurality of record locator indices, each of the record locator indices corresponding to one of the plurality of data records, for uniquely identifying the location of each of the data records stored on the storage medium.
The apparatus further includes high speed semiconductor memory for storing at least the plurality of record locator indices and the associated plurality of record identification portions. Upon receiving a request for one or more data records stored on the storage mediums, the apparatus of the present invention searches the high speed semiconductor memory utilizing the data record identification portion and locates the corresponding record locator index associated with the requested data record. The apparatus then directly retrieves the data record from the storage medium using the record locator index located during the search of semiconductor memory.
In the preferred embodiment, the data records are received in CKD format and stored on an FBA formatted disk drive. The record identification portions and associated record locator indices are combined to form one record locator table stored in one or more blocks of the FBA formatted disk drive and also copied in the high speed semiconductor memory.
A method for retrieving one or more requested data records stored on a storage medium is disclosed utilizing a data record locator index stored in memory and includes the steps of receiving a plurality of data records, each record including at least a record identification portion and the data portion, and transferring and storing the data records to one or more storage mediums. The method also includes generating a plurality of record locator indices, each of which are associated with one of the plurality of data records and uniquely identify the location of the each of the plurality of data records stored on the storage medium. Also included are the steps of storing at least a plurality of record locator indices and the associated plurality of record identification portions in memory. In response to a request for access to one or more of the plurality of data records, the method includes searching the memory, locating one or more data record identification portions and associated record locator indices corresponding to the one or more requested data records, and directly retrieving from the storage medium the requested data records as directed by the record locator indices.
In one embodiment, the method of the present invention includes transforming and encoding CKD formatted data records onto one or more FBA disk drives. Also in the preferred embodiment, the step of storing the data record to one or more storage mediums includes storing the data to one or more directly addressable storage mediums, the step of storing further including the steps of transforming and encoding at least the record identification portion of each of the data records, generating a plurality of record locator indices, and combining the transformed and encoded record locator indices and record identification portions, for forming a record locator table stored in a high speed semiconductor memory.