The present invention relates to data transfer in a hard disk drive system, and more specifically to the dynamic adjustment of buffer utilization ratios within the hard disk drive and monitoring thereof.
Conventional hard disk drives generally include two individual data transfer engines configured to cooperatively move data into and out of a hard disk drive storage medium, as shown in FIG. 1. The first of the two engines, typically referred to as the drive side engine 101, is generally responsible for transferring data between a memory buffer 102, which may be a bank of dynamic random access memory (DRAM), and the magnetic media 100 of the hard disk drive. The second of the two engines, typically referred to as the host side engine 103, is responsible for transferring data between the memory buffer 102 and a host interface 104. The host interface 104 may be, for example, an advanced technology attachment interface (ATA), a small computer systems interface (SCSI and/or scuzzy), fiber channel arbitrated loop (FC-AL), and/or another known interface configurations. The first and second engines generally operate independently of each other, but often operate to transfer data into and out of the memory buffer 102 simultaneously. Additionally, the first and second engines often operate at different data transfer speeds, as host-type interfaces often operate in the 1 to 2 gigabit per second (Gbps) range, while the interface between a hard disk drive and a memory are traditionally much slower, generally in the range of 20 to 60 megabytes per second (Mb/s).
In an operation to read data from the hard disk drive, for example, when a device requests information residing on the hard disk drive, the drive side engine 101 generally operates to transfer the requested data from the storage medium 100 of the hard disk drive to the memory buffer 102. After a predetermined period of time has passed, the host engine 103 will generally begin moving data transferred to the memory buffer 102 by the drive side engine 101 to the host interface 104 for distribution to the device requesting the data from the hard disk drive. It is important that the host side wait before initiating data transfer, as the host side is generally capable of transferring data at a substantially faster rate. Therefore, the host is capable of rapidly catching up to the drive side, which results in loop performance delays due to re-arbitration, as the host side engine must then be temporarily disabled in order to allow the drive side transfer more data for the host side to process/transfer. After the drive side initiates data transfer, it will eventually complete the transfer of the requested information from the medium of the hard disk drive to the memory buffer. At some time after drive side engine initiates data transfer, host side engine starts transfer and eventually completes transfer of the requested data from the memory buffer to the host interface. Once the host side engine completes the transfer of data from the memory buffer to the host interface, the data transfer process for that particular read operation is generally complete. However, in a typical hard disk drive configuration, there are generally multiple individual chunks of data transferred in order to complete a single transfer command, and therefore, the host side may regularly catch up to the drive side at the end of each data segment transfer. These end of segment-type catch-up conditions may generally be referred to as desired catch-up conditions, and are expected to continue until the segments are collectively transferred, thus completing the individual transfer command.
A similar operation is conducted for writing data to the hard disk drive, however, the data flow and respective engine handling is essentially reversed. Therefore, when a device is to write data to the hard disk drive, the host side engine generally begins to transfer the portion of data to the memory buffer from the host interface, for example, a segment of data. The memory buffer will begin to fill up with the data to be written, and therefore, at some predetermined time thereafter, which is generally as quickly as possible, the drive side engine begins to transfer data into the drive storage medium for storage thereon. Both engines may simultaneously transfer data to and from the memory buffer until the data is completely transferred to the hard disk drive. This simultaneous transfer operation generally occurs in segments or blocks, in similar fashion to the above noted read operation. However, drive side catch-up conditions are generally much less frequent than host side catch-up conditions, as the performance penalty associated with a drive side catch-up is substantially greater than a host side, and is therefore to be avoided. In this configuration the host side engine generally completes data transfer operations prior to the data side engine.
However, since the drive and host side engines generally operate at different data transfer rates, one engine may xe2x80x9ccatch-upxe2x80x9d to the other engine during a data transfer operation, irrespective of the direction of the data transfer. In this situation, the transfer operations of engine that has caught up must be halted, and the engine must wait until the other engine has transferred additional data, i.e., caught up, before the halted engine can reinitiate and continue its own data transfer operations. If the host side engine catches up to the drive side engine, then the catch-up condition is generally referred to as a host catch-up. Alternatively, if the drive side engine catches up to the host side engine, then the catch up condition is generally referred to as a drive catch-up. Both of these conditions are detrimental to the efficiency and performance of the hard disk drive and the surrounding components/devices, as each time a catch-up event occurs, an efficiency/performance penalty is incurred, as the respective engine is halted while the software intervenes to calculate when the engine may be subsequently restarted.
On hard disk drives in particular, drive catch-up conditions have a substantial performance penalty, as it requires one complete revolution of the hard disk storage medium before access to the storage medium may be reinitiated at the same location at which the previous data read/write was stopped. For example, on a 10,000 revolution per minute disk drive, the timing penalty for waiting for the drive medium to complete a single revolution to return to the point on the drive at which the drive medium was halted would be at least 6 milliseconds. Although host catch-up penalties are typically smaller than drive catch-up penalties and depend primarily upon the specific type of interface used, host catch-up penalties nevertheless also contribute to decreased system performance. In a fiber channel arbitrated loop configuration (FC-AL), for example, the halt/wait time penalty generally amounts to the time required to re-arbitrate for the loop. However, on large loops or public loops, the wait time penalty can be significantly increased and become a substantial factor in decreased system performance. Both types of catch-up conditions generally require software intervention to halt and/or reinitiate the respective transfer engine. As a result thereof, both catch-up conditions require allocation of valuable processor cycles, which reduces the number of processor cycles available for other devices and tasks, such as, for example, command reordering.
In view of the performance degradation resulting from catch-up conditions, it is desirable to have a logical structure and/or controlling-type software for hard disk drives that is configured to avoid catch-up conditions and to optimize the host side engine usage so as to reduce the number of times it must be re-started. Some conventional scuzzy-type (SCSI) devices attempt to accomplish this task via allowing users selective control over when the host side engine initiates data transfer. This selective control is generally based upon timing of the host engine""s initialization of data transfer with respect to the drive side engine. This timing is generally based upon the size of the intermediately positioned memory buffer and the transfer speeds of the respective engines. In particular, conventional devices may allow users to set the Read Full Ratio in Mode Page 2 for read commands. This ratio generally represents a fraction that indicates how full the drive buffer should be before host data starts getting transferred out of the buffer, Le., 40% or 80%, for example. There is also a corresponding Write Empty Ratio parameter, which represents how empty the buffer should be before the drive engine should request more data to be written thereto, that can be specified for write commands. These are fixed ratios that a sophisticated customer may be able to use in order to maximize loop performance for case specific tasks under very specific conditions. However, the manipulation of these parameters requires that the user have substantial understanding of the respective system and that the respective system has a predictable and relatively constant loop response. However, if system conditions change, as they often do, then the fixed ratios are no longer appropriate and must be recalculated by the user, which may be a substantial task. As an alternative to manually manipulating these parameters, the user may allow the hard disk drive to determine when to start the host side engine in reference to the drive side engine by setting one or both of the Read Full Ratio and Write Empty Ratio to zero. This is generally referred to in the art as using an xe2x80x9cadaptive ratio,xe2x80x9d which indicates that a consistent value is used to adjust the engine start times. This value remains constant during operation and is not adjusted for system changes.
For example, an SCSI interface utilizes an inter-locked bus design that allows for a relatively high degree of predictability on data transfers. In particular, once a device on an SCSI interface arbitrates and gains control of the bus, data may be instantaneously transferred from one device to the other device. Therefore, generally the only variable that needs to be considered when calculating the optimal time to start the host engine on a transfer, e.g., the adaptive ratios, aside from the respective engine speeds, is the amount of time it takes to gain control of the bus. Therefore, using a worst case bus workload scenario, the amount of time required to gain control of the bus can be calculated and used to represent all other workload cases. This amount of time is relatively constant and with minimal padding can be set so as to generally avoid a drive catch-up condition, while also minimizing the number of host catch-ups conditions. Since the calculated worst-case time to gain control of the bus generally remains constant for writes or reads and generally does not vary from system to system, this approach is generally effective for SCSI based devices.
Alternatively, FC-AL interfaces have a number of variables that contribute to the calculation of the adaptive ratio. As such, FC-AL interfaces are substantially less predictable than SCSI interfaces. For example, on an FC-AL loop, the ability to arbitrate for control of the loop generally depends upon factors such as the loop traffic and the number of ports present on the loop. Therefore, on a busy loop with a large number of ports, the delay required to arbitrate for control of the bus could easily be several milliseconds. Additionally, in an FC-AL configuration data is not instantaneously transferred between devices on a loop, as there is some finite delay between the time when one device sends data and another device actually receives the data. This delay generally increases as the loop size grows, and therefore, increases substantially if there is an interstitially positioned fabric. Furthermore, FC-AL includes unique handling procedures for write data, as the drive sends a Transfer Ready frame when it is ready to begin receiving write data frames. The drive, however, has no control over when the receiver of the Transfer Ready frame will turn around and begin sending these data frames. This turn around time varies from adaptor to adaptor and from system to system, and therefore, further contributes to making it increasingly difficult to calculate the adaptive ratios for an FC-AL type system.
Another major problem in calculating the adaptive ratios is the fact that the data transferred by both the drive engine and the host engine is not a perfectly linear function. If the drive and host transfers were linear functions, the system would be quite predictable and calculating an optimal buffer ration would be simplified. However, both transfers consist of a combination of linear and step functions which complicates the problem.
The drive engine transfers data into (or out of) the buffer in a nearly linear fashion until it reaches the end of a track. At that point, a track switch occurs which injects a step function delay into the drive data transfer. During the track switch, no data is actively being transferred into (or out of) the buffer from the drive engine. This delay is quite significant and can require up to one-third of a revolution to complete. However, the track switch delay is known and fixed. Assuming no servo or drive errors, the drive data transfer function consistently behaves according to a function similar to the one shown in FIG. 2.
Similarly, the host transfer consists of a combination of a step function and a linear function. The host engine encounters a step function as it is attempting to arbitrate the loop. Once it has control of the ioop, the data transfer typically behaves nearly linearly. However, for the host transfer, the problem exists in that the step function delay varies depending on a number of factors outlined below.
The first factor is the loop traffic. As loop traffic increases, the delay to win arbitration also increases. A second factor is physical loop size and topology. FC-AL allows up to 128 ports on a loop. Each port inherently injects some propagation delay into the loop. As the number of ports increases, so does the total loop propagation delay and hence the amount of time to win arbitration. A third factor is the workload type. In SCSI-FCP, reads and writes behave quite differently. A drive only needs to win arbitration once on reads to begin a host data transfer. On writes, a drive must first win arbitration and send a XFER_RDY frame to the initiator indicating how much data the drive is ready to receive. Typically the drive closes the connection at this point. Then, the initiator must win arbitration before the host data transfer can begin. As a result of this extra step, writes incur an additional delay over read commands before the data transfer actually begins. A fourth factor is the type of host system. Some host systems are faster and have larger buffers than others, which results in a variation in system behavior. The amount of time required to process an XFER_RDY frame and begin sending write data varies from system to system. Some systems can keep up with sequential read data streaming from the drive. Others will CLS the connection or temporarily withhold credit when they have exhausted their resources. Other system variations that can affect delays include command turnaround time, queue depth, and transfer length. These system variations translate to delays that the drive firmware must account for to efficiently complete host transfers. The fifth and final factor is host transfer speed. For example, arbitration delays on a 1 Gbps loop will inherently be twice as long as on a 2 Gbps loop. FIG. 3 illustrates how these variations in the arbitration delay can affect the host transfer.
This variation in host delays does affect the frequency of host catch-ups and drive catch-ups. How well this variation is accounted for in calculating buffer ratios is directly attributable to loop efficiency and overall drive performance. Previous designs used a simple table of constants (indexed by zone) to determine how much pad was needed to account for the arbitration delays. Such a static design has no ability to account for any variations in arbitration delays. It can be manually tuned to be efficient for one system and one workload. Moving the same drive to a different system or changing the workload can result in poor performance.
Co-pending application Ser. No. 09/818,161, filed Mar. 23, 2001 entitled xe2x80x9cMethod for Dynamically Adjusting Buffer Utilization Ratios in a Hard Disk Drive Systemxe2x80x9d, hereinafter incorporated by reference, provides on-the-fly tuning to account for these variations. This uses a pad to account for the delays encountered in the host transfer. The pad is adjusted to attempt to account for changes in host transfer delays and to maximize loop utilization efficiency. The larger the pad, the sooner the host side is started in reference to the drive side transfer, resulting in a smaller probability that a drive catch-up will occur. However, the larger the pad, the greater the number of host catch-ups that are needed to complete a given transfer. More host catch-ups result in more loop tenancies, which leads to more loop overhead and reduced system performance. The smaller the pad becomes, the fewer the number of host transfers that are required to complete a given transfer. However as the pad size is decreased, the risk of incurring a drive catch-up increases.
One of the pad adjustment mechanisms outlined in application Ser. No. 09/818,161 compares the actual amount of host data transferred since the previous host catch-up to a fixed threshold (i.e., goal) based on the drive""s segment size. If the actual amount of data transferred exceeds the goal, no pad adjustments are made. If the amount of actual data transferred does not exceed the goal, the pad size is decreased by a predetermined, fixed amount to improve the probability that the host side exceeds the goal during the next transfer.
It would be desirable to provide a dynamic, more realistic goal to provide increased accuracy and adaptability in reaching an optimal pad setting. Rather than relying solely on a single fixed parameter (i.e., a drive""s segment size), the goal would dynamically adjust, based on a number of factors including: drive transfer speed, host transfer speed, and track switch locations.
The present invention provides a method for dynamically adjusting buffer utilization ratios for a hard disk drive system. The method establishes and dynamically adjusts a host transfer goal, which targets the amount of data transferred between host catch-up conditions for a current command. The actual amount of data transferred between host catch-up conditions is compared against the host transfer goal, and the buffer utilization ratios are adjusted when the actual amount of transferred data does not exceed the transfer goal.
The host transfer goal is determined by a number of operational characteristics, including drive transfer speed, host transfer speed, and track switch locations. As these characteristics change during operation, the host transfer goal is adjusted accordingly, and the data transfer rate is optimized.
In one embodiment, the data transfer goal is established by first initializing a base block count for the current transfer operation, by analyzing both the host and drive transfer speeds. Next, the method iteratively calculates how many additional drive blocks are transferred while the host side is transferring. A data transfer goal is then established from the base block count and the additional drive blocks that are transferred while the host side is transferring. Finally, if a track switch occurs during the host side transfer, the data transfer goal is adjusted accordingly.
If the host side transfer cannot be completed in a single operation, the base block count is initialized to the size of the buffer minus the size of the pad. The pad is defined as an optimal point in time where the host engine should start/restart transferring read data so that the drive side engine does not stall and/or enter into a catch-up condition during the transfer of data into and out of the buffer. Otherwise, if the host side transfer can be completed in a single operation, the base block count is initialized to the remaining blocks in the transfer operation minus the number of drive blocks that will be transferred in the amount of time it takes the host side to complete the transfer for the current command.
The number of blocks the drive side transfers while the base block count is being transferred by the host is computed by multiplying the base block count by the drive data rate inverse, then dividing the product by the host data rate inverse.