Generally, after data is written into a storage device, a corresponding address is found according to a logical block address (LBA), and then the data is written into the corresponding address. If addresses of multiple input output (IO) commands are consecutive, the multiple input output commands are sequential IOs; otherwise, the multiple input output commands are random IOs.
When a hard disk drive (HDD) in the storage device needs to read a piece of data, a magnetic head first must move to a track where the data is located, and the magnetic head moves in a radial direction to a position above a track where a target sector is located. This period of time is referred to as seek time and is about 10 milliseconds on average. After a target track is found, the target sector is rotated to a position under the magnetic head by rotating a disk sheet. This period of time is referred to as a time of a rotational latency. For a hard disk of 7200 revolutions per minute, a time required for one revolution is about 8.33 milliseconds, and an average time of rotational latencies is about 4.17 milliseconds. A time for reading data from or writing data into a target sector is about a few milliseconds. For the sequential IOs, a seek and waiting are not required between the IOs, and therefore performance of the sequential IOs is relatively high; and for the random IOs, a seek and waiting are required for each IO and a time for a seek and waiting is much longer than a time for reading or writing data, and therefore performance of the random IOs is very poor.
Generally, redundant array of independent disks (RAID) protection is used for the storage device. For a random write IO, RAID5 and RAID6 both have a write penalty, which seriously affects performance; for a sequential write IO, a write penalty is very small. 8-disk RAID5 is used as an example to describe impact of a write penalty on write performance. The 8-disk RAID5 consists of seven data disks (D) and one parity disk (P). If a small IO is randomly written, in a best case, original parity data must be read into in a memory, and new data and the original parity data are checked again to generate new parity data, then the new data and the new parity data are written into a disk. Each host IO triggers a disk IO operation for at least three times; therefore, a write is increased by 3 times and performance of an entire system deteriorates by ⅔. During a sequential write, after seven host IOs are received, the seven host IOs are checked together to generate new parity data and then the seven IOs and the new parity data are together written into a disk. Every seven host IOs trigger a disk operation for eight times, and therefore a write is increased by 8/7 times and performance deteriorates slightly.
A redirect-on-write is a full sequential write, which not only can solve a problem of poor performance of an HDD random write, but also can solve a problem of a RAID write penalty; therefore, in a case of processing a write IO, a redirect-on-write function is introduced. A physical space of a storage device is divided into a valid data space and a redundant space. The valid data space stores data that has been written. A mapping is established between a logical address of a write IO and a physical address of the valid data space. After a random write IO is delivered to a logical address, an original physical location in the valid data space is not overwritten; instead, a segment of space is allocated from the redundant space so that multiple random write IOs are sequentially written into consecutive redundant spaces; and then a mapping between the logical address and a new physical address is recorded by using a mapping table, and the original valid data physical space that has been written becomes garbage. In this way, a random write can be transformed into a full stripe sequential write, which improves write performance of the storage device.
The redirect-on-write may create garbage in the original physical space. When the redundant space is less than a specific degree, garbage collection must be enabled; otherwise, once the redundant space is used up, a new write IO cannot be processed because no redundant space is allocated. To facilitate garbage collection, the physical space of the storage device is generally divided into multiple segments according to a specific size. In a RAID scenario, a segment refers to an entire stripe; in a solid state disk (SSD) scenario, a segment refers to an erasable block; and in another scenario, a segment refers to a segment of consecutive spaces. After a system runs for a period of time, garbage may be created in each segment, and a ratio of a garbage quantity to a quantity of blocks in a segment is referred to as a garbage ratio in the segment. During garbage collection, generally, a segment in which a garbage ratio is higher is found first, and remaining valid data in this segment is migrated to a redundant space; and after the valid data in this segment is migrated, this segment may be reclaimed so that the segment becomes a redundant space for re-allocation.
In an SSD, a block consists of a page that is a minimum write unit. In the SSD, a block is a segment in which garbage data collection is performed. Before data is written into the SSD, erasing needs to be performed first, where a unit of the erasing is a block; therefore, a redirect-on-write is also used inside the SSD and garbage collection also exists. An operation process is exactly the same as the above-mentioned redirect-on-write except that a step of an erasing operation is added before the collection.
In general, redirect-on-write-based garbage collection of a storage device generally uses a garbage ratio as a collection condition. After a redundant space is insufficient, collection is performed, according value of a garbage ratio, on a segment in which a garbage ratio is higher. Valid data needs to be migrated in a garbage collection process, where the valid data needs to be read out for migration and then the valid data is written into a new address. A read IO and a write IO that are generated in this process belong to a part of a redirect-on-write function; therefore, a new IO needs to be generated in a migration process and the new IO occupies an IO resource and bandwidth of the storage device. In a common storage device, IO resources and bandwidth are limited; a lower garbage ratio in a segment indicates a greater volume of valid data that needs to be migrated in the segment and more IO resources and a higher bandwidth capability of the storage device that are occupied, which greatly affects storage device performance. In the prior art, basically, each segment is reclaimed immediately after a garbage ratio reaches a preset value, which causes low efficiency of garbage data collection.