The invention relates to an apparatus and a method for recording and reproducing data including video data and audio data, and AV (audio video) server.
In recent years, following multi-channeling of providing information owing to the familiarization of CATV (cable television) or the like, a demand for reproducing a plurality of visual/audio data simultaneously from a single data recording and reproducing apparatus, unlike VTRs (video tape recorder), has become higher. In order to satisfy this demand, a data recording and reproducing apparatus called a video server which records and reproduces visual/audio data using a random accessible recording and reproducing medium such as a hard disk is coming into wide use.
In general, a video server used in, for example, a broadcasting station is required to have a high transfer rate of data to be sent for obtaining high qualities of visual and audio, and a large capacitance to record data for a long period of time. Therefore, it has been tried to acquire a higher transfer rate of data and a larger capacitance by using a data recording and reproducing apparatus comprising a plurality of hard disk (HD in the followings) drives which can perform accumulation of visual and audio data, and parallel processing. Further, it has been tried to record parity data so as to ensure the reliability even if any of the HD drives accidentally breaks down. As a result, even in a case where different numbers of channels are requested due to contents or broadcasting systems of programs provided by a broadcasting station, it is possible to implement a multi-channel video server applicable to a variety of usage patterns, for example, establishing NVOD (near video on demand) systems by separately recording a plurality of material data and transmitting them through multi-channel simultaneously, or by reproducing the identical material data with a time-lag through multi-channel.
In a data recording and reproducing apparatus used in such a multi-channel video server, RAID (Redundant Arrays of Inexpensive Disks) technique proposed in the article presented by Patterson and some others in 1988, is used. In the article, the RAID is classified into five: RAID-1 to RAID-5. The typical ones among them are the RAID-1, the RAID-3 and the RAID-5. The RAID-1 is a method of writing the same contents on two HDDs.
The RAID-3 is a method of recording input data on a plurality of HDDs by dividing the data at a specific length, while generating parity data and writing them on another HDD.
FIG. 11 is a block diagram showing an example configuration of a data recording and reproducing apparatus using the RAID-3. This data recording and reproducing apparatus 101 comprises: a plurality of hard disk drives (referred as HDDs in the followings) 1021 to 102N (N is an integer of 2 or more); an HDD 109 for recording parity data P as redundancy-code data; a data distributor 106 for generating a plurality of divided data by dividing input data DI at a specific length and for distributing each of the divided data to each of the HDDs 1021 to 102N; a parity generator 107 for generating parity data P from the divided data outputted from the data distributor 106; input memories 1041 to 104N for temporarily keeping each divided data outputted from the data distributor 106; input memory 108 for temporarily keeping the parity data P outputted from the parity generator 107; controllers 1031 to 103N and 110, which are respectively connected to the HDDs 1021 to 102N and 109, for controlling recording the data kept by the input memories 1041 to 104N and 108 on the HDDs 1021 to 102N and 109, and for controlling reproducing data from the HDDs 1021 to 102N and 109; output memories 1051 to 105N and 111 for temporarily keeping data read out from each of the HDDs 1021 to 102N and. 109; an error corrector 112 for restoring the divided data by detecting and correcting errors based on the data kept by the output memories 1051 to 105N and 111 and on error information which is described later; a data multiplier 113 for outputting the output data DO obtained by multiplying the output data from the error corrector 112; and a CPU 114 for controlling the whole apparatus.
Next, the data-writing operation of the data recording and reproducing apparatus 101 will be described. The input data DI is inputted to the data distributor 106, and a plurality of divided data are generated. Each of the divided data is distributed to be recorded on the input memories 104, to 104N. It is also inputted to the parity generator 107. At this time, data may be distributed in order in the following manner: provided that, for example, the data are arranged in data lines D1, D2, D3, D4, D5, . . . , with a unit of bit or byte, the data D1 is distributed to the first HDD 1021, the data D2 is distributed to the second HDD 1022, and so forth, and if the data DN is distributed to the last HDD 102N, data is distributed in order to the HDDs beginning at the first HDD 1021 again.
The parity generator 107 generates the parity data P based on the divided data outputted from the data distributor 106 and outputs it. The input memory 108 temporarily records the parity data P. Then, the controllers 1031 to 103N and 110 of the HDDs 1021, to 102N and respectively 109 read out the divided data and the parity data P from the input memories 1041 to 104N and 108 under the control of the CPU 114, and writes the data on the HDDs 1021 to 102N and 109 respectively.
Next, the data-reading-out operation of the data recording and reproducing apparatus 101 will be described. Each of the controllers 1031 to 103N and 110 reads out the divided data and the parity data P from the HDDs 1021 to 102N and 109 respectively, and writes them on the output memories 1051 to 105N and 111 respectively. At this time, if an error of data-reading-out operation (referred as reading-out error in the followings) occurs in the HDDs 1021 to 102N and 109, error information indicating that an error has occurred is sent to the controllers 1031 to 103N and 110 as status data from the control section in the HDDs 1021 to 102N and 109. Then, the error information is sent to the CPU 114 as error information Er1 to ErN and ErP from the controllers 1031 to 103N and 110.
Each of the data recorded on the output memories 1051 to 105N and 111 is synchronized and outputted to the error corrector 112. At this time, if the reading-out error occurs, error information showing that an error has occurred is sent to the error corrector 112 from the CPU 114. The error information includes information for identifying the HDD in which the reading-out error has occurred. The error corrector 112 restores the divided data based on the error information and the parity data P and outputs the divided data to the data multiplier 113. Data can be restored by the error corrector 112 only when reading-out errors have occurred in one HDD. When reading-out errors have occurred in a plurality of HDDs, the error corrector 112 can detect the errors but can not restore the data. The data multiplier 113 rearranges the divided data outputted from the error corrector 112 in the original data line and outputs it outside as the output data DO.
In contrast, in the RAID-5, a unit(block) of dividing data is made larger and one divided data is recorded as a data block on one HDD, while exclusive OR (parity data) of the data blocks which correspond to one another and are recorded on the HDDs is calculated and the result is recorded as a parity block on another HDD. The parity block is thus distributed to all the HDDs.
FIG. 12 is a block diagram showing a configuration example of a data recording and reproducing apparatus using the RAID-5. This data recording and reproducing apparatus 201 comprises: a plurality of HDDs 2021 to 202N (N is an integer of 2 or more) for recording input data; a parity generator-cum-error corrector 212 for generating parity data based on the input data DI and the data recorded on the HDDs 2021 to 202N, while restoring the data by correcting the error based on the data read out from each of the HDDs 2021 to 202N and the error information; input memories 2041 to 204N for temporarily keeping the output data of the parity generator-cum-error corrector 212; controllers 2031 to 203N, which are connected to the HDDs 2021 to 202N respectively, for controlling the operation of recording the data kept by the input memories 2041 to 204N on the HDDs 2021 to 202N and the operation of reproducing the data from the HDDs 2021 to 202N; output memories 2051 to 205N for temporarily keeping the data read out from the HDDs 2021 to 202N respectively; and a CPU 214 for controlling the whole apparatus. Further, if an error of data-reading-out operation occurs in any of the HDDs 2021 to 202N, error information indicating that an error has occurred is sent to controllers 2031 to 203N as status data from the control section in the HDDs 2021 to 202N. Further, the error information are sent to the CPU 214 as error information Er1 to ErN from the controllers 2031 to 203N.
Next, the data-writing operation of the data recording and reproducing apparatus 201 will be described. For example, if writing the data D onto an address A in the HDD 2021, the CPU 214 controls the controllers 2031 and 2032 to read out the recorded data D1 from the HDD 2021 and so as to read out the parity data P from the HDD 2022, provided that the parity data P corresponding to the data D is recorded on the HDD 2022. At this time, the parity generator-cum-error corrector 212 calculates exclusive OR of the data D1 and the parity data P, and restores parity data Px without the data D1. Then, the parity generator-cum-error corrector 212 calculates exclusive OR of the data D and the parity data P1 to obtain another parity data D2. The CPU 214 controls the controllers 2031 and 2032 to write the data D onto the HDD 2021 and to write the parity data P2 onto the HDD 2022.
Next, the data-reading-out operation of the data recording and reproducing apparatus 201 will be described. For example, if reading out the data D from the address A in the HDD 2022, the CPU 214 controls the controller 2031 to read out the data D from the HDD 2021. At this time, if the reading-out error does not occur, the CPU 214 controls the parity generator-cum-error corrector 212 to output the data D read out from the HDD 202, as the output data Do through the output memory 2051 and the parity generator-cum-error corrector 212. At this time, no particular processing is performed in the parity generator-cum-error corrector 212.
On the other hand, if data in the data recording and reproducing apparatus 201 is not read out normally, that is, if the data D can not be read out from the address A in the HDD 202, because of a defective sector or the like, the CPU 214 receives the error information Er1 from the controller 2031. In such a case, the CPU 214 reads out data from the corresponding addresses in other HDDs 2022 to 202N, send them to the parity generator-cum-error corrector 212 and controls the parity generator-cum-error corrector 212 to reproduce the data D based on the above-mentioned data to output them as the output data DO.
As described, if writing input data in the data recording and reproducing apparatus 201 using the RAID-5, the number of access increases since reading out and writing the data block, and reading out and writing the parity block are required to be performed. Further, if an error occurs if reading out data, the data is restored by reading out data from another HDD so that the number of access also increases. Accordingly, the data recording and reproducing apparatus 201 using the RAID-5 is fit for use in a random access processing of logical blocks of a specific size, but not in a processing which requires a real-time operation.
In contrast, in the data recording and reproducing apparatus 101 using the RAID-3, input data can be written by one access, and error-correction after reading out data can be immediately performed. Accordingly, the data recording and reproducing apparatus 101 using the RAID-3 is fit for use in a processing for recording and reproducing data with high-speed. Therefore, a data recording and reproducing apparatus using the RAID-3 is suitable for a device such as a multi-channel video server which requires a real-time operation.
In the above-mentioned data recording and reproducing apparatus using the RAID-3, however, data can be restored only when reading-out errors have occurred in one HDD. This causes a problem that the apparatus becomes incapable of error detection and correction of data if any of the HDDs breaks down.
Moreover, in the data recording and reproducing apparatus using the RAID-3, if one of the HDDs reads out invalid data without an reading-out error being generated, it is impossible to obtain information indicating which HDD has read out the invalid data, although it is possible to detect the error of data. This results in a problem that data can not be restored.
The data recording and reproducing apparatus using the RAID-3 has another problem that data can not be restored if two or more HDDs break down, although it is possible to detect the error of data.
Furthermore, when an error (referred as writing error in the followings) occurs during the writing operation in any of the HDDs in the data recording and reproducing apparatus, it is necessary to restore the data in which the writing error occurred. This data-restoring processing performed on part of the recording region of the recording medium (hard disk) is called a portion-rebuild processing (Portion Rebuild) in this application. On the other hand, if any of the HDDs in the data recording and reproducing apparatus is replaced, it is necessary to rebuild the original data on the new HDD. This data-restoring processing performed on the whole recording region of the recording medium (hard disk) is called a whole-rebuild processing (Whole Rebuild) in this application. In the followings, the data-restoring-processing operation in the data recording and reproducing apparatus 101 using the RAID-3 shown in FIG. 11 will be described.
First, the operation of the Portion Rebuild will be described. The CPU 114 stores the HDD and the address (sector) in which the writing errors have occurred, and the Portion Rebuild is performed on the HDD and the address. In the Portion Rebuild, first, the CPU 114 controls each of the controllers 1031 to 103N and 110 to perform the reading-out operation, appointing the address on which the Portion Rebuild is to be performed. In response, the controllers 1031 to 103N and 110 read out the data in the appointed address from the HDDs 1021 to 102N and 109 respectively. The read-out data is inputted to the error corrector 112 through the output memories 1051 to 105N and 111. At this time, the CPU 114 gives a command for the error corrector 112 not to use the data read out from the HDD on which the Portion Rebuild is to be performed. The error corrector 112 restores the divided data using the data outputted from the output memories 1051 to 105N and 111 except the data read out from the HDD on which the Portion Rebuild is to be performed, and outputs the divided data which has been restored to the data multiplier 113. The data multiplier 113 rearranges the divided data outputted from the error corrector 112 in the original data line, and outputs it as the output data DO. Next, under the control of the CPU 114, the output data DO is inputted from the data multiplier 113 to the data distributor 106. The restored divided data is written onto the HDD on which the Portion Rebuild is to be performed by performing the same writing operation as the writing operation of input data DI, and the Portion Rebuild is ended.
Next, the operation of the Whole Rebuild will be described. If a predetermined HDD is replaced and the superior device issues a command for the data recording and reproducing apparatus 101 to perform the Whole Rebuild, the CPU 114 makes the controllers 1031 to 103N and 110 perform the reading-out operation upon receiving the command. In response, the controllers 1031 to 103N and 110 read out data from the HDDs 1021 to 102N and 109 respectively. The read-out data is inputted to the error corrector 112 through the output memories 1051 to 105N and 111. At this time, the CPU 114 gives a command for the error corrector 112 not to use the data read out from the HDD on which the Whole Rebuild is to be performed. The error corrector 112 restores the divided data using the data outputted from the output memories 1051 to 105N and 111 except the data read out from the HDD on which the Whole Rebuild is to be performed, and outputs the divided data which has been restored to the data multiplier 113. The data multiplier 113 rearranges the restored divided data outputted from the error corrector 112 in the original data line, and outputs it as the output data DO. Next, under the control of the CPU 114, the output data DO is inputted from the data multiplier 113 to the data distributor 106. The restored divided data is written onto the HDD on which the Whole Rebuild is to be performed by performing the same writing operation as the writing operation of input data DI. The processing described above is performed on the whole recording region of the hard disk.
In both cases of the Portion Rebuild and the Whole Rebuild described above, however, data-restoring during the rebuild processing can be performed only if the correct data has been read out in all the HDDs except for the HDD on which the rebuild processing is to be performed. If a writing error occurs even in one of the other HDDs than the HDD on which the rebuild processing is to be performed, the error can be detected in the error corrector 112 but data can not be restored. That is, the data recording and reproducing apparatus 101 using the RAID-3 has a problem that data can not be restored if another trouble occurs since the apparatus is incapable of detecting errors and correcting of data during the rebuild processing.
In the meantime, a plurality of tracks are provided on a hard disk in a concentric circular pattern. A plurality of sectors, which are recording unit of data, are provided by dividing the tracks in a radial pattern. There may be a case where these sectors include sectors in which errors always occur at the time of writing or reading out data. Such sectors are called defective sectors. Defective sectors are considered to be in conditions in which reading out or writing data can not be correctly performed because of physical damage or the like. In case that there should be defective sectors, spare sectors may be provided on the hard disk so that data can be recorded on the spare sectors instead of the defective sectors if necessary. Such a spare sector is called a substitute sector. In the HDD having substitute sectors, the control section in the HDD includes a correspondence table showing the correspondence between logical sector numbers (LBA) and physical sector numbers so that the substitute sectors used instead of the defective sectors can be referred by the same sector number as the defective sectors from the superior devices. Accordingly, if there is a defective sector, a re-allotting processing (Reassign), in which the correspondence between LBA and the physical sector number in the recording region on the hard disk is changed, is to be performed.
The Reassign requires relatively long time (several seconds). Therefore, in general, it is not performed during the operation of a data recording and reproducing apparatus of the related art. In an apparatus which requires a real-time operation such as a multi-channel video server, however, it is desirable that the Reassign should be performed even during the operation of the apparatus, since stopping the operation of the apparatus because of the Reassign can be very inconvenient.
As described in the followings, the Reassign may also be performed during the operation of the data recording and reproducing apparatus. In the followings, the operation of the Reassign if performed during the operation of the data recording and reproducing apparatus 101 using the RAID-3 shown in FIG. 11 will be described.
If the superior device gives a command for the data recording and reproducing apparatus 101 to perform the Reassign, the CPU 114, upon receiving the command, suspends the writing operation and the reading-out operation on/from the HDD on which the Reassign is to be performed. Next, the CPU 114 makes the Reassign be performed on the HDD on which the Reassign is to be performed, appointing the sector on which the Reassign is to be performed. The Reassign is, as described, to change the correspondence between LBA and the physical sector number. If the CPU 114 receives a command to perform writing, during the Reassign, it makes the writing operation be performed on the HDDs except the HDD on which the Reassign is to be performed while storing the HDD on which the Reassign is to be performed and LBA onto which writing is to be performed according to the command. If the CPU 114 receives a command to perform reading-out, during the Reassign, it makes the reading-out operation be performed on the HDDs other than the HDD on which the Reassign is to be performed, and gives a command for the error corrector 112 to perform the error correction voiding the data from the HDD on which the Reassign is to be performed. If the Reassign is completed, the CPU 114 cancels the suspension of the writing operation and the reading-out operation onto/from the HDD on which the Reassign has been performed.
After the Reassign is performed as described, it is necessary to restore the data recorded on the HDD and the address on which the Reassign has been performed, and the data recorded on the HDD and the address which have been stored during the suspension of the writing operation in the process of the Reassign. This restoring processing of data is described above.
Accordingly, the data recording and reproducing apparatus 101 using the RAID-3 becomes incapable of detecting and correcting errors of data during the Reassign. It is, therefore, difficult to perform the Reassign during the operation of the apparatus without a considerable decrease in the reliability of the apparatus.
Incidentally, a method in which the RAID-5 is expanded is proposed as introduced in the document “The latest secondary storage :Disk array :by KIRENGAWA” (Information Processing, Vol.34, No.5, pp.642–651, Published in May, 1993). This is a method in which two parity blocks based on Read-Solomon coding are provided to be able to cope with troubles in two HDDs at most within a parity group.
In the method in which the RAID-5 is expanded, the number of the access increases compared to that of the RAID-5 when errors occur at the time of writing input data or reading out data. The method is, therefore, not suitable for a processing which requires the real-time operation any more than the RAID-5 is. Accordingly, it is difficult to use the above-mentioned method in which the RAID-5 is expanded in the apparatus which requires a real-time operation such as a multi-channel video server.
In addition, in a regular data recording and reproducing apparatus using a plurality of the HDDs, the data-restoring processing such as the Whole Rebuild or the Portion Rebuild as described is performed ifever necessary, and it is impossible to access the data recording and reproducing apparatus from outside during the restoring processing. This is a disadvantage of the apparatus which requires a real-time operation such as a multi-channel video server since stopping the operation of the apparatus because of the data-restoring processing can be very inconvenient.