This invention provides an improved data reader suitable for a data storage device, which may be a tape drive arranged to receive data from a computer, or the like. The invention also provides related methods and devices incorporating such a reader,
An example of a data storage device is the tape drive, which receive user data from computers, particularly, but not exclusively to back-up the user data held on the computer onto a data-holding medium. In such back-up applications it is of prime importance that the user data is retrievable, since generally, this copy is the back-up copy that will only be required if the original copy has been lost or damaged. Therefore, there is an ongoing need to ensure that backup data storage devices are as robust and secure as possible.
Once user data has been stored on the data-holding medium it can be held there for long periods. To recover the user data from the data-holding medium the data storage device must read the data-holding medium and regenerate the user data originally stored there. In some devices the user data backed-up on the dataholding medium accounts for only roughly 80% of the overall information held on the data-holding medium. The remaining roughly 20% of the information is header, error correction information that attempts to make the user data as secure as possible.
Therefore, in order to read the user data the storage device must accurately detect the user data within all of the information held on the data-holding medium. In view of the amount of information other than user data that is held on the data-holding medium, this can be problematic,
It is known to provide markers, sometimes referred to as data separator fields (DSS fields), that identify when significant occurrences are about to happen within the information that is held on the data-holding medium. For example, it is known to provide a marker before a set of user data occurs on the data-holding medium. It is also known to provide non-user data, including header information that specifies the contents of portions of the user data.
It is an object of the present invention to provide a data reader suitable for a data storage device that addresses the problems discussed above.
According to a first aspect of the invention there is provided a data reader arranged to read a data-holding medium containing data comprising both user and non-user data, set data being held in at least one set, and each said set being arranged into datasets, said non-user data holding information relating to said user data and being interspersed therewith, said data reader comprising at least one read head arranged to read said data-holding medium and generate a data signal comprising user data and non-user data, said non-user data being arranged to identify said user data within said sets, processing circuitry being arranged to receive and process said data signal and obtain said user data from said data signal using said non-user data to identify said user data within said data signal.
An advantage of such an apparatus is that it does not rely on a marker stored on the data-holding medium to identify the start of user data. Prior art data storage devices have relied on detecting this marker to identify that user data is about to occur. A problem with detecting the marker in this manner is that if the marker is not detected, or data is interpreted as the marker, then data can be lost.
Relying on the information held in the non-user data is advantageous because it can provide a more robust approach and therefore, is it less likely that data will be lost.
The sets of data may themselves be arranged into larger groupings, or datasets. The sets may be arranged on the data-holding medium such that the datasets may overlap one another. The processing circuitry may be arranged to occupy a state reflecting, whether or not the sets of data being read by the reader must be from the same dataset, or whether the sets are possibly from a plurality of datasets. I.e. the data is in an overlap zone in which datasets can overlap one another. An advantage of causing the processing circuitry to occupy such state is that it provides a convenient way of noting the nature of the data being read from the data-holding medium. The processing circuitry may be arranged to occupy a state in any one or more of the following manners: having a flag set, having a state machine in which the occupied state varies, setting a register, altering a memory location, etc.
In the preferred embodiment only two datasets can overlap one another, and therefore, two state machines are provided: one state machine provided in relation to each of the datasets.
Further, when data is written to the data-holding medium it may overwrite data already in existence on the data-holding medium. Data that overwrites existing data in this manner is generally written at the end of a dataset that exists on the data-holding medium. However, there may be a latency between a dataset finishing on the data-holding medium and the stat of the over written data, which can result in a portion of data from a dataset that should have been over written remaining. Therefore, at the end of each dataset on the data-holding medium there may be an overwrite zone in which a first dataset can finish and data from a second dataset can start before the first dataset has finished (due to the second dataset having overwritten the first). The processing circuitry may be arranged to occupy a state reflecting whether or not data being read from a data-holding medium is in an overwrite zone.
Further, the processing circuitry may be arranged to occupy a state arranged to reflect when data being read from a data-holding medium is beyond an overwrite zone.
A zone detector may be provided to interpret the non-user data and determine whether the user data must be from the same dataset, or could possibly be from a plurality of datasets. Preferable, the zone detector is arranged so that it controls the state of the processing circuitry.
The reader may comprise a plurality of read-heads, each of which is arranged to read a separate channel of data, preferably in parallel with one another. In the preferred embodiment the reader comprises 8 read heads, although the reader could comprise any number of read heads. For example the reader may comprise 2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13, 14, or more read heads. An advantage of providing more than one read head in this manner is that the rate at which data can be read from the data-holding medium is increased.
Conveniently, the reader comprises a controller that is arranged to determine whether user data read from the data-holding medium should be accepted and stored in a memory of the reader. Preferably, the controller includes the at least one state machine.
According to a second aspect of the invention there is provided a data storage device incorporating a data reader according to the first aspect of the invention.
In the preferred embodiment the data storage device is a tape drive. Such a tape drive may be arranged to read data held in any of the following formats LTO (Linear Tape Open), DAT, (Digital Audio Tape), DLT (Digital Linear Tape), DDS (Digital Data Storage), or any other format. Although in the preferred embodiment the tape is LTO format.
Alternatively, the data storage device may be my one of he following: CDROM drive, DVD ROM/RAM drive, magneto optical storage device, hard drive, floppy drive, or any other form of storage device suitable for storing digital data.
According to a third aspect of the invention there is provided a method of reading data from a data-holding medium containing user data held in a plurality of sets and interspersed with non-user data, said non-user data-holding information relating to said user data, the method comprising reading said non-user data to identify said user data within said sets and obtain said user data from said data-holding medium said method further comprising arranging said sets of user data into datasets, the identity of which are provided by the non-user data, and monitoring the non-user data to ascertain the identity of the dataset being read from the data-holding medium.
An advantage of such a method is that it does not rely on detecting markers as has previously been performed. In such marker detecting methods, user data can be lost if the marker is missed by the reader, the marker becomes damaged and is therefore not intelligible, or data is mis-interpreted as a marker.
Further, it is convenient to ascertain the identity of the dataset in addition to determining the identity of the sets of data within a dataset in order that it can be ascertained that sets of data being read from the data-holding medium belong to the same dataset.
It is possible that the datasets can overlap one another on the data-holding medium, so that at least one set of user data from a first dataset can occur in a region corresponding to a second dataset. The method may comprise determining whether data being read from the data-holding medium may be in a zone corresponding to where data may be from a plurality of datasets (an overlap zone).
The method may comprise monitoring the identity of the datasets being read from the data-holding medium and determining if more than two datasets have occurred within the overlap zone. It is possible for data from two datasets to occur within the overlap zone, this occurs if at least one set of data is re-written after writing of a second dataset has started. However, if a third dataset occurs in this region it is likely that an error occurs and therefore, it is advantageous to monitor the data-holding medium for such an occurrence.
If a third dataset is detected the method may comprise rejecting sets of data read from the data-holding medium that occur from the third or higher dataset that occurs within the overlap zone.
Alternatively, or additionally, the method may comprise rejecting sets of data from earlier datasets read from the data-holding medium within the overlap zone if more than two datasets occur within the overlap zone.
The method may be user configurable to allow either rejection of earlier datasets (and consequently acceptance of later datasets), or rejection of later datasets (and consequently acceptance of earlier datasets) if a third dataset is detected.
The method may comprise using the non-user data to determine when the end of a dataset has occurred. Determining when the end of the dataset has occurred may itself comprise timing from the end of the last set of user data within a dataset to ensure that no re-writes of the last or any other set of user data from that dataset are present on the data-holding medium. When data is written to the data holding medium any data that is written in error will be re-written. Due to latencies within the writing apparatus used to rite data to the data-holding medium such re-writes may well occur after the last set of data within a dataset.
Therefore, the end of the dataset will not occur until any rewrites have been read, It is advantageous to time from the last set of data within a dataset since such rewrites should occur within a predetermined period; once his period has expired no further re-writes should occur. Prior methods have relied upon detecting markers on the data-holding medium, which can be problematic if the marker is corrupted in any manner, or missed.
Conveniently the method comprises using the non-user data to determine if any of the sets of data from a dataset have been re-written and restarting the timing if any re-writes are detected once the last set of user data within a dataset has been read. Such a method is advantageous because if any of the sets of data have been re-written, these re-writes may themselves be re-written, which extends the period in which re-writes occur.
Preferably the method comprises asserting that data being read from the data-holding medium is in an exclusive zone, such that data should only occur from a single dataset, once the timing has reached a predetermined value.
Preferably, the method comprises using the non-user data to determine if any of the sets of data from a dataset have been written a plurality of times to the data-holding medium during writing of the data to the data-holding medium. Once it is determined that a set of data has been written a plurality of times the method may reject earlier sets of data from a dataset, read from the data-holding medium, in favour of a later received substantially identical set of data from a dataset. It will be appreciated if a set of data is re-written to the data-holding medium during writing thereto that although the original write of the dataset and subsequent re-writes should be the same when read back from the medium, they are likely to be slightly different due to the errors that caused the data to be re-written.
Alternatively, or additionally, the method may comprise combining an earlier set of data from a dataset, read from the data-holding medium, with at least one later received substantially identical set of data from a dataset. This is advantageous because it may allow a complete uncorrupted set of data to be reconstructed from a plurality of corrupted sets of data. The method may provide for selection of whether earlier sets of data are discarded, or combined with later ones.
Further, when data is written to the data-holding medium it may overwrite data already in existence on the data-holding medium. Data that overwrites existing data in this manner is generally written at the end of a dataset at exists on the data-holding medium. However, there may be a latency between a dataset finishing on the data-holding medium and the start of the overwritten data, which can result in a portion of data from a dataset that should have been over written remaining. Therefore, at the end of each dataset on the data-holding medium there may be an overwrite zone in which a first dataset can finish and data from an second dataset can start before the first dataset has finished (due to the second dataset having overwritten the first). The method may comprise detecting whether or not data being read from the data-holding medium exists in an overwrite zone.
Conveniently, the method comprises monitoring the non-user data to determine whether sets of data being read from the data-holding medium were written in the same pass. This is advantageous because it can be used to detect drop in data: when the data is written it is conveniently arranged such that an entire dataset is written on a single pass and therefore if sets of data within a dataset occur from more than a single write pass an error is likely to have occurred.
The method may monitor a portion of the non-user data that provides a numerical value representing the pass on which the set of data being read was written, further comprising detecting whether the numerical value is altered for neighboring sets of data. Such a method provides a convenient manner in which to check for sets of data being written on more than one pass.
The method may comprise using a state machine to monitor the zone into which data being read from the data-holding medium falls. An advantage of using a state machine in this manner is that is that it provides a convenient structure to track the zone.
According to a fourth aspect of the invention there is provided a computer readable medium having stored therein instructions for causing a processing unit to execute the method of the third aspect of the invention.
The computer readable medium, although not limited to, may be any one of the following: a floppy disk, a CDROM, a DVD ROM/RAM, a ZIP(trademark) disk, a magneto optical disc, a hard drive, a transmitted signal (including an internet download, file transfer, etc.),
According to a fifth aspect of the invention there is provided a data reader arranged to read a data-holding medium containing first and second markers in addition to user data, said data reader comprising at least one read head arranged to read the data-holding medium and generate a data signal corresponding to said first and second markers, and said user data, the data reader further comprising processing circuitry arranged to receive said data signal and obtain said user data from said data-holding medium wherein, the processing circuitry is arranged to identify said user data without reference to said first marker.
An advantage of such a data reader is that by not detecting the first marker can make the data reader simpler to implement. As such it may be easier to configure, and may be more reliable. Reliability is an important consideration for certain applications in which a data reader may be used (for example in data backup applications, in which data recovery and reliability are important issues).