1. Field of the Invention
This invention relates to video data formatting and storage.
2. Description of the Prior Art
A video data storage system has been proposed in which an input video signal having a so-called xe2x80x9clong GOPxe2x80x9d format (i.e. a group of pictures format related to the MPEG format, involving P and/or B frames) is stored as an I-frame only signal. The stored signal is then returned to the long-GOP form for output.
As well as storing the I frame data (which allows frame-accurate editing of the video signal) some extra data or metadata associated with the video signal could be stored as well. The metadata could provide information about how the signal was coded in long GOP form and so could be used to assist the output encoder to give better results when the long GOP output signal is generated.
Two example possibilities for the metadata are: (a) the coding decisions used in generating the long GOP data, for example, vectors, Q (quantization parameter) values etc., or (b) the actual long GOP bitstream itself. In case (a), the coding decisions would guide the output encoder to make similar or identical decisions to generate the output long GOP bitstream, and in case (b), when no editing has taken place the original long GOP data could be output directly so that an I-frame to long GOP recoding process is needed only at or near to edit points.
Of course, in either case (a) or case (b), the amount of metadata concerned can be highly variable.
In case (a), the amount of data representing the coding decisions can vary from frame to frame. Apart from any other reason, an I frame does not usually have any associated vectors, a P frame has one set of vectors and a B frame may have two sets.
In case (b), the number of bits per frame or per GOP of a long GOP bitstream is not generally regulated to a fixed amount, but instead is controlled in the MPEG system by the fullness of a so-called xe2x80x9cvirtual bufferxe2x80x9d, an average desired data rate over time and by the difficulty of encoding the picture content itself. So, the quantity of data can vary dramatically from frame to frame. Also, it is very difficult to synchronise to the long-GOP data on a frame-by-frame basis.
In contrast, the I frame data recorded by the video data storage system is generally set up to aim towards a fixed data amount per frame. This is particularly important in a tape-based system where the capacity of the storage medium is strictly limited. If any metadata is recorded, the amount of metadata allocated to each I-frame must be subtracted from the available data capacity per I-frame to give a remaining target data amount for the Iframe.
The problem therefore is to store the variable-size metadata within a fixed data allocation per I-frame, and to do so in a manner which allows synchronism between the metadata and the frames of the stored I-frame signal.
Two possibilities for handling this problem are therefore:
(i) to record an average amount of either the coding decision data (a) or the long GOP data (b) with each frame of I-frame data. This gives a predictable data quantity remaining for each I frame. This scenario is shown schematically in FIG. 1 of the accompanying drawings, in which each fixed-size data allocation 10 contains a predetermined quantity of I-frame data 20 and a predetermined quantity of metadata 30. However, this means that the data recorded with each I-frame recorded by the video data storage system may bear no relation to the particular image represented by that I-frame.
(ii) to record data actually associated with each I frame alongside that I frame. This is only really possible for the decision data (a), as there is not an easily derivable relationship between long-GOP data (b) and individual frames of the video signal. This scenario is illustrated schematically in FIG. 2 of the accompanying drawings, in which each fixed size data allocation 10 contains a variable quantity of I-frame data 22 and a variable, complementary, quantity of metadata 32. This proposal means that there is a good correlation between metadata and the I-frame data if the signal is cut or otherwise edited. However, it requires the target bit rate for each I-frame to be altered from I-frame to I-frame. Also, because the amount of data varies dramatically, some I-frames may be left with insufficient data capacity for adequate coding.
In summary, there is a need for a formatting and/or storage arrangement which allows variable-length metadata which may, for example, comprise or be derived from a long GOP encoding of a video signal.
This invention provides video data formatting apparatus for formatting video data representing successive pictures for a data handling channel having a predetermined data capacity per picture, the apparatus comprising:
means for receiving an input video signal representing successive pictures, the input video signal having associated with it at least data defining at least some of the coding decisions made during an encoding of pictures represented by the input video signal into a compressed form having group-of-pictures (GOP) format including at least one inter-picture encoded picture;
means for converting the input video signal into an intermediate compressed video signal, the intermediate compressed video signal having a GOP format in which each GOP contains fewer pictures than a GOP associated with the input video signal;
means for deriving a metadata signal from the input video signal, the metadata signal indicating at least the data defining at least some of the coding decisions;
means for generating a data quantity allocation to control the transcoding into the intermediate video signal, whereby each picture of the intermediate video signal is transcoded so as not to exceed a respective data quantity allocation, in which the generating means calculates the data quantity allocation for each picture to be substantially equal to:
the predetermined data capacity per picture
less
the quantity of metadata for the input video signal GOP containing that picture divided by the number of pictures (n) in that input video signal GOP.
Further respective aspects and features of the invention are defined in the appended claims.
The invention is particularly applicable to a system in which the intermediate compressed video signal has a GOP format comprising only intra-picture encoded pictures.
Preferably the apparatus comprises means for generating data packets (e.g. for recording on a storage medium such as a tape medium) each comprising: an encoded picture of the intermediate compressed video signal; and a substantially I/n portion of the metadata signal associated with the input video signal GOP from which that picture was derived.
In embodiments of the invention, the solution provided to the problem described above is to determine the quantity of metadataxe2x80x94for example, of type (a) or (b)xe2x80x94associated with a particular GOP of the long GOP signal, and then to record that metadata in substantially equal segments with each of the I-frames corresponding to that GOP.
This allows the metadata and I-frame data to be associated with one another on a GOP by GOP basis. The start of each GOP of metadata can be established by standard synchronising codes within a GOP.
This solution recognises that long-GOP metadata of either type is close to useless if an edit has been made during the GOP, so there is no point having a correlation frame by frame. It also allows the target bit rate to be set once for all of the frames corresponding to a GOP.
The intermediate video signal preferably comprises only I-frames, for convenience of editing. This would generally imply a GOP length of 1, though that need not always be the case.
Preferably the metadata is formatted to the intermediate video signal in such a manner that it is resynchronised at GOP boundariesxe2x80x94in the preferred embodiments these would be boundaries of the long-GOPs, but in a system where the two GOP lengths shared a common multiple it would take place at instances of that common multiple of pictures.
In some embodiments, the input video signal need not be a compressed signal but may instead have been compressed at some other processing stage. The metadata signal could indicate at least a quantization parameter used in encoding each picture of the input video signal. Alternatively, or indeed in addition, the metadata signal indicates at least a set of motion vectors used in encoding each picture of the input video signal. Of course, notwithstanding that these conditions would in fact be fulfilled by a substantially entire compressed video signal, the metadata can itself be the compressed input video signal.
The input video signal could be in an uncompressed xe2x80x9cbasebandxe2x80x9d form and have associated with it (for example) a set of coding decisions relevant to its encoding into a compressed form, in which case the metadata signal is derived from the video signal by extracting it from the associated data. However, in a preferred embodiment the input video signal is a compressed video signal in accordance with the associated GOP format.
For convenience of operation of the apparatus and the division of metadata between pictures of the intermediate video signal, it is preferred that the number of pictures in each GOP of the intermediate video signal and the number of pictures in each GOP associated with the input video signal have a common multiple under 61. More specifically, it is particularly convenient if the number of pictures in each GOP of the intermediate video signal is a factor of the number of pictures in each GOP associated with the input video signal, as this allows an easier allocation of the metadata.
The invention is particularly applicable to use in video data storage apparatus comprising formatting apparatus as defined above; and a storage medium for storing the intermediate video signal and the associated metadata signal. At the output of such apparatus a video signal in the same GOP format as that associated with the input video signal is preferably generated, using the metadata if possible or appropriate.
In the case where the metadata is effectively a compressed video signal itself, this can in some circumstances be used as the output signal, so avoiding any quality loss or artefacts from the coding and decoding of the intermediate video signal. However, this cannot be done if an edit has taken place or if the metadata has been corrupted through, for example, data loss (detectable using a conventional error correcting code). In order to handle the possibility of an edit interrupting a GOP of the metadata, it is preferred that the storage apparatus means for detecting whether an edit operation has taken place within a predetermined number of pictures of a current picture; and, if not, for using the metadata signal as the output compressed video signal.