Technologies being developed at the moment, or also future ones, enable more and more efficient storage of audio signals or video signals, but also enhancement of the auditory or visual pleasure by extensions, such as the employment of the multi-channel technology. Such extensions may be filed in new file formats and made available to the user together with the audio base data, which may for example be a mono or stereo audio signal. The extension data may, for example, be additional multi-channel information. This means that both the audio base data and the extension data are contained in a common data stream or a file.
At the same time, however, it is also interesting that a user already possessing a stereo version of an audio signal only obtains an extension, namely the multi-channel tone, and afterwards adds the same to their existing audio signal or the corresponding file. This variant, in particular, has various advantages. Data already existing with the user anyway does not have to be transmitted unnecessarily. Particularly in a scenario in which a service provider bills their service according to the amount of data transmitted via their network, a user may achieve significant cost savings by receiving as little data as possible via a network.
For example, a user is in possession of a stereo CD and therefore has the left and the right channel of a certain piece of music. With the advent of the multi-channel technology, such as the 5.1 technology, the user may now have the desire of not only playing their stereo CD on a new surround system, but having a 5-channel version of their stereo CD and play it. In this case, it would suffice to transmit only the left surround channel, the right surround channel and the center channel to the user, who already has the left and right channels. In the scenario described, in which the amount of data transmitted is billed, a user would already save 40% of the amount of data if they only have 3 channels sent instead of 5 channels.
Moreover, the additional purchase of the extension data possibly is more attractive economically for the user, since they do not have to pay again for audio base data already present. Thus, a record company that has already sold a stereo CD could offer, as an additional service for their customers, the “surround” extension at a lower price than the complete 5-channel version of a piece of music.
The use of additional data for already existing data may, however, also be highly interesting for various other applications. In particular, in the field of scalable audio/video data, additional data may exist in a higher scaling layer. In the scalability concept known in the art, there is, for example, a base scaling layer, which includes the audio signal up to a certain bandwidth, such as 8 kHz, of an audio piece. A playing device capable of reproducing only this maximum bandwidth of 8 kHz, for example, is fully utilized with such data. The playing device could for example be a player not having especially broad-band speakers. Likewise, this signal could also be band-limited downward, so that the player also cannot reproduce tones below e.g. 500 Hz. The next higher scaling layer could be the bandwidth missing in the downward direction and/or the bandwidth missing in the upward direction, such as the bandwidth from 20 Hz-500 Hz and the bandwidth from 8 kHz-16 kHz. This first scaling layer would then have to be combined with the original audio signal, the bandwidth of which lies between 500 Hz and 8 kHz, in order to obtain a broad-band audio signal, which may then be reproduced by a broad-band reproduction device. This scaling layer variant could also perfectly well be provided by a provider such that the first scaling layer costs less for a user than the broad-band audio signal, because the user has already bought the “narrow-band” audio signal previously.
Further extension data consists in video data, in which the base layer provides a video sequence with a certain resolution, while the next scaling layer provides video data already having a higher resolution itself or, when combined with the original video data, results in a video sequence with higher resolution. Such a scenario is given if a user only has a video reproduction device with lower resolution and then later obtains a video reproduction device with higher resolution and would like to view their “old” videos with the higher resolution made possible by their new device.
Further extension data also consists in so-called SBR (spectral band replication) data. In the known SBR technology, due to a low output data rate available, an encoder only generates a band-limited signal, which only extends up to a maximum cutoff frequency of e.g. 4 or 6 kHz. The data for the missing high band is no longer coded as audio samples or audio spectral values, but as parametric data. In the SBR technology, this is parametric data information on the spectral envelope. An SBR decoder will then copy spectral values from the available band into a higher band and thus establish a fine spectral structure of the higher band, while the rough spectral structure, that is the spectral envelope, is determined by the parametric additional data. Depending on the implementation, a user could therefore supplement their already existing band-limited coded or uncoded audio data to a broad-band audio signal, either due to the transmitted SBR parameters or already due to the temporal audio samples only including the high band.
In the multi-channel audio reproduction technology, which has at least three reproduction channels, such as left, right, and center, parametric techniques are increasingly being employed, also known by the headword BCC technique. In the BCC technique, one or two base channels are used to generate, in principle, an arbitrary number of reproduction channels, such as 5 channels in the surround reproduction technology, using parametric additional data. Here, the parametric data are inter-channel level differences (ICLD), inter-channel time differences or inter-channel coherence (ICC) information.
This parametric data is applied to the transmitted stereo base channels, in order to generate the reproduction channels by various weightings/combinations of the two base channels.
Also in this scenario, a user already in possession of the two stereo channels of a piece of music could be interested in either “additionally buying” the parameter data, which of course claims very low data rates. But in this case a receiver would have to dispose of a BCC decoder, to be able to deal with the parametric data. Alternatively, a service provider could, however, also generate the 3 channels of left surround, right surround, and center from such parametric data and (ideal) versions of the two stereo base channels present at the same due to the parametric data and send them to the receiver in “decoded” manner, so to speak, i.e. as audio data, which is not parametric data.
Similar multi-channel coding techniques using parametric data are also known by the headword of “intensity stereo coding”.
Time-synchronously adding time-continuous extension data, in particular, to time-continuous audio base data, wherein the time-continuous extension data has already been generated from parametric data, for example, leads to a series of practical problems, which have to be solved for successful application.
For all extension data, it should be ensured that it exactly goes with the partner among the vast amount of various audio base data, for which it has been designed, generated or calculated. In particular, this is made difficult by the fact that the base data per se have no unique tag on the basis of which it can be identified or even associated with a unique partner. By way of example, multi-channel additional data Dx of a piece of music X should only be added to this piece of music X and not to another piece of music Y or a so-called remix “XR” of the same piece of music X. At this point, it is to be pointed out that in the field of pop rock music, in particular, there always are several versions of a piece, so these versions may be long versions for a CD, short versions for a single, live versions or the so-called re-issues or remix versions. But in the field of classical music a multiplicity of interpretations, which solely arise by the fact that a piece was recorded by various orchestras, also exist for one and the same piece. Thus, it goes without saying that multi-channel additional data of a recording of a classical piece by the orchestra X will of course not match the recording of the same classical piece by the orchestra Y.
Another problem is that it has to be provided for the audio base data to match the extension data in precisely time-continuous manner, and vice versa. If this is not the case, the extension data will be useless for the user in most cases. If the multi-channel additional information of a piece of music has a minimum offset to the stereo base data, clearly audible artifacts already occur in the sound impression, and the user thus only has a faulty multi-channel version of the piece of music, which is no longer usable in the extreme case.
Audio base data may also be present in shortened form. For example, if a service provider is supposed to provide for a multi-channel extension of existing stereo signals, that is, supposed to generate the multi-channel additional data, they should have access to a multi-channel version of the piece of music. The user of the service, who desires the multi-channel additional information, also possesses a version of the piece of music, namely a stereo version. If the end user has intentionally or unintentionally removed parts of the beginning or the end from the audio data in the creation or processing, for example when reading in a CD, the multi-channel version of the service provider and the stereo version of the end user no longer cover the same audio range. Such situations, as far as they occur, are also to be taken into account when adding the multi-channel additional data.
If the audio base data is also temporally stretched or shrunk, that is, if it has been recorded/played more quickly or more slowly, this also leads to problems in the addition. Here, the correct shrinking/stretching factor would have to be determined, which then should be used for the extension data in similar manner. If the end user has recorded their stereo version from the radio, for example, it may be that this was played up to 3% more quickly or more slowly. Correspondingly, the end user now possesses a stretched/longer or shrunk/shorter version of the piece of music, which will also be relevant for the multi-channel additional data.
Furthermore, all the data mentioned should also be able to be determined if the audio base signal is no longer present in its original form, but has been changed by the transmission, for example by an audio coding, within certain boundaries. If the stereo version of the end user has been dubbed from an analog cassette recorder, the piece of music has thereby changed qualitatively (deteriorated). Even under these (more difficult) conditions, adding the multi-channel additional data should also work in principle.
In particular, it is to be pointed out that in technology the removal of data e.g. at the beginning or at the end of a piece is understood by “shortening”. The English technical term for this is “cropping”. By “shrinking”, on the other hand, a linear distortion of the time axis, for example by quicker reproduction, is understood, which corresponds to a “resampling” in digital technology, i.e. the conversion to an altered sampling frequency. By analogy, the “lengthening” means an addition of data, whereas the “stretching” means a linear distortion of the time axis in reverse direction, that is, slower reproduction.
From technology, in particular also from cinema video technology, time synchronization methods are known, in which typically time code standards also referred to as time stamps are used. By correctly matched time codes both in the video material and in the audio material, it is ensured that the matching tone is played to a sequence of images. Such time codes allow for the synchronization of audio and video data as well as multimedia data. But they are usually not present in consumer audio formats. A CD containing a stereo version of a piece does not contain any uniquely standardized or generally accepted time codes. There also are no generally accepted time synchronization techniques for the “enhancement” of usual video sequences with additional information, to obtain a higher-resolution video sequence.
Therefore, the addition of additional information to base information both in the audio and the video field is only successful if both the base data and the additional data have been created in “one casting”, such as if a BCC encoder generates BCC parameters due to a multi-channel version, wherein a BCC decoding can only take place on the basis of the base channels derived from this multi-channel version, but not using arbitrary base channels. The situation is similar with scalable encoders or with SBR systems. Here, it is also worked “from one casting”, because SBR additional data or higher scaling layers match only exactly one base scaling layer or one low-band signal, which must already have been available in the generation of the data. For arbitrary base data, as it may be present with a user and already has been manipulated by the user intentionally or unintentionally (quality deterioration), such concepts working according to the principle of the “one casting” do not work already by definition.