This invention relates to a method and apparatus for embedding digital audio data in a serial digital video data stream.
SMPTE 259M-1993 (hereinafter referred to simply as SMPTE 259) defines the serial digital interface (SDI) signal format for video. In accordance with SMPTE 259, video is transmitted as 8-bit or 10-bit serial data at 143, 177, 270 or 360 Mb/s.
The SDI signal format allows a user of equipment that supports this standard to interconnect discrete items of equipment with the assurance that the different items are compatible with respect to the form in which video data is supplied or received by the respective items.
The SDI signal format specifies locations at which ancillary data can be accommodated in the field of a composite digital signal or a component digital signal. For convenience and brevity, the following description will focus on component digital signals. Much of the description is also applicable directly to composite digital signals. Those skilled in the art will recognize where the description is not applicable to composite digital signals and will understand how the description should be modified to render it applicable to composite digital signals.
ANSI S4.40 prescribes a data stream for digital audio data. The data stream, which is known as the AES-3 data stream, or simply the AES data stream, is composed of a succession of frames, each frame containing two subframes and each subframe containing 32 bit cells. Subframe 1 contains an audio data sample for audio channel 1 and subframe 2 contains an audio data sample for audio channel 2. The two channels of the AES data stream may be, but need not be, related, for example as left and right stereo channels. Referring to FIG. 1, each subframe contains a preamble of 4 bit cells, 4 bits of auxiliary data, 20 bits of sample data and 4 additional bits, which are referred to as V (validity), U (user), C (channel status) and P (parity). The four bits of auxiliary data may be used as added sample data space, allowing 24-bit samples, although the usual sample is 20 bits.
SMPTE 272M-1994 (hereinafter referred to simply as SMPTE 272) defines the mapping of AES digital audio data into the horizontal ancillary data, or HANC, space of the SDI data stream, resulting in a serial data stream including both video data and audio data. The horizontal ancillary data space has a preset range of word locations for ancillary data: for example, in the case of the component digital SDI signal format based on 525 lines, 29.97 frames per sec, the word locations are 1444-1711.
At the preferred 48 kHz audio sample rate, there are 1920 samples during one frame interval of a video signal having a 25 Hz frame rate and 1601.6 samples during one frame interval of a video signal having a 29.97 Hz frame rate. Thus, there are 3.072 audio samples per line interval for a 625 line, 25 Hz video signal and 3.051 audio samples per line interval for a 525 line, 29.97 Hz video signal. In order to provide a uniform distribution of audio samples throughout the frame of the composite audio-video data stream, three samples are placed in the HANC space of most lines of the video signal and four samples are placed in other lines. SMPTE standards specify that there should be no samples in the HANC space immediately following the switch line.
In accordance with SMPTE 272, sample data for one audio group, consisting of one or two AES digital audio data streams (each having two channels), is used to construct an audio data packet (or base packet) which is inserted in the HANC space on a given line. Referring to FIG. 2, in the case of digital composite video the first four words of the base packet are a data header, data ID, data block number and data count. There are then two or four channels, each containing an unspecified number of sets of three consecutive sample data words X, X+1 and X+2 (subject to a maximum of 255 user words). The final word of the base packet is a check sum. In the case of digital component video, the data header is three words long, but the structure of the base packet is otherwise the same.
The channels are organized in a sequence (e.g. 1, 2, 3, 4 in the case of four channels) and the sequence of channels repeats in the SMPTE 272 packet a number of times equal to the number of audio samples (typically three or four) to be accommodated by the SMPTE 272 packet. Each set of three consecutive sample data words X, X+1, X+2 represents one audio data sample.
A single data sample for one AES channel is derived from the 20 bits of sample data and the V, U and C bits of one AES subframe, and these twenty-three bits are mapped into the three consecutive sample data words X, X+1, X+2 of one channel of the SMPTE 272 packet. In the case of a group containing four channels, there are 36 sample data words (three samples*three words per sample*four channels) in the audio data packet if the packet contains three samples and there are 48 sample data words in the packet if the packet contains four samples.
The auxiliary data of two AES data streams of one audio group may be used to construct an extended data packet to be inserted in the HANC space on the same line as the base packet. The extended data packet is composed of a data header (one word or three words depending on whether the digital video is composite or component), data ID (one word), data block number (one word), data count (one word), an unspecified number of auxiliary data words, and a check sum (one word). The auxiliary data bits of the two AES subframes of one AES frame are mapped into one auxiliary data word of the extended data packet. The extended data packet for a given group immediately follows the base packet for that group and the number of auxiliary data words must match the number of samples in the base packet.
The ancillary data space of the SDI signal derived from a digital composite video signal is able to accommodate only one group (four digital audio channels, corresponding to two AES streams), whereas the ancillary data space of the SDI signal derived from a digital component video signal is able to accommodate four groups (sixteen digital audio channels, corresponding to eight AES streams). On each line that contains ancillary data, there is a base packet for each group and there is also an extended data packet for any group containing an AES data stream that includes auxiliary data.
The data ID of the base packet and extended data packet reflects the number of the group to which the two AES streams used to form the packets have been assigned.
Under SMPTE RP 165, an EDH (error detection and handling) packet may be included in the HANC space of one line per field. The EDH packet is inserted at the end of the HANC space. The embedder must take care to ensure that the EDH packet is not overwritten by audio packets, particularly in the case of the 270 Mb/s format where the HANC space is not large enough to support four groups of data with four samples per group as well as the EDH data.
In accordance with SMPTE 272, if a signal includes horizontal ancillary data, the ancillary data must start immediately after the EAV (end of active video) timing reference signal and all packets of ancillary data must be contiguous. Accordingly, there should be no ancillary data in the HANC space after the start of blanking.
A conventional device for embedding an audio group in an SDI video data stream operates by constructing the base packets (20 bits of audio data) and multiplexing the base packets into the digital video data stream immediately after the EAV timing reference signal. At the receiving end, the receiver detects the header of the audio data packet and controls a demultiplexer which extracts the ancillary data from the data stream, allowing the AES subframes to be reconstructed.
If the video data stream is able to accommodate more than one group, conventional embedders operate in cascade fashion. Thus, in the event that sixteen channels are to be embedded, a first embedder receives both the SDI data stream containing no ancillary data and audio data channels 1-4 and embeds the audio data channels into the serial digital video data stream to create a 1:4 (1 channel video, 4 channels audio) SMPTE 272 data stream and supplies the 1:4 data stream to the second embedder. The second embedder, which also receives audio channels 5-8, constructs an audio data packet containing audio channels 5-8 and multiplexes the audio data packets into the 1:4 data stream to provide a 1:8 SMPTE 272 data stream. The third and fourth embedders operate in similar manner to the second embedder, each adding four audio channels to provide, respectively, a 1:12 SMPTE 272 video data stream and the desired 1:16 SMPTE 272 video data stream.
This approach to embedding up to sixteen channels of audio data in the SDI video data stream is subject to disadvantage because it requires a full-functioned embedder for each group and does not allow the function of one embedder to be shared over several groups.
Further, there may be as much as 2 ms delay in each embedder. Therefore, if audio channels 1-4 are synchronized with audio channels 13-16 when the channels are provided to the first and fourth embedders respectively, channels 1-4 will delayed by as much as 6 ms relative to channels 13-16 in the final data stream because of the delay suffered by audio channels 1-4 in the first through third embedders. It is generally considered that a delay greater than 1 ms will produce objectionable loss of lip sync and therefore use of cascaded embedders requires careful synchronization of the audio channels.
If the time evolution of a sound field is converted to a data stream by employing microphones to generate a multi-channel electrical signal and digitizing the channels using sample clocks that are aligned in time, the several digital audio channels are said to be phase coherent. Loss of phase coherency by as little as one sample period in processing or propagating the multi-channel audio data stream can lead to a perceptible loss in audio image quality when the data stream is converted to analog form and used to drive loudspeakers for recreating the sound field.
The phase coherency of a multi-channel audio data stream is preserved when the data stream is embedded and disembedded provided the data stream is processed as one audio group. Conventional stereophonic audio requires only two channels, both of which can be included in one group, and accordingly phase coherency can be preserved over embedding and disembedding. However, some applications require use of more than four audio channels to create an audio image and this requires use of more than one audio group for embedding the multi-channel audio data stream. Conventional embedders cannot assure that phase coherency is preserved across groups.
In accordance with a first aspect of the invention there is provided a method of embedding audio data of at least two audio data groups in an ancillary data space of a serial digital video data stream, comprising multiplexing the audio data groups to provide a serial multi-group audio data stream, and inserting the serial multi-group audio data stream into the ancillary data space of the serial digital video data stream.
In accordance with a second aspect of the invention there is provided apparatus for embedding at least two audio data groups in an ancillary data space of a serial digital video data stream, comprising an embedder for formatting data of a first audio data group, generating data packets from the formatted data and inserting the data packets into the digital video data stream, and an expansion device for formatting data of a second audio data group and supplying formatted data to the embedder, and wherein the embedder generates data packets from the formatted data of the second audio data group and inserts the data packets into the digital video data stream.
In accordance with a third aspect of the invention there is provided a method of embedding ancillary data in an ancillary data space of a serial digital interface video stream, wherein each line of the video stream is composed of a horizontal ancillary data space followed by an active interval, said method comprising during the horizontal ancillary space of line n of the video stream, reading all data from a video FIFO, whereby at the start of the active interval of line n+1 of the video stream the video FIFO contains no data, during the active interval of line n, preparing an ancillary data packet and loading the data packet into the video FIFO, during the horizontal ancillary data space of line n+1 of the video stream, reading all data from the video FIFO and inserting the ancillary data packet into the horizontal ancillary data space of line n+1, whereby at the start of the active interval of line n+2 of the video stream the video FIFO contains no data.
In accordance with a fourth aspect of the invention there is provided a method of processing multiple audio data In streams, comprising writing the first and second audio data streams into respective FIFOs, reading the audio data streams from the respective FIFOs, combining the data streams read from the FIFOS, periodically testing depth of data in each FIFO, and forcing the depth of data in each FIFO to a selected value.
In accordance with a fifth aspect of the invention there is provided a method of embedding ancillary data in the horizontal ancillary data space of a serial digital video stream, wherein each line of the video stream is composed of a horizontal ancillary data space followed by an active interval, said method comprising receiving the input serial digital video stream, detecting whether ancillary data is embedded in the horizontal ancillary data space of the input serial digital video stream, if no ancillary data is embedded in the input serial digital video stream, embedding ancillary data in the serial digital video stream, if ancillary data is embedded in the input serial digital video stream, operating either in a cascade mode or in an originate mode, and wherein operating in the originate mode includes the step of embedding ancillary data in the horizontal ancillary data space of the serial digital video stream by overwriting data in the input serial digital video stream, and operating in the cascade mode includes the step of embedding ancillary data in the serial digital video stream without overwriting data in the input serial digital video stream.
In accordance with a sixth aspect of the invention there is provided apparatus for disembedding at least two audio data groups from an ancillary data space of a serial digital video data stream, comprising a disembedder for reading data packets of at least two groups from the digital video data stream, formatting packet-wise data of a first audio data group as sample-wise data and outputting the sample-wise data of the first audio data group, and an expansion device for receiving packet-wise data of a second audio data group from the disembedder, formatting packet-wise data of the second audio data group as sample-wise data and outputting the sample-wise data of the second audio data group.