The MPEG-2 standard provides for the transmission and multiplexing of video and audio data in a Transport Stream TS (see ISO/IEC standard 13818-1 or ITU-T Rec. H.222.0, both incorporated by reference herein). An Elementary Stream ES encodes video data in a Video Coding Layer VCL, as well as other data in a non-video coding layer. The ES is packetized in Packetized Elementary Stream PES packets by grouping the NAL units of each video frame in a PES packet to form a PES stream, which in turn is split into TS packets forming the TS. Typically, the TS packets are much smaller than the PES packets.
The elementary stream is organised in Network Access Layer NAL units containing video coding data, as well as NAL units containing non-video coding data. An example of the latter is Supplemental Enhancement Information SEI NAL units. SEI NAL units can carry private data in addition to that prescribed by the applicable standard and may carry watermarking data in this way. The organisation of the ES, and in particular the video coding NAL units depend on the codec used to encode the video data, an example of which is the H.264 codec (see ISO/IEC 14496-10 and ITU-T Recommendation H.264, incorporated herein by reference). The present disclosure is not limited to any particular codec but, rather, is applicable independently of how the video data is encoded.
Watermarking data is data inserted in an ES, typically in a SEI NAL unit, by a watermarking provider and is used by a consumer device receiving and decoding a TS containing the ES to insert a watermark characteristic of the consumer device (for example, based on a device ID) and possibly a timestamp indicative of the time of decoding in the decoded ES. The watermarking data typically comprises a list of watermarking locations at which video (or audio) data is to be modified in dependence upon device ID and/or timestamp data. For example, the list comprises triplets of a location identifier (such as a byte offset relative to the SEI NAL unit comprising the watermarking data, or a video slice), a (byte or slice) value to write at the location if a corresponding bit of the ID and/or timestamp data is 0, and a value to write at the location if a corresponding bit of the ID and/or timestamp data is 1. The watermarking modifications are arranged so that they are not perceptible but can be detected with a corresponding tool to identify the device and/or time of decoding. Since the watermarking data is embedded in the ES and defines the locations to modify inside the ES, the consumer device can only watermark the ES once the TS has been decoded to form the ES. Forming the ES from the TS is a computationally intense operation.
MPEG-2 TS video streams are often sent in scrambled form to allow Conditional Access Systems CAS to restrict consumption of the streams only to authorised subscribers, for example in digital broadcasting in accordance with the DVB standard. The consumer device descrambles the scrambled TS to produce a clear text TS, allowing a MPEG decoder to form the ES from the TS (parse the TS and PES) and to decode the video signal from the ES. In many CAS the decoding is done in a dedicated hardware component, typically a Secure Element SE, often provided on a smartcard, cartridge or dongle for example, a USB dongle, or a dedicated CAS chip or chipset.