Watermarking (forensic marking) is employed in digital cinemas to prevent piracy and allow forensic tracking of illicit captures or copies of cinematic content, and is also employed in other contexts. Watermarks, which can be embedded in both audio and video signals, should be robust against legitimate and illegitimate modifications to the marked content and captures of the marked content (e.g., captures made by mobile phones or high-quality audio and video recording devices). Watermarks typically comprise information about when and where playback of the content has occurred. Thus, watermarking for theatrical use typically occurs during actual playback, and the watermarks to content played in theaters are typically indicative of theater identification data (a theater “ID”) and playback time.
The complexity, and therefore the financial and computational cost, of watermarking audio programs increases linearly with the number of channels to be watermarked. During rendering and playback (e.g., in movie theaters) of object based audio programs, the audio content has a number of channels (e.g., object channels and speaker channels) which is typically much larger (e.g., by an order of magnitude) than the number occurring during rendering and playback of conventional speaker-channel based programs. Typically also, the speaker system used for playback includes a much larger number of speakers than the number employed for playback of conventional speaker-channel based programs.
It is conventional to watermark some but not all speaker channels of a multichannel audio program of the conventional type comprising speaker channels but not object channels. However, conventional watermarking of this type does not measure content of individual channels of the program to select which channels should be watermarked, and does not select which channels to watermark based on the configuration of the playback speakers (e.g., the arrangement of speakers in a room) or the audio content to be played by any of the speakers. Rather, conventional watermarking of this type typically tries to watermark the first N channels of the program (where N is a small number consistent with the processing limitations of the watermarking system, e.g., N=8) or all the channels if the program comprises not more than a small number of channels, but during watermarking (e.g., rendering which includes watermarking) skips randomly the watermarking of some channels depending on actually achieved processing speed (so that watermarking of some channels is skipped if otherwise, overall processing rate would fall below a threshold).
The inventors have recognized that watermarking (e.g., during playback in a theater) of each individual channel (or a randomly determined subset of the channels) of a multichannel audio program (or each speaker feed signal, or a randomly determined subset of the speaker feed signals, generated in response to such program) can be wasteful and inefficient. For example, watermarking of signals indicative of silent (or nearly silent) audio content will generally not contribute to an improved watermark recovery. Furthermore, watermarking of channels that are relatively quiet compared to other channels will not contribute to improved watermark recovery.
Although embodiments of the invention are useful for selectively watermarking channels of any multichannel audio program, many embodiments of the invention are especially useful for selectively watermarking channels of object-based audio programs having a large number of channels.
It is known to employ playback systems (e.g., in movie theaters) to render object based audio programs. Object based audio programs which are movie soundtracks may be indicative of many different audio objects corresponding to images on a screen, dialog, noises, and sound effects that emanate from different places on (or relative to) the screen, as well as background music and ambient effects (which may be indicated by speaker channels of the program) to create the intended overall auditory experience. Accurate playback of such programs requires that sounds be reproduced in a way that corresponds as closely as possible to what is intended by the content creator with respect to audio object size, position, intensity, movement, and depth.
During generation of object based audio programs, it is typically assumed that the loudspeakers to be employed for rendering are located in arbitrary locations in the playback environment; not necessarily in a (nominally) horizontal plane or in any other predetermined arrangements known at the time of program generation. Typically, metadata included in the program indicates rendering parameters for rendering at least one object of the program at an apparent spatial location or along a trajectory (in a three dimensional volume), e.g., using a three-dimensional array of speakers. For example, an object channel of the program may have corresponding metadata indicating a three-dimensional trajectory of apparent spatial positions at which the object (indicated by the object channel) is to be rendered. The trajectory may include a sequence of “floor” locations (in the plane of a subset of speakers which are assumed to be located on the floor, or in another horizontal plane, of the playback environment), and a sequence of “above-floor” locations (each determined by driving a subset of the speakers which are assumed to be located in at least one other horizontal plane of the playback environment).
Object based audio programs represent a significant improvement in many contexts over traditional speaker channel-based audio programs, since speaker-channel based audio is more limited with respect to spatial playback of specific audio objects than is object channel based audio. Speaker channel-based audio programs consist of speaker channels only (not object channels), and each speaker channel typically determines a speaker feed for a specific, individual speaker in a listening environment.
Various methods and systems for generating and rendering object based audio programs have been proposed. During generation of an object based audio program, it is typically assumed that an arbitrary number of loudspeakers will be employed for playback of the program, and that the loudspeakers to be employed (typically, in a movie theater) for playback will be located in arbitrary locations in the playback environment; not necessarily in a (nominally) horizontal plane or in any other predetermined arrangement known at the time of program generation. Typically, object-related metadata included in the program indicates rendering parameters for rendering at least one object of the program at an apparent spatial location or along a trajectory (in a three dimensional volume), e.g., using a three-dimensional array of speakers. For example, an object channel of the program may have corresponding metadata indicating a three-dimensional trajectory of apparent spatial positions at which the object (indicated by the object channel) is to be rendered. The trajectory may include a sequence of “floor” locations (in the plane of a subset of speakers which are assumed to be located on the floor, or in another horizontal plane, of the playback environment), and a sequence of “above-floor” locations (each determined by driving a subset of the speakers which are assumed to be located in at least one other horizontal plane of the playback environment). Examples of rendering of object based audio programs are described, for example, in PCT International Application No. PCT/US2001/028783, published under International Publication No. WO 2011/119401 A2 on Sep. 29, 2011, and assigned to the assignee of the present application.