Dolby, Dolby Digital, and Dolby Digital Plus are trademarks of Dolby Laboratories Licensing Corporation. Dolby Laboratories provides proprietary implementations of AC-3 and E-AC-3 known as Dolby Digital and Dolby Digital Plus, respectively.
Audio is a critical component in the experience of playing video games and the art of producing game sound has become more and more sophisticated as the industry has grown. Game sound in current generation games is produced using audio objects, which are processed in a game console to generate a speaker channel-based program (sometimes referred to as a speaker channel-based “mix”). The mix, which comprises a number of speaker channels, is typically encoded (e.g., as an AC-3 or E-AC-3 bitstream) and the encoded audio is delivered to a rendering system. To implement playback, the rendering system generates speaker feeds in response to the speaker channels indicated by the encoded audio. FIG. 1 is a block diagram of audio processing elements of a typical conventional game console (one of the current generation of game consoles).
Typically, many of the sounds heard in a conventional game are stored as individual mono files (except some ambience and music tracks, which are typically stored as 2-channel or 5-channel files), and accesses to these files are triggered by events which occur during game play. The audio data labeled “Audio Assets” in FIG. 1 are examples of such stored audio files. Typical game consoles include an audio engine (e.g., game audio engine 1 of the FIG. 1 system) which is configured to manage a library of stored audio files, to monitor game state/user input, to play appropriate ones of the audio files at appropriate times, to position the accessed sounds accordingly (so that they will be perceived as emitting from appropriate locations during playback), and then finally to generate a speaker channel-based mix (e.g., the 5.1 speaker channel PCM audio output from engine 1 of FIG. 1). Game consoles typically also include an audio mixer (e.g., game console audio mixer 3 of the FIG. 1 system) which is coupled and configured to supplement the speaker channel-based mix with system sounds, alerts, additional music (and optionally other audio content). Game consoles typically also include an encoder (e.g., encoder 5 of the FIG. 1 system) which is coupled and configured to encode (in real-time) the modified (mixed) speaker channel-based mix (e.g., the 5.1 speaker channel PCM output from mixer 3 of FIG. 1) to generate an encoded audio bitstream (e.g., the encoded bitstream having AC-3 format which is output from encoder 5 of FIG. 1) for delivery (typically, transmission by an S/PDIF link) to a rendering system for rendering. Encoder 5 of the FIG. 1 system may be implemented as a conventional “Dolby Digital Live” encoder which outputs an encoded AC-3 bitstream in response to 5.1 speaker channel PCM audio from mixer 3.
Often during conventional game audio generation, much of the spatial information of the original object-based audio content (e.g., the Audio Assets of the FIG. 1 system) is lost when creating a speaker channel-based mix (e.g., the speaker channel-based mix output from engine 1 or mixer 3 of FIG. 1, which comprises speaker channels but not any object channel, or the encoded version of the speaker channel-based mix which is output from encoder 5 of FIG. 1, which is also indicative of speaker channels but not any object channel). The final listener experience is also compromised when the final playback system does not precisely render the speaker channel-based mix. The inventors have recognized that it would be desirable to include in the encoded audio which is generated by a game console (and output from the console for rendering) not only speaker channels, but also at least one object channel indicative of at least one audio object (e.g., indicative of stored audio content which is read from a file or otherwise accessed in response to an event occurring during game play) and descriptive information (metadata) regarding at least one such audio object (e.g., the positional trajectory and perceived size of each audio object as a function of time during playback). Thus, typical embodiments of the inventive game console are configured to generate an object based audio program (indicative of game audio content), and typically also to output the program for delivery to an external spatial rendering system (e.g., device) having knowledge of the playback system speaker configuration. Typically, the spatial rendering system employed to render the object based audio program is operable to generate speaker feeds indicative of an appropriate spatial mix of the program's speaker channel and object channel content.
It is known to employ high-end playback systems (e.g., in movie theaters) to render object based audio programs. For example, object based audio programs which are movie soundtracks may be indicative of many different sound elements (audio objects) corresponding to images on a screen, dialog, noises, and sound effects that emanate from different places on (or relative to) the screen, as well as background music and ambient effects (which may be indicated by speaker channels of the program) to create the intended overall auditory experience. Accurate playback of such programs requires that sounds be reproduced in a way that corresponds as closely as possible to what is intended by the content creator with respect to audio object size, position, intensity, movement, and depth.
Object based audio programs represent a significant improvement over traditional speaker channel-based audio programs, since speaker-channel based audio is more limited with respect to spatial playback of specific audio objects than is object channel based audio. Speaker channel-based audio programs consist of speaker channels only (not object channels), and each speaker channel typically determines a speaker feed for a specific, individual speaker in a listening environment.
Various methods and systems for generating and rendering object based audio programs have been proposed. During generation of an object based audio program, it is typically assumed that an arbitrary number of loudspeakers will be employed for playback of the program, and that the loudspeakers to be employed (typically, in a movie theater) for playback will be located in arbitrary locations in the playback environment; not necessarily in a (nominally) horizontal plane or in any other predetermined arrangement known at the time of program generation. Typically, object-related metadata included in the program indicates rendering parameters for rendering at least one object of the program at an apparent spatial location or along a trajectory (in a three dimensional volume), e.g., using a three-dimensional array of speakers. For example, an object channel of the program may have corresponding metadata indicating a three-dimensional trajectory of apparent spatial positions at which the object (indicated by the object channel) is to be rendered. The trajectory may include a sequence of “floor” locations (in the plane of a subset of speakers which are assumed to be located on the floor, or in another horizontal plane, of the playback environment), and a sequence of “above-floor” locations (each determined by driving a subset of the speakers which are assumed to be located in at least one other horizontal plane of the playback environment). Examples of rendering of object based audio programs are described, for example, in PCT International Application No. PCT/US2001/028783, published under International Publication No. WO 2011/119401 A2 on Sep. 29, 2011, and assigned to the assignee of the present application.
The advent of object based audio program rendering has significantly increased the amount of the audio data processed and the complexity of rendering that must be performed by rendering systems, in part because an object based audio program may be indicative of many objects (each with corresponding metadata) and may be rendered for playback by a system including many loudspeakers. It has been proposed to limit the number of object channels included in an object based audio program so that an intended rendering system has capability to render the program. For example, U.S. Provisional Patent Application No. 61/745,401, entitled “Scene Simplification and Object Clustering for Rendering Object-based Audio Content,” filed on Dec. 21, 2012, naming Brett Crockett, Alan Seefeldt, Nicolas Tsingos, Rhonda Wilson, and Jeroen Breebaart as inventors, and assigned to the assignee of the present invention, describes methods and apparatus for so limiting the number of object channels of an object based audio program by clustering input object channels to generate clustered object channels which are included in the program and/or by mixing audio content of input object channels with speaker channels to generate mixed speaker channels which are included in the program.