1. Field of the Invention
The present invention relates to cinema film or video film applications, respectively, and in particular to the synchronization of an audio signal comprising samples associated with each frame of the film, comprising frames of the cinema or video film, respectively.
2. Description of the Related Art
There is a rising demand for new technologies and innovative products in the area of entertainment electronics. Here, it is an important prerequisite for the success of new multimedia systems to provide optimum functionalities or capabilities, respectively. This is achieved by the use of digital technologies and in particular of computer technology. Examples for this are the applications which offer an improved realistic audiovisual impression. With present audio systems a main disadvantage is the quality of the spatial sound reproduction of natural but also of virtual environments.
Methods for a multi-channel loudspeaker reproduction of audio signals have been known and standardized for years. All conventional technologies have the disadvantage that both the setup location of the loudspeakers and also the position of the listener are already integrated in the transmission format. With a wrong arrangement of the loudspeakers with regard to the listener, the audio quality suffers substantially. An optimum sound is only possible in a small area of the reproduction space, the so-called sweet spot.
A better natural room impression and a stronger enveloping in audio reproduction may be achieved with the help of a new technology. The basics of this technology, the so-called wave-field synthesis (WFS), were researched at the TU of Delft and first presented in the late 80ies (Berkhout, A. J.; de Vries, D.; Vogel, P.: Acoustic control by Wave-field Synthesis. JASA 93, 1993).
As a consequence of the enormous requirements of this method with regard to computer power and transmission rates, the wave-field synthesis has hitherto been only rarely used in practice. Only the advances in the areas of microprocessor technology and audio encoding today allow the use of this technology in concrete applications. First products in the professional area are expected next year. In a few years also first wave-field synthesis applications for the consumer area are to hit the market.
The basic idea of WFS is based on the application of the Huygen Principle of Wave Theory:
Every point on a propagating wave-front serves as the source of a wavelet propagating in a spherical or circular form, respectively.
If applied to acoustics, any form of an incoming wave-front may be reproduced by a large number of loudspeakers arranged next to each other (a so-called loudspeaker array). In the simplest case of an individual punctual source to be reproduced and a linear arrangement of the loudspeakers, the audio signals of every loudspeaker have to be supplied with a time delay and an amplitude scaling so that the reflected sound fields of the individual loudspeakers are correctly overlaid. With several sound sources, for each source the contribution to each loudspeaker is calculated separately and the resulting signals are added. If the sources to be reproduced are in a room with reflecting walls, then also reflections have to be reproduced as additional sources via the loudspeaker array. The effort in calculating thus strongly depends on the number of sound sources, on the reflection characteristics of the recording room and on the number of loudspeakers.
The advantage of this technology is in particular that a natural spatial sound impression is possible via a large area of the reproduction room. In contrast to known technologies, direction and distance from sound sources are reproduced very accurately. In a limited way, virtual sound sources may even be positioned between the real loudspeaker array and the listener.
Although the wave-field synthesis works well for environments whose conditions are known, irregularities do occur, however, when the conditions change or when the wave-field synthesis is performed on the basis of environmental conditions that do not correspond to the actual conditions of the environment.
The technology of wave-field synthesis may also be used advantageously, however, to supplement a visual perception by a corresponding spatial audio perception. Hitherto, in the production in virtual studios the provision of an authentic visual impression of the virtual scene was of top priority. The acoustic impression matching the image is usually integrated on the audio signal by manual operation steps in the so-called postproduction or classified as being too expensive and time-consuming in the realization and thus omitted. By this, there is usually a contradiction of the individual sensations which leads to the fact that the designed room, i.e. the styled scene, is perceived as less authentic.
Not only in the above briefly illustrated wave-field synthesis, in which a number of loudspeakers have to be supplied with individual sound signals which may be in a range of above 100 pieces, but also in conventional cinema applications, in which, for example, Dolby 5.1 or 7.1, respectively, is used, or also in normal stereo applications and even also in mono-applications there is always the requirement to synchronize the film and the audio material either in a home environment or, in particular, in a cinema environment.
Further, band-supported video material has to be synchronized with audio material in the studio area. For this, conventionally a standard time code for the cinema or studio operation, respectively, is used. The standard time code is also referred to as LTC (LTC=longitudinal time code) or in general as time code. The longitudinal time code as an example for any possible time code indicating a position of a frame in the sequence of frames of the film is a time code which is typically imprinted on the film material, i.e. so that each frame receives its own time code.
A possible configuration of the time code is illustrated in FIG. 2. FIG. 2 shows a sequence of frames 200, 201, 202, 203, wherein the frame 200 is referred to as a frame EBi, while the frame 201 is referred to as a frame EBi+1. FIG. 2, so to speak, shows an “unrolled” section of a film which has 24 frames per second in the example shown in FIG. 2. In a field 204, which is associated with each frame in the schematic illustration of FIG. 2, the way of counting the longitudinal time code is illustrated. The longitudinal time code, with regard to its encoded information, consists in a “time information” and a “frame information”. The time information is schematically illustrated in FIG. 2 such that the frame i (200) is an image whose time information e.g. includes 10 hours, 0 minutes and 1 second. The frame information designates the first frame in this second for the image 200. Analog to that, the frame information for the frame 202 designates the 24th frame at the “point in time” of 10 hours, 0 minutes and 1 second.
As, in the indicated embodiment in FIG. 2, it is assumed that the film has a playing frequency of 24 frames per second (also playing frequencies of 25 frames per second exist), the time information of the frame k+1 (203) is 10 hours, 0 minutes and 2 seconds, while the frame information of this frame is again equal to 1, as this is the first frame in the “new” second.
It is to be noted that the starting point of the time information may be selected randomly. If the starting point of the time information is, for example, set to 0, and if a film takes 90 minutes, then the maximum time information will be 1 hour, 30 minutes, 0 seconds. Important with regard to the time information is the fact that each frame obtains an original time code information which enables reconstructing the position of each frame in the sequence of frames, i.e. in the film.
The time information and the frame information are both encoded together by means of the time code which may be selected randomly and which is, for example, an 8-bit code of binary zeros and ones. Depending on the implementation, for a binary zero a dark spot may be imprinted on the film and for a binary one a light spot may be imprinted on the film, or vice versa. Alternatively, however, it is also possible and practicable to encode a “zero” e.g. as two short light/dark changes, and a “one” as a long light/dark change.
Audio samples are associated with each frame. When the case is considered that the film has a playing frequency of 24 frames per second, and the audio samples are present with a sample frequency of e.g. 48 kHz, then 2000 discrete audio samples are associated with each frame. These samples are typically stored externally into files and in the film reproduction digital/analog-converted synchronized to the frames, amplified and provided to the correspondingly positioned loudspeakers, for example, in the cinema.
In the cinema/film area, most different methods are used in order to synchronize the image material (video and film) with digital audio material (WAV files, MPEG-4 files . . . ). It is to be noted here that the audio/video material is often present in analog form and separate from each other and is to be put together accurately with regard to frames and samples after a separate digitization. For this synchronization the time code described with regard to FIG. 2 is used.
Additionally, such known systems are provided with a centrally generated and usually stable clock, also referred to as word clock. Depending on the embodiment, the frequency of this word clock is, for example, equal to the frequency using which the stored discrete samples were sampled.
As band-supported video players as well as film devices are mechanical systems whose rotational speed may vary over time, both, time code information and also word clock information imprinted on the film may only be read out in an insecure way. In particular, this information imprinted on the film is jittered after the typically optical read-out, which brings about the danger of an erroneous processing of this information taking place, which may lead to a breakdown of a system which has to operate in particular within relatively critical time constraints. As this is particularly the case with wave-field synthesis systems, in which especially the synchronous cooperation of the audio signals output by all loudspeakers is important to reconstruct corresponding wave fronts on the basis of the single waves generated by the loudspeakers.
In the prior art, different synchronization solutions are known and licensable. Thus, with the system SDDS or DTS the time information is digitally encoded and imprinted on the film perforation. There, the time code is encoded on the film. The time code is decoded in a processor and used in order to achieve a time-synchronous reproduction of image and sound. In particular, on such films a special time code track is located on the film strip. This time code is read out from the film by a special reader. An special decoder which is also required sees to it that the audio material present on CD-ROM/DVD is played synchronously to the film. The image and the analog sound, which is also referred to as optical sound, film sound or Lichtton, respectively, as it is imprinted on the film material, are arranged in a defined offset on the film strip in order to consider the delay when rendering the sound information. A synchronization is here performed manually in the processor via a setting of the delay time (Dolby A, S R).
For all such systems a special hardware is required, i.e. the special reader and the special decoder. Further, in the film copying factories for the respective method special exposure devices have to be used in order to imprint the corresponding information onto the film. There is further the fact that different synchronization/exposure concepts are not mutually compatible, so that on different films different sound formats may exist isolated from each other or next to each other such that a once copied film is as far as possible suitable for all cinema systems. It has been found, however, that the optical sound format, i.e. an optical sound track on which sound information is typically imprinted on the film, may be found on all film copies, as this optical sound guarantees an emergency variant. This means, that, if the worst comes to the worst, i.e. when the synchronization fails due to a defect of the device and the cinema is, for example, full of people, the film can nevertheless be finished, no longer on the basis of the digital sound material, however, but on the basis of the sound material imprinted on the optical sound track.
An important feature of the optical sound track is, however, as it has been implemented, the fact that the same is typically present on all film copies and that typically all film copying devices comprise means in order to imprint an optical sound track and that typically all film players have a device to optically read out the optical sound track.
One disadvantage of the described system is that these systems are typically closed systems whose functionality may not easily be determined. This is in particular problematic in so far that the known systems are not provided for any number of audio channels, but only, for example, for Dolby 5.1 or 7.1. For wave-field synthesis applications those six or eight channels, respectively, are by far not sufficient, however, so that for those systems at the moment no suitable image/sound synchronization concepts exist. It is further disadvantageous that different concepts exist which are typically not mutually compatible so that a further processing, in particular of wave-field synthesis film/sound material, is problematic.