Live video productions such as TV productions are realized today using vision mixers. Vision mixers are commercially available e.g. from the companies Grass Valley, Sony, Snell & Wilcox, and Ross.
A vision mixer (also called video switcher, video mixer, production switcher or simply mixer) is a device used to select between different video input signals to generate a video output signal. Besides switching directly between two input signals the vision mixer can also generate different kinds of transitions. Direct switching means that frame N is from a first input signal and frame N+1 is from a second input signal. Transitions between two input signals include simple dissolves and various kinds of effect transitions. Most mixers are equipped with keyers and matte generators to perform keying operations and to generate background signals which are also called mattes.
The vision mixer also performs the routing and switching of audio signals accompanying the video signals. However, since the processing of video signals is more complex than the processing of audio signals the present patent application is focused on the video signal. It is to be understood that in the context of the present patent application the processing of the video signal also implies a corresponding processing of an accompanying audio signal. Only for the sake of better intelligibility of the description of the present invention audio signals are not always mentioned in addition to the video signals.
In order to enable the multiple functionalities of vision mixers they consist of a huge amount of hardware components to process the video signals. The processing hardware components are located in one housing and are connected with local bus solutions in order to control all video processing hardware in real-time to meet the fast control requirements of live productions. In today's vision mixers there is a latency of approximately 40 ms between the moment when a user pushes a button until the associated function is executed. A latency of 40 ms is still called “real-time” processing.
The vision mixer comprises a central mixing electronic, several input channels and at least one output channel, a control unit and a user interface. Such kind of vision mixer is described for example in DE 103 36 214 A1.
The mixing electronic is provided with up to 100 or even more video input signals at the same time. The input signals are live video signals from cameras, recorded video clips from a server such as archived material, slow-motion clips from dedicated slow-motion servers, synthetic images, animations and alphanumeric symbols from graphic generators.
Devices external to the vision mixer are also controlled from the vision mixer by the user. However, the integration of the external devices to the live control environment in the same manner as the vision mixer internal hardware can only be achieved with certain restrictions. The restrictions are caused by more or less random signal latencies involved in the video, audio and control interconnections. The reason is that the overall processing including external devices does not behave in the same way as if the hardware components are connected to the same local control-, video- and audio-bus. Specifically, the overall control latency is predetermined only within a certain time window as well as the overall signal latency and signal change latency. The time windows range from several frames up to seconds and do not meet the requirements for real-time control behavior. Since the individual delays can be additionally random, there is a certain risk that a set of changes involving the vision mixer and external devices are not executed in a synchronized manner and produce temporary inconsistent video and/or audio frames. This general problem of synchronizing several processing devices is solved today by mechanisms that work in two steps:
At first, external devices which shall provide a certain signal are prepared at least some seconds ahead of the actual use of the signal. Secondly, the vision mixer waits for the ready status signal of the external device or alternatively for a predetermined safe time period before the signal of the external device is added to the live stream, i.e. to the production stream of the vision mixer. The predetermined safe time period is long enough to be sure that the external device is ready to execute a command.
The described approach of the state of the art vision mixers requires that the operator of the vision mixer must have in mind that some hardware devices need to be prepared with the additional complication that among those hardware devices requiring preparation each one has to be prepared in its dedicated manner. An inherent disadvantage of this approach is that the prepared hardware devices are locked during the waiting time and are not available for processing tasks. Consequently, today's vision mixers for live productions typically contain much more hardware than needed for a specific live production in terms of video inputs, video outputs and processing stages because the director of a live video production usually wants to execute as many as possible functionalities of the production preferably within one processing frame to achieve all intended to signal changes simultaneously and in real time.
Boutaba R et al: “Distributed Video Production: Tasks, Architecture and QoS Provisioning”, published in Multimedia Tools and Applications, Kluwer Academic Publishers, Boston, US, Volume 16, Number 1-2, 1 Jan. 2002, pages 99 to 136. Boutaba et al address the issue of delay, delay variations and inter-media skew requirements. Boutaba et al explicitly state that delay performance is measured based on delay variation or “jitter”. Jitter is a measure of the difference in delay experienced by different packets in the network due to variation in buffer occupancy in intermediate switching nodes. Another form of jitter is inter-stream jitter or “skew”, which measures the difference in delay as seen by separate streams pertaining to the same application (such as audio and video). In order to ensure proper intra-stream synchronization, low delay variation is often required. Boutaba et al suggest compensating jitter by buffering the data streams. This requires the provision of sufficient memory capable of storing sufficiently long intervals of the video and audio data to compensate the jitter. In the case of high definition video data this requires a big storage capacity.
Taking this as a starting point it is an object of the present invention to propose an alternative approach for making live video productions.