For a live TV production in today's video production environment very expensive specialized equipment is used for the acquisition, processing, and play out of video and/or audio feeds or streams coming from acquisition devices, such as cameras and microphones. Sometimes the video and/or audio feeds are combined with associated meta-data, for instance video/audio meta-data or meta-data associated with the event that is captured, such as the speed of athletes or situational data being supplied by experts, e.g. commentaries on sports games like goals, fouls, etc. All the incoming signals or data need to be processed to produce one or more final program output signal(s). A processing unit that is part of the production equipment performs the processing of the signals. The processing includes                ingesting signals from acquisition devices into processing equipment;        encoding raw signals from the acquisition devices into lower bit rate signals;        deriving lower resolution copies of the original signal, e.g. for monitoring or other purposes;        decoding the lower bit rate signals into the original raw signal;        transcoding signals;        recording a signal for later use;        applying video effects to signals;        mixing different signals into a single signal;        playing out of signals;        displaying signals.        
A big portion of the necessary processing is performed by vision mixers and production servers. Vision mixers are available, e.g. from companies like EVS, Grass Valley, Sony, SAM, and Ross. In a typical broadcast production vision mixers and production servers are connected with other devices such as cameras, video stores, video effect generators, and others. All these devices are commercially available. Today all these devices are built from specialized hardware components. Of course, audio signals accompanying the video signals likewise need to be processed during a broadcast production.
Dedicated appliances run on these specialized hardware devices for performing specific functions mentioned above. The probability of using all the dedicated devices and their full power at the same time is very low. This leads to a overprovisioning of processing power. At the same time this also causes very little flexibility of the processing infrastructure in a conventional broadcast studio because different types of productions, e.g. live sports versus live news, cannot be made on the same infrastructure unless all necessary devices for both types of productions are available.
There is yet another reason contributing to the inefficiency of conventional broadcast studios. In a conventional broadcast environment, most of the communication between the broadcast equipment devices is achieved through baseband transmission of video/audio signals using the proprietary SDI format. The devices operate synchronously, according to a shared timing signal (called Genlock or black-burst) that is routed together with the SDI signals. This timing signal is typically a squared signal whose rising edges correspond to the arrival of new video frames in the system. As a result, video/audio frames must be processed internally within a fixed time interval, leading to a worst-case dimensioning of the infrastructure, because all processing steps must be finalized before the next frame arrives. Typical hardware systems require an even finer level of synchronization down to a line level and even a sub-line level.
US 2009/0187826 A1 describes a data control and display system which controls all production devices in a video production system. The production devices are conventional but the control system consolidates the functionalities of the connected production devices. The control system simplifies the operation of the video production system.
A more recent concept for broadcast productions is distributed video production. It attempts to overcome at least some of the limitations of conventional broadcast equipment. The concept of distributed video production is described, for example, in the article of Boutaba et al.: “Distributed Video Production: Tasks, Architecture and QoS Provisioning”, published in Multimedia Tools and Applications, Kluwer Academic Publishers, Boston, US, Volume 16, Number 1-2, 1 Jan. 2002, pages 99 to 136. Boutaba et al. address the issue of delay, delay variations and inter-media skew requirements. Boutaba et al. explicitly state that delay performance is measured based on delay variation or “jitter”. Jitter is a measure of the difference in delay experienced by different packets in a network due to variation in buffer occupancy in intermediate switching nodes. Another form of jitter is inter-stream jitter or “skew”, which measures the difference in delay as seen by separate streams pertaining to the same application (such as audio and video). In order to ensure proper intra-stream synchronization, low delay variation is often required. Boutaba et al. suggest compensating jitter by buffering the data streams. This requires the provision of sufficient memory capable of storing sufficiently long intervals of the video and audio data to compensate the jitter. In the case of high definition video data this requires a big storage capacity.
Boutaba et al. rely on different kinds of servers to realize different functionalities that are needed for a broadcast production. For each functionality a specific type of server is used. Each specific server is either built exclusively with proprietary hardware, or with standard hardware and proprietary acceleration boards.
Taking this as a starting point, the present disclosure suggests a software based video production server providing enhanced flexibility for broadcast productions.