Video data can be considered a three dimensional array of color or luminance data, depending if one refers to color or grayscale video. Two dimensions—horizontal and vertical—of this three dimensional array represent spatial data or so called pixels of a video image, whereas the third dimension represents the time domain of consecutive images. Hereafter each video image will be called a frame. A frame of pixel data generated by an imaging sensor is typically transferred to a processing or visualisation unit by serialising the data, and sending it via one or a limited set of communication lines. This said, the two dimensional spatial data of a single frame are transferred via a single communication line as a consecutive series of data in time. This communication line can carry analog data or digital codewords representing the original pixel data. By using multiple communication lines, data can be transferred more in parallel (e.g. some systems transfer red, green, blue and synchronization data in parallel). The above description typically explains how a camera system transports via a single cable its consecutive frame data to a display. A digital display will collect all consecutive data of a single frame in a buffer, and once the frame is completed it will present it to the display matrix for visualisation. In the remainder of this text, this will be referred to as a ‘direct video link’.
Video or image compression refers to bandwidth reduction either in the spatial domain (image compression) or in the spatial and temporal domain simultaneously (video compression). The principal goal of compression is to reduce the amount of data (bandwidth). The latter can either be done without losing any information (lossless compression). This said the original frame data can be reconstructed identically based on the compressed frame data, and is a bit-by-bit perfect match to the original. Alternatively compression can be done such that a human observer is unable to perceive the differences between the original and the compressed frame data (visual lossless compression). This said the original frame cannot be reconstructed identically, but a human observer typically will not see the differences between the original and reconstructed frame. Lastly compression can be ‘lossy’ and lower the amount of visual information in order to receive a strongly improved compression efficiency. Video compression exploits the fact that pixel data is typically strongly temporal and spatial redundant. Compression can be achieved by storing the differences between a pixel and one or more references spatially (intra-frame: e.g. used in the JPEG compression scheme) and by storing the differences between consecutive frames in the time domain (inter-frame: e.g. used in the MPEG compression scheme). Additionally, given that the human eye is not very sensitive to subtle variations in intensity and/or color, further compression can be obtained by reducing the amount of different variations which are retained after compression. Combinations of these techniques form the basics behind modern nowadays compression schemes like e.g. used in the MPEG1-MPEG2 and MPEG4 families and related.
A communication protocol is an agreement between computing or telecommunication systems for exchange of information. Communication protocols used on the internet/intranet are designed to function in a complex and uncontrolled setting. The design hereto typically uses a layering scheme as a basis, which decouples a larger and more complex protocol in distinct, easier to manage sub-protocols. The Internet protocol suite consists of the following layers: application-, transport-, internet- and network interface-functions. The Internet hereby offers universal interconnection, which means that any pair of computers connected to the internet is allowed to communicate. All the interconnected physical networks appear to the user as a single large network. This interconnection scheme is hence called the internet.
Communication protocols may include signaling, authentication, encryption and error detection and correction capabilities.
Video communication can be obtained through an electrical or optical ‘direct cable’ carrying raw video data, minimally or not compressed and typically using no higher level communication protocols. The classic cable based system typically yields fast low latency communication, but consumes high bandwidths and normally cannot be tunnelled through a complex communication network like the internet or an intranet. Additionally, traditional video cabling typically imposes limited maximum cable lengths, or it has to be extended with expensive and/or signal-specific technology such as UTP extenders, fiber-optic extenders, and satellite connections. Then, again, these technologies incur high costs for relatively limited flexibility to put multiple channels on the same “wire” and/or receive the same channel on multiple receivers.
Internet capable video communication systems (e.g. used for telepresence) typically offer strong compression and work seamlessly over the internet/intranet, but always introduce a delay of one or more frames. In other words complex communication protocols and compression imply delay.
Despite the advanced stage of current systems for video communication there remains a need for a system combining low latency, strongly compressed internet/intranet capable video communication and possibly offering high visual quality. There is a lack of method or apparatus that could use the internet/intranet—or a communication channel of similar complexity—to send and receive video data with only a delay which is less than half of the time between two consecutive frames in the video feed presented to the sending unit. In other words, the surplus delay when compared to a ‘direct video link’ (cfr. sup.) of any prior system typically seems at least half of the inter frame time interval.