Field of the Invention
This invention relates to a method and a system for video conferences with at least three different video conference participant stations which communicate with one another, multimedia data being transmitted via a telecommunications network, which data comprise at least participant image data and/or participant audio data, and each participant receiving the participant image data of the other participants shown at the same time visually arranged on a reproduction device, e.g. a display or a VRD (Virtual Retinal Display). The invention relates in particular to methods and systems which communicate over telecommunications networks consisting at least partially of a mobile radio network.
Known is the rapid transmission, receipt and display of video images through video and television apparatus. The pictures usually have a resolution of at least 10xc3x976 ppi (pixels per inch) in sufficiently good color and gray scale quality. A prerequisite for the transmission of the entire picture information is a minimal bandwidth for the transmission channel of several megahertz. The costs for such systems, however, are much too high for certain applications such as video conference systems for business and private use. Using media with smaller bandwidth, such as e.g. public telecommunications networks, for the transmission of video pictures is known. The transmission rate is correspondingly low for these media, however. For special applications, such as xe2x80x9cslow scanxe2x80x9d video systems, such a limited bandwidth can actually be sufficient. Examples therefor are security and surveillance systems in which a high picture repetition rate or high resolution is not necessary. Such systems typically use a resolution of 128xc3x97128 pixels for the whole picture, only 16 color or gray scale values being used. Video pictures of higher quality, e.g. with 640xc3x97480 pixels (European standard: 620xc3x97576 pixels, 8 bit depth of color) and a depth of color of 64 levels, as are common for video conferences, cannot be transmitted with these systems, however. A normal video picture requires about 2 million bits of information, i.e. about 250 kbytes, for gray scale pictures. With color pictures the quantity of data rises to even 750 kbytes. The data transmission rate over public switched telephone networks (PSTN) today is typically at about 57 000 bps (bits per secondxe2x80x94for digital data this corresponds to bauds) per line in the analog area and 64 000 bps for ISDN, whereby about 30 seconds or respectively 90 seconds are needed to transmit a complete video picture of sufficiently good quality. This is by far too slow for most video conference applications. For this reason the unprocessed digital video data are compressed with the most various data compression algorithms in order to shorten the transmission time. However very good compression and decompression algorithms with a compression rate of 1/20 to 1/50 are also insufficient for many video conference applications. Moreover compression and decompression is normally time-consuming and requires corresponding energy and calculating capacity. For instance, in the field of mobile radio, it is precisely this last factor which can be decisive. It is thereby to be taken into consideration that in the field of mobile radio, in contrast to that of PSTN networks, the connection quality that allows a maximal transmission rate is not always present. Furthermore, with transmission rates lower than the maximum possible the transmission time is multiplied correspondingly. To obtain a further data compression, there are several documents in the state of the art is which propose transmitting with high resolution only a certain detail of a captured picture while all other picture sections are transmitted with low resolution. The patent publications U.S. Pat. No. 5,703,637 and U.S. Pat. No. 4,513,317 are examples which register with an eye tracking system the movement of the eyeball or of the retina, and use this information to show only a small region of the picture with high resolution. These systems make use of the feature of the human eye which is that only a small part of the retina (called fovea) is of high resolution, while the large remaining part has low resolution. The state of the art has several drawbacks, however, inter alia the disadvantage that all participants have to use the same video standard to be able to show these pictures. It is desirable, however, for video conference systems to be independent of the video standard. Moreover the fovea has a high resolution visual angle of only 2xc2x0. This fact is corrected by the brain through a natural but unavoidable and continuous scanning movement of the eye. The result of this is that, regardless of how good the visual angle and the high resolution picture section coincide, the picture appears blurry to the user with a small sharp picture detail in the center of vision. With the present state of the art this drawback can only be corrected with great effort, if at all.
It is an object of this invention to propose a new method and system for video conferences which does not have the drawbacks described above. In particular, the participant image data of video conferences should be able to be transmitted with high compression.
This object is achieved according to the present invention through the elements of the independent claims. Further preferred embodiments follow moreover from the dependent claims and from the description.
In particular, these objects are achieved through the invention in that at least three participants communicate with one another via video conference participant stations of a video conference system, multimedia data being transmitted over a telecommunications network, which data comprise at least participant image data and/or participant audio data, and each of the participants receiving the participant image data of the other participants shown at the same time visually arranged on a reproduction device, e.g. a display, of the respective video conference participant station, in that the direction of view of the participants is registered in each case by an eye tracking system and eye tracking data comprising at least data about the direction of view are transmitted to a communications unit of the respective video conference participant station, and in that transmitted in each case with full resolution and video transmission rate over the telecommunications network to the communications unit of a video conference participation station are the participant image data of that participant whose participant image data are shown on the reproduction device of this last-mentioned video conference participant station in the momentary direction of view of the participant of this video conference participant station while the participant image data of the other participants are transmitted in reduced resolution and/or at reduced video transmission rate. The invention has the advantage that the compression, i.e. the reduction, of the participant image data is independent of the video standard used since the participant image data of a participant are transmitted either reduced or in full resolution without a complicated subdivision into sub-frames taking place therefor as in the state of the art. The individual video images of a video conference interface can thereby be maintained, for example. The simplicity of the method also brings with it a minimal use of calculation capacity, which can be especially important for mobile radio devices having limited energy reserves. The drawback of the state of the art that the scanning movement of the fovea has to be corrected (such as e.g. U.S. Pat. No. 4,513,317) is eliminated with this invention since the scanning movement in the normal case relates to the object to be discerned. The impression of a sharp focus with blurry surroundings is eliminated. The entire logistical object, e.g. the video conference participant, is sharply discerned. If the glance wanders to the next logistical unit, i.e. the participant image data for another participant, these are perceived sharply as a whole.
In an embodiment variant, the video transmission rate for these participant image data that are not shown in the momentary direction of view of the participant, is set at zero. This embodiment variant has in particular the advantage that the network load is limited to a minimum. At the same time the to necessary calculating capacity needed e.g. to compress the participant image data is mimimized.
In a further embodiment variant, the eye tracking data as well as the participant image data are transmitted to a central unit, the central unit determining for each participant, according to the indications of the eye tracking data of the respective participant, the resolution and/or the video transmission rate of the participant image data of the other participants and transmitting the participant image data to the communications unit of the respective participant in this resolution and/or at this video transmission rate. This embodiment variant has the advantage inter alia that with a larger number of participants the network load remains small. Through the central reduction of the participant image data, the calculating capacity of the individual video conference participant stations likewise remains small, for example, compared to other solutions.
In a further embodiment variant, the participant image data are transmitted to a central unit and are stored in a data store of the central unit (the storing or buffering of the data can be achieved e.g. via a data buffer, a data stream, a database or in another way). The communications unit of a participant determines, according to the indications of the eye tracking data of the respective participant, the resolution and/or video transmission rate of the participant image data of the other participants, and these participant image data are transmitted by the central unit to the communications unit of the respective participant in this resolution and/or at this video transmission rate. This embodiment variant has the same advantages as the previous embodiment variant, but does not require from the central unit any calculating capacity for calculation of the video images to be shown of the individual participants since the communications units access directly the participant image data in the resolution and at the video transmission rate they have determined. Another advantage is that the eye tracking data do not have to be transmitted over the network.
In an embodiment variant, the participant image data for a participant are stored in each case in a data store of the communications unit of this participant, and the communications unit determines according to the indications of the eye tracking data of this participant the resolution and/or the video transmission rate of the participant image data of the other participants, the last-mentioned participant image data being transmitted by the communications units of the other participants in this resolution and/or at this video transmission rate to the communications unit of the respective participant. This embodiment variant has the advantage inter alia that it manages without a central unit. Subscribers of the telecommunications network can join together directly over the telecommunications network without having to make use of further units outside their video conference participant stations.
In a further embodiment variant, at least one video conference participant station is connected to the telecommunications network via a mobile radio network. The telecommunications network can comprise, for example, a fixed network such as a LAN (Local Area Network) or WAN (Wide Area Network), the public switched telephone network (PSTN) and/or ISDN (Integrated Services Digital Network), the Internet or another communications network, in particular a mobile radio network.
In another embodiment variant, the communications unit uses image analysis and form reconstruction algorithms to portray the participant image data that have been transmitted with reduced resolution. One of the advantages of this embodiment variant is that, despite reduced resolution of the transmitted pictures, the pictures can be reproduced again through the pictorial synthesis of the form reconstruction algorithms and can be shown in higher resolution than that transmitted.
It should be stressed here that, in addition to the inventive method, the present invention also relates to a system for carrying out this method.