The present invention relates to a method for transmitting video images between multimedia terminals in a data transmission system, in which video images are transmitted by using first video frames, in which information encoded from one video image is transmitted, as well as second video frames, in which information encoded on the basis of two or more video images is transmitted, from which a video image can be formed in the receiver multimedia terminal by using at least one first and at least one second video frame. The present invention also relates to a data transmission system, which comprises means for transmitting video images between multimedia terminals, means for forming first and second video frames from the video images, in which first video frames information encoded from one video image is arranged to be transmitted, and in the second video frames information encoded on the basis of two or more video images is arranged to be transmitted. The present invention relates furthermore to a multimedia terminal, which comprises means for receiving commands, and means for generating first and second video frames from video images, in which first video frames information encoded from one video image is arranged to be transmitted, and in the second video frames information encoded on the basis of two or more video images is arranged to be transmitted.
Multimedia applications are used for transmitting e.g. video image information, audio information and data information between a transmitting and receiving multimedia terminal. For data transmission the Internet data network or another communication system, such as a general switched telephone network (GSTN), is used. The transmitting multimedia terminal is, for example, a computer, generally also called a server, of a company providing multimedia services. The data transmission connection between the transmitting and the receiving multimedia terminal is established in the Internet data network via a router. Information transmission can also be duplex, wherein the same multimedia terminal is used both as a transmitting and as a receiving terminal. One such system representing the transmission of multimedia applications is illustrated in the appended FIG. 1. Definitions for such a multimedia terminal are presented in the International Telecommunication Union ITU-T Recommendation H.324 xe2x80x9cTerminal for Low Bit-Rate Multimedia Communicationxe2x80x9d (Feb. 6, 1998).
The source of information can advantageously be a video application, an audio application, a data application or a combination of these, of which a collective term xe2x80x9cmultimedia applicationxe2x80x9d is used in this description. In the multimedia application, the user of the multimedia terminal selects the location of the desired source of information, wherein a data transmission connection is established in the system between the selected access location of the information and the multimedia terminal of the user. Data frames, in which the information is transmitted in a digital format, are typically used for transmitting information. A separate data frame is advantageously produced for each different source type, or, in some situations, it is possible to combine data from two or more sources of information into one data frame. In the data transmission system, the data frames are transmitted to the multimedia terminal of the user. In practical applications, these data frames are temporally interlaced, wherein the actual data transmission stream is composed of temporally separated data frames of different applications. There are also systems under development, in which a separate, logical data transmission channel is allocated for different types of applications using, for example, different frequencies or, in CDMA-based systems, different spreading codes. In practice, the data transmission capacity of such data transmission systems is restricted because, for instance, the data transmission channel is physically band restricted and there can be several simultaneous data transmission connections, wherein the entire capacity of the data transmission system cannot be given to the use of any single data transmission connection. In mere audio applications, this does not usually impose a significant drawback, because the amount of information to be transmitted is relatively small. However, in the transmission of video information this restricted bandwidth sets high demands on the data transmission system.
The use of multimedia applications has also been developed in low bit rate data transmission systems, wherein the data transmission rates are in the order of 64 kbit/s, or lower.
The video application can be a TV image, an image generated by a video recorder, a computer animation, etc. One video image consists of pixels which are arranged in horizontal and vertical lines, and the number of which in one image is typically tens of thousands. In addition, the information generated for each pixel contains, for instance, luminance information about the pixel, typically with a resolution of eight bits, and in colour applications also chrominance information, e.g. a chrominance signal. This chrominance signal further consists of two components, Cb and Cr, which are transmitted with a resolution of eight bits. On the basis of these luminance and chrominance values, it is possible at the receiving end to form information corresponding to the original pixel on the display device of the multimedia terminal. In said example, the quantity of data to be transmitted for each pixel is 24 bits uncompressed. Thus, the total amount of information for one image amounts to several megabits. In the transmission of a moving image, several images are transmitted per second, for instance in a TV image, 25 images are transmitted per second. Without compression, the quantity of information to be transmitted would amount to tens of megabits per second. However, for example in the Internet data network, the data transmission rate can be in the order of 64 kbits per second, which makes real time image transmission via this network impossible without the use of compression techniques.
For reducing the amount of information to be transmitted, different compression methods have been developed, such as presented in the ITU-T Recommendation H.263 xe2x80x9cVideo Coding for Low Bit-Rate Communicationxe2x80x9d, Geneva 1998. In the transmission of video, image compression can be performed either as interframe compression, intraframe compression, or a combination of these. In interframe compression, the aim is to eliminate redundant information in successive image frames. Typically, images contain a large amount of such non-varying information, for example a motionless background, or slowly changing information, for example when the subject moves slowly. In interframe compression, it is also possible to utilize motion compensation, wherein the aim is to detect such larger elements in the image which are moving, wherein the motion vector of this entity is transmitted instead of transmitting the pixels representing the whole entity. Thus, the direction of the motion and the speed of the subject in question is defined, to establish this motion vector. For compression, the transmitting and the receiving multimedia terminal are required to have such a high processing speed that it is possible to perform compression and decompression in real time.
In several image compression techniques, an image signal converted into digital format is subjected to a discrete cosine transform (DCT) and is subsequently quantised and coded before it is transmitted to a transmission path or stored in a storage means. In this context, the word discrete means that the DCT is calculated using sampled values of cosinusoidal functions, rather than continuous functions.
Using a DCT it is possible to calculate the frequency spectrum of a periodic signal. For example, it is possible to transform the signal from the time domain to the frequency domain. When the discrete cosine transform is applied to a single image, a two dimensional transform is required. Instead of time, the variables are the luminance and/or chrominance values of the pixels in the image. The frequency is not the conventional quantity relating to periods in a second, but indicates e.g. the rate of change of luminance in the direction of the location coordinates X, Y. This is called spatial frequency.
In an image signal, neighbouring pixels typically have substantial spatial correlation. One feature of the DCT is that the coefficients established as a result of the DCT are practically uncorrelated; hence the DCT conducts the transformation of the image signal from the pixel value (i.e. luminance/chrominance) domain to the spatial frequency domain in an effective (efficient) manner.
In an image which contains a large number of fine details, high spatial frequencies are present. For example, parallel lines in the image correspond to a higher frequency, the more closely they are spaced. In general, DCT-components corresponding to diagonally oriented features in an image can be quantized in image processing more without the quality of the image noticeably deteriorating.
In ITU-T Recommendation H.263, Section 4.2.1 xe2x80x9cGOBs, Slices, Macroblocks and Blocksxe2x80x9d there is described a compression method, in which the DCT is performed in blocks so that the block size is 8xc3x978 pixels. The luminance information in the image is transformed with full spatial resolution. Both chrominance signals are spatially subsampled, for example a field of 16xc3x9716 pixels is subsampled into a field of 8xc3x978 pixels. The differences in the block sizes are primarily due to the fact that the eye does not discern changes in chrominance equally well as changes in luminance, wherein a field of 2xc3x972 pixels is encoded with the same chrominance value.
The ITU-T Recommendation H.263, Section 4.2.2 xe2x80x9cPredictionxe2x80x9d defines seven frame types, three of which are mentioned in this application: an I-frame (Intra), a P-frame (Predicted), and a B-frame (Bidirectional). The I-frame is generated solely on the basis of information contained in the image itself, wherein at the receiving end, this I-frame can be used to form the entire image. The P-frame is formed on the basis of the closest preceding I-frame or P-frame, wherein at the receiving stage the preceding I-frame or P-frame is correspondingly used together with the received P-frame. In the composition of P-frames, for instance motion compensation is used to compress the quantity of information. B-frames are formed on the basis of the preceding I-frame and the following P- or I-frame. Correspondingly, at the receiving stage it is not possible to compose the B-frame until the corresponding I-frame and P-frame have been received. Furthermore, at the transmission stage the order of these P- and B-frames is changed, wherein the P-frame following the B-frame is received first, which accelerates the reconstruction of the image in the receiver.
Of these three image types, the highest efficiency is achieved in the compression of B-frames. The appended FIG. 2 presents a data transmission stream, in which these three types of image frames are transmitted. It should be mentioned that the number of I-frames, P-frames and B-frames can be varied in the application used at a given time. It must, however, be noticed here that at least one I-frame must be received at the receiving end, before it is possible to reconstruct a proper image in the display device of the receiver.
In multimedia applications, data transmission in data frame format is also used in the transmission of an audio signal. Thus, both audio data frames and video data frames are preferably provided with identifications, on the basis of which these data transmission streams are connected together at the receiving end. In addition, it has to be possible to synchronize these data transmission streams in order to ensure that the image and the sound are reproduced substantially synchronously.
In an interactive application, the user of the multimedia terminal can control information transmission from the terminal. For example, in situations in which the user wishes to browse the image information faster forward or backward when searching for a desired location, the user enters the fast forward or fast rewind command, respectively, which is transmitted to a server transmitting multimedia information. Thus, the server transmits frames at a faster rate and these are received by the multimedia terminal. However, this fast forward or fast rewind function requires that the server has a high processing speed and a large memory capacity. In addition, the data transmission rate of the data transmission channel has to be sufficiently high to transmit the necessary quantity of information. In all systems this fast forward or fast rewind possibility cannot be implemented using equipment and data transmission channels of prior art. Thus, the user has to follow the multimedia application at normal speed and wait for the desired location to be found. This may take a great deal of time and, on the other hand, unnecessarily load the data transmission system and increase the operating costs.
One purpose of the present invention is to produce a method and a system, in which the fast forward and fast rewind functions are also possible when using data transmission channels with a low bit rate. The present invention is primarily characterized in that in the method the fast forward or fast rewind function of the video images is performed primarily by transmitting only first video frames. A data transmission system according to the present invention is primarily characterized in that the system further comprises means for performing the fast forward or fast rewind function of video images, wherein during the fast forward/rewind function, primarily only first video frames are arranged to be transmitted. A multimedia terminal according to the present invention is primarily characterized in that the multimedia terminal further comprises means for performing the fast forward or fast rewind function of the video images, wherein during the fast forward/rewind function, primarily only first video frames are arranged to be transmitted. The invention is based on the idea that during fast forward/rewind, only intra frames are transmitted. The number and time interval of these intra frames can be adjusted according to the need. Furthermore, it is possible to decrease, if necessary, the information content of these intra images, for example by compressing, reducing the resolution or transmitting them in black and white. Also the transmission of audio information can be interrupted for the time of fast forward/rewind, wherein it is possible to further reduce the amount of information to be transmitted in the fast forward/ rewind.
Considerable advantages are achieved with the present invention when compared with solutions of prior art. With a method according to the invention, it is also possible to implement the fast forward and fast rewind function in systems with a low bit rate without imposing an additional load on the data transmission system. The fast forward and fast rewind function implemented according to the invention does not require the multimedia server to have more processing or memory capacity. In the data transmission system according to the invention, it is also possible to reduce the loading of the system, because the quantity of information transmitted during fast forward/rewind is smaller, and it is possible to reach the correct location in a sequence of video images faster than in systems of prior art. Thereby data transmission and operating costs are also reduced.