1. Field of the Invention
The present invention relates to a digital video signal encoding and decoding system in which the digital video signal delivered from a camera with a given high rate is constituted of words having a predetermined number of bits and representing picture elements and is encoded into a pulse code modulation (PCM) encoded signal transmitted along a low rate digital transmission medium conveying words representing certain picture elements. More particularly, the invention relates to a digital video system for transmitting videotelephone signals.
2. Description of the Prior Art
In the transmitter of a videotelephone system for a speaker, the video-telephone signal is delivered from the monitoring means asociated with the camera through analog-to-digital converting means in a PCM digital waveform having a high rate, such as approximately 16 to 18 Mbits/s, depending on the adopted video standard. Each word represents a picture line element comprising 8 bits the 8 bits provide 256 quantization levels between white and black. After DPCM differential encoding in the encoding device of the transmitter, the digital video signal is transmitted on the digital transmission medium at a low rate equal to approximately 2 Mbits/s. Reduction in the binary rate by a factor of approximately 8 corresponds on the average to a transmission of a bit per element.
However, to obtain satisfactory picture reconstitution in the decoding device of the receiver of the distant speaker, a minimum of 3 bits per picture element (pel) must be transmitted. Thus, the encoding device comprises means for selecting certain pels which are transmitted after having been encoded into differential pulse code modulation (DPCM) code. The decoding device comprises, at least, means for interpolating the untransmitted elements in terms of the transmitted and decoded elements to reconstitute the entire picture.
In this respect, the scope of this invention does not take in the digital video encoding and decoding systems in which all the pels are retransmitted in the form of differential PCM encoded words, each of which has a predetermined number of bits (equal to three, for example) and for which the outgoing binary rate lies approximately in the ratio between the incoming binary rate and this predetermined bit number (see, for instance, UK Patent Application No. 2,003,001).
In order to solve the above problem, so-called systematic replenishment encoding systems were previously described, in paragraph II of the published article by M. Devimeux, M. Jolivet and J. P. Temime in "Journees d'Etudes" of Nov. 30th and Dec. 1st 1977, of the French Society of Electricians, Electronicians and Radio-electricians in Rennes. Systematic replenishment encoding consists of transmitting a constant number of data bits allocated to a limited number of elements on successive pictures, generally with every third field of the picture encoded at 3 bits/pel. In this case, the encoding device transmits all the 3-bits/pel words in DPCM code for every third frame on the digital transmission medium. In the receiver, the decoding system reconstitutes the pairs of missing fields by interpolation between the adjacent transmitted fields. With this in mind, the decoding system comprises a field buffer which is read at the pel frequency. Such an encoding and decoding system introduces, in particular, temporal and spatial resolution losses and a very marked "jerk" effect, especially in the case of significant displacements of the moving area of the picture (generally, the speaker's face) on the fixed background. These losses stem from the fact that the frequency of the transmitted fields is one third of that of the real fields. It naturally follows from this that the alternation of the even and odd frames necessitates an interpolation of both the missing temporal and spatial fields.
Other higher performance encoding and decoding systems are known but are far more complex. They are called conditional replenishment systems. These systems comprise detector for detecting the movement of each video picture to control a pel word selection means so only certain pel words are encoded and transmitted depending on internal criteria. Such conditional replenishment systems are disclosed, inter alia, in the following documents:
article by M. Devimeux et al., already mentioned, paragraph III; PA1 article by J. C. Candy, M. A. Franke, B. G. Haskell and F. W. Mounts in "The Bell System Technical Journal", vol. 50, July-August 1971, New York, Pages 1889 to 1918; PA1 article by R. C. Nicol, "Conference on Digital Processing of Signals in Communication", Loughborough University of Technology, September 1977; PA1 article by B. G. Haskell and R. L. Schmidt in "The Bell System Technical Journal", vol. 50, No. 8, October 1975, pages 1475 to 1495; PA1 article by von Gert Bostelmann in "Frequenz", vol. 33, No. 1, January 1979, Berlin, pages 2 to 8; and PA1 U.S. Pat. No. 4,027,331. PA1 the optimization and the production of the encoding device are complex as a result of the various operating modes depending on the fullness of the "elastic" buffer store and consequently on the movement; PA1 multiplexing in the digital transmission medium of a variable member of element and element cluster address data words and, therefore, which results in variable duration for the transmitted pel words for each picture; PA1 as a result of the variable number and duration of the transmitted pel words for each picture, the receiver must include resynchronizing means employing variable lock on the transmitted synchronization words and an "elastic" buffer store analogous to that at the transmitter; PA1 in the event of significant movements, reduction of the temporal or spatial resolution is reduced by spatial or temporal subsampling. PA1 picture storing means for storing the transmitted picture element words after the words have been interpolated; PA1 means for detecting a moving area of the present picture with respect to the previous stored picture in response to a comparison of the word difference between two words representative of two corresponding elements of the present picture and the previous stored picture with a predetermined threshold; PA1 the number of data bits NB allocated to each line of a picture in said digital transmission medium is constant and the average number of bits B allocated to each encoded signal word of a line is greater than or equal to a first predetermined integer; PA1 said moving area detecting means producing, for each line, a picture the coordinates of two ledge picture elements defining the moving area of said present picture with respect to the corresponding line of said for deducing the number NP of picture elements in said moving area of the line likely to be encoded from the produced coordinates of said two ledge picture elements and for deducing said average number of bits B for said line from the ratio NB/NP of said numbers NB and NP; PA1 a linear predicting means for delivering DPCM predicted picture element word from the stored picture element words and present picture element words; PA1 first and second down-counting means controlled by said movement area detecting means respectively have counts C.sub.1, and C.sub.2, set to NB and NP at the start of the moving area of each present picture down-count the number of bits remaining to be allocated to said line and the number of picture element words remaining likely to be encoded at the line element frequency; PA1 the counts of said first and second down-counting means are compared to select the DPCM words representing the picture element to be transmitted into each time C.sub.1 /B.ltoreq.C.sub.2 ; PA1 said selected DPCM words are encoded, according to a predetermined quantization law, to multiplex said selected DPCM words in said digital transmission medium; and PA1 in response to the encoded DPCM words the PCM unselected picture element words are linearly interpolated.
The internal discrimination criteria in the movement detector are based on the detection of the elements of the visual portion of each frame whose amplitudes or levels have varied in excess of a certain threshold, generally variable in terms of the pel words which may be transmitted, with respect to those of the previous picture elements. The "stationary" areas of the present picture which correspond to amplitude differences below the variable threshold remain unchanged in the frame store, except in the event of interpolation. On the other hand, the moving areas are replenished, i.e. causing the frame store to be filled in lieu of the corresponding areas of the previous frame. It turns out that such criteria are quite suitable for video pictures having relatively restricted displacements in the picture moving areas.
The movement detection threshold drops when the number of modified elements to be encoded and transmitted increases, thus allowing suitable reconstitution in the decoding device of the receiver. It is possible because of the reduced threshold that the encoding must be regulated in order to adapt the variable number of modified element words to be encoded to the constant rate of the digital transmission medium. This regulation is achieved using an "elastic" buffer in which the selected encoded element words are asynchronously written and are read synchronously at the transmission medium rate. If buffer store is never full nor empty, its contents make it possible to determine regulation parameters of the encoder such that the average incoming bit-rate in the buffer store equals that of the transmission medium. Indeed, the number of bits, or more exactly, of the pel words, generally with 4 bits after DPCM encoding, is variable from frame to frame. For each frame, DPCM words representing the amplitudes of the variable picture elements, as well as the start and end address words of the clusters of these variable elements, are thus transmitted. When the buffer store overflows, due to significant variable element clusters being imminent, temporal subsampling of every second field and/or spatial subsampling of one pel out of two for each line systematically occurs (modes 1 and 2 according to the afore-mentioned article by Haskell and Schmidt). In the first above mode, the missing field assumed to be an even numbered field, is not transmitted by the transmitter and is replaced at the receiver by a field resulting from an interpolation of the two previous and upcoming adjacent odd numbered fields, thus reducing the vertical resolution of the picture by a factor of two. In the second mode, the untransmitted pels are obtained from their transmitted adjacent elements by interpolation.
Conditional replenishment encoding and decoding systems, such as these, present the following drawbacks:
To avoid the interpolation problems according to the conditional replenishment, Amano et al., U.S. Pat. No. 3,940,555 discloses encoding of a picture signal for which all the variable MICD encoded level words representative of pel level exceeding a predetermined threshold are transmitted at a low rate digital signal, wherein the number of bits allocated to each "line" of the low rate digital signal is constant. Encoding is an intra-image encoding. Each MICD word indicates the difference between the levels of a present picture element of a horizontal, vertical or diagonal line and the corresponding present picture element of an adjacent line of same type.
The bit number allocated to each line is composed of a line synchronization word, an addressing bit for each pel of the line, the state of which indicates the presence or absence of a change on the pel level, and MICD level words corresponding to changed pels. The bit number of MICD words is always less than or equal to a predetermined integer, such as four, as so as to fit the number of DPCM words to be transmitted to the predetermined bit number allocated to each line.
However, this encoding and decoding system has the drawback that the bit number allocated to each line may be reduced to unity, when there are to be transmitted a large number of line pels having levels which are varied. In fact, since an interpolation and, consequently, a subsampling are not provided, the picture spatial resolution is reduced considerably. This encoding and decoding system is used when the ratio between the outgoing and incoming binary rates is relatively low, such as 1/4.