Hitherto, various systems of efficiently compression-encoding video signals or speech signals to transmit encoded signals have been proposed. In these conventional compression encoding systems, with respect to respective signals of video signals and speech signals, compression encoding processing are carried out on the basis of respective masking effects so that signal degradation does not conspicuous.
Meanwhile, compression encoding systems as mentioned above are in conformity with respective sensitivity characteristics (masking characteristics) of the hearing sense and the visual sense of the human being. In all the systems, compression processing are independently carried out with reproduced picture and reproduced sound being individually considered to be evaluated.
For example, when only picture is evaluated in the state where sound is interrupted, since it is important to allow conspicuous degradation such as jerkiness or block distortion, etc. of picture to be as minimum as possible, compression encoding processing is carried out laying stress upon that point.
However, in entertainments such as video or video software, etc., pictures and sounds are simultaneously reproduced, and these reproduced pictures and sounds synthetically give stimulus to the human being. For this reason, in the case where, e.g., sound related to picture simultaneously exists, there are many instances where severe degradation of picture like jerkiness of picture mentioned above is practically masked and is not felt. Particularly, at the portion in which interest (attention) of the human being is mainly paid upon sound (the portion where stimulation level (activity) of sound is high), attention on picture is distracted, resulting in high possibility that severe degradation as described above of picture might not be felt in most cases.
Phenomenon as mentioned above similarly applies to sound. In the case where level (degree) of stimulation (activity) from picture is high, e.g., the case where, e.g., picture is moving, etc., there are many instances where sound is masked, so its degradation is not felt.
As described above, the sensitivity of the visual and auditory senses of the human being (masking characteristic) in the case where pictures and sounds respectively independently exist and that in the case where pictures and sounds exist in a manner relative to each other like cinema, etc. are greatly different. Accordingly, there was the possibility that in the case where pictures and sounds exist in a manner relative to each other like cinema, etc., if pictures and sounds are caused to respectively independently undergo compression processing as in the prior art, optimum compression encoding might not be carried out.
This invention has been made in view of actual circumstances as described above, and its object is to provide an encoding apparatus and an encoding method which can carry out more optimum compression encoding in connection with pictures (video signals) and sounds (speech signals).