The present invention relates to a communication apparatus and method and, more particularly, to a communication apparatus and method which selectively transmit/receive a sensed camera image and an image containing a specific auxiliary image.
Generally, a speaker who is using a TV phone often looks at only the video image of the other party during the speech communication. In some cases, however, he/she may want to have conversation while seeing auxiliary materials and the like. In, e.g., a video conference, if a presentation is done while showing presentation materials such as concrete charts, the participants can more easily understand the contents. Even in TV phones for home use or in cellular phones, people can much more enjoy in conversation if they can see photographs or maps together with other parties. A conventional video conference system having a function of sending auxiliary materials has a means for transmitting still images in addition to a video image and voice data. Materials are stored in advance as still images such as JPEG and transmitted by a predetermined key operation.
A video conference system using voice recognition saves users from doing such a predetermined key operation. FIG. 13 shows an example of the video conference system.
Referring to FIG. 13, the conventional video conference system includes a transmission apparatus 50 and a reception apparatus 51. The transmission apparatus 50 comprises a voice reception unit 3, an image sensing unit 4 such as a camera, a voice encoder 5 which encodes received voice, and a moving image encoder 6 which encodes a received moving image.
The transmission apparatus 50 also comprises a still image database 52 and a still image encoder 53 which encodes still image data received from the still image database 52. The still image database 52 stores still image data as auxiliary materials to be used in a conference together with voice data as a keyword.
Voice obtained by the voice encoder 5, a moving image obtained by the moving image encoder 6, and still image compressed data obtained by the still image encoder 53 are multiplexed by a multiplexing unit 54 and transmitted to the reception apparatus 51 through a transmission unit 7. The auxiliary images stored in the still image database 52 are transmitted to the reception apparatus 51 in advance.
The reception apparatus 51 comprises a demultiplexing unit 55 which demultiplexes the received multiplexed data into individual compressed data, and a voice decoder 9, moving image decoder 10, and still image decoder 56, which decode the voice, moving image, and still image compressed data. The reception apparatus 51 also comprises a voice recognition unit 58 and a still image database 57.
In the reception apparatus 51, the auxiliary image data received in advance are held in the still image database 57. The keyword is registered in the voice recognition unit 58 and associated with specific still image data in the still image database 57.
During video conference, multiplexed data received through a reception unit 8 is demultiplexed into moving image and voice compressed data by the demultiplexing unit 55. The moving image and voice data are decoded by the moving image decoder 10 and voice decoder 9 and output to a display unit 12 and a voice output unit 11, respectively. Simultaneously, the output data from the voice decoder 9 is input to the voice recognition unit 58. When the recognized voice data coincides with the registered keyword, the result is sent to a data determination unit 59. The data determination unit 59 selects still image data corresponding to the recognized keyword from the still image database 57 so that the selected still image data is displayed on the display unit 12 as an auxiliary image.
With the above arrangement, the device on the other party side can be caused to display the auxiliary image without any specific key operation (e.g., Japanese Patent Laid-Open No. 11-355747).
However, in communication using a communication apparatus which displays an image in real time, a communication apparatus and method with higher operability, which allow displaying an auxiliary image other than a main image without taking care to the operability, are demanded.