1. Field of the Invention
The present invention relates to an information processing apparatus and information transmission system and, more particularly, to an information processing apparatus and information transmission system suitable for transmitting/receiving voice information and image information.
2. Description of the Related Art
Image/voice transmission systems such as a video conference system and video phone for transmitting/receiving voice information and image information have conventionally been known.
A conventional image/voice transmission system performs the following control. An object is photographed by a video camera to obtain image information to be transmitted. At the same time, a speaker""s voice is received via a microphone to obtain voice information to be transmitted. The image information and voice information to be transmitted are respectively encoded (compressed) by a video encoder and audio encoder. The compressed image information and voice information are multiplexed by a multiplexer, and the multiplexed image and voice data is transmitted.
An information processing apparatus which received the transmitted multiplexed image and voice data demultiplexes this data into image data and voice data by a demultiplexer, and decodes them to reconstruct an image and voice. In reconstruction, the image and voice are synchronized.
The encoding method includes various methods. In the above example, (compressed) natural moving picture data and voice data are transmitted. This is widely used in the video conference and video phone.
There is also proposed transmission of animation data and text data instead of transmission of natural moving picture data and voice data. For example, animation information of a face and body is extracted from an image obtained by a video camera to create an abstracted abatar as animation data. Text data is obtained by recognizing a speaker""s voice input via a microphone and converting the voice into a text. The avatar animation data and text data are multiplexed by a multiplexer and transmitted.
On the receiving side, the animation of the face and body is displayed based on the animation data, whereas the text data is converted into voice signals and read off. In this case, the animation and voice must be synchronized, as a matter of course.
The above example is effective for a transmission path having a narrow band (low bit rate).
Instead of receiving animation data and text data using a video camera and microphone, animation data including the movements and expressions of a body and face may be created by an animator, and a text subjected to voice synthesis may be edited using a text editor.
However, the above information transmission system suffers the following problems.
The transmission method of compressing a natural moving picture and voice is difficult to cope with a transmission path having a narrow band (low bit rate).
The method of transmitting information as animation data and text data is suitable for a transmission path having a low bit rate. However, the animation data and text data are completely independent of each other. For this reason, although conversion of the text data into voice data must be synchronized with the animation data, it is difficult to accurately synchronize displaying the animation with reading off the text.
The present invention has been made to overcome the conventional drawbacks, and has as its object to easily, accurately synchronize displaying animation data with reading off text data.
It is another object of the present invention to hold animation data on the receiving side so as not to transmit animation data every transmission of a text to be read off.
To achieve the above objects, an information processing apparatus according to one aspect of the present invention comprises the following arrangement.
That is, an information processing apparatus comprises
reception means for receiving data containing animation control information in a text block for voice synthesis,
storage means for storing animation data,
voice output means for extracting the text block for voice synthesis from the data received by the reception means, and synthesizing a voice based on the extracted text block to output the voice, and
display control means for controlling display of the animation data stored in the storage means, on the basis of a position of the text block output as a voice by the voice output means, and a position of the animation control information.
The present invention provides an information processing method implemented by the information processing apparatus. The present invention provides a storage medium which stores a control program for causing a computer to implement the information processing method. Further, the present invention provides an information transmission system using the information processing apparatus.
Other features and advantages of the present invention will be apparent from the following description taken in conjunction with the accompanying drawings, in which like reference characters designate the same or similar parts throughout the figures thereof.