1. Field of the Invention
The present invention relates generally to a mobile terminal and, more particularly, to a video telephony apparatus and signal transmitting/receiving method for a mobile terminal capable of providing text information corresponding to a voice signal along with the video images for clearer phone conversations.
2. Description of the Related Art
A video telephony service enables users of camera-equipped mobile terminals to see each other during phone conversations which can be provided through a circuit-switched network, such as a public switched telephone network (PSTN) or through a packet-switched network, i.e., an Internet Protocol (IP)-based network.
A video telephony service must be implemented according to the H.323 or H.324M recommendation from the International Telecommunications Union (ITU). H.323 is an umbrella recommendation for providing a video telephony service on a packet-switched network. That is, H.324 is an ITU umbrella recommendation for voice, video and data transmission over a traditional circuit-switched network.
H.324 and several mobile specific annexes are generally referred to as H.324M (M for mobility). H.324M is an umbrella protocol referring to H.261, H.263 and MPEG-4 for video coding, and to G.723.1 for audio coding. H.261 describes video coding and decoding for video telephony and video conferencing. H.263 and MPEG-4 aim to provide a higher quality video than H.261. G.723.1 describes speech coding and decoding for a data rate lower than or equal to 8 Kbps. The 3rd Generation Partnership Project (3GPP) has adapted H.324M to form 3G-324M for circuit-switched 3G networks. In 3G-324M, adaptive multi-rate (AMR) speech coding is mandatory, and G.723.1 speech coding is optional.
Further, H.324M refers to H.223 describing multiplexing and demultiplexing of video, audio, and data. It also refers to H.245 describing messages and control procedures for opening and closing logical channels for audio, video, and data.
In the H.324M video telephony service, a calling mobile terminal places a call to a called mobile terminal, and in turn, the called mobile terminal accepts the call, thereby establishing a call connection between the calling mobile terminal and called mobile terminal for a video conversion.
During a video call, both photographed images of target objects near to one mobile terminal and collected audio signals are transmitted to the other mobile terminal. Particularly, in a crowded and noisy environment, not only a speech signal of a user but also various noise signals are collected and transmitted to the other mobile terminal, and the noise signals may hinder clear conversations between the users.