With the development of computer technology and the 3rd Generation (3G), the processing capability of the computer device at the user end becomes more and more powerful, and the real-time transmission technology for transmitting video data through the mobile communication network is relatively mature now. The best use of 3G network is synchronous communication, and currently, both markets and technologies of the wireless videophone based on circuit domain transmission via the mobile network are relatively perfect. Videophone is an important application in 3 G communications, and 3G phone is at present an important terminal to implement this application. In a process of making a call via the videophone, to record audio and video streams including images and voices of the opposite end into a file of 3GP format is also a very important application. The 3GP format is a video file format generally supported by mobile terminals. It is specified in technology requirements for IP multimedia subsystem (IMS) terminal of China telecommunication that mobile terminals should support the encoding and decoding of audios and videos in 3GP format.
3rd Generation Partnership Project (3GPP) provides a solution for mobile videophone in which audios and videos are transmitted based on the circuit domain: 3G-324M protocol set. The 3G-324M protocol set includes H.324M protocol, H.223 multiplexing protocol, H245 control protocol, and audio and video encoding protocol, etc.
The 3GP standard is 3GPP 26244-720 standard made by the 3GPP organization, and the standard is based on ISO/IEC 041828_ISO_IEC—14496-12—2005 (E). Nowadays, most videophone videos are generated by recording the audio and video streams of the opposite ends according to the 3GP standard format into 3GP files and then playing back in players. As such, users can listen to the voices and watch the images of the opposite ends.
During a calling process of a videophone at present, both parties can watch in real time the video images of the opposite end captured by a camera, and meanwhile, can hear the audio voices of the opposite end captured by a microphone. When the mobile terminal records the watched video and listened audio into the 3GP file, it needs to start up audio write-in thread and video write-in thread in order not to affect the current effect of the video call. When the audio write-in thread processes the audio frames and the video write-in thread processes the video frames, the threads need to acquire system time of the cell phone. However, due to the thread priority and scheduling, the time acquired by these two threads might be inconsistent, as a result, the audio and video might be asynchronous, and the time difference between audio and video can be sometimes 1 second, 2˜3 seconds or even more.