As communication technologies develop and smart phones become more common, various Internet communication tools, such as Microsoft Network (MSN), Tencent QQ, WeChat, Laiwang, and other communication products, are used by the general public to communicate with each other. In addition, among these communication tools, voice messages are popular because the voice messages are easy to transmit, and allow for quick and convenient communication. Typically, phones, personal computers (PCs), tablets/pads, personal digital assistants (PDAs), mobile internet devices (MIDs), and other such mobile terminals or network terminals (Internet equipment) provide speech input and output functions via network communication applications (apps).
Conventionally, inputting and outputting voice messages with network communication tools, such as instant messaging tools, include the following: a sending end records a voice message to be issued by a sender-user, and after encoding the recorded voice message, sends the encoded voice message to an instant messaging server (IM-Server, IMS). The IMS pushes the sent encoded voice message to a corresponding receiving end. Then, when the receiver-user is to listen to the voice message, the receiving end decodes and plays the received voice message. The voice message function of existing instant chat tools can only be played for the users, but when no earphones are connected to play the voice message, there can be various problems such as: 1) Privacy cannot be guaranteed. For example, playing a voice message involving a private matter in a public place may not be suitable. 2) People nearby are affected. For example, playing a voice message in a meeting room or a reading room is not courteous, yet an immediate desire to know the contents of the message that is being conveyed by speech exists. 3) Clarity of the voice message is affected in noisy environments. For example, excessive noise makes clearly understanding the voice message conveyed by the speech difficult.