With development of the society, there is increasing interaction between people, between people and mobile devices, and between people and computers. The interaction between people, between people and mobile devices, and between people and computers is generally performed in a session form. The session is an uninterrupted sequence of requests and responses. The session includes multiple types of information, for example, voice information, text information, and image information. A single-information transmission manner is a most commonly used transmission manner, which is easy to operate, and has a relatively low requirement on system performance. However, information transmitted in a single-channel transmission manner is relatively monotonous, and cannot comprehensively and accurately convey a thought of a user. For example, when a user chats by using a chat tool or a social tool, the user generally chats by using texts, but emotion of the user cannot be comprehensively and accurately conveyed with only the texts.
A multi-information transmission manner overcomes the shortcoming of the single-information transmission manner to some extent, but the user needs to manually insert other information when using a piece of information, which is cumbersome to operate. For example, the user adds an emoticon during a text chat, and according to the prior art, the user needs to first manually search an emoticon library for a proper emoticon image, and then add the emoticon image to a chat session. Because the emoticon library has a finite quantity of emoticons, it is probably that the user cannot find a proper emoticon image to convey a mood of the user. If the user talks about some scenic spots, food, weather, an environment the user stays, and the like, the user also cannot show these to the other party in real time. It can be seen that an existing multichannel interaction manner is cumbersome to operate, and information transfer efficiency is relatively low.