“Instant messaging” applications attempt to deliver notifications and content as soon as possible. However, the experience is often far from near real-time, instead reminiscent of explicitly asynchronous predecessors. Network connections fail or manifest latency. Servers under heavy load queue and hold messages before redistribution. A party on one side of a conversation sends a message, but the recipient is not simultaneously available or interested, or is reviewing messages from other senders with the same application. A group chat sends messages to every participant, increasing the number received and the difficulty of keeping current. Multicasting and repeated forwarding of messages to large, overlapping social networks also increase the number received, unpredictably, and weaken the association between message and recipient so much that a reasonable prima facie assumption is that no incoming message is personal, significant or worthy of immediate review.
Yet the most common lower-level user interface in instant messaging is a stream view that anticipates near real-time interaction. Incoming and outgoing messages are positioned sequentially in forward or reverse chronological order. A recipient who is inattentive to actively streaming messages must scroll backward to pick up the conversation. If the stream has moved on to other topics, responses require context and repetition. The most common higher-level user interface is a list of users or chat groups, sorted alphabetically or in forward or reverse chronological order, with message counters that indicate little more than streams charging ahead unheeded.
Attempts to adapt message organizing methods from explicitly asynchronous predecessors have been mixed. The most common user interface for Internet email is a threaded view which groups messages by common subject and identifies which messages reply to other messages. Such subject grouping and threading, applied to instant messaging, so changes the user experience that most participants view the application as a different category of service with alternative nomenclature. For example, on Facebook, a message multicast in near real-time to a social network and rendered in a stream is a “posting” on a “Timeline”, threaded responses are “comments” or “likes”, subject headings refer to embedded content, and threads reappear with each new activity. Most users view this near real-time service as a social bulletin board, not instant messaging. On Twitter, a message multicast in near real-time to a social network and rendered in a stream is a “tweet” (or if forwarded, a “retweet”), words that proxy for subject are “hashtags”, and replies, although threaded in a lower-level interface, are filtered so that only messages from social network connected senders appear in a receiver's stream. Most users view this near real-time service as “microblogging” or broadcasting, not instant messaging.
Despite the proliferation of multimedia content on the Internet, instant message applications remain predominantly textual, with photographs the most typical multimedia content. Text requires more time to type than to read—particularly on a mobile device—which limits the number of messages sent and, from the standpoint of recipients, limits the number of incoming messages. Text and photographs are skimmable; without reading every word or studying every image, recipients recognize interesting content in a swiftly flowing stream. Practical limits on content production, and the possibility of selective and rapid content consumption, have been advantageous to instant messaging of text and photography.
Audio and video recordings can require less time to create than text messages, and offer potentially rich and personalized content; but take longer for a recipient to experience and interpret than text or photography. There is no practical way to skim audio or video recordings for interesting content. Instant messaging applications that support audio or video messages typically treat the content as supplementary; the most common lower-level user interface embeds media play buttons in a stream of skimmable text that provides context and rationale to push individual play buttons.
Attempts in the prior art to put audio or video recordings at the forefront of instant messaging have been problematic. Audio or video messages delivered in any significant volume, without accompanying skimmable content, quickly accumulate into unmanageable reservoirs akin to the “full voicemail box” in telephony. Users are reluctant to send audio or video recordings, not knowing whether a message will be played or left to accumulate, or whether more important or more current information will soon be available to send. The more messages sent, the harder to hold a near real-time conversation when the opportunity arises. If a recipient responds to messages played in chronological order, any backlog renders the communication asynchronous, even when both parties are simultaneously attentive and engaged. If a recipient responds to messages played in reverse chronological order, it disrupts the natural flow of conversation. Explicit threading of audio or video messages, without accompanying skimmable content, is difficult and of lesser value than in text messaging; if a newly received audio or video message relates to an older such message, the recipient often must replay it to appreciate the context.