The Internet is filled with many types of content, including text, images, videos, audio, e-books, and so forth. These types of content may combine to make conversations and data exchanged between various users of the Internet. Content may comprise news articles, discussion boards, product sales information, e-commerce data, personal information, educational information, software as a service (SaaS) applications, and so forth. For example, a news website may include articles that use text, images, and videos to convey current information, while a discussion board may include text and graphical data associated with conversations between users.
Content may come from a variety of sources, including user-generated content, professionally generated content (e.g., from a news site editor), academic or other sources, corporate messaging, and so forth. User generated content may include text-based conversations between users on sites such as Facebook, Twitter, any message board, e-mail, and so on as well as graphical and other content created by users, such as photos posted to a social networking website, illustrations of problems posted to a question and answer site, and so forth.
Content often has some form of organization that is proprietary and native to a particular site. For example, a news site may publish articles in its own format that differs from that of other news sites. Such sites may include some common, standardized way of extracting information, such as Really Simple Syndication (RSS), XML/JSON feeds, and application programming interfaces (APIs), but such facilities often do not display content in the same form as the original and may miss information that is not deliberately exposed in this way. Other sites, such as Facebook, organize photos, biographical data, conversations in their own unique format, such as via a news feed, photo album viewing area, and so forth. Discussion boards may offer yet another format for organization and sharing information, such as hierarchical folders or topics, posts within each topic, and so forth. Each discussion board or site may use its own proprietary format, except where sites elect to leverage the same middleware discussion board software, though there are even many of those.
The Internet has grown exponentially creating an unmanageable and disconnected chaos of data and content. One important piece of this chaos is the discussion that goes on about various topics. These discussions are disconnected from each other by site location, type of discussion, and format. For example, an owner of a particular make and model of car with a problem may ask a question about the problem on any of hundreds of sites. Car enthusiast sites, general knowledge sites (e.g., Wikipedia, E-How, About.com, and so on), and others may all contain the information the owner is looking for or may contain the same question that the owner wants to ask. The owner's success in finding a satisfactory answer may depend on the amount of exposure the question receives which may depend on which site the owner happens to choose. The owner may opt to “broadcast” the question to many sites, hoping someone on one of them will provide a helpful answer. This situation makes it very difficult for users to engage in meaningful discussions because of the difficulty involved in keeping up with these various sources separately.
An interesting development is that users often self-classify their content by using tag or keywords as a means of a taxonomy to organize information. The introduction and adoption of the hashtag (words following a “#” symbol that indicate a classification given to content by the content author) on social media sites often provides a source of classification information embedded within content. Being placed on content by the content author as the content is being created, hashtags form a democratic, non-editorial method of content classification. Other users may also repost content of others and add additional hashtags as a method of flagging an important aspect of the content that the original author did not.