Conventional electronic communication systems generate a vast amount of short, unstructured text. This text may be present within electronic mail messages or Short Message Service (SMS) messages sent from a sender to one or more specific recipients, within microblog messages (e.g., Twitter “tweets”) posted to the World Wide Web, or within any other type of communication. Due to the lack of structure, and also due to the lack of content within each individual message, this text is not amenable to advanced storage, access and analysis techniques.
Hashtags are community-driven identifiers for specifying topics associated with microblog messages. Being both defined and selectively applied to a message by the message's author, hashtags are of limited use in effectively storing, accessing and analyzing microblog messages according to their semantic meanings.