Technology users generate a significant volume of text based data via the Internet. Users of all kinds can create text though blogs, websites, articles, and social media applications such as Facebook, Twitter and Instant Messages (IMs). Users can also communicate over mobile networks through texting or the Short Message Service (SMS) through mobile devices. The ability to harness the great volume of text created by users and to glean useful information from that text can provide many benefits. For example, classifying users by user information can be used for targeting marketing to a certain demographic or analysis of market penetration within a particular demographic. In general, user information is any information that can classify the user, such as a user's age or preferences.
Data mining is a branch of computer science that generally seeks to extract patterns from large data sets by combing methods from statistics and artificial intelligence with database management. However, there has been little work for analyzing short messages. Short messages are messages that are often only a few sentences or less, generally used where writing space is at a premium. Popular social networking services provide many services for generating short messages, like Twitter messages or Facebook statuses, which utilize messages that are typically much shorter in length than full web pages. This brevity makes it increasingly difficult to use current data mining techniques specifically designed to analyze full pages of text to extract information from the increasingly large body of short messages being created via social networking services. Additionally, in recent years there has been an increase in the number of very short documents, usually in the form of user generated messages or comments.