Data mining refers to the analysis of large data sets to discover interesting patterns and gain information. The information obtained from data mining can provide insight into dimensional relationships between fields such as transactions, sales, date/time, health, environment, biology, and products. Applications of data mining include, but are not limited to, discovering buying patterns/sales trends, discovering biomarkers and performing gene mapping, detecting fraud, performing forensics, as well as predicting behaviors.
One emerging area of interest for data mining includes social media. Patterns of behavior and content available from the pervasive use of social media have widespread applications for improving business, providing humanitarian relief, and assisting users as a few examples.
Social media refers to Internet-based applications that propagate user-generated content. Social media include social networking applications, blogs, wikis, and other content (e.g., image, video, text) sharing applications. A massive amount of content (and associated data) is generated and posted to social media sites. Unlike traditional “structured” attribute-value data, social media data is often noisy (i.e., contain issues with trustworthiness) and unstructured (e.g., do not necessarily contain cohesive or consistent attributes).
One attribute that can be useful for uncovering patterns in social media content is source location. However, currently, only a minority of social media content include this attribute.