Most computer operations involve two types of data: structured data and unstructured data. Structured data has a high degree of organization, making inclusion in a relational database smooth and easily searchable by simple search engine algorithms. For example, spreadsheets include structured data because the data resides in fixed fields within a spreadsheet file providing quick and easy access to the information in the fixed fields. On the other hand, unstructured data includes text and multimedia content having internal structure, but lacking the ability to fit neatly in relational databases. Examples of unstructured data include emails, word processing documents, videos, photos, audio files, presentations, webpages, microblogs, x-rays, etc.
Current data-mining techniques require substantial investments of resources to analyze and extract meaningful data elements from unstructured data. For example, present data-mining techniques for mining semantics from unstructured multimedia data depend on available labeled training data. Labeled training data includes user generated tags, classes, and/or metadata that provide information relevant to unstructured multimedia data. Some semantic mining approaches identify semantics based on the tags, classes, and/or metadata associated with unstructured multimedia data. Other semantic mining approaches use notes associated with unstructured multimedia data to glean a meaning from the unstructured multimedia data or identify semantics by using cluster information and/or a context of the unstructured multimedia data. Recently, click logs from search engines have been used to provide efficient ways to generate training data for identifying semantics associated with unstructured multimedia data.