Networked entities for sharing content via the Internet, such as web sites or application service providers (ASPs) and their client application counterparts, have become commonplace. These entities may be accessed to upload content, and to find other content for downloading or viewing as authorized. Such entities often employ “structured” metadata associated with the content items provided thereby to facilitate efficient storing, searching, and retrieval of the content items. Structured metadata refers to metadata that describes a data object, such as content item, according to fixed, predefined patterns and term descriptors.
However, when users upload or provide content to such an entity for sharing with other users or for various other purposes, the entity generally has little or no control over the metadata that is included with the content file. Often, the content items received by such entities (e.g., an uploaded video or music file) have little or no metadata associated therewith or the metadata associated therewith is low quality. Further, even when users do include some metadata with uploaded content items, the metadata is often “unstructured.” Contrary to structured metadata, unstructured metadata describes metadata that is composed of free-form descriptive terms that do not follow a particular linguistic pattern. For instance, there may be no particular restrictions placed on the descriptive terms a user can use to identify or describe a content item and so the descriptive terms can be as varied as the users who think them up. For example, one user might describe an uploaded music or video file in terms of a full title and artist name, while another user might describe an uploaded file in terms of a partial title and an artist nickname or venue where a live performance took place, and so on.
Accordingly, content providers cannot rely on users to provide sufficient and structured metadata with uploaded content items. For content providers that receive vast amounts of content items over time (e.g., such as a media sharing system that receives over 300 hours of new video content per minute), management, organization and provision of these content items is drastically hindered due to the lack of sufficient and structured metadata associated therewith.