Generally, files can be categorized into structured or unstructured data. Current systems can efficiently process structured data, as such data is organized as columns and rows in a database which is easy to retrieve and process via a series of database queries and other programmable code. On the other hand, unstructured information is without any structure and/or schema which may cause significantly more processing time to parse such data. Further, analyzing unstructured data that has irregularities and ambiguities makes it significantly more difficult to understand using traditional programs as compared to structured data stored in fielded form in databases or annotated (semantically tagged) in documents.
Metadata can be utilized to analyze unstructured data, including parsing the metadata to identify provenance, description categories, and other information that describe the unstructured data. In other words, metadata provides additional information about a certain file's content. For example, an image file can be unstructured data includes a group of pixels that form the image, but may also include metadata that describes how large the picture is, the color depth, the image resolution, when the image was created, and other data. A text document's metadata may contain information about how long the document is, who the author is, when the document was written, and a short summary of the document. In several cases, the metadata may be captured as the file is created, or embedded by a user through various software tools such as RightField.