Protection of data has been a challenging task because data may be embedded in any kind of files, such as word processing document, memorandum, electronic mail, spreadsheet, etc. Further more, data may be stored as structured data, such as in databases (where the data may be logically organized into columns and/or rows), and/or unstructured data, such as in a word processing document. It is difficult to process knowledge in unstructured data on which intelligent queries can be applied. This is because most queries and operations are currently limited to applications in which knowledge or information is represented or organized in structured data in regular expressions, such as credit card information, telephone number, and/or social security number, etc. In contrast, most conventional queries are inapplicable to unstructured data.
Due to the lack of concise and compact representation of unstructured data, it is difficult to run queries (such as searches) or perform operations on unstructured data, not to mention complex analysis of unstructured data. Since many applications, especially applications in the field of data security (e.g., data intrusion prevention, data extrusion prevention, etc.), rely on complex analysis of data, thus, a concise and compact representation of unstructured data is important for successful data security policy implementation.