The present invention generally relates to data storage systems, and more particularly relates to optimizing schema-less data within data storage systems.
Storing schema-less (unstructured) data in relational databases is a difficult task, as this type of data tends to be sparse and generally requires a large number of tables/columns for storage. For example, consider storing extractions from an on-line encyclopedia using RDF (Resource Description Framework), which is one type of data that is sparse and schema less. This extraction can result in very large number (e.g., 39,000) of predicates such as the age of a person, the location of a company, etc. A large number of tables/columns would be required to store this large number of predicates. However, relational databases impose significant constraints on the size of various relational objects such as the size of a table, size of columns in a table, etc. Therefore, a single table may not be able to store all of the data and multiple tables generally cannot be used since schema-less data can have many thousands of types/entities.