Present invention embodiments relate to document storage, and more specifically, to determining which portions of a document to store in an unstructured format and which portions to store in a structured format.
Non-relational database solutions (e.g., NoSQL) increasingly utilize interchange or unstructured data formats, e.g., JavaScript Object Notation (JSON), Binary JSON (BSON), etc., for ease of data management and ease of data exchange with applications. Interchange formats such as JSON or BSON provide schema type flexibility, allowing both key-value pair types and the number of key-value pairs to be arbitrary. JSON and BSON provide flexibility to developers, allowing for data to be entered in any desired format, as corresponding schema rules are enforced during subsequent read operations (and not during write operations).
While providing flexibility, storing documents in unstructured format incurs a significant negative performance impact (especially for large datasets) during runtime evaluation of queries, as compared to traditional SQL databases having uniform rows, columns, and data types of predetermined sizes.