Significant information in enterprises may currently be in semi-structured data in the form of JSON, XML etc. For interpretation of data and joining the data with structured relational data, the semi-structured data may need to be flattened. With flattening of the data, the metadata and schema information may need to be redefined. For example, XSD for XML would no longer be valid in the flattened XML; similarly JSON metadata may not be valid for the flattened JSON data.
Existing methods of data joins were developed assuming a single program being executed on a single processor. These methods were not developed with massive parallelization as an objective, and software tools based on these methods are not adequate to fully exploit the performance of the new parallel distributed processing techniques.
Current mechanisms do not allow parallelization of data and massive participation of users that may lead to various forms of data that may need to be integrated in order to be understood.