Data scientists spend time organizing and cleaning data, which time could be better-spent on procedures such as modelling or data mining. Standardization bodies, such as the World Wide Web Consortium (W3C), have worked for many years on proposing formats and best practices to facilitate data publication and sharing. However, not all data providers publish their data according to standards. Moreover, most standards focus on the syntax of the data model and forget about the data semantics. This often results in semantic interoperability problems when data from different sources are exchanged and merged, e.g. when two datasets refer to the same data property using different names.
The identification of data properties in a dataset it is complex and time-consuming when proper metadata is not available.