An analytic system such as, for example, a web analytic system, may have three basic components, a data collection component, a data processing component and a data visualization component. Instrumentation data may be generated from various data sources including, but not limited to, an end-user client application, an application/web server, as well as other data sources, which may provide varied instrumentation that each application records and measures. As a result, each of the components of an analytic system may receive heterogeneous data with varied schema and semantics. Heterogeneity is eventually propagated from the data collection component to lower layers of the analytic system, such as, for example, data processing and data visualization.
Some existing analytic systems process heterogeneous data from different data sources by transforming the heterogeneous data to produce transformed data conforming to a generic schema in early stages of the analytic systems. The analytic systems then operate on the transformed data. However, these analytic systems are inefficient due to processing overhead for transforming the heterogeneous data, and storage and processing overhead due to mapping an application-specific schema to a generic schema for much of a processing pipeline. In addition, the analytic systems may inefficiently use processing/storage capabilities if many attributes of the generic schema are unused. Further, because data processing in the analytic systems is tightly coupled with the generic schema, any changes or upgrades to the systems are propagated through various components or layers.