Service providers and device manufacturers (e.g., wireless, cellular, etc.) are continually challenged to deliver value and convenience to consumers by, for example, providing compelling network services. These services are leading to vast amounts of data (structured and binary) which need to be managed, stored, searched, analyzed, etc. Over the last decade, the internet services have accumulated data in the range of exabytes (1016 bytes). Although most of this data is not structured in nature, however, it must be stored, searched and analyzed appropriately before any real time information can be drawn from it for providing services to the users.
In order to perform analytics on data and gain insight into the data, the data has to be put into the analytics engine through various ingestion schemes. The data is typically received in an unstructured format at the time it is ingested. Then, it needs to be cleansed, structured, and validated into a format conductive for analysis. In order to cleanse the data and make it available for analytics, the data is required to go through a pipeline of disparate systems. Almost everyone in the industry spends a fair amount of time providing custom work to create a pipeline through disparate systems for each data source that is brought in. Getting the data ready for analysis is very time consuming and labor intensive work. Typically, developers write various custom map-reduce programs to cleanse the data. However, if the data could be reflected in terms of some standard data models and cleansing processes, then it would be possible to create a standard pipeline and greatly streamline the ETL (Extraction-Transformation-Load) process which is mostly the biggest obstacle and time consuming area of analytics. Standard data models are very hard to figure out because schematics of the data changes continuously because data and usage of that data changes continuously in the device.