Increasing business and individual electronic needs is directly proportional to increasing large electronic data volume, as all the paper work is being transferred to electronic document. Also nowadays a large volume of electronic data increases, as most of the companies in US have at least 100 terabytes of stored data and as much as 2.5 quintillion bytes of data are created each day. Also with the frequent flow of data from sources such as sensors, machines, social media sites, the velocity of electronic data is massive and continuous. Effective analysis of large volume of organizational and individual data from multiple varieties of internal and external sources such as transactions, social media content, enterprise content, sensors and mobile devices is required for deriving various useful insights for business as well as individual needs.
One of the biggest problem while handling large volume/big data is the veracity of data. Is the data being stored is mined meaningful and stored in such a way that the retrieval of data is done efficiently.
When faced with a large volume of varied velocity of data, from a variety of internal and external sources, it is a natural human tendency to divide and conquer. But, the existing databases do not pre-process and stores the pre-processed large volume of varied variety of data based on its content type such that the analysis and retrieval of data is done efficiently in a seamless manner. Also the existing databases does not allow grouping of data based on user defined domains.
Though some existing databases provide grouping of data based on user defined domains when faced with a large volume of varied velocity of data from a variety of internal and external source, they does not support effective retrieval of data for complex queries.
Hence there exists a need of a data storage system that pre-processes and stores the data based on its content type such that the analysis and retrieval of data is done efficiently in a seamless manner.