In recent years, organizations have seen substantial growth in data volume. Continuous collection of large datasets that record information, such as customer interactions, product sales, results from advertising campaigns on the Internet by organizations, data coming from social media and mobile devices, etc. has led to a substantial growth in data volume. Many organizations today are facing tremendous challenges in managing the data due to the sudden growth in data volume, and also the unstructured nature of data. Consequently, storage and analysis of large volumes of data have emerged as a challenge for many enterprises, both big and small, across all industries.
In recent years, Big data technologies, such as Hadoop and NoSQL, have been widely adopted due to its capability of handling large sets of structured as well as unstructured data. The Hadoop is an open source technology for distributed computing with massive data sets using a cluster of multiple nodes. The Hadoop includes a Hadoop Distributed File System (HDFS) as a data storage layer and a Hadoop MapReduce framework as a data processing layer. Further, NoSQL is a technology to address new challenges of flexible schema needed for unstructured data and several other constraints associated with traditional database management systems, such as relational database management system (RDBMS).