1. Technical Field
The embodiments herein are generally related to field of cloud computing. The embodiments herein are particularly related to a system providing memory based storage solution for big data applications. The embodiments herein more particularly relates to a system and method for providing hierarchical cache for big data in a cloud infrastructure.
2. Description of the Related Art
Distributed systems are those having more than one CPU on at least two computing board or machines. Such systems are commonly used to meet the workload requirements in-order to scale a storage capacity of a cluster computing system. These systems comprise different independent parts that are connected to each other with the network links. The large scale storage systems are capable of consolidating more than one hard disk and may store on more than one thousand disks that are managed by a distributed storage system in some cases. When a failure occurs on a hard disk, the storage software manages data replicas in more than one disk to increase a reliability of data and to ensure a recoverability of the system. The Distributed Storage Systems (DDS) are storage specific and operate independently based on the medium of storage used for storing a data. The DDS independently sends the data through network links for replication purposes to increase to reliability of the stored data in case of failures.
The amount of data generated by the sensors, machines, and individuals increases exponentially. The Intelligence, Surveillance and Reconnaissance (ISR) platforms have been moved towards higher resolution sensors and persistence surveillance. This has lead to the collection of enormous volume of data. Similarly, enterprises collect the large amounts of operational data from Information Technology (IT) systems with the goal of improving operations and cyber security. Finally, the data generated by people, especially in the context of social media explodes heavily. This flow of multi-source data leads to an opportunity to extract real time information that is immediately relevant to users. Big data includes information garnered from social media, data from internet-enabled devices (including smart phones and tablets), machine data, video and voice recordings, and the continued preservation and logging of structured and unstructured data. Big data refers to the dynamic, large and disparate volumes of data created by people, tools and machines which are distributed over a set of storages. The data gathered may be stored beforehand or may be a continuous stream to be accessed, stored and analyzed with distributed algorithms and frameworks. Big Data analytics inherently requires a set of distributed computing, networking and storage resources that may be available locally or to be rented from a cloud infrastructure. The system occasionally need memory based storage solutions for Big Data processing applications. On the other hand, data volatility characteristic of RAM modules may challenge DDS for data reliability that lead to availability of data, especially when RAM based DDS are used as a caching mechanism for Disk based DDS. The independent nature of such systems causes unnecessary duplications in data transfers through networks links
The aforementioned drawbacks are responsible for creating need for a better system with more efficient methods to eliminate the redundancies for better combination of DDS over different storage mediums. Further, there is a need for a system with different types of memories combined in regard to improve the efficiency of the connectivity in replication process.
The above mentioned shortcomings, disadvantages and problems are addressed herein and which will be understood by reading and studying the following specification.