In this era of stringent regulatory and compliance requirements, there is an ever increasing need of storage capacity for storage of digital archives and historical data-digital preservation. Unstructured data such as documents, images, emails, multimedia etc. need to be stored for retrieval and analysis at a later date. In a large organization, such data can easily be of the order of few hundred terabytes to petabytes level. There is a clear trend towards use of disks for storage of archives. These disk based archives provides the advantage of random and faster access to required files; offers large capacities and bandwidth at low costs and it is possible to proactively check and maintain the integrity of the archives when using disks.
However, the major technical challenges in creating a large disk based storage archive are:                Availing large capacity at low costs remains a major challenge where hundreds of terabytes/tens of petabytes of data storage is required.        The ability of the archives to deliver large read and write throughput with continuous addition of files to the archive system on a daily basis or their retrieval from the system on being queried still poses a major challenge.        Further, archives are required to be stored for a considerably long period of time. In such a case, it is obvious that periodically the hardware and operating system will have to be refreshed. The challenge herein, lies with automatic and transparent migration of data to the refreshed hardware as and when the changes are made in the infrastructure.        Lastly, data integrity is a critical issue which requires data to be stored in the archival system without any loss and with their integrity intact throughout the lifetime of the archive.        
US Patent Application US20100199123 presents a system, method and a computer program which replaces a failed node storing data relating to a portion of a data file. An indication of a new storage node to replace the failed node is received at each of a plurality of available storage nodes. The available storage nodes each contain a plurality of shares generated from a data file. These shares may have been generated based on pieces of the data file using erasure coding techniques. A replacement share is generated at each of the plurality of available storage nodes. These replacement shares may later be used to reconstruct the data file.
US Patent Application US20100064166 shows an exemplary system and method providing a plurality of data services by employing splittable, mergable and transferable redundant chains of data containers. The chains and containers may be automatically split and/or merged in response to changes in storage node network configurations and may be stored in erasure coded fragments distributed across different storage nodes. Data services provided in a distributed secondary storage system utilizing redundant chains of containers may include global de-duplication, dynamic scalability, support for multiple redundancy classes, data location, fast reading and writing of data and rebuilding of data due to node or disk failures.
However, the above cited prior arts do not provide any granular control of QoS to individual files stored in the system and neither does it provide any mechanism to automate self repair and reconstruction of failed nodes to improve system performance. This adversely impacts the scalability, throughput time and instant data availability from the archival systems.
What is needed, therefore, is a system to addresses the above stated technical problems of the prior art and to this end, the present invention proposes a novel system and method which introduces a reliability parameter indicative of QoS levels provided to each file which allows granular file level control on the protection given to a file with respect to data loss in the event of hardware failure and improvise overall performance of the system by making it more responsive and reliable. What is also needed is a data archival system and method that ensures data integrity and provides an increased data transfer bandwidth between the user and the archival system.