Many enterprises are suffering from a data explosion problem. Data grows at an exponential rate each year, largely caused by the mixture of active, inactive and unusable data. A growing number of government regulations mandate enterprises keep business critical data for a certain timeframe. For instance, the Sarbanes-Oxley Act, the Health Insurance Portability and Accountability Act (HIPAA) and Basel II (the second of the Basel Accords) all mandate enterprises keep business critical data for a certain timeframe. Statistics show the percentage of inactive data and unusable data increases nonlinearly in an operational environment. This problem leads to following major issues:
1. data explosion;
2. application performance degradation; and
3. increase in information technology (IT) costs, e.g., maintenance costs, hardware and storage costs, etc.
Data archiving is a practical approach for selecting inactive data and unusable data from an operational environment and move this data to an archive space for future use. When the inactive data is not needed anymore, it can be removed from the archive space. This is known as data purge. Thus, not only performance of enterprise application can be enhanced, but also costs can be reduced and saved. Aside from a “doing-nothing” approach, which creates performance degradation and increasing IT costs, there are three major data archive solutions adopted today.
The first is to do the data archiving manually. In this case, users select, move and remove data manually. For instance, it could be database administrators that issue queries using, for example, Structured Query Language (SQL), or using generic database utilities to query against relational databases and save the query result as files, and then transfer the files to another location using, as an example, File Transfer Protocol (FTP) which is a standard network protocol used to exchange and manipulate files over a TCP/IP based network, such as the Internet. This solution may seem to be simple and doesn't create large, upfront costs, but the disadvantage is that it creates high risk and frequently leads to data integrity issues, so this method is of potential damage to an enterprise and even may result in disaster.
The second is for applications to have their own specific “data archive” function, provided with sufficient planning and funding. This solution has following drawbacks.
The first drawback is that when an application needs to be rewritten, the data archive function or component also needs to rewritten due to tight-coupling architectures.
The second drawback is that it prolongs the project development lifecycle because the user has to do additional archive function development, such as analysis, design, implementation, and testing.
The third drawback is that there is additional cost; every project has to add resources on the implementation of the archive function.
Another drawback is that the data archive function is highly specialized for particular types of data and generic enough to satisfy changing data archive requirements for different applications.
A third major data archive solution is to use an off-the-shelf data archive product. Many software companies have offered their own data archive products. For example, International Business Machines Corporation (IBM), Hewlett-Packard Company (HP), Oracle Corporation, SAP AG and other corporations have developed data archive products. These data archive products generally may provide a configurable console and programmable tool for data archive.
However, these current products also have limitations. One such limitation is that there is limited data source and location support. Most of these tools only support specific relational databases (such as IBM® DB2®, the Oracle Database (commonly referred to as Oracle RDBMS or simply Oracle), etc) and merely archive to tables or flat files.
Another limitation is that there is limited data type support. Most present data archive products support only common data types in a relational database.
In addition, some archive tools simply copy the documents (files) just like a backup, never considering business logic.
Finally, present archive tools are not flexible enough to change archive rules. Though some archive tools integrate a larger scale of archive rules from which users can select, these rules are hard-coded in the system and difficult to change.
The above issues and drawbacks limit these solutions general usage. As is known, in an enterprise environment, there is not only data stored in databases, but also data stored in files, documents, emails and XML (Extensible Markup Language).
Also, it has only a few basic, coarse-grained and fixed archive rules, e.g. what storage pool target, what if file is in use and how long to keep. Therefore, there is a need to solve the problems associated as described above.