Field of the Invention
The present invention relates to an apparatus, system and method for collecting, importing and modeling data, in particular data stored in a variety of computer systems.
Background of Invention
Organizations are running ever more sophisticated computer systems. For example, a small business with only 30 employees located at a single site may run one or two networks, with a single server. Employees may have different workstations or computers, manufactured by different OEMs and using different operating systems. The types of data created and manipulated by different employees will vary depending on their role, and the software they use.
As the requirements of IT systems grow organically, so the number of workstations, networks, servers and storage devices increases. Moreover, there is increasing variation in the OEM product and IT systems used within an organization. In larger organizations with thousands of employees spread across many sites, there is considerable variation in hardware and software both within and between the sites. Moreover, data retention and protection policies may vary between sites and between departments within (or between) sites. Accordingly, it is becoming increasingly difficult to manage data, especially within larger organizations and to ensure that data is most efficiently and cost-effectively stored, with maximum control and minimum access times. It is also difficult to manage the transfer of data from legacy hardware to replacement equipment as the IT infrastructure is refreshed.
Typically, all (or at least all important) information stored by an organization is backed up overnight or at other regular intervals. There are two primary reasons for backing up data. The first is to recover data after loss. The second is to allow recovery of data from an earlier time according to a user-defined retention policy. Accordingly, backed up data will commonly be given an expiry date setting the time for which the copy of the backed up data should be kept.
Since at least one copy must be made of all data on a computer system that is worth saving, storage requirements can be very large and back up systems can be very complicated. To add to the complexity, there are many different types of storage data that are useful for making back ups, many different back up models, many different access types and many different providers of back up solutions.
Briefly, back ups can be unstructured, which are generally file system type back ups, with a copy of data made on a medium or series of media with minimal information about what was backed up and when, an structured, which generally use product specific formats such as SQL, Oracle and BD2.
Irrespective of whether structured or unstructured, back ups may be: full, in which complete system images are made at various points in time; incremental, in which data is organized into increments of change between different points in time; reverse delta, in which a mirror of the recent source data is kept together with a series of differences between the recent mirror and earlier states; and continuous, in which all changes to data are immediately stored.
In addition, various media can be used for storing data, including magnetic tapes, hard disk, optical storage, floppy disk and solid state storage. Typically, an enterprise will hold its own back up media devices, but remote back up services are becoming more common.
To add a further layer of complexity, back up may be: on-line, in which an internal hard disk or disk array is used; near-line, such as a tape library with a mechanical device to move media units from storage to a drive where the media can be read/written; off-line, in which direct human action is required to make access to the storage media physically possible; off-site; or at a disaster recovery centre.
Moreover, the different back up providers use proprietary systems for organizing back ups. These systems can handle the copying or partial copying of files differently; and they can copy file systems differently, for example by taking a file system dump or by interrogating an archive bit or by using a versioning file system. They may also handle the back up of live data in different ways. In addition to copying file data, back up systems will commonly make a copy of the metadata of a computer system, such as a system description, boot sector, partition layout, file metadata (file permissions, owner, group etc), and system metadata (as different operating systems have different ways of storing configuration information).
In addition, the different back up providers frequently manipulate the data being backed up to optimize the back up speed, the restore speed, data security, media usage and bandwidth requirements. Such manipulation may involve compression, duplication and reduplication, encryption, multiplexing, refactoring and staging, and varies between the different products and different vendors.
It will be apparent that when a number of different back up systems are used, it can be very difficult to properly manage data. Similar or greater degrees of complexity arise in computer systems in the primary storage layer, which acts as the source of data to be backed up by back up systems.