I. Field of the Invention
The present invention relates generally to computer network management systems, and more specifically to the intelligent management of archival data in such networks.
II. Related Art
Given the current prevalence of personal computers (PCs), it is not uncommon for each employee in a company to have his or her own workstation for performing daily tasks such as word processing, CAD/CAM, creating spreadsheets, browsing the Internet, etc. Companies often employ computer networks to link these individual workstations for electronic mail communications, sharing data and resources (e.g., peripherals). These networks, for a medium to large size company, can easily contain over 10,000 workstations. Consequently, it is now necessary for companies to have a computer services or a management information systems (MIS) department.
These MIS departments are not only charged with maintaining and up-keeping the network but also with archiving the data located on the network. It is typical for MIS departments to perform full and incremental xe2x80x9cnightlyxe2x80x9d back-ups (i.e., archiving) of the data located on each of the 10,000 or so computers on the network. Back-ups (also commonly referred to as xe2x80x9cdumpsxe2x80x9d) consist of spare copies of all the files on a computer workstation""s disk which are made periodically and kept on magnetic tape or other removable storage medium. This essential precaution is needed, for example, in case an employee experiences a xe2x80x9cdisk crashxe2x80x9d or accidentally deletes the only copy of a file containing critical data (e.g., a project file an employee has been working on for months).
As computer technology continues to advance at a rapid-fire pace, the data storage capacity, size of computer applications, and consequently, the size of user files are increasing dramatically. Thus, the task of performing back-ups is becoming increasingly complex. It is not uncommon for a computer network containing 10,000 computers to have two to four terabytes (240=1,099,511,627,776 bytes or roughly 1012 bytes) of data that needs to be archived during xe2x80x9cfullxe2x80x9d back-ups and 350-400 gigabytes (230=1,073,741,824 bytes) of data that needs to be archived during nightly xe2x80x9cincrementalxe2x80x9d back-ups. Even with the price of storage media (i.e., hard drives, magnetic tapes, CD-ROMs, etc.) decreasing and the capacity and speed of such media increasing, the costs in terms of time, labor, and supplies can be a significant amount of overhead for any companyxe2x80x94at least $500,000 per year is not uncommon.
MIS departments (and computer scientists in general) have contemplated the problem of archiving large amounts of data and have developed several solutions. Many of these solutions have concentrated on faster computers or data compression techniques to minimize the amount of storage space archived data occupies. There are many compression algorithms and utilities such as compress (the standard UNIXxc3x4 compression utility), gzip (written by the Free Software Foundation""s GNU project), and pkzip (a product of PKware, Inc. primarily for MS-DOSxc3x4 machines).
Another conventional scheme for dealing with the problem is reducing the amount of data needing to be backed-up on a regular basis. One such common scheme is known as the Grandfather-Father-Son (GFS) back-up scheme. In the GFS scheme, a full xe2x80x9cgrandfatherxe2x80x9d back-up is made at every pre-defined interval of time (e.g., monthly). Axe2x80x9cSonxe2x80x9d back-up is made one week later, archiving the data that has changed since the last xe2x80x9cGrandfatherxe2x80x9d back-up. Finally, a xe2x80x9cFatherxe2x80x9d back-up is made another week later to archive the data that has changed since the last xe2x80x9cSonxe2x80x9d back-up. The xe2x80x9cFather-Sonxe2x80x9d process is then repeated until it is time for the next xe2x80x9cGrandfatherxe2x80x9d back-up and the entire GFS process then repeats itself.
Another conventional back-up scheme to limit the amount of data needing to be archived is the use of an exclusion list. This scheme identifies pre-determined directories (e.g., application and operating system directories) which will not be archived from each user""s workstation. This type of scheme results in redundant data not being archived. However, this scheme results in many lost files because users often are not savvy enough to know not to save files in these excluded directories. Furthermore, many programs often save special files (e.g., initialization or *.ini files) in these application and operating system directories. If these special files are not archived, a restoration process cannot reload them and thus a user""s application preferences (e.g., mouse speed, screen color, Internet Web page bookmarks, etc.) are forever lost. This process can no doubt lead to a lot of frustration. In addition, none of the conventional compression techniques or backup schemes have addressed the amount of redundant data that is repeatedly archived.
What is needed, therefore, is a system and method for the intelligent management of archival data in a computer network where the resources and associated costs of making back-ups are reduced. Further, what is needed is a system and method that allows user""s unique files to be archived regardless of what directory contains them through the creation and use of intelligent inclusion/exclusion lists and rules.
The present invention is a novel and improved system and method for intelligently managing data in the archival and restoration processes of a computer network. The system includes a plurality of computer workstations and a server which initiates and controls the archival process via an Inclusion/Exclusion engine. A set of rules which allows MIS personnel on the sever and users on the workstations to select the files and directories to include in or exclude from the archival process is included in the engine.
The engine makes use of an inclusion list that contains the signature of the files and directories included in the archival process based on the rules selected and applied. An exclusion list that contains the signature of the files and directories excluded from the archival process based on rules selected and applied is also included. The exclusion list also holds signatures of files deemed redundant data by the server.
The method of the present invention, during archiving, includes identifying a redundant file that is present in a pre-determined number of computer workstations, copying the redundant file to a low latency storage area, and updating a commonality list with the signature of the redundant file. The redundant file is then assigned a commonality list identification number which is placed into the archive as a placeholder. As for unique files present in the computer workstations, they are copied into the archive. The archive is then stored on a tape storage medium.
During restorations, when a placeholder (i.e., a commonality list identification number) is encountered in the archive, the file is located on the low-latency storage area using the commonality list identification number and then copied to the workstation being restored. When a unique file is encountered in the archive, it is copied directly from the archive to the computer workstation.
An advantage of the present invention is that not only can it speed-up the back-up process by eliminating redundant data, but the restoration process is sped-up as well.
Another advantage of the present invention is that, because less data is involved, the archival and restoration processes take up less network resources and leaves more bandwidth for regular computing activities.
Another advantage of the present invention is that, unlike conventional exclusion list, each computer workstation user within the network does not have to be aware of which directories files can or cannot be placed in.
Another advantage of the present invention is that new software applications installed on the network and then on individual workstations can automatically be labeled as redundant data and thus never be archived.
Yet still, another advantage of the present invention is that by eliminating the repeated archiving of redundant data, a company can reduce their MIS department""s overall operational costs. Further features and advantages of the invention as well as the structure and operation of various embodiments of the invention are described in detail below with reference to the accompanying drawings.