1. Field of the Invention
The invention relates to computer file backup systems and methods. More specifically, the invention relates to client-server systems for backing up computer files both locally at a customer or primary site and remotely at an offsite or datacenter site while providing rapid recoverability of lost or damaged files as well as file metadata in the event of destruction of all or part of the primary site (or at least its computer system).
2. Description of Related Art
Businesses, universities, and any entities using computers and computer files need to protect their files from accidental loss or damage. A fire, an ill-placed and spilled beverage, or an electrical surge can degrade or eliminate an entire server's contents in seconds. Many entities back up their files onto a supplemental medium such as a tape or similar device periodically, e.g., nightly or weekly. Some of the systems that have evolved utilizing tape backup require manual insertion of a tape into a tape drive or reader on a periodic basis. Any system requiring human intervention to replace a tape is potentially at the mercy of human error.
Automatic backup systems are also known. In many such systems, software is provided on the onsite computer system to automatically, e.g., at timed intervals, transfer a copy of the contents of the system onto a remote or offsite computer system. Although most file system backup occurs when there is little or no activity on the client or local onsite system (e.g., at night), such software-only systems have an inherent lag time with respect to recovering lost or damaged files. Customers of such systems (i.e., the owners of the various local sites) typically need to wait a significant and often impractically long amount of time for recovery of files to be transferred back to them, even over the Internet.
In 2005, the instant inventor devised a system in which a piece of backup equipment was installed onsite at the customer's site. This equipment was installed with a mixture of Open Source software and proprietary code. Each evening, the onsite device would copy the customer's files and store them, then later in the evening, transmit the changes to a central remote or offsite datacenter. Once at the datacenter, the customer's data was copied to tape. By the inventor providing each customer with their own onsite equipment, customers would be able to recover data more rapidly. As an example, a modest sized server would take a week to restore using a software-only system, whereas the same server could be recovered in one hour using the inventor's previous system utilizing an onsite backup device.
Yet even Applicant's previous system was not ideal. One problem that needed solving for file backup was caused whenever customers reorganized their files. Applicant's previous system would interpret this as a deletion of old content, and creation of new content. This meant that a simple reorganization could result in re-copying files back to the offsite datacenter in their entirety. Rather than minutes for an update, it could mean hours or even days of extra synchronization work.
The problem can best be illustrated with the following example. Say a customer has a primary device with a folder C:\FOLDER1 and that in that folder is a single file PROPOSAL.DOC. The first time the customer's files are backed up to the onsite device, the device creates a copy of the folder and file and stores the copy on its hard drive. Each night thereafter, the onsite device would check the primary device to match PROPOSAL.DOC and only need to update its copy with changes.
The trouble arose when the customer renamed the folder as C:\FOLDER2. That night, the onsite device would recognize there was no longer a folder called C:\FOLDER1, and so would delete its copy of the folder and its file. It would then recognize there was a new folder C:\FOLDER2 that needed to be copied, forcing a complete copy of the file and folder to be copied and thus prepared for back-up.
If PROPOSAL.DOC were very large, say in the hundreds of megabytes or even gigabytes range, the additional copy could take hours to then transmit to the datacenter. This could be made even worse in cases where the simply renamed folder had hundreds or thousands of files—all which now need to be recopied.
Another problem that arose was rooted in the Microsoft Windows® file protection system, referred to as Access Control Lists (ACLs). In Windows®, each file is considered to be owned by a user. Each user's credentials are represented in an organization's or customer's unique coding called a Security Identifier (SID). When a customer's files are backed-up to the central datacenter, the SID information was stripped or removed from the datacenter copy. During a recovery, customers would need to first restore their files, then manually re-create file security information. Even small companies can have tens of thousands of files. This meant days of manual effort to recreate the ACLs.