The instant invention generally relates to systems for file sharing, backing up and distributing data and more specifically to a method and system for asynchronous transmission, backup, distribution of data and file sharing.
From the inception of the introduction of personal computers in the workplace, a significant problem relates to sharing, preserving by backing-up, and distributing data between multiple users. This problem becomes even more difficult when the users are scattered over the country and reside in autonomous organizations. For example, if a company wishes to work on single project with another company, an individual, a group of individuals or various combinations of entities the question of how to selectively grant access and distribute data arises. How do you insure that only data necessary for the project is distributed? How do you prevent unauthorized access to files or data which must remain segregated? And how do you simply keep track of exactly what stage the document is in?
One methodology includes the use of a local area network (commonly known as a LAN) or a wide area network (WAN) together with one or more fileservers, said file server being at least one centralized computer where the data is stored. Fileserver systems include NFS, Novell Netware, Microsoft""s SMB, AFS, and other systems. These systems function well when employed by a single large user, such as a company, to allow its employees to access files and other data. A recognized shortcoming attendant to LAN systems is that they preclude third party usage by preventing access to an organizations internal network. The distribution of data is somewhat secure, but often members of the same organization can access data without proper need authorization or clearance. It should be noted that a LAN interconnects computers within a limited geographical area and LAN systems are susceptible to unacknowledged viruses. Notwithstanding, the fact that the employee is authorized, each person with LAN access has, as of right, access to all the files on the server. The files stored on a fileserver can be changed, deleted, modified, or updated. Of more importance, a xe2x80x9ccrashxe2x80x9d of the central server, militates to a loss of data and failure, even with daily backup procedures, since the data may have been changed in between the time of the last backup and the crash. Additionally, the users will not be able to access the data while the data is being restored from a traditional backup device, and immediate access to older versions of files is not possible. Another method utilizes a wide area network (WAN). WAN systems allow third parties to receive distributed data by a distributed network. WAN systems are cumbersome, expensive and do not address the xe2x80x9cneed to knowxe2x80x9d aspects of transmitting, storing, distributing and backing up data.
Distributed file systems arise from the communication between a file server and interconnected computers which enable users to share data. It is preferred that the system should include a number of replicated central file servers to distribute the data. One such system is the Andrew File System (AFS), distributed by Transarc Corporation. AFS is a network of computers where there is a hierarchy of client machines and server machines. A user is granted access to the server machine either with or without the power to change data. Distributed file servers solve some of the issues of data reliability, but still do not provide access to older versions of data except in a limited way. For example, distributed file servers may provide snapshots every hour from the past day only.
The Internet has made file and data sharing an everyday occurrence, whether it is within an organization or with third parties. The best known mode is e-mail (electronic mail), which allows the user to send files and data, via the Internet, to others by sending same to a URL and user name. E-mail transmission occurs over what may be termed a public network. The vagaries and frailties of Internet transmission of data are legion. They include omission of attachments, transmission to unintended parties, and the unauthorized access to confidential data.
Well known in the art is the use of firewalls and encryption. A firewall is merely a software/hardware gateway utilizing an access filter, which is installed where a network is connected to the Internet. By checking all requests against acceptable access criteria the software either allows or denies access to the server where the files and data are stored. Identifying information like a token, user""s machine identification, and the Internet Packet at the firewall results in access or denial. The frailties of this system are well known, since anyone who can intercept access information can enter the system. Conversely, often users who have valid access are denied by corrupted information or software incompatibility.
One method allowing third parties access to internal fileservers is the virtual private network (commonly known as a VPN). A VPN allows one or more third party users to penetrate the firewall and gain access to the distributor""s server. The VPN moderates data communications between members of a defined VPN group, and allows access to the file server on the internal organization network. While data reception may require various passwords, encryption/decryption techniques, authentication, each group of users must have a set series of rules therefore. In order to utilize a VPN a user must be connected to the Internet, for example by dialing into an Internet Service Provider (commonly known as an ISP) and maintain at least a temporary membership via a name and password (See U.S. Pat. No. 6,055,575 issued to Paulsen et al).
VPNs by their very nature suffer from a series of vagaries. First, the user must be connected to the Internet in order to access the data. If the connection fails on either sidexe2x80x94the users or the organization""sxe2x80x94the user will not be able to access the data. Second, after access is granted through the firewall, the user has potential access to all of the internal network. Therefore, the proprietor must insure that all of its servers do not respond to requests from outside computers and that users with limited access cannot access, delete, or modify requested data or files. By using a modality which labels and reads Internet Packet level information relating to the address and destination of information identifies the user so that clearance or rejection may take place. This method is complex and at times does not prevent unauthorized or inadvertent access to data, since every computer on the internal network must be protected. VPNs therefore are of limited use for giving third party access to data. Similar remote access methods such as SSH and PC Anywhere have similar problems.
Other distributed file systems and file sharing systems include the following: Coda, The Cedar Filesystem, Freenet, and CVS. None of the foregoing as set out in the description below provide the same or similar utility.
U.S. Pat. No. 6,175,917. Method and Apparatus for Swapping A Computer Operating System. The ""917 patent discloses an invention which utilizes a computer system with a number of different storage memories, each with an operating system program and an identifier. The identifier may be switched between storage memories so if there is a failure, one can switch to the other version. The VPN unit is utilized to maintain lookup tables for members of particular groups. When data packets are sent between members of the same VPN, the VPN unit processes the packet, makes sure it is properly encrypted and adds authentication information to it. The receiving end also makes sure the corresponding VPN is authenticated, then decrypts and decompresses the packet.
U.S. Pat. No. 6,173,399. Apparatus for Implementing Virtual Private Networks. Discloses a protocol and architecture for implementing virtual private networks for using a public network space for secure private networks. A packet of information is sent between source and destination addresses, a check is made that both are members of the same VPN group, the VPN Unit processes the data packet from the sending side to make sure it is encrypted, authenticated and optionally compressed. The receiving VPN Unit handles the process of decrypting and authenticating before sending it to it""s final destination. Also, disclosed is the use by remote clients, where the VPN Unit is simulated in software which operates in conjunction with the communication software for connecting the remote client to the associated local Internet Service Provider.
U.S. Pat. No. 6,131,148. Snapshot copy of a secondary volume of a PPRC pair. The ""148 invention provides a method and apparatus for setting up a Peer-to-Peer Remote Copy (PPRC) session and snapshot copying a remote secondary volume to another volume on the remote subsystem. The apparatus includes a primary storage subsystem having a primary data-storage device with at least a primary volume. A primary processing unit relays a request to perform a snapshot copy of at least a portion of the primary volume to a secondary storage subsystem. The secondary storage subsystem includes a secondary data-storage device having a secondary volume which is maintained in a duplexed state with the primary volume through the use of remote copy sessions. A secondary processing unit, responsive to the relayed request can perform an equivalent of a snapshot copy of at least a portion of the primary volume by making a snapshot copy of a corresponding portion of the secondary volume.
U.S. Pat. No. 6,119,151. System and Method for Efficient Cache Management In a Distributed File System. Discloses a system for data management in a distributed file system in which a cache manager is used independent of the file system protocol. The cache is periodically written to non-volatile storage, from which it can be restored in the case of a power or system failure. The cache manager is the trigger that determines whether to store information.
U.S. Pat. No. 6,101,585. Mechanism for incremental backup of on-line files. A backup mechanism enables incremental backup operations for on-line files of a computer system having an archive bit attribute associated with each file. The mechanism comprises an archive bit change number (ABCN) attribute that is also associated with each file and that is manipulated by a file system of the computer to reflect a correct value of the archive bit when the file is modified during a current backup operation. The ABCN attribute is incremented each time the file is modified to ensure that the file is accurately copied to secondary backup storage during a subsequent incremental backup operation. One of the methods of accomplishing this is consists of: (1) creating a snapshot file in a read-only container, and duplicating contents of the on-line file to the snapshot file, each file initially having an archive bit change number (ABCN) attribute of the same value; (2) modifying the on-line file to reflect changes by a user; (3) asserting the archive bit and incrementing the ABCN of the on-line file in response to the step of modifying (4) backing up the contents of the snapshot file with the archive bit to a backup file on a secondary storage device; (5) issuing a clear archive bit command upon completion of an incremental backup operation directed to the snapshot file; (6) comparing the ABCN attribute of the on-line file with the ABCN attribute of the backup file; (7) maintaining assertion of the archive bit in the on-line file if the ABCN attributes of an on-line file ID and a snapshot file ID do not match; and (8) clearing the archive bit in the on-line file if the ABCN attributes of the on-line file ID and the snapshot file ID match, thereby enabling an accurate incremental backup operation.
U.S. Pat. No. 6,023,710. System and method for long-term administration of archival storage. Simplifies the long-term administration of remote or archive storage by collecting multiple portions of the same files stored in different backup or archive storage sessions at different times into a single updated version of the file which may then be placed on backup or archive storage. Identifies the various backup or archive sessions containing portions of the file of interest. It then retrieves these various portions and determines which is the most current version. The most current version of all portions are then assembled and coalesced into a single updated version. The system works with remote backup, local backup or archive storage and reduces the time necessary to retrieve information from a backup.
U.S. Pat. No. 5,835,953. Backup system that takes a snapshot of the locations in a mass storage device that has been identified for updating prior to updating. A system and method for maintaining logically consistent backups using minimal data transfer is disclosed. The system utilizes more than one device, having a backup storage device and one or more primary systems capable of storing information that is to be backed up on the backup storage device. The primary systems identify changes that are going to be made to the mass storage device. The combined effected locations in the mass storage device of these identified changes are then captured in a static snapshot when the mass storage device is in a logically consistent state. Only those data blocks changed since the last backup are then transferred to backup system. The backup system can then store these changes or apply the changes to the backup storage device in order to bring the backup storage device current to a particular point in time.
U.S. Pat. No. 5,794,254. Incremental computer file backup using a two-step comparison of first two characters in the block and a signature with pre-stored character and signature sets. The system backs up computer files to a remote site via modem. Files of a user computer that are found in a common library at the remote site initially are not copied to the remote site, whereas files not in the library are copied to the remote site. Then, periodically the user computer determines which blocks have been changed, and the user computer transmits only changed blocks to the remote site. The blocks are gathered in xe2x80x9cchunkxe2x80x9d files, and when a chunk file reaches a predetermined size, it is transmitted to the remote site for updating the back up version of the respective file. The process then resumes identifying changed blocks. In addition to flagging the changed block for transfer, the process resynchronizes the local data file with the backed up version using a two-step comparison, first comparing the first two characters in the block with a pre-stored character set, and then, if the first comparison results in a match, comparing a digital signature of the changed block with a pre-stored signature. If either comparison results in a mismatch, the test is repeated using, as the first byte of the test block, the next byte in the sequence.
U.S. Pat. No. 5,771,354. Internet online backup system provides remote storage for customers using IDs and passwords which were interactively established when signing up for backup services. This invention makes it possible for a customer computer to connect to an online service provider computer by phone, Internet, or other method, pay a fee to said service provider, and obtain additional processing and storage resources for the customer""s computer. Relevant in that it fools the computer into thinking that information is being stored locally. Offsite archival services are performed by accessing virtual disk drives. Customer files that are inactive for a specified period are automatically copied to on-line service virtual disks for offsite archiving. Many disks of varying size can exist for each customer. Virtual disks are mounted and customer files are copied to their original customer disk for restoration. Virtual disks inactive for a specified period can be copied to on-line service tape for long term offsite archival. A virtual disk can be considered an offsite archival storage area. Every customer file could be stored on virtual disk with directory structures maintained. A diskette can be provided to boot a customer computer and connect to the on-line service and boot a virtual disk copy of the customer computer system disk. An advantage virtual disks provide for offsite archival is that remote storage is accessible as if locally attached. Relevant in that it discloses the use of the Virtual disk drive.
U.S. Pat. No. 5,678,042. Network management System Having Historical Virtual Catalog Snapshots for Overview of Historical Changes to Files Distributively Stored Across Network Domain. Discloses a network management system where a domain administrating server (DAS) is coupled to a network-linking backbone of a network domain for scanning the network domain to retrieve or broadcast domain-wide information, where the DAS is capable of storing and maintaining a domain-wide virtual catalog, overseeing other domain-wide activities. In such a system the domain-wide virtual catalog contains file identifying information for plural files distributively stored in two or more file servers on the network domain. Workstations are coupled by way of the network-linking backbone to the domain administrating server for accessing the domain-wide information retrieved by the domain administrating server.
U.S. Pat. No. 5,410,697. Concurrency Management Using Version Identification of Shared Data as a Supplement to Use of Locks. Uses a token and a shared lock issued by a local lock manager and local cache manager in response to a page read request from a process to trigger storage of a new cache. When the request is processed the token issued to the process during a prior request must match the token stored in the resident copy. When this occurs all resident copies of the cache are invalidated, a new cache is written through to the backing store of the changed page and a copyback is given with a new token.
Known within the art are source code management systems such as Revision Control System (RCS) and Concurrent Versions System (CVS). They allow software developers to store a history of changes in the source code of a computer program, and in the more advanced systems to have multiple users work on different copies of the data and merge their changes together. These systems calculate the differences between the different versions and store them on the server, and are generally intended only for text based files. Also known within the art are file/data-sharing, groupware and distribution systems, such as Groove, which facilitate communication among groups of people. In accordance with the groove-type system, communication between Groove users is peer to peer (that is one third party user""s computer communicates directly with another third party user""s computer). The connection is in direct relation and the changes are sent directly to the other. Unless both parties are connected to a network at the same time, and there is no hindering firewall there will be no communication possible. Therefore, in order to use the groove system a relay must be interposedxe2x80x94(if client A wants to transmit to client B then the data must be stored temporarily on a relay (computer). Later, when B is connected to the network, it downloads the data from the relay, at which point the data is removed from the relay. In the instant invention, unlike the groove-type system, the relay is not just a temporary storage locationxe2x80x94it""s a database where the data resides permanently. In the instant invention, the database does not function merely to provide drop off point where direct communication is not possible, but instead functions as the repository for the main copy of the data. In the groove-type system the data is ephemeral, it resides on the system until the users requests the same so that when a client downloads the data from it, it is no longer existent therein. With the present invention once the user downloads the data, only a copy is transmitted and the immutable original continues to reside in the database, allowing others to download the data. As a result the user is secure in the knowledge that the data will be there in the future, so in the event of catastrophic destruction of data on the user-side the data can be retrieved from the database again. Simply, groove-net relies on a peer to peer solution unlike the present invention.
The Freenet system is yet another instance of a peer to peer system. To fashion a Freenet system one uses a large network of machines, communicating in a peer to peer fashion. Data can be inserted and requested, and is protected, by employing a key self-signing protocol. Freenet differs however in a number of fundamental points. First, the core of Freenet is a routing algorithm designed to move entries near xe2x80x9csimilarxe2x80x9d entries, and to distribute them in such a way that popular data has more copies in the network than unpopular data. The result is a probabilistic system, where data that is inserted is not guaranteed to be retrievable by other users even if it exists somewhere in the system, depending on the location of the requestor. Less popular data, even if it is quite important, may be dropped from the system and disappear. The Freenet-type system is extremely decentralized, and requires many computers (hundreds or even thousands of computers). As a result a user who wishes to retrieve information may have to wait for a long period of time. Moreover, if the data is not accessed then it is likely it will be deleted from the system. This system is simply another peer to peer manifestation.
Email (using the SMTP protocol) although not being strictly peer to peer and being an asynchronous data sharing mechanism is quite different from the instant invention. Data is sent to a relay point, from which the recipient can retrieve it, optionally leaving it there for future use (e.g. the IMAP protocol). In email the message being sent is relayed through a series of SMTP servers until it reaches its destination, and the sender has no way of knowing if it arrived or was delayed or dropped on the way. Instead of sending the data to a database, the data is merely sent to another server to either be re-sent or downloaded, or resides on the server (IMAP). In the present invention, the data is sent straight into the database, and the sender is insured that it will be stored as an immutable entry indefinitely. Email is less generic, supporting only a chronologically-ordered listing of messages, whereas the instant invention uses a more generalized database structure allowing much more complex and sophisticated forms of interchange. The present invention is not user oriented, but uses location-based access control to create shared areas, allowing shared areas for multiple users, integration with existent access control, preventing people from spamming addresses, and automatic creation of new areas, as opposed to emails extremely limited comparable facilities.
Usenet is also known in the art. Based off of NNTP (Network News Transfer Protocol) and the Usenet message format, is a method for building newsgroups that can be accessed by networked computers. A user posts a message to a specific newsgroup (e.g. xe2x80x98comp.lang.pythonxe2x80x99), and other users can then list the contents of that newsgroup and download this message. The newsgroups are organized as a hierarchy of groups. For example, the newsgroup for the programming language Python is xe2x80x98comp.lang.pythonxe2x80x99. It is inside the xe2x80x98comp.langxe2x80x99 (computer languages) section of the top-level xe2x80x98compxe2x80x99 section (computing related newsgroups). NNTP supports user/password based authentication. The Usenet protocols and design are meant for message based discussionsxe2x80x94postings in the newsgroups are organized by a simple ordering (posting 1, posting 2 . . . ), unlike the present system where the data can be organized into more complex structures, hierarchies and versions. In the Usenet system the headers and meta-data cannot be encrypted. The present invention, on the other hand, resides on top of a generalized database, where all data is encrypted. Unlike Usenet""s hierarchal newsgroups the namespaces of the present invention are not structured in any way.
It is the principal object of the present invention to provide a system and method for distributing, backing-up and transmitting data to remote users.
Another object of the present invention is to provide a system for presenting immutable data.
An additional object of the present invention is to provide a system for sharing data where there is no single point of failure.
Yet another object of the instant invention is to provide a system whereby the user automatically receives data on an asynchronous basis.
Still another object of the instant invention is to provide a system whereby the user does not have direct access to the server and its files.
Another object is to insure that data is never lost, always accessible, easily shared, and may be accessed without a direct connection to the Internet.
Finally, a further object is to create a system for the above-stated tasks that may be layered over existing software, that works in conjunction with TCP/IP firewalls, and that can act as a backup mechanism.
Therefore, in accordance with the present invention, a system and method for the asynchronous transmission, storage, backing-up and distribution of data, imparts a proprietor (e.g. a server proprietor) with the ability to transmit data to remote users. As a general precept the data is immutable, that is a recipient does not have the power to modify, delete or change the data, said data or files residing as a snapshot thereof. The system resides as a layer above resident software and provides a method for storing, transmitting, backing up and distributing the data in the form of snapshots of existing data.
Overall there are three main parts to the system. The first part is an asynchronous data sharing part. A second part is a database preferably composed of immutable entries, while the third aspect of the system is the authentication protocol.
The data is stored and backed up at a desired interval, with the snapshot of changes derived therefrom preferably being stored as immutable entries of the data. The data as stored is internally consistent, consisting of a copy or snapshot of the original data at given intervals. Since the system generates copies or snapshots of the original data at specified time intervals either automatically or specified by the user, it provides the user with a way of sharing data and as a backup for the data. The referenced interval can be short (on the order of zero to ten minutes) as opposed to traditional backup system""s one day interval, it provides a historical view of the data as it changed, and allows other users to get an up to date internally consistent snapshot or copy of the data as it is changed.
By creating data which is identical and unchangeable all parties work on the same version. Without the ability to modify or delete data, each user is secure in the knowledge that there is but one updated version, and that is version is consistent and whole. The system provides on a snapshot basis immutable files which reside both centrally and locally. Moreover, the user needs only to save the data to a dedicated folder, and the system in accordance with the instant method will distribute the data upon the user connecting to a dial up connection (Internet, network, etc.). This solves the longstanding problem of automatically backing up data, and sharing the most recent incarnation of data among multiple users.
Additionally, once the data has been downloaded, the user can browse the data even while disconnected from the network, allowing offline access to the data.
In one embodiment, a proxy-based methodology tracks and accepts changes naming the version and housing same in the database and may be stored as a version of the original. Stored in an external database, a copy of the data resides, not only on at least one node, but also on the database node or the originating machine but may be asynchronously transmitted to other users upon their request and authentication.
In order to insure that only authorized personnel receive the data, a protocol might be added which authenticates access by a user who requests data. The act of logging on to the Internet, network or dial up connection, is sufficient to effectuate receipt by the user (requester) for updated data.
The system in accordance herewith does not change, modify or alter the software upon which it is layered, but is designed to reside thereupon.
Unlike a VPN, pursuant to the instant system, there is no need for the remote or local user to establish a connection to the file server in order to store or retrieve files. Instead, the instant system and method employs a protocol whereby logging on to the Internet automatically creates a communication for exchange of data.
The system in accordance herewith does not change, modify or alter the software upon which it is layered, but is designed to reside thereupon, and all data is kept in local storage, thus obviating the requirement for user access to a primary server.
One example of the system is a form-fulfillment system, for example signing up for health services. The user fills in their data in an application, which is then stored, in an encrypted form on the local hard disc. When the user connects to the Internet or some other network, the program stores this data on a remote data-storage system.
Asynchronously, the people who this data is intended for can connect to the data-storage system and download any new forms that have been submitted.
Another example (which will be expanded later on) of the operation of the system is a file-backup system. The system in accordance herewith utilizes a data storage system which constantly stores a snapshot of the file system so that changes thereto are stored in the database on a file by file basis. If the snapshot detects the change in a file or in data that file or data will be stored as a separate entry (file/data). Therefore, a snapshot is the database comprised of at least a file defined on a user by user basis.
As stated hereinabove, unlike a VPN there is no need for a direct connection to the server, instead there is a protocol whereby logging on to the Internet creates a communication for exchange of data. This is the essence of the asynchronous aspect of the system. Unlike most systems, the instant system is not a peer to peer system contemporaneous connection is not necessary.
The system in accordance herewith will queue each snapshot, saved in accordance with the timed update, and when the user logs on to the dial up network, will upload the snapshot to the primary or database node and pursuant to the instructions promulgated therewith, will request and receive any updates relating thereto.