The present invention relates generally to document archiving and document distribution, and in particular to a distributed secure peer-to-peer document archival system.
In a typical business workgroup IT infrastructure, two basic functions must be provided. The first is to insure that team members are able to access their documents and share them with other members. The second is to insure that no one else can access those documents. The first function typically requires a dedicated file server, centralized backups, dedicated network, static IP address and domain name service, the second requires firewalls, account and password management, and physical security for one's servers. Even when membership of a team is clearly defined and relatively static, such an infrastructure is difficult and expensive for a small business to maintain. It is even more difficult when a team is made up of members from several different organizations, and who might collaborate in some areas and compete in others.
Current document archive systems tend to follow one of two models:                The groupware model provides features that are especially useful to a single work group, company or other well-defined group of collaborators that wish to maintain a “group memory.” These features include remote access to documents, restricted access for non-group members, security, version control, and unique handles for documents to allow both linking and the creation of compound documents. Groupware systems are most often provided by centralized architectures such as file servers and Web-based content managers.        Conversely, the personal archive model has features to support the mobile, distributed and loose-knit organizations that are becoming increasingly prevalent in today's business world. Knowledge workers in these environments tend to work on many projects at once, and simultaneously belong to many overlapping (and potentially competing) communities. They are also increasingly mobile, and often find themselves in environments with slow, partitioned or no network access. Knowledge workers in these environments need a sharable personal archive: one that is easy for a single person to maintain, works both on- and off-line and supports an intuitive limited-publication model that allows an ad-hoc working group to share some pieces of information while keeping others confidential. These features all suggest a decentralized solution where each user maintains his or her own archive and shares certain files with others, as is provided today by PDAs, locally-stored email archives and traditional paper-based documents.        
From a user's perspective, the main difference between the centralized and decentralized solution is whether control naturally lies with the publisher or the reader of a document. On the Web, the publisher of a site (or his designated site administrator) has ultimate control and responsibility over who has access to a document, who can modify it and whether past versions are made available. The publisher may also decide to take a site down entirely, thus denying access to everyone. With email and paper-based solutions, it is the reader who has control. Anyone who receives a paper document has the ability to share it with someone else simply by making a photocopy, and once someone receives a paper document it is very difficult for the original author to “take it back.” Similarly, email is often forwarded to others, sometimes with modifications or annotations made by the person doing the forwarding. The decision to grant or deny access to a document is distributed among those who already have access, with limitations imposed through social (and sometimes legal) rules.
Whether publisher or reader control is “better” depends on the organization, the environment in which the information is being produced and used, and sometimes on who is doing the judging. Centralized solutions such as password-and-firewall-protected Web servers work well in environments where there are clearly-defined groups of people who need access to clearly-defined sets of documents, and where there is a clear distinction between authors and consumers of information. In more collaborative environments where group boundaries are fuzzier a distributed solution is often better. Most workers today fall somewhere between these two environments, engaging in both ongoing and ad-hoc collaborations, and thus need the advantages of both centralized and decentralized systems.