Research, whether it is academic or industry-based, increasingly requires collaboration between multiple groups where the collaborations are characterized by conducting research on shared data. Collaborative research among groups of researchers or research teams allows each group to take advantage of the discoveries of the other groups, which can accelerate the pace of progress and provide opportunities for synergy between disparate research foci.
Collaborative research has its own challenges, however, including how to best share information among the collaborative partners in a secure and confidential manner. In one example, collaborative biomedical research between multiple institutions involves the sharing of each institution's data, results, and conclusions with the other institutions. Such data can be shared by direct transfer from each institution to each of the other institutions, referred to herein as a “distributed” solution, or data from all institutions can be collected in a common repository that can be accessed by all collaboration partners, referred to herein as a “centralized” solution. Such collaborations are challenging to construct and maintain, as multiple issues must be resolved, including agreement on data usage policies, establishing trust between collaborators that data usage policies will be abided by, and providing the technical means of sharing and integrating data securely and in a time-efficient manner.
In collaborative research in particular, there are often the conflicting requirements to provide ready access to the data by all collaboration partners and at the same time restrict or block access to the data by any party or entity that is not a collaboration partner. Each collaborative entity should have ready access to the data without reducing the security level of the overall collaboration and without increasing the risk of leaking sensitive data to the outside world.
Many technical approaches have been tried to provide electronic-based infrastructure to facilitate such collaborations, but to date adoption has been limited. The are many reasons for failure, including the inability of solutions to adapt to the rapidly changing requirements of researchers, infrastructure cost, infrastructure complexity, and the inability to properly address data security and data privacy concerns. Conventional approaches to this problem include attempting to centralize the infrastructure, relying on non-technical solutions, such as data use agreements, or relying on each group to properly implement protection. Each of these approaches has disadvantages.
Centralizing the infrastructure can be done in two ways. The first way is to copy all data to a shared location with centralized authentication and authorization. This requires duplication of resources and raises the issue of data coherency between the centralized copy and the remote copy. The second way is to have all data exist only in the shared location, e.g., moving the group to the centralized location rather than moving the data to the centralized location.
Non-technical solutions tend to rely on promises to abide by agreed-upon behavior and imposition of some punitive measures for a breach of these agreements. For example, the parties may sign a data use agreement in which each party promises to share data only with the other collaborative partners. Such agreements would cover sharing of data by email to the other parties, for example. These solutions are susceptible both to accidental sharing of confidential information due to human error and to deliberate sharing of confidential information by fraud or intentional breach.
De-centralized technical solutions, in which each group is responsible for properly implementing protection, are only as secure as the security policies and implementations of the least secure group member. Here also, it is surprisingly difficult to implement secure transport of data from one party to another or from each party to a shared repository. Email encryption programs, for example, have been in existence for decades but are still not widely used; the vast majority of email sent is not encrypted in any way.
Another approach is to create a custom solution for secure, authenticated, and authorized data integration systems, but these solutions are by definition ad hoc, are usually so customer-specific as to be essentially non-reusable, and are thus usually very expensive to design and implement.
Thus, what is needed is an infrastructure that provides security, authentication, and authorization yet makes data sharing easy. Such an infrastructure would allow the researchers the flexibility to implement whichever data integration solution they deem best. Accordingly, there exists a need for methods, systems, and computer readable media for providing a secure virtual research space.