The present invention relates generally to collaborative data integration and more particularly to collaborative information systems.
Scientific research has become increasingly reliant on collaborative effort among multiple institutions and interdisciplinary consortia which share scientific experiments and data and collaborate on analysis of data and results. Traditional data management and integration systems focus on passively integrating existing data. Thus, the collaboration among data providers and users is limited.
For example, the increased complexity of biomedical problems requires collaborative effort from multiple institutions and interdisciplinary consortia. The National Institutes of Health (NIH) provide large-scale collaborative project awards for teams of independently funded investigators to synergize and integrate their efforts. In this way, consortia are formed to pool expertise, validate approaches, forge common instrumentation platforms and rapidly translate new technologies toward clinical trials.
One example is the “Networks for Translational Research: Optical Imaging” (NTROI), which was structured to support four multi-site teams that would include broad national and international representation from academia, NIH intramural, and device and drug industry investigators. One team works on breast cancer research with Multi-Dimensional Diffuse Optical Imaging. The consortium consists of six research programs across multiple universities and hospitals and includes nearly one hundred researchers. Together with three other teams, there are several hundred researchers working on the problems of optical imaging. The researchers not located at the same sites are limited in how closely they are able to collaborate due to the distributed nature of such large scale research consortia.
However, current data integration systems only provide passive integration of existing data sources—a bottom up approach. There are several problems with this approach. First, schemas of data sources continue to evolve throughout the investigative process. This can disrupt integration. Second, data providers are generally not aware of changes and updates from other data sources. Also, they generally cannot contribute to such changes. That is, they are isolated to their own research and it is difficult for them to participate in collaboration with other researchers. Third, data users can only retroactively use data provided by others. The data users cannot proactively participate in active discussion, data reviewing, data authoring, or schema definition with other members due to the passive nature of traditional data integration. The lack of active collaboration can also cause disagreement in collaborative data sharing.
The distribution and large scale of scientific data also poses new challenges for scientific data management and integration. The warehouse based approach is difficult. Collecting large amount of image data over the Internet can be very slow. Additionally, researchers generally prefer having control of their data on a server located on their own labs instead of storing the data elsewhere. As a result, without pushing collaboration into a collaborative environment, information is becoming further isolated.
Further, the evolution of Web technology is transiting to a new paradigm. The term Web 2.0 refers to a second generation of services available on the World Wide Web that let people collaborate and share information online. For example, product purchasing sites (e.g., booksellers, clothiers, etc.) use users and/or readers as contributors, contributory knowledge sites (e.g., wikipedia, etc.) allow all content to be authored by users, weblogs generate content through participation (e.g., comments, etc.) instead of publishing only, and peer-to-peer file sharing sites (e.g., USENET, Bit-Torrent, Gnutella, FastTrack, etc.) radically decentralize data and the systems work by large scale participation of users. The Web is now shifting to strong interaction, participation, trust, and decentralization.
Therefore, for such large scale networks of research, a collaborative environment and data integration system for researchers to easily manage, collaborate, share, and review their experiments and results is needed.