Life science information is information relevant to the understanding of the structures, behaviors, operations, maladies, and processes of plant and animal life, and includes the nature of the work that generated it, the identity of the people who generated it, and assessments of its significance and context within the encyclopedic, ever growing life science knowledge-base of mankind.
Traditional methods of discerning and understanding the meaning of life science information are breaking down due to the large amount of material that must be absorbed and combined. New and old information are presented and stored in public, publicly accessible, proprietary, and private databases of different structures, printed or electronic journals, scholarly theses, patents, medical records, master files, books, clinical trial files, government data compilations, etc. These information sources exist in different formats, different languages, different data structures, conflicting vocabulary and ontology, and often are presented based on inconsistent and competing theories. The accessibility of these data for study and knowledge mining ranges from completely inaccessible trade secret data, to data available only by subscription, to current data generated by a colleague but not yet communicated, to obscure observations in a language foreign to the reader, to free public information a few clicks away. To form an effective understanding of a biological system, a life science researcher must synthesize information from many of these sources.
Understanding biological systems is made more difficult by the interdisciplinary nature of the life sciences. Forming an understanding of a system may require in-depth knowledge of genetics, cell biology, biochemistry, medicine, and many other fields. The literature in these fields often are addressed to specialists who do not frequently communicate outside their specialties: the protein chemist may not talk to and does not read the literature of the epidemiologist; the synthetic chemist may relate poorly to the molecular biologist.
Understanding a biological system may require that information of many different types be combined. Life science information may include material on basic chemistry, proteins, cells, tissues, and effects on organisms or population—all of which may be interrelated. These interrelations may be complex, poorly understood, or hidden.
Knowledge useful in the development of human therapies and the like is gained by inspired individuals seeking out and combining disparate data and then reasoning from it. Currently, progress is made as scientists locate and access diverse data sources, pose questions, seek other data in an attempt to refine or eliminate a hypothesis or make a connection, and devise and conduct new experiments. The scientist then publishes or otherwise records his new data, exposing it for review, criticism, and use by others. As knowledge increases, it become apparent that no person can possibly access, much less assimilate, all the available data in any field. Furthermore, the amount of data generated in the life sciences is increasing dramatically, with no end in sight. Those seeking new insights and new knowledge are presented with the ever more difficult task of connecting the right data from mountains of information gleaned from vastly different sources. Thus, to the extent our current system of generating and recording life science data has been developed to permit knowledge mining, it is clearly far from optimal, and significant new efficiencies should be available.
What is needed is a way to assemble and store vast amounts of life science information, and to make that information available in a manner that enhances understanding of the interrelationships within the information. It would be desirable to provide a system and methods that allow researchers to assemble life science data and mine information in a comprehensive manner that facilitates the understanding and revelation of the possibly hidden interactions of a biological system.