The life sciences are undergoing a paradigm shift from a traditional laboratory (wet science) driven industry to a truly information-driven industry. A new understanding of the workings of life at the genetic and molecular levels, together with laboratory automation, promises to make the processes associated with finding new drugs, therapies, and agricultural products radically faster, cheaper, and more effective. As a result, a formidable volume of data are pouring out of innovative technologies such as genomics, combinatorial chemistry, and high-throughput screening at an unprecedented rate.
The challenges that accompany the management of massive volumes of data may be compounded by the fact that life sciences data are often dispersed throughout the research and development (R&D) enterprise, across the public domain, and within the labs of external research partners. The data, which tends to be highly complex and constantly changing, may often be stored in multiple heterogeneous formats such as 3-D chemical structure databases, relational database tables, flat files, text stores, image repositories, web sources and other formats. This data may further reside on different hardware platforms, under different operating systems, and in different database management systems.
The lack of structure in some data sources, or the use of differing structures amongst structured data sources, also presents challenges to those trying to process the diverse sources. Unstructured data sources often store data as strings of data (e.g. text of a journal article) which makes it difficult to ascertain the relevance of a particular piece of the data when read out of context. For example, a text search for the string “alanine” in an unstructured database may retrieve a document where “alanine” is present in a single footnote and a document where “alanine” is discussed in depth. In an unstructured database, it may be difficult to differentiate between the documents.
Another challenge that arises from the volume of information currently inundating researchers is that it may be difficult to make intelligent decisions regarding particular avenues of research and development to pursue. For example, a company may be developing several promising new drugs, each at a different stage of regulatory approval processes, it would be profitable for the company to be able to make informed decisions regarding how to allocate company resources to maximize the overall revenue given the current state of the company's drug portfolio. Existing systems do not provide tools to facilitate such decisions.
Many pharmaceutical and biotechnology companies have recognized that the information challenge they face may consist largely of inefficiencies with existing information technology (IT) systems. As a result, many of these institutions have increased spending on IT research and development. Unfortunately, many drawbacks remain as the new technologies that have been adopted generally focus on optimizing particular tasks within the data management process, rather than focusing on the optimization of the data management process itself
These and other drawbacks exist.