This invention relates to analyzing experimental data. More specifically, it relates to methods and system for identifying potential pharmaceutical drug candidates by interpreting and validating errors in experimental data with automated reasoning.
Historically, the discovery and development of new drugs has been an expensive, time consuming and inefficient process, With estimated costs of bringing a single drug to market requiring an investment of approximately 8 to 12 years and approximately $350 to $610 million, the pharmaceutical industry is in need of new technologies that can streamline the drug discovery process. Companies in the pharmaceutical industry are under fierce pressure to shorten research and development cycles for developing new drugs, while at the same time, novel drug discovery screening instrumentation technologies are being deployed, producing a huge amount of experimental data (e.g., gigabytes per day).
To fully exploit the potential of experimental data from high-volume data generating screening instrumentation, there is a need for new informatic and bioinformatic tools. As is known in the art, xe2x80x9cbioinformaticxe2x80x9d techniques are used to address problems related to the collection, processing, storage, retrieval and analysis of biological information including cellular information. Bioinformatics is defined as the systematic development and application of information technologies and data processing techniques for collecting, analyzing and displaying data obtained by experiments, modeling, database searching, and instrumentation to make observations about biological processes. Bioinformatic tools are being used to process experimental data to create and manipulate knowledge stores.
As is known in the art, xe2x80x9cknowledgexe2x80x9d includes a body of truth, information, expertise or principals obtained through the application of reasoning to facts or data. Knowledge is used for some task, e.g., to modify behavior based upon information and experience. A common view of knowledge is that it includes more value than mere data and information. At one level it is accepted that knowledge is something that mainly resides in the xe2x80x9cheads of individualsxe2x80x9d i.e., experience that divides an expert from a non-expert in a particular domain. Terms such as xe2x80x9cUse of Knowledgexe2x80x9d or xe2x80x9cKnowledge Management,xe2x80x9dxe2x80x9cKnowledge Capitalxe2x80x9d, xe2x80x9cKnowledge Assets,xe2x80x9d xe2x80x9cBusiness Intelligencexe2x80x9d and xe2x80x9cKnowledge Culturexe2x80x9d are becoming common in the pharmaceutical industry and industry in general.
One problem is that at best, knowledge in corporate databases can only be considered as declarative knowledge (i.e., information in computer readable form) or method and process knowledge (i.e., basic mathematical relationships). Another problem is that knowledge is viewed at some gross level as xe2x80x9cjust informationxe2x80x9d and thus the key to knowledge management is to improve information systems in some way.
Another problem is that there are many diverse approaches to knowledge storage and management. These knowledge storage and management approaches include, for example, basic repository; experience repository; corporate personal expertise base; knowledge transfer; knowledge culture; enhanced repository knowledge server; corporate rule based; data mining and data visualization; and data warehouse, datamart, Online Analytical Processing (OLAP) coupled to Executive (EIS)-or Management (MIS) Information System.
The basic repository includes knowledge extracted from human experts by some means and stored in a system for later access. The knowledge is mostly structured and primarily in the form of documents. The experience repository includes knowledge that is much less structured and in the form of insights and observations of experts, usually in the form of documents or threaded discussion databases. The corporate personal expertise base does not include knowledge as such but typically provides pointers to those individuals who do have knowledge. Knowledge transfer includes some means of transferring knowledge from individuals to other individuals. Knowledge culture includes knowledge promoted from a human resource perspective appreciation, value of knowledge and a culture of knowledge sharing.
Enhanced repository knowledge servers include an automated indexing, cross-referencing, annotation and presentation of information, with the expectation that this will lead to knowledge in some way. Corporate rule based includes knowledge from a true knowledge base using expert system technology to extract and codify knowledge into business rules that can be applied to information and data.
Data mining includes knowledge obtained from patterns in multidimensional data and then annotating those patterns to give them value. Data visualization includes transforming knowledge obtained from three-dimensional graphs to visual pattern representations. Data warehouse, data-mart, online analytical processing coupled to executive information systems or management information systems include knowledge obtained from business rules to summarize data and information into a second database where it is more readily accessible. Tools then present the enhanced information in various views, with drill down etc., so that an individual will be able to create the knowledge. These knowledge management and storage approaches differ widely, both in their manner and the technology used.
Another problem is that none of these approaches address managing knowledge in all of its forms throughout a business or multiple businesses and then using that knowledge as a fundamental driver of business. Another problem is that the whole drive towards knowledge management is in itself fundamentally flawed since it is the ability to use knowledge to change corporate behavior that is the real problem; the power to act on knowledge being one of most important factors of knowledge use.
The pharmaceutical, telecommunications, banking, aeronautical engineering, retail supermarkets, insurance companies and others are some of the commercial sectors that are applying knowledge based approaches at varying levels to successfully drive their business with knowledge. These industries are receiving very high returns in some cases (e.g., British-Telecom (UK) estimates implementing a knowledge based strategy for network maintenance scheduling will provide cost savings of 1 billion pounds per year).
Some of the companies that have implemented knowledge strategies to drive their businesses indicate that one or more of the following knowledge criteria need to be satisfied: (1) knowledge is extracted from a particular domain/discipline or business process (from experts, databases etc), in the form of rules or models of reality; (2) knowledge is encapsulated (usually into some form of software); (3) knowledge is delivered and used, either via, or within a conventional information infrastructure; (4) knowledge is used, together with data and information to change business behavior; and (5) knowledge management is combined with organizational re-structuring in order to best use knowledge; ideally the restructuring itself is driven by knowledge, and new knowledge and refinement of old knowledge can be accommodated. Note that in many cases all these criteria were not present in the approaches used by such companies and the points above represent an idealized case.
However, few companies that have applied such knowledge management strategies have applied it to their entire business, or structured a business or business division completely around knowledge management. Knowledge management is thus seen as an add-on rather than the foundation of a business. Thus, it is important that knowledge management in drug discovery should be a business foundation rather than an add-on.
Another problem is that as drug discovery becomes more and more complex, knowledge storage and management become more and more specialized and compartmentalized. The pharmaceutical industry typically has as many as 7,000 compounds in active development at any one time. It is already questionable whether the pharmaceutical industry can support this numerical level of drug development, especially when at the end of the process the number of new drugs entering the market has not shown any increase.
One optimistic viewpoint is that it is perhaps too early for drug candidates derived from the new discovery technologies such as high throughput screening to have progressed to late development and market. Alternatively, there is, and will continue to be, an increasing xe2x80x9cattrition ratexe2x80x9d of new compounds that start active development but never reach market. This high attrition rate has cost implications, as successful drugs must support the increasing number of drug candidate failures.
New technologies (e.g., high throughput screening) may therefore simply increase the number of compounds available for active development, a number perhaps in excess of one million at any one time and further aggravate the discovery problem. Since there is already a huge shortfall of compounds completing development (e.g., a 1 in 10 success rate), use of knowledge storage and management techniques known in the art are not improving the attrition rate for pharmaceutical compounds.
One of the key goals of the pharmaceutical industry is to reduce the attrition rate among new drug candidates accepted for development using knowledge. Thus, the need for decision making/support systems based on knowledge has been identified as xe2x80x9ccriticalxe2x80x9d to address the attrition rate for new drug candidates.
One problem associated with reducing the attrition rate is that it is difficult to determine errors in pharmaceutical data collected from automated screening systems. When automated screening systems are used there are almost always common xe2x80x9cphysical screening problemsxe2x80x9d related to instrument and/or equipment errors (e.g., a clogged or partially clogged pipette head), common microplate preparation errors, microplate variances within runs, bio-chip problems, gel-electrophoresis problems, etc. It is desirable to remove such-physical errors and others to improve interpretation and validation of any knowledge that is created.
Another problem is that xe2x80x9cbiological specific errorsxe2x80x9d such as errors in assays can also occur during automated screening. It is also desirable to remove biological errors when possible to improve any new knowledge generated from such pharmaceutical data. Therefore, it is desirable to provide an improved method and system to detect data collected for the pharmaceutical and other industries. The method and system should include the ability to identify and manipulate error data associated with physical as well as biological errors.
In accordance with preferred embodiments of the present invention, some of the problems associated with removing errors from experimental data from automated screening systems are overcome. A method and system for interpreting and validating experimental data are presented.
Pharmaceutical data from a knowledge database is classified with a semantic representation. A set of reasons for any classified pharmaceutical data is provided. The set of reasons are used to help interpret the classified pharmaceutical data to remove errors such as xe2x80x9cphysical errorsxe2x80x9d (e.g., pipetter errors, common microplate preparation errors, microplate variances within runs, bio-chip errors, gel-electrophoresis errors, etc.) and xe2x80x9cbiological specific errorsxe2x80x9d such as errors in assays.
The method and system may be used to improve the identification, selection, validation and screening of new real or virtual pharmaceutical compounds by removing physical and/or biological specific errors in pharmaceutical data. The method and system may also be used to provide new bioinformatic techniques for storing and manipulating pharmaceutical knowledge.
The foregoing and other features and advantages of preferred embodiments of the present invention will be more readily apparent from the following detailed description. The detailed description proceeds with references to the accompanying drawings.