This invention relates to analyzing experimental data. More specifically, it relates to methods and system for identifying potential pharmaceutical drug candidates by employing pattern acquisition, pattern map creation, pattern annotation and pattern recognition on pharmaceutical data.
Historically, the discovery and development of new drugs has been an expensive, time consuming and inefficient process. With estimated costs of bringing a single drug to market requiring an investment of approximately 8 to 12 years and approximately $350 to $610 million, the pharmaceutical industry is in need of new technologies that can streamline the drug discovery process. Companies in the pharmaceutical industry are under fierce pressure to shorten research and development cycles for developing new drugs, while at the same time, novel drug discovery screening instrumentation technologies are being deployed, producing a huge amount of experimental data (e.g., gigabytes per day).
To fully exploit the potential of experimental data from high-volume data generating screening instrumentation, there is a need for new informatic and bioinformatic tools. As is known in the art, xe2x80x9cbioinformaticxe2x80x9d techniques are used to address problems related to the collection, processing, storage, retrieval and analysis of biological information including cellular information. Bioinformatics is defined as the systematic development and application of information technologies and data processing techniques for collecting, analyzing and displaying data obtained by experiments, modeling, database searching, and instrumentation to make observations about biological processes. Bioinformatic tools are being used to process experimental data to create and manipulate knowledge stores.
As is known in the art, xe2x80x9cknowledgexe2x80x9d includes a body of truth, information, expertise or principals obtained through the application of reasoning to facts or data. Knowledge is used for some task, e.g., to modify behavior based upon information and experience. A common view of knowledge is that it includes more value than mere data and information. At one level it is accepted that knowledge is something that mainly resides in the xe2x80x9cheads of individualsxe2x80x9d i.e., experience that divides an expert from a non-expert in a particular domain. Terms such as xe2x80x9cUse of Knowledgexe2x80x9d or xe2x80x9cKnowledge Management,xe2x80x9d xe2x80x9cKnowledge Capitalxe2x80x9d, xe2x80x9cKnowledge Assets,xe2x80x9d xe2x80x9cBusiness Intelligencexe2x80x9d and xe2x80x9cKnowledge Culturexe2x80x9d are becoming common in the pharmaceutical industry and industry in general.
One problem is that at best, knowledge in corporate databases can only be considered as declarative knowledge (i.e., information in computer readable form) or method and process knowledge (i.e., basic mathematical relationships). Another problem is that knowledge is viewed at some gross level as xe2x80x9cjust informationxe2x80x9d and thus the key to knowledge management is to improve information systems in some way.
Another problem is that there are many diverse approaches to knowledge storage and management. These knowledge storage and management approaches include, for examples basic repository; experience repository; corporate personal expertise base; knowledge transfer; knowledge culture; enhanced repository knowledge server; corporate rule based; data mining and data visualization; and data warehouse, datamart, Online Analytical Processing (OLAP) coupled to Executive (EIS) or Management (MIS) Information System.
The basic repository includes knowledge extracted from human experts by some means and stored in a system for later access. The knowledge is mostly structured and primarily in the form of documents. The experience repository includes knowledge that is much less structured and in the form of insights and observations of experts, usually in the form of documents or threaded discussion databases. The corporate personal expertise base does not include knowledge as such but typically provides pointers to those individuals who do have knowledge. Knowledge transfer includes some means of transferring knowledge from individuals to other individuals. Knowledge culture includes knowledge promoted from a human resource perspective appreciation, value of knowledge and a culture of knowledge sharing.
Enhanced repository knowledge servers include an automated indexing, cross-referencing, annotation and presentation of information, with the expectation that this will lead to knowledge in some way. Corporate rule based includes knowledge from a true knowledge base using expert system technology to extract and codify knowledge into business rules that can be applied to information and data.
Data mining includes knowledge obtained from patterns in multidimensional data and then annotating those patterns to give them value. Data visualization includes transforming knowledge obtained from three-dimensional graphs to visual pattern representations. Data warehouse, data-mart, online analytical processing coupled to executive information systems or management information systems include knowledge obtained from business rules to summarize data and information into a second database where it is more readily accessible. Tools then present the enhanced information in various views, with drill down etc., so that an individual will be able to create the knowledge. These knowledge management and storage approaches differ widely, both in their manner and the technology used.
Another problem is that none of these approaches address managing knowledge in all of its forms throughout a business or multiple businesses and then using that knowledge as a fundamental driver of business. Another problem is that the whole drive towards knowledge management is in itself fundamentally flawed since it is the ability to use knowledge to change corporate behavior that is the real problem; the power to act on knowledge being one of most important factors of knowledge use.
The pharmaceutical, telecommunications, banking, aeronautical engineering, retail supermarkets, insurance companies and others are some of the commercial sectors that are applying knowledge based approaches at varying levels to successfully drive their business with knowledge. These industries are receiving very high returns in some cases (e.g., British Telecom (UK) estimates implementing a knowledge based strategy for network maintenance scheduling will provide cost savings of 1 billion pounds per year).
Some of the companies that have implemented knowledge strategies to drive their businesses indicate that one or more of the following knowledge criteria need to be satisfied: (1) knowledge is extracted from a particular domain/discipline or business process (from experts, databases etc), in the form of rules or models of reality; (2) knowledge is encapsulated (usually into some form of software); (3) knowledge is delivered and used, either via, or within a conventional information infrastructure; (4) knowledge is used, together with data and information to change business behavior; and (5) knowledge management is combined with organizational re-structuring in order to best use knowledge; ideally the restructuring itself is driven by knowledge, and new knowledge and refinement of old knowledge can be accommodated. Note that in many cases all these criteria were not present in the approaches used by such companies and the points above represent an idealized case.
However, few companies that have applied such knowledge management strategies have applied it to their entire business, or structured a business or business division completely around knowledge management. Knowledge management is thus seen as an add-on rather than the foundation of a business. Thus, it is important that knowledge management in drug discovery should be a business foundation rather than an add-on.
Another problem is that as drug discovery becomes more and more complex, knowledge storage and management become more and more specialized and compartmentalized. The pharmaceutical industry typically has as many as 7,000 compounds in active development at any one time. It is already questionable whether the pharmaceutical industry can support this numerical level of drug development, especially when at the end of the process the number of new drugs entering the market has not shown any increase.
One optimistic viewpoint is that it is perhaps too early for drug candidates derived from the new discovery technologies such as high throughput screening to have progressed to late development and market. Alternatively, there is, and will continue to be, an increasing xe2x80x9cattrition ratexe2x80x9d of new compounds that start active development but never reach market. This high attrition rate has cost implications, as successful drugs must support the increasing number of drug candidate failures.
New technologies (e.g., high throughput screening) may therefore simply increase the number of compounds available for active development, a number perhaps in excess of one million at any one time and further aggravate the discovery problem. Since there is already a huge shortfall of compounds completing development (e.g., a 1 in 10 success rate), use of knowledge storage and management techniques known in the art are not improving the attrition rate for pharmaceutical compounds.
One of the key goals of the pharmaceutical industry is to reduce the attrition rate among new drug candidates accepted for development using knowledge. Thus, the need for decision making/support systems based on knowledge has been identified as xe2x80x9ccriticalxe2x80x9d to address the attrition rate for new drug candidates.
One problem associated with reducing the attrition rate is that it is difficult to recognize patterns representing a desired biological or chemical activity concealed with in a range of pharmaceutical data. The desired activity may include activity at a target site, selectivity, drug absorption, distribution, metabolism and excretion information (xe2x80x9cADMExe2x80x9d), toxicology and other activities.
Another problem is that due to the large amount of automated screening data generated, it is very difficult to process the screening data to recognize previously unrecognized patterns representing a desired activity concealed with in a range of pharmaceutical data. Recognizing such unrecognized patterns may be beneficial or even crucial to finding new pharmaceutical drug candidates.
Another problem is that is often desirable to compare patterns from screening data to other known patterns. Due to the complexity of comparing patterns with large numbers of elements, such pattern comparisons are not typically completed using pattern recognition systems known in the art.
Another problem is that it is becoming more and more desirable to create xe2x80x9cvirtual pharmaceutical compoundsxe2x80x9d and virtual assays. Virtual screening may be completed on such virtual pharmaceutical compounds to create simulation data. It is very difficult to compare any determined patterns from such a simulation to known patterns for real pharmaceutical compounds.
Therefore, it is desirable to provide an improved method and system to recognize previously unrecognized patterns representing a desired activity concealed with in a range of pharmaceutical data. The method and system should include the ability to extract existing knowledge and create new knowledge from the unrecognized patterns.
In accordance with preferred embodiments of the present invention, some of the problems associated locating previously unrecognized patterns representing a desired activity concealed with in a range of data are overcome. A method and system for creating and using knowledge patterns are presented.
One or more patterns derived from one or more experimental data sources are acquired. A knowledge map is created using the one or more patterns. The knowledge map may be used to locate and describe previously unrecognized patterns concealed with in a selected range of data. New knowledge can be added to the knowledge map by annotating selected regions of the knowledge map.
The method and system may be used to improve the identification, selection, validation and screening of new real or virtual pharmaceutical compounds by locating previously unrecognized patterns of biological activity concealed with in a range of pharmaceutical data. The method and system may also be used to provide new bioinformatic techniques for storing and manipulating pharmaceutical knowledge.