Research into biological systems is moving from manual experimental techniques to robotics, and toward automated fluorescent detection in high throughput and/or high content screening. Continuing improvements in automation and data processing are useful and important. Specific advanced software technologies within the bioinformatics industry, particularly association mining, reverse engineering, knowledge assembly and simulation components, have enhanced computational biology to create new capabilities that are needed to improve and accelerate biomedical research.
In research involving environmental systems, concerns about the build-up of carbon dioxide in the atmosphere have spawned modern global-warming research. With more carefully designed monitoring networks the movement of carbon dioxide through the atmospheric, biospheric and oceanic reservoirs can be understood more completely. Inverse dynamic modeling, and redirecting monitoring efforts based on modeling needs, can improve insight into the workings of the natural system. Much as in the case of running a river flow model in reverse to detect pollution sources, the Earth's biogeochemical cycles can be reverse-engineered to detect the workings of the coupled ocean-atmosphere-biosphere system.
As shown in FIG. 1, it has been understood for more than fifteen years that likely consequences of global warming will impose damages through storms, storm surge, erosion, flooding, disease vectors, sea-level rise and impacts upon domestic water, among other impacts. As the Earth's climate system becomes more energetic, it is likely that storm frequency and storm force will increase. Human populations, for the most part live on the shoreline. Scenarios to assess risk in these vulnerable areas have been run in most major cities. For example, prior to Hurricane Katrina hitting in New Orleans in 2005, modeling exercises anticipating such flooding had been available to governmental managers at state and federal levels. Uncertainty in measurement and modeling, variability in human perception of risk, and avoiding costs of precautionary measures all played together to leave the city vulnerable.
The energy-technology feedback (ETF) is a relevant modeling component for multiple organizational levels (i.e., from human cells to global governance of energy resources), both as a physical force and/or as a dynamic process that could be susceptible to engineering. To understand the ETF will require better research and modeling tools, particularly advances in integrated monitoring, modeling and management (IM3) methods.
In U.S. Pat. No. 6,448,983, issued Sep. 10, 2002, incorporated herein by reference in its entirety, Ali et al. disclose a method for assisting a user in selecting an experimental design by obtaining attributes associated with a many experimental designs and, through user responses to questions about objectives of the design of the experiment, user-selected attributes are determined from which the process selects or de-selects one or more of the experimental designs and notifies the user of the selection.
Y. Wang et al. have previously disclosed a computer-implemented method of designing a set of experiments to be performed with a set of resources, which can include providing a set of parameters and a set of constraints, the parameters including a plurality of factors to be varied in a set of experiments and representing axes defining a parameter space, the set of constraints including one or more experimental constraints representing limitations on operations that can be performed with the set of resources, generating a plurality of configurations based on the parameters constraints, each configuration including a plurality of experimental points, each point having a set of values for the parameters, and selecting a configuration from the plurality of configurations, and defining a set of experiments based on the selected configuration (U.S. Pat. No. 6,996,550, issued Feb. 7, 2006, incorporated by reference herein in its entirety).
D. R. Dorsett has described a computer-implemented method for processing experimental data according to an object model, comprising providing an object model for representing experiments performed in a laboratory data management system, the object model including a first pre-defined experiment class that can be instantiated to define one or more experiment objects that represent data for particular experiments performed in the laboratory data management system, the first pre-defined experiment class having an associated variable definition template defining a plurality of variable types that can be used to represent data from experiments performed in the laboratory data management system, the first pre-defined experiment class being configurable to represent a plurality of different types of experiments performed by the laboratory data management system based on different sets of variable definitions; receiving input specifying a first set of one or more variable definitions defining a set of variables for a first experiment type to be represented by one or more instances of the first pre-defined experiment class, the variables in the set of variables having types selected from the plurality of variable types defined in the variable definition template; receiving data from an experiment of the first experiment type, the data including a plurality of values corresponding to variables defined in the first set of variable definitions; storing a first representation of the data from the experiment of the first experiment type in a format defined according to the plurality of variable types; and presenting a second representation of the data from the experiment of the first experiment type, the second representation being derived from the first representation and being presented in a format defined according to the first set of variable definitions (U.S. Pat. No. 7,213,034, issued May 1, 2007, incorporated by reference herein in its entirety).
L. B. Hales et al. have disclosed process control optimization systems that use adaptive optimization software with goal-seeking intelligent software objects that contain expert system, adaptive models, optimizer, predictor, sensor, and communication translation objects, arranged in a hierarchical relationship whereby the goal-seeking behavior of each intelligent software object can be modified by objects higher in the structure and in a relationship that corresponds to the controlled process (U.S. Pat. No. 6,112,126, issued Aug. 29, 2000, incorporated by reference herein in its entirety).
A. Bondarenko has described a system that digitally represents an experiment design with a definition that provides the logical structure for data analysis of scans from one or more biological experiments, and either directly reflects the experiment design in a one-to-one relationship, or the user can customize the experiment definition, where the experiment definitions are stored as a set of instructions in a database of experiment definitions, and a user can customize one or more automated analysis pipelines for processing the experiment definitions (U.S. Pat. No. 7,269,517, issued Sep. 11, 2007, incorporated by reference herein in its entirety).
T. Lorenzen et al. disclosed an expert system for the design and analysis of experiments that includes a descriptive mathematical model of the experiment under consideration yielding tests that supply information for comparing different designs and choosing the best possible design, providing a layout for data collection of data, and the system Once the data has been collected and entered, the system analyzes and interprets the results. (U.S. Pat. No. 5,253,331, issued Oct. 12, 1993, incorporated by reference herein in its entirety).
U.S. Pat. No. 6,615,157 issued to Tsai on Sep. 2, 2003, herein incorporated by reference in its entirety, discloses a system and method and computer program product for automatically assessing experiment results obtained in a process by analyzing attributes representing experimental results of a process, where change in a control variable alters an attribute, where attributes that are expected to be affected by changes in the control variable of the process are listed in a knowledge base; comparing the altered attributes from an experiment with those listed; and identifying the altered attributes that are not listed and storing these in a non-conformity database.
Development has occurred in structuring domain knowledge into specialized relational databases (knowledge bases) that can be interrogated by artificial intelligence methods. Aspects of these domain knowledge bases (KBs) can be domain ontologies, such as those developed for research in the life sciences. A method and system for managing and evaluating life science data is described in U.S. application Ser. No. 10/644,582 (D. N. Chandra, et al., filed Aug. 20, 2003), incorporated herein by reference in its entirety, where life science data is placed in a knowledge base and used for creating a knowledge base by generating two or more nodes indicative of the data, assigning to one or more pairs of nodes a representation descriptor that corresponds to a relationship between the nodes, and assembling the nodes and the relationship descriptor into a database, such that at least one of the nodes is joined to another node by a representation descriptor that can include a case frame that describes the relationships between elements of life science data.
U.S. application Ser. No. 10/992,973 (D. N. Chandra, et al., published Jul. 28, 2005), incorporated herein by reference in its entirety, includes methods for performing logical simulations within a biological knowledge base, including backward logical simulations, which proceeds from a selected node upstream through a path of relationship descriptors to discern a node which is hypothetically responsible for the experimentally observed changes in the biological system and forward logical simulations, which travels from the target node downstream in a causal network through a path of relationship descriptors to discern the extent to which a perturbation to the target node causes experimentally observed changes in the biological system. Also disclosed are methods to perform a logical simulation on a hypothetical perturbation and method steps for conducting an experiment on a biological specimen to determine if the hypothetical changes predicted by logical simulation correspond to the biologically observed change.
U.S. application Ser. No. 10/717,224 (D. N. Chandra et al.), which is incorporated herein by reference in its entirety, discloses a system that uses an epistemic engine that accepts biological data from real or thought experiments probing a biological system, and uses these data to produce a network model of component interactions consistent with the data and prior knowledge about the system, and thereby ‘deconstructs biological reality and proposes testable hypotheses/explanations/models of the system operation. An associated method of proposing new knowledge is disclosed that includes providing a representation structure for certain biology concepts (where causal network nodes represent known conditions, processes, and physical structures, with interrelationships among nodes described qualitatively), proposing a biological model by specifying many pairs of nodes and descriptors between selected nodes, simulating the proposed model to produce simulated data, assigning a fitness measure to the proposed model as a measure of how the simulated data compares to measured biological behavior or properties (reality), iterating for many different proposed biological models; and selecting the best-fit proposed models based on fitness measures.
Biological systems have been investigated by dynamic simulation of cellular models. For instance, U.S. Pat. No. 7,415,359 issued Aug. 19, 2008 to Hill et al., which is incorporated herein by reference in its entirety, discloses systems and methods for cell simulation and cell-state prediction, where a cellular network can be simulated by representing interrelationships with equations solved to simulate a first state of the cell, then perturbing the network mathematically to simulate a second state of the cell which, upon comparison to the first state, identifies components as targets.
U.S. application Ser. No. 11/985,618 by Hill et al. (Filed Nov. 15, 2007; Publ. No. 20080208784, Published Aug. 28, 2008), which is incorporated herein by reference in its entirety, discloses using a probabilistic modeling framework for reverse engineering an ensemble of causal models from data, pertaining to numerous types of systems, and then forward simulating the ensemble of models to analyze and predict the behavior of the network, including data-driven techniques for developing causal models for biological networks. Here causal network models include computational representations of the causal relationships between independent variables such as a compound of interest and dependent variables such as measured DNA alterations, changes in mRNA, protein, and metabolites to phenotypic readouts of efficacy and toxicity.
Hood et al. (U.S. application Ser. No. 09/993,312, incorporated herein by reference in its entirety) disclose methods of predicting a behavior of a biochemical system by comparing data integration maps of the system under different conditions, comprising at least two networks, and identifying correlative changes in value sets between the maps to predict behavior of the system.
Methods of interrogating complex systems to understand dynamic behavior can be assisted by advanced data mining techniques, including reverse engineering relationships in a causal network that represents the system. First steps in reverse engineering include finding correlations or associations between pairs of nodes, or associations among three or four nodes, or preferably among much larger sets of nodes. Computationally, finding an optimal set of a large number of associated nodes in a complex system around which to structure behavioral simulation can become a nondeterministic polynomial-time hard (NP-hard) type problem. In this regard, U.S. Pat. No. 6,493,637 issued to Steeg on Dec. 10, 2002, which is incorporated herein by reference in its entirety, discloses a method and system for detecting coincidences in a data set of objects, where each object has a number of attributes, iteratively sampling equally-sized subsets of the data, and recording co-occurrences of a plurality of attribute values in one or more objects in the subset (coincidences), determining expected coincidence count and comparing with the observed to determine a measure of correlation, with a resulting set of attributes for which the measure of correlation is above a predetermined threshold (k-tuples) being reported. This ‘association mining’ method is useful for finding associations among large sets of associated nodes in complex system data (See also Evan W. Steeg, Derek A. Robinson, Ed Willis: Coincidence Detection: A Fast Method for Discovering Higher-Order Correlations in Multidimensional Data. KDD 1998: 112-120; incorporated herein by reference in its entirety).
U.S. Pat. No. 5,384,895 to Rogers et al. (issued Jan. 24, 1995), which is incorporated herein by reference in its entirety, describes a self-organizing neural network and method for classifying a pattern signature having N-features where the network provides a posteriori conditional class probability that the pattern signature belongs to a selected class from a plurality of classes with which the neural network was trained. In its training mode, a plurality of training vectors is processed to generate an N-feature, N-dimensional space being defined by a set of non-overlapping trained clusters. Each training vector has N-feature coordinates and a class coordinate. Each trained cluster has a center and a radius defined by a vigilance parameter. The center of each trained cluster is a reference vector that represents a recursive mean of the N-feature coordinates from training vectors bounded by a corresponding trained cluster.
In another approach to solving complex system functions in biological applications, U.S. application Ser. No. 11/668,671 to Shaw, filed Jan. 30, 2007 and incorporated herein by reference in its entirety, discloses a computational method of determining a set of proposed pharmacophore features describing interactions between a known biological target and ligands showing activity towards the target by identifying a set of n-dimensional inter-site distance (ISD) vectors, the set comprising at least one ISD vector from each of two or more ligands, each of the ISD vectors being associated with a specific set of pharmacophore sites within a single conformation of one of the ligands, the sites being identical in number and type to the pharmacophore features from which the set of ISD vectors is defined; and using a computerized process of hierarchical partitioning to determine, from a top-level multi-dimensional space, a refined, smaller multi-dimensional space defining the distance ranges for each dimension of the ISD vectors, said distance ranges being used to propose spatial relationships among said set of pharmacophore features.
A problem with the automation software utilized in the research equipment for systems research (including biotechnology and related biomedical research laboratories) is that existing solutions are created with many lines of custom code or threads written in programming languages such as C, C++, C#, or Java. This programming methodology originated in research labs and universities where the advanced research processes were developed and proven. These same processes and associated automation software have been moved to research equipment without change, in an attempt to maintain the original results. Optimization and maintenance of these islands of custom code have created a major obstacle for an information-enabled, high volume research environment.
At the same time, the industry is attempting to lower costs, reduced time to market, reduce start-up time, and achieve greater reliability and availability of the equipment and experimental process. The industry is reacting to the need to connect these islands of custom code while optimizing the research processes. Standards organizations are sponsoring multiple biotechnology-specific standards that have been written or are being developed to define an enhanced research environment. This environment focuses on optimizing the research processes by accessing process data and applying analysis and corrective actions within equipment and across multiple pieces of equipment. This approach, based on extending the existing code base, has created a more complex environment and at this point, not achieving the cost, research and optimization goals. This problem has not been completely solved to date and the pieces that exist are mainly custom software code.
Further, the advent of multiple biotechnical research companies which each may specialize in a particular aspect or phase of an experiment, or phase of research in the development of research-based knowledge, has led to an opportunity to integrate these many aspects, or many research functionalities, into a coordinated ensemble and/or research progression. However, the tools to effect such an integration, and particularly to automate such a progression in a way that would allow rapid and iterative looping of experimental result from a previous experiment to automatically initiate the conditions and starting procedures for a next experiment have not previously been developed. There is, therefore, an unmet need in industry to provide improved research methodologies in the biotechnology and/or biomedical industry, and particularly to provide improved software and hardware systems for managing automated laboratories and automated research methodologies.
There is a continuing need to improve the conduct and data processing aspects of research into complex systems. Particularly, there is a need to improve access to automated experimentation in order to accelerate the pace of productive research. A number of prior developments have used computing and expert systems in relation to experiments, experimental design and automation, and automated processing of results. Now, there is a pressing need to use the steady increase in computing power to better assist researchers in choosing experiments, getting them run, processing the data quickly, and using the results intelligently to rapidly inform the next round of experimentation.
Compounding of environmental and economic stresses is threatening populations. There is a need for an automated, Integrated, Monitoring, Modeling and Management (AIM3) learning model to explore rapidly how energy dynamics relate to the growth and stability of social systems and subsystems, as this may assist managers to utilize improved expert monitoring and modeling for guidance in avoiding environmental calamity. There is needed an AIM3 research model to study the subsystem behavior of the Energy-Technology Feedback (ETF) in the domain of global energy use.
It is a further goal to provide a business method and system that employs computerized, automated steps and standardization at numerous points in the process of providing a medical or biomedical service, such as, for example, stem cell extraction and freeze-dried storage at room temperatures, in order to further improve handling-efficiencies and thereby increase cost savings owing to these efficiencies and reduction of human labor costs.