1. Field of the Invention
The present invention relates to a method and apparatus for analyzing multivariable experimental data, and for drawing conclusions concerning relationships between these variables. The invention is especially useful for analyzing processes in complex systems consisting of many components, such as in biology, where the experimentally derived data sets characterize these components and the inducing and inhibiting interactions between them.
2. Prior Art
Modern biological research is revealing in growing details the complexity of living systems. Thus it is generally appreciated that biological mechanisms are the outcome of a large number of highly interactive molecular events. Experimental techniques must provide therefore not only increased sensitivity and selectivity to minute (even individual) molecular components, but also characterize the whole biological context by simultaneous measuring of many variables at each single experimental procedure, reporting about as many relevant components participating in the studied process.
Biological systems can be described at various levels and resolutions. Modern biology describes biological processes in terms of molecular components. Molecular mechanisms are typically described as pathways consisting of sets of rules for the way molecules interact, reacting to each other, and changing properties to inhibit or induce their functional activity in response to a stimulus. As a result of these sets of rules many changes are induces, leading to a new cellular state (e.g. enter a proliferative cell cycle). This process is often termed a cellular decision mechanism or a signaling pathway.
Molecular mechanisms are often presented graphically as a set of nodes connected by a network of line segments, where the nodes are usually the molecular components and the lines between them present the interactions. A molecule suspected to be involved in a process help explore its mechanisms by introducing changes in its activity and searching for correlated changes induced on other molecules.
Exhaustive exploration of biological systems is the goal of present efforts to develop automated large scale assays such as multisensor arrays and DNA chips, in order to document the expression patterns of genes. One of the most commonly used techniques in biology, namely polyacrylamide gel electrophoresis (PAGE), also provide information about the protein inventory of cells and tissues by quantifying tens (in one dimensional gels) to hundreds of bands (in two dimensional gels). Comparison of electrophoretic patterns following specific treatments can reveal multiple changes in the levels of many protein bands. Moreover, combined with immunoblotting techniques, such changes can be assigned to specific proteins and their post-translational modifications and activation. However, these changes are usually too numerous to reveal, or even suggest definitive causal relationships between individual molecular components. This can be studied by quantitative analysis methods, such as this invention.
An example to illustrate the information extractable by quantitative analysis of biological measurements is signal transduction in cells. Signal transduction mechanisms have logical structure of well defined input stimulants, and they lend themselves experimentally to multivariable measurements of cell output responses, for example by recording the many activated molecular components by phosphorylation changes, using phosphotyrosine gel blots.
Understanding of the molecular mechanisms underlying signal transduction in cells has advanced greatly in recent years, and include activation, alteration, regulation, maintenance and termination of various cellular functions [Alberts et al. 1994]. Classically, signaling pathways were described as cascades of sequentially activated events leading to recognizable responses. Responses at cellular levels involve post-translational modifications of proteins (like phosphorylation or dephosphorylation) affecting molecular interactions and enzymatic activities, causing translocations between different cell compartments, notably between the nucleus and the cytoplasm, and leading to changes in the expression of genes and global alterations in morphology and in cell cycle. Due to the multitude of processes involved (directly or indirectly), characterization of the molecular mechanisms induced by cell stimulation is based on a wide variety of immunological, biochemical, genetic and cell biological approaches.
Genetic manipulations are undoubtedly the most powerful method to dissect molecular mechanisms. Many genetic manipulations are binary (yes/no, such as forced expression in transgenic cells and animals or knockouts). This makes them ideally suited to assign the function of specific molecules in signaling cascades, and identify the downstream events. Yet, assigning attributes to molecular changes like cause and effect (upstream-downstream relationships) which appear often obvious in model systems, becomes difficult in many realistic conditions. Isolation of homologous molecules in various species, ranging from bacteria and yeast to mammals, indicates the universality of signal transduction genes. However, their detailed functionality often displays system dependent quantitative as well as qualitative diversion compared to their characterization in specialized systems (such as low forms of life or overexpressing cell lines). The increasing complexity in species with larger genome is therefore not only attributed to the increasing number of independent cascade-like pathways, but also on the multiplicity of cross-talks and interactions between pathways, which turn them into interlinked networks. This network architecture is believed to account for the robustness of higher species against random mutations, without compromising the evolutionary potential: duplication of pathways gave rise to networks of cross-talking components, and independent mutations evolved abilities to respond specifically and non-linearly to a growing repertoire of stimuli in a cell-type dependent manner [Bray 1990]. There are though special, and extremely important cases of degenerate network architectures. In order to guarantee synchrony of cellular functions (for example, cell cycle progression) cells evolved mechanisms that cumulate information about multiple conditions before deciding to act, which then spreads out signals to many responses. It is this reduced robustness of the critical locales of cellular decision nodes (or checkpoints) that is associated with cancer [Fearon and Vogelstein, 1990]. Characterizing the hierarchy of interactions between the components in signaling pathway networks and identifying the logical architecture of cellular decision making is therefore a critical question that draws intensive research efforts.
How is it possible to probe a network architecture such as that underlying cell signaling pathways? The problem of defining the content of an electronic black box by stimulating its inputs and measuring the signals emerging at the outputs has been formulated for a network of linear elements long time ago. The dynamic behavior of networks of interacting elements can in principle be modeled by sets of differential equations which solves equilibrium states and dynamic responses of the system to perturbations. For example, the behavior of chemical mixtures can be solved in terms of concentrations and chemical reaction constants (presenting the interactions). The behavior of interlinked biological pathways were described in relation to metabolic cycles and their control [Chock and Stadtman, 1977]. Features like amplification and non-linear response, feedback and temporal integration all emerge from the interlinking [Hjelmfelt et al. 1993]. However, it is rarely the case that sufficient information exist to describe complex biological mechanisms in the level of details required for such dynamic modeling. A number of recent works have applied neural networks to model biological pathways like receptor mediated signaling of bacterial chemotaxis [Bray et al. 1993], and segmentation in the embryonic development of Drosophila, [Burstein, 1995]. These works used the known hierarchical structure of the molecular mechanisms to build neural networks that model the behavior of the studied biological systems. One purpose of the present invention is to do the inverse of common neural network analyses, namely to deduce about the hierarchical network structure of the studied biological mechanism from the analysis of the raw experimental measurements.