Analysis of functional or modal sets of data may focus on particular targeted characteristics or behaviors of subsets of the data, but it may not otherwise provide global and/or dynamic perspectives (e.g., invariants) that can be inferred collectively from a set of data. The conventional use of controlled vocabularies to describe sets of data may exploit only the taxonomical properties (e.g., membership or set containment) of the ontology, and likely may not otherwise use process-oriented properties to present dynamical perspectives on the whole systems, e.g., in biological systems. However, such dynamical perspectives can be important in obtaining a better analysis, e.g., a process-level understanding of the underlying dynamics and relationships that may be acting to produce the observed data.
Useful information can be obtained for characterizing a dynamical system by encoding its properties into the vernacular of temporal logic. Temporal logic may be defined in terms of Kripke structures, which can be expressed in the form (V, E, P). This can be understood to represent a “semantic support” for hybrid systems. Here, (V, E) can be understood to represent a directed graph having a plurality of reachable states of the system as vertices, V, and state transitions of the system as directed edges, E. For example, a classic cell-cycle can be characterized by six states: M, G1(I), G1(II), S, G2 and G0. P can represent a labeling of the states of the system with properties that apply to each state.
Conventional model systems and/or experimental conditions may be used conventionally to formally define a Kripke structure. Defining a Kripke structure can require defining states, a state transition diagram and providing a labeling of the states using a particular vocabulary.
A redescription can be understood to mean a shift of vocabulary, e.g., a different way of communicating a given aspect of information. Redescription mining is a technique that may be used to find sets (e.g., sets of genes) that can be associated with multiple definitions. In biological systems, the inputs to a redescription mining technique may be of different forms, e.g., a universal set of open reading frames (“ORF”s) associated with a particular organism, and various subsets, or “descriptors,” which may be defined over this universal set. These subsets can be based on diverse sets of information, e.g., prior biological knowledge, or they may be defined by the outputs of algorithms operating on gene expression data. An exemplary descriptor can be from the field of systems biology, e.g., “genes involved in glucose biosynthesis.”
Redescription mining can connect diverse vocabularies by relating set-theoretic constructs formed over the descriptors. For example, it may be possible to determine, in a biological system, that “genes expressed in the desiccation experiment except those participating in universal stress response” is the same as “genes significantly expressed 2-fold positively or negatively in the salt stress experiment.” This redescription relates a set difference in the first descriptor to a set union in the second descriptor. Such equivalence relationships can assist in unifying diverse ways of qualifying information by identifying regions of similarity and/or overlap.
Microarray technologies can be utilized to analyze biological processes, e.g., to characterize cellular transcriptional states by simultaneously measuring mRNA abundance of many thousands of genes. The levels of gene expression (absolute or relative), which can be measured while a cell is subjected to a particular ambient condition, can be analyzed using conventional statistical techniques, visualization techniques, and/or data mining algorithms/techniques. Statistical and data-mining analysis techniques may focus on targeted sets of genes, e.g., those that may vary in a well correlated manner, are under similar regulatory control, or may have consistent functional annotation or ontological categorizations. However, there may be additional information in the full data set which can remain unrecognized or be inadvertently discarded when using these techniques, and possibly contain a richer and more detailed picture.
Biological processes such as circadian rhythms, cell division, metabolism, and development can occur as ordered sequences of events. The synchronization of these coordinated events can be important for proper cell function, and thus the determination of significant time points in biological processes can be an important component of all (or substantially all) biological investigations. For example, such significant time points can establish logical ordering constraints on subprocesses, impose prerequisites on temporal regulation and spatial compartmentalization, and/or situate dynamic reorganization of functional elements in preparation for subsequent stages. Thus, building temporal phenomenological representations of biological processes from genome-wide datasets can be relevant in formulating biological hypotheses on, e.g., how such processes can be mechanistically regulated, how the regulations can vary on an evolutionary scale, and how inadvertent disregulation of such processes can lead to a diseased state or fatality.
Thus, there may be a need for methods, systems and software arrangements that are capable of providing global and dynamic perspectives on transcription states by combining quantitative analysis of data sets with formal models that can characterize various global phenomena, e.g., temporal evolution of biological processes or other sequential data patterns.