With rapidly growing databases of genome, transcriptome, and proteome sequence data, an outstanding challenge of post-genome biology is to map and understand the function of the underlying biological networks formed by an organism's genes. At the heart of these biological networks is the process of gene regulation. In general, gene regulation can involve control in the dynamics of transcription, splicing, transportation, translation, modification, and degradation of gene products.
The survival and well-being of a cell depends heavily on its ability to regulate the activities of its genes in response to changing environment and internal needs. Achieving the appropriate genetic responses requires the integration of vastly different sources of signals, e.g., availability of various nutrients, presence of hazardous chemicals, metabolic state of the cell, etc. The process of signal integration is carried out through the interactions of the signal carriers, regulatory proteins, with each other, and with DNA sequences in regulatory regions located in the vicinity of each gene. The regulatory program each gene executes is coded into the genome through the structures and chemical properties of the regulatory proteins and the composition and location of the regulatory DNA sequences.
The low level of knowledge of quantitative biochemical details and the mathematical difficulties of accurately describing these complex networks has led to the development of simple models aimed at learning general principles. This approach is based on the belief that some fundamental principles of regulation are typical properties of the systems, robust to theoretical idealization, such that they can be expressed in terms of a simplifying model. One such model is the Boolean genetic networks, which were first proposed in 1969 by Kauffman, in which an element, or “node”, of the network represents a gene or a protein and can take only two values: ON (1) or OFF (0). Each node of the Boolean network is assumed to be described by a logic function that transforms a set of inputs to an output. The output is then sent to a number of other nodes through a set of pre-defined connections, i.e., “wiring”, defined by the specificity of the interactions encoded in the molecular structure of the element. Such Boolean networks have been widely used to model gene networks. However, these models do not address how the specific logic functions can be implemented. Also, the binary inputs and outputs assumed in the Boolean network model are unrealistic approximations of the continuous protein concentrations or gene expression levels in a cell.
The theoretical prediction of mRNA expression patterns, beginning with the regulatory sequences, is a formidable task and will likely remain an unsolved problem in the near future. One of the obstacles to prediction is the fact that the relevant binding sites for TFs are difficult to identify using standard bioinformatics tools. Given these binding sites, a much larger obstacle still lies in the need to accurately describe the mutual interactions between RNA polymerase, TFs, and DNA.
It is well known that extensive use of cis-regulatory transcription control is made, for example, in many developmental genes of higher eukaryotic organisms, which are typically regulated by 5 to 8 different TFs (Arnone and Davidson 1997). Within the cis-regulatory region of such a gene, internal and external signals encoded in concentrations of active TFs are directly processed on the DNA through protein-protein interactions among the TFs and the RNA polymerase to determine the resulting expression level. In general, this process, called “signal integration”, takes place at every “node” of a genetic network.
Gene networks are unlike an electrical network such as an integrated circuit, which processes information through synchronized cascades of a large number of simple nodes (millions to billions of transistors), and for which connectivity is the main source of network complexity. A gene network typically consists of only a few tens to hundreds of nodes, which are the regulatory genes in the genome. These nodes are slow and asynchronous, yet are sophisticated in their capacity to integrate signals: Each node can be regulated combinatorially, often by 4-5 other nodes, and the regulatory effect of one node on another can either be activating or repressive depending on the context. Combinatorial control is an important feature of regulatory networks, allowing different combinations of TFs, taken from the same larger set, to act in concert at different genes. The same combination can even implement very different functions.
Out of several different known mechanisms for gene regulation, transcription control by regulated recruitment appears to be the most flexible and general mechanism (Ptashne and Gann 1997, 1998, 2002), and it naturally allows for signal integration and combinatorial control. Regulated recruitment refers to a situation where TFs regulate transcription simply by “recruiting”, i.e., attracting, RNA polymerase to the promoter sequences on the DNA. This mechanism involves only simple and generic, “glue-like” protein-protein interactions, in contrast to other known mechanisms of activation which require specific contacts to induce allosteric transitions in the conformation of the molecules. Due to the simplicity of the interactions involved in recruitment, this mechanism is particularly flexible with respect to the choice of interaction partners, which facilitates combinatorial control of gene expression.
A simple theoretical model for gene regulation in prokaryotes, which was introduced by Shea and Ackers (1985), is based on the mechanism of gene activation by recruitment and a competitive binding mechanism for repression. However, with increasing complexity of the logical functions, some general difficulties for the design process arise. The existing art, known as synthetic genetic circuits, requires the expression of multiple genes and performs computation by “cascading” their results, leading to increasingly complex logical functions. Accordingly, the need remains for a method for modeling gene regulation which does not involve the complexities of cascading.