The programmable chemistry of nucleic acid base pairing enables the rational design of self-assembling molecular structures, devices, and systems. However, the ability to specify a target nucleic acid structure, and then design a nucleic acid sequence that will adopt this target structure is still challenging and computationally intensive.
Secondary Structure Model
For an RNA strand with N nucleotides, the sequence, φ, is specified by base identities φiε{A, C, G, U} for i=1, . . . , N (T replaces U for DNA). The secondary structure, s, of one or more interacting RNA strands can be defined by a set of base pairs (each a Watson Crick pair [A-U or C-G] or wobble pair [G-U]). Using the set of base pairs, a polymer graph for a secondary structure can then be constructed by ordering the strands around a circle, drawing the backbones in succession from 5′ to 3′ around the circumference with a nick between each strand, and drawing straight lines connecting paired bases.
A secondary structure is “pseudoknotted” if every strand ordering corresponds to a polymer graph with crossing lines. A secondary structure is connected if no subset of the strands is free of the others. An “ordered complex” corresponds to the unpseudoknotted structural ensemble, Γ, comprising all connected polymer graphs with no crossing lines for a particular ordering of a set of strands. For a secondary structure, sεΓ, the free energy, ΔG(φ, s), is calculated using nearest-neighbor empirical parameters for RNA in 1M Na+5 or for DNA in user-specified Na+ and Mg++ concentrations. This physical model provides the basis for rigorous analysis and design of equilibrium base-pairing in the context of the free energy landscape defined over ensemble Γ.
Characterizing Equilibrium Secondary Structure
Using this secondary energy model, the equilibrium of a nucleic acid complex can be characterized. The equilibrium state of nucleic acid complex can be determined by calculating the partition function
      Q    ⁡          (      ϕ      )        =            ∑              s        ∈        Γ              ⁢          ⅇ                        -          Δ                ⁢                                  ⁢                              G            ⁡                          (                              ϕ                ,                s                            )                                /                      k            B                          ⁢        T            over the unpseudoknotted structure ensemble Γ. Using the partition function, it is possible to then evaluate the equilibrium probability
            p      ⁡              (                  ϕ          ,          s                )              =                  1                  Q          ⁡                      (            ϕ            )                              ⁢              ⅇ                              -            Δ                    ⁢                                          ⁢                                    G              ⁡                              (                                  ϕ                  ,                  s                                )                                      /                          k              B                                ⁢          T                      ,of any secondary structure sεΓ. Here, kB is the Boltzmann constant and T is temperature. The secondary structure with the highest probability at equilibrium is the minimum free energy (MFE) structure, satisfying
            s      MFE        ⁡          (      ϕ      )        =      arg    ⁢                  min        ⁢                                              s        ∈        Γ              ⁢    Δ    ⁢                  ⁢                  G        ⁡                  (                      ϕ            ,            s                    )                    .      
The equilibrium structural features of ensemble Γ are quantified by the base-pairing probability matrix, P(φ), with entries Pi,j(φ)ε[0, 1] corresponding to the probability,
                    P                  i          ,          j                    ⁡              (        ϕ        )              =                  ∑                  s          ∈          Γ                    ⁢                        p          ⁡                      (                          ϕ              ,              s                        )                          ⁢                              S                          i              ,              j                                ⁡                      (            s            )                                ,that base pair i·j forms at equilibrium. Here, S(s) defines a structure matrix with entries Si,j(s)ε{0, 1}. If structure s contains pair i·j, then Si,j(s)=1, otherwise Si,j(s)=0. For convenience, the structure and probability matrices are augmented with an extra column to describe unpaired bases. The entry Si,N+1(s) is unity if base i is unpaired in structure s and zero otherwise; the entry Pi,N+1(φ)ε[0, 1] denotes the equilibrium probability that base i is unpaired over ensemble Γ. Hence the row sums of the augmented S(s) and P(φ) matrices are unity.
The distance between two secondary structures, s1 and s2, is the number of nucleotides paired differently in the two structures:
      d    ⁡          (                        s          1                ,                  s          2                    )        =      N    -                  ∑                              1            ≤            i            ≤            N                                1            ≤            j            ≤                          N              +              1                                          ⁢                                    S                          i              ,              j                                ⁡                      (                          s              1                        )                          ⁢                                            S                              i                ,                j                                      ⁡                          (                              s                2                            )                                .                    
The discrete delta function is defined as
      δ                  s        1            ,              s        2              =      {                                        1            ,                                                              if              ⁢                                                          ⁢                              d                ⁡                                  (                                                            s                      1                                        ,                                          s                      2                                                        )                                                      =            0                                                            0            ,                                    otherwise                    with respect to secondary structure.
Although the size of the ensemble, Γ, grows exponentially with the number of nucleotides N, the MFE structure having the lowest energy, the partition function, and the equilibrium base-pairing probabilities can be evaluated using Θ(N3) dynamic programs.
For a given target structure, s, the determination of the nucleotide sequence that will produce the target structure s can be specified as an optimization problem, minimizing an objective function with respect to sequence, φ. Rather than seeking a global optimum, optimization can be terminated if the objective function is reduced below a prescribed stop condition.
One strategy to determine the lowest free energy sequence that corresponds to a particular target structure s is to minimize the MFE defect
                              μ          ⁡                      (                          ϕ              ,              s                        )                          =                ⁢                  d          ⁡                      (                                          s                MFE                            ,              s                        )                                                            =                    ⁢                      N            -                                          ∑                                                      1                    ≤                    i                    ≤                    N                                                        1                    ≤                    j                    ≤                                          N                      +                      1                                                                                  ⁢                                                                    S                                          i                      ,                      j                                                        ⁡                                      (                                                                  s                        MFE                                            ⁡                                              (                        ϕ                        )                                                              )                                                  ⁢                                                      S                                          i                      ,                      j                                                        ⁡                                      (                    s                    )                                                                                      ,            corresponding to the distance between the MFE structure sMFE(φ) and the target structure s. This approach hinges on whether or not the equilibrium structural features of ensemble Γ are well-characterized by the single structure sMFE(φ), which in turn depends on the specific sequence φ. If μ(φ, s)=0, the target structure s is the most probable secondary structure at equilibrium. However, p(φ, s) can nonetheless be arbitrarily small due to competition from other secondary structures in Γ.
To address this concern, an alternative strategy to MFE defect minimization is to minimize the probability defect:π(φ,s)=1−p(φ,s),corresponding to the sum of the probabilities of all non-target structures in the ensemble Γ. If π(φ, s)≈0, the sequence design is essentially ideal because the equilibrium structural properties of the ensemble are dominated by the target structure s. However, as π(φ, s) deviates from zero, it increasingly fails to characterize the quality of the sequence because the probability defect treats all non-target structures as being equally defective. This property is a concern for challenging designs where it may be infeasible to achieve π(φ, s)≈0.
To address these shortcomings, still another strategy is to minimize the ensemble defect between the target structure s and the equilibrium properties of sequence φ. The ensemble defect
                              n          ⁡                      (                          ϕ              ,              s                        )                          =                ⁢                              ∑                          σ              ∈              Γ                                ⁢                                    p              ⁡                              (                                  ϕ                  ,                  σ                                )                                      ⁢                          d              ⁡                              (                                  σ                  ,                  s                                )                                                                                      =                    ⁢                      N            -                                          ∑                                                      1                    ≤                    i                    ≤                    N                                                        1                    ≤                    j                    ≤                                          N                      +                      1                                                                                  ⁢                                                                    P                                          i                      ,                      j                                                        ⁡                                      (                    ϕ                    )                                                  ⁢                                                      S                                          i                      ,                      j                                                        ⁡                                      (                    s                    )                                                                                      ,            corresponds to the average number of incorrectly paired nucleotides at equilibrium calculated over ensemble Γ.
These three objective functions, ensemble defect minimization, MFE defect minimization and probability defect minimization, and can be cast into a unified form to highlight their differences:
            n      ⁡              (                  ϕ          ,          s                )              =                  ∑                  σ          ∈          Γ                    ⁢                        p          ⁡                      (                          ϕ              ,              σ                        )                          ⁢                  d          ⁡                      (                          σ              ,              s                        )                                ,          ⁢            μ      ⁡              (                  ϕ          ,          s                )              =                  ∑                  σ          ∈          Γ                    ⁢                        δ                      σ            ,                          s              MFE                                      ⁢                  d          ⁡                      (                          σ              ,              s                        )                                ,          ⁢            π      ⁡              (                  ϕ          ,          s                )              =                  ∑                  σ          ∈          Γ                    ⁢                        p          ⁡                      (                          ϕ              ,              σ                        )                          ⁢                              (                          1              -                              δ                                  σ                  ,                  s                                                      )                    .                    
Using n(φ, s) to perform ensemble defect optimization, the average number of incorrectly paired nucleotides at equilibrium is evaluated over ensemble Γ using p(φ, σ), the Boltzmann-weighted probability of each secondary structure σεΓ, and d(σ, s), the distance between each secondary structure σεΓ and the target structure s. By comparison, using μ(φ, s) to perform MFE defect optimization, p(φ, σ) is replaced by the discrete delta function δσ, sMFE, which is unity for SMFE and zero for all other structures σεΓ. Alternatively, using σ(φ, s) to perform probability defect optimization, d(σ, s) is replaced by the binary distance function (1−δσ, s), which is zero for s and 1 for all other structures σεΓ.
Hence, the MFE defect makes the optimistic assumption that sMFE will dominate Γ at equilibrium, while the probability defect makes the pessimistic assumption that all structures σεΓ with d(σ, s)≠0 are equally distant from the target structure s. The objective function n(φ, s) quantifies the equilibrium structural defects of sequence φ even when μ(φ, s) and π(φ, s) do not.