The high ratio between the frequency at which new protein sequences become available and the rate of appearance of experimentally determined structures, provides an ideal matrix for the development of homology-based modelling techniques. Homology modelling algorithms basically intend to position new side chains on a backbone taken from a homologous protein with known three-dimensional (3D) structure. When a correct or approximate main-chain structure is not available, for instance in the loop regions of two homologous proteins, it is sometimes possible to generate a set of main-chain structures, position the side chains on each and use some scoring function to select the most probable global structure. This general approach is commonly applied to fields like structure-based homology modelling, the prediction of loop conformations, peptide modelling and protein folding.
From a theoretical point of view, the problem of finding the optimal global arrangement of a set of side chains attached to a particular main-chain structure is a typical combinatorial problem. Not only do side chains interact with the backbone, but their conformation can also be influenced by neighbouring side chains. Yet, F. Eisenmenger et al. in J. Mol. Biol. (1993) 231:849-860 found that the majority of side chains can be correctly positioned by taking into account only their interactions with the template: applying this simple template-based method to a test set of 6 proteins, they found that, on average, 53% of all side chains had dihedral angles in agreement with the known structure and that for buried side groups (i.e. having less than 25% exposed accessible surface) this score increased to 74%. When each side chain was modelled in the presence of the complete known structure, the average prediction score increased only to 65% for all side chains and to 84% for the buried side groups. From these observations, the authors concluded that the combinatorial barrier in side-chain positioning hardly exists.
However, several authors tackled the side-chain positioning problem by means of a combinatorial approach or equivalent method. For instance, the Dead-End Elimination (DEE) method takes into account both side-chain/template and side-chain/side-chain interactions and uses a mathematically rigorous criterion to eliminate side-chain rotamers which do not belong to the Global Minimum Energy Conformation (GMEC). Since the elimination routines usually do not yield a unique structure, a combinatorial end stage routine is needed to determine the GMEC.
Other methods, such as Monte Carlo simulation, genetic algorithms, simulated annealing, mean-field optimisation, restricted combinatorial analysis and neural networks, have also been published which, to varying extent, account for side-chain/side-chain combinatorial effects. To date the question of whether the combinatorial barrier in side-chain positioning indeed exists is still unanswered because of the different methods, protein test sets, evaluation criteria and viewpoints in their interpretation used by various authors.
The present inventors, by applying both the Eisenmenger method and the DEE method to a statistically significant test set and by evaluating the results on the basis of all scoring criteria as used by those skilled in the art, could demonstrate that a true combinatorial approach leading to the GMEC, as opposed to the rudimentary Eisenmenger method, yields much better results in terms of potential energy but also that the improvements are much less impressive when considering the number of correctly predicted side-chain conformations. In other words, when using a mathematically rigorous combinatorial method such as the DEE algorithm, one can usually avoid inter-atomic clashes leading to favourable global energies although this is usually accompanied by a gain of only about 10% correctly positioned side chains.
From a practical point of view, the more important question of whether the improved accuracy as obtained by more sophisticated methods effectively balances the extra computational effort cannot be answered in general as it depends on the needs of the user. Yet, an improvement in prediction accuracy at a low computational cost is a long felt need, especially when the side-chain positioning algorithm is to be included as a sub-method in a larger program, e.g. for loop structure prediction, inverse folding, high-throughput homology modelling, etc. There is also a need for a substantial gain in computational speed relative to the DEE method without, if possible, a reduction in accuracy which could be problematic for some of the above-mentioned applications.
While the DEE method is relatively fast for small sets of side chains (<30) and thus useful in applications like flexible docking of peptides or inverse folding, its performance rapidly drops for larger systems.
This, in combination with an urgent need to accelerate protein side-chain computations in different applications, illustrates the need for an alternative method, which preserves the accuracy of the DEE method, especially at the energetical level, but reduces the computational requirements in comparison with the DEE method.