The analysis of surface properties can be an important approach to understanding interactions between objects. The interaction between a solid object and external, and/or internal, forces may be greatly effected by surface properties. For example, failure of a material often begins on the surface, especially at edges or vertices. Interactions on the surfaces of bodily organs may provide insights into the functioning of these organs, potentially allowing important diagnostic information. Surface properties of various isosurfaces of fields, such as magnetic potential isosurfaces in a magnetic containment system, may provide important details as well.
One of the most exciting areas in which the analysis of surface properties may find application is in the area of computational chemistry. Analysis of the surface properties of molecules may allow chemists to reduce the complexity of some problems involving characterization of molecules and interactions between molecules.
One approach to the process of calculating molecular surface properties and shapes starts with a mathematical description of a molecule's shape and surface from an electron-density-derived/transferable atom equivalent data file or other property-encoded surface file. Transferable Atom Equivalents (TAEs) are a library of atomic charge density distributions and their properties that can be combined using the RECON program to provide for rapid retrieval of atomic charge density fragments and molecular assembly. Each atomic charge density fragment in the library is associated with a surface file and a data file. The surface file is a numerical representation of the 3-dimensional shape of the atomic charge density, and includes a set of electronic surface properties.
Transferable Atom Equivalent (TAE) descriptors encode the distributions of electron density based molecular properties, such as electronic kinetic energy densities, local average ionization potentials, electrostatic potentials, Fukui functions, electron density gradients and second derivatives, in addition to the density itself. Table 1 shows a complete list of TAE descriptors.
TABLE 1Electron-density-derived TAE descriptors. ρ(r) represents theelectron density distribution.Integrated EtectronicPropertiesEnergyElectron populationVolumeSurfaceSurface electronic properties(extrema, surface integralaverages and histogrambins are available foreach properties)SIEPSurface integral of electrostatic potential EPElectrostatic potential      EP    ⁡          (      r      )        =                    ∑        α            ⁢                        Z          α                                                r            -                          R              α                                                      -          ∫                                    ρ            ⁡                          (                              r                2                            )                                ⁢                      dr            2                                                          r            -                          r              2                                                     DRNElectron density gradient normal to 0.002 e/au3 ∇ρ · nelectron density iso surfaceGElectronic kinetic energy densityG(r) = −(1/2)(∇ψφ· ∇ψ)KElectronic kinetic energy densityK(r) = −(1/2)(ψφ∇2ψ + ψ∇2ψφ)DGMGradient of the K electronic kinetic∇K · nenergy density normal to surfaceDGNGradient of the G electronic kinetic∇G · nenergy density normal to surface FFukui F+ function scalar valueF+(r) = ρHOMO(r)LLaplacian of the electron densityL(r) = −∇2ρ(r) = K(r) − G(r) BNPBare nuclear potential      BNP    ⁡          (      r      )        =            ∑      α        ⁢                  Z        α                                      r          -                      R            α                                       PIPLocal average ionization potential      PIP    ⁡          (      r      )        =            ∑      i        ⁢                                        ρ            i                    ⁡                      (            r            )                          ⁢                                        ɛ            i                                                ρ        ⁡                  (          r          )                    
The TAE data files contain information describing topological features of the atomic charge density, and are used to orient the fragments into their proper molecular space orientations. The data files also contain atomic charge density-based descriptors which encode electronic and structural information relevant to the chemistry of intermolecular interactions, such as the van der Waals surface.
In practice, when calculating TAE data files, the external atomic surfaces are truncated where the electron density reaches 0.002 electrons per cubic Bohr. This serves to keep the atoms finite and roughly corresponds to the condensed phase van der Waals surface of the atoms in a molecular environment.
These TAE descriptors generally generate high quality models. TAE descriptors, however, are non-orthogonal, therefore traditional regression analysis (such as multiple regression analysis) is not appropriate, as the system can become over-determined. Modeling techniques such as principal component analysis, artificial neural networks (M. J. Embrechts, et al., COMPUTATIONALLY INTELLIGENT DATA MINING FOR AUTOMATED DESIGN AND DISCOVERY OF NOVEL PHARMACEUTICAL IN SMART ENGINEERING SYSTEMS: NEURAL NETWORKS, FUZZY LOGIC, EVOLUTIONARY PROGRAMMING, DATA MINING AND ROUGH SETS, ASME Press (1998); Kewley, et al., NEURAL NETWORK ANALYSIS FOR DATA STRIP MINING PROBLEMS, in INTELLIGENT ENGINEERING SYSTEMS THROUGH ARTIFICIAL NEURAL NETWORKS, ASME Press, pgs. 391-396 (1998)), kernel partial least squares regression or SupportVectorMachine (SVM) regression, can be fruitfully employed on such data, with feature selection accomplished using genetic algorithms or sensitivity analysis, as described by M. J. Embrechts et al. in BAGGING NEURAL NETWORK SENSITIVITY ANALYSIS FOR FEATURE REDUCTION IN QSAR PROBLEMS (2001 INNS—IEEE International Joint Conference on Neural Networks (2001)). Some of these routines are incorporated in the StripMiner™ software package.
RECON is another algorithm for the rapid reconstruction of molecular charge densities and charge-density based electronic properties of molecules that can be used in place of or in conjunction with TAE descriptors derived from ab initio or semi-empirical wave functions. RECON uses a library of atomic charge density fragments and is based on the quantum theory of atoms in molecules. This algorithm was developed at Rensselear Polytechnic Institute.
In recent years, wavelet encoding has gained popularity in diverse applications as an efficient means of data compression and pattern recognition. The wavelet basis has advantages over the Fourier basis in that, while the trigonometric functions used in Fourier expansion are monochromatic in frequency but entirely delocalized in position, the wavelet basis is well localized in both frequency and position. Wavelet encoding and decoding are accomplished by a simple scaling and dilation algorithm.
The Discrete Wavelet Transform (DWT) is a fast linear operation on a data vector with length 2n (where n is an integer) that transforms the original data vector into a wavelet coefficient vector of the same length. The resulting vector consists of 2n−1 scaling coefficients and 2n−1 detail coefficients. The former represent a smoothed envelope of the data, while the latter give the detailed deviations from this smoothed function. The scaling coefficient vector can, in turn, be subjected to another round of DWT, resulting in 2n−2 scaling coefficients and 2n−2 detail coefficients, encoding a finer level of detail. For a data vector of length 2n the DWT can be performed n−1 times, resulting in a single scaling coefficient and 2n−1 detail coefficients. This entire procedure can be reversed in the same iterative manner to decode the wavelet coefficient vector, reconstructing the original signal. Since molecular surface property distributions are smoothly varying functions in property space, it is reasonable to expect that the important physicochemical information relevant to intermolecular interactions will be contained in the scaling and first few levels of detail coefficients, rather than in the finer levels of detail. Discarding the finer levels of detail coefficients therefore results in significant data compression with little loss of signal. In RECON, each of the ten surface electronic properties in Table 1 is represented by a 1024-point distribution and encoded in the symmlet-8 wavelet basis, retaining only 32 wavelet coefficients. Property distributions reconstructed from these 32 wavelet coefficients reproduce the original distributions to greater than 95% accuracy.
TAE WCD's generated from ab initio quantum computations have been employed with success, in conjunction with other TAE and traditional descriptors, to model a variety of chemical and biochemical phenomena. Since ab initio quantum chemical descriptors are laborious to compute and impracticable to implement in high-throughput mode, it is of considerable value to obtain these WCD descriptors through the RECON method. Just as for other TAE descriptors, wavelet coefficients of atomic property distributions (WCD) can be simply summed (weighted by the atomic surface area) to give molecular wavelet representations, from which approximate distributions in property space can be reconstructed, if desired. This has been implemented in Beta version 6.3 of RECON and the atomic wavelet library is presently being constructed.