1. Field of the Invention
The present invention relates generally to alignment and classification procedures and more specifically to alignment and classification procedures without a dependence on reference elements.
2. Background Information
The 3D (3 dimensional) structures of macromolecules can be investigated by single particle analysis, a powerful technique of electron microscopy that provides more penetrating structural analyses of macromolecules whose x-ray crystallography is problematical. Resolution and analyses of molecular interactions and conformational changes, using single particle analysis, have been advancing in pace with the image processing methods employed (Ueno and Sato, 2001). Such methods are needed because imaged macromolecules are obscured by noise backgrounds (with SIN ratios [SNR] often below unity). Noise has its origin partly in the low irradiation levels used to minimize specimen damage. To attenuate noise, one must average the 2D images with the same 3D orientations, a process requiring optimal alignments among the gallery of data set images. Signals then reinforce one another, and noise tends to average out.
As with other techniques of 3D reconstruction, 2D image alignment is critical when applying the RCT (random conical tilt) method to fixed particles, imaged either “‘face up” or “face down” in a membrane bilayer or film. To recover 3D information coherently, one must know the rotational orientation of each data set image in the plane of the bilayer. This is normally accomplished by bringing untilted images {those viewed “‘head on” at 0°) into a common rotational alignment. Once relative rotational orientations (the amounts of rotation of given images required to bring them into common alignment) are known for untilted images, they also become known for tilted images, because of pairwise imaging. Single particle analysis should, in principle, achieve atomic resolution. In practice, however, various circumstances prevent this.
Normally, one assumes the existence of a prototype reference image, against which the data set images can be aligned. However, this assumption is unjustified for inhomogeneous data sets. Another problem is that alignments are biased by the choice of reference images. One method to reduce this bias is to average the images aligned to a particular reference, to yield a revised reference (Penczek et al., 1992). However, when the images have a poor SNR, or represent different views of the same macromolecule, this procedure yields a final averaged reference that shares features of the original reference. “‘Iterative reference free alignment,” selects and aligns two images at random. The process is then repeated with a third selected image, etc., until the data set images have been exhausted (Penczek et al., 1992). However, because the order of selection biases results, the process is repeated with random orders of selection, thereby reducing the bias because this method uses a changing global average for reference, it is not strictly reference free (van Heel et al., 2000).
The “‘state of the art” technique for generating and aligning reference images representing different views of a data set, is designated “Multivariable Statistical Analysis/Multi Reference Alignment” (MSA/MRA; see Zampighi et al., 2004). Some variations between data set images may not reflect structural differences, but merely positional differences (e.g., in plane rotational orientation, in the case of RCT). For that reason, it is undesirable, when classifying data sets, to consider in plane rotational differences to be valid. Consequently, before classification, images must be brought into mutual rotational alignment, thereby eliminating particle orientation as an independent classification variable.
However, alignments using correlation based techniques are only well defined operations when galleries of images are homogeneous. But to produce representative classes, data set images must first be aligned, which requires an initial set of representative classes. To cope with this “circularity,” workers have resorted to iterative cycles of classification and alignment using MSA/MRA, until results stabilize. However, this procedure does not guarantee attainment of the global minimum of the “energy” function. In addition to these shortcomings, MSA/MRA hinges on subjective operator choices of many critical free variables that impact the final result. Consequently, such results typically are operator dependent. Finally, MSA/MRA often consumes months of processing (Bonetta, 2005).
In one embodiment, the technique developed here aligns, or orients, data set images produced by RCT by directly classifying their relative in plane rotational orientations, without rotating images into “alignment” with references, common to the above described techniques. Instead of starting with selected reference image(s), one typically starts with over 8 million random pixel values. Coupling this procedure with a sufficiently small influence of the data during each training cycle, eliminates training bias because the alignment results become independent of the order of the images used to train the network.
This alignment procedure bypasses the need for alternate cycles of alignment and classification in order to achieve alignment. It simultaneously classifies both structural differences and rotational orientations of images during each training cycle. It is a self organizing process, performed on the surface of a cylindrical array of artificial neurons. After reference free alignment, a more detailed classification according to structural differences may be required. Zampighi et al. (2004) found that vertices of the square planar SOMs' (self organizing maps') map differing views of RCT imaged particles embedded in a membrane bilayer. But the homogeneity of vertex partitions generated by a 2-CUBE SOM is imperfect when the SNRs are low, or the data sets are heterogeneous. Also, the number of vertex classes is limited to four. To obviate the heterogeneity arising from 2D SOMs, and lesser restrictions on the number of vertex classes, we developed a method called “N Dimensional vertex partitioning” with which a data set's principal classes “migrate” to the vertices of ND (N dimensional) hypercubical arrays. Because an ND hypercube has 2N vertices, it partitions 2N vertex classes.
We found that, as the dimension is stepped up (at least to a value of N=4), the average homogeneity of the vertex classes improves. This likely eventuates from the high degree of data compression that occurs in mapping high D data onto low D grids. The higher the grid dimension, the less the data compression, and the greater the resulting homogeneity. If the operator desires two vertex classes of the data set, a value of N=1 is selected, for four vertex classes, a value of N=2, for eight vertex classes, a value of N=3, etc. This allows one to control the number and homogeneity of the generated vertex classes.
In one embodiment, we call the combination of these reference free methods, SORFA, for “Self Organizing, Reference Free Alignment.” SORFA benefits from the intrinsically parallel architecture and topology/metric conserving characteristics of self-organizing maps. Here we demonstrate SORFA using the Kohonen self organizing artificial neural network, an unsupervised method of competitive learning that employs no target values decided upon in advance (Deco and Obradovic, 1996). The inspiration and methodology for this type of network are based on the circumstance that most neural networks comprising the mammalian brain exist in 2D arrays of signal processing units. The network's response to an input signal is focused mainly near the maximally excited neuron.
In one embodiment, SORFA uses this methodology to classify images according to azimuthal orientations and structural differences. It benefits from the SOMs' insensitivity to random initial conditions. Further, for alignment there are only two critical free parameters: learning rate, and circumference/height ratio of the cylinder. SORFA's speed benefits from its intrinsically parallel architecture. A data set of 3,361 noisy, heterogeneous electron microscopical {EM) images of Aquaporin 0 (AQP0), for example, was better aligned in less than 9 min, than was a MSA/MRA run requiring six months. SORFA was far simpler to use, and involves far fewer chances for operator error, as was suspected in the above six month MSA/MRA alignment. These AQP0 channels were from the plasma membranes of native calf lens fibers, which are tetramers, 64 Å on a side, MW˜118kDs (Konig et al., 1997).
Images of the AQP0 channel obtained by RCT geometry already have been aligned and classified using the alignment through classification approach (Zampighi et al., 2003). By using knowledge gained in applying SORFA to test images, we aligned and classified the AQP0 images. Finally, we compared the structure of the 3D reconstructions to the atomic model of Aquaporin I (AQP I), a closely related and structurally similar channel with 43.6% identity and 62.6% similarity to AQP0 (Gonen et al., 2004).