Cross-product matrices are frequently generated by data processing systems that perform statistical analysis, such as data processing systems that use the method of least squares to fit general linear models to data. In general, one can form a dense cross-product matrix (“X′X matrix”) by first forming the x row for the current observation and then adding the outer product to the X′X matrix computed so far. Mathematically, this can be expressed as:
            X      ′        ⁢    X    =            ∑              i        =        1            n        ⁢                  x        i            ⁢              x        i        ′            where n denotes the number of observations, the matrix X′X is of order (p×p), and the vector xi is of order (p×1).
Multi-pass algorithms to solve such matrices may be used in such non-limiting situations as when the elements of xi depend on elements in xj (where j is different from i). In these types of situations, it is customary to compute the X′X matrix in multiple passes through the data. For example, on a first pass one might compute the information necessary to subsequently construct the vector xi for any observation and then computes the cross-product matrix on a second pass.
As another non-limiting scenario, multi-pass algorithms are used when the columns of the X matrix depend on classification variables. Classification variables are variables whose raw values are mapped to an integer encoding. For example, a study of a species of fish might include a classification variable for gender with three categories: male, female, and undetermined. If a gender effect is in a statistical model regarding the study (i.e., occupies columns in the X matrix), the knowledge of a number of factors would be required to construct the X matrix. Such factors might include: (i) the number of levels of the gender effect that are represented in the data; (ii) the proper order for these levels; and (iii) the position of the first column of the gender effect in the X matrix—that is, which other terms precede the gender effect in the model and how many columns do they occupy.
Statistical analysis with classification variables in model effects are common in a number of SAS/STAT® procedures such as GLM, GENMOD, GLIMMIX, GLMSELECT, LOGISTIC, MIXED, and PHREG. These procedures construct the rows of X in up to three passes through the data. In the first pass the unique values of the classification variables and their sort order are determined. In a second pass, the levels of the effects in which the classification variables are involved are determined. Finally, in a third pass the row of X (i.e., xi for the ith observation) is constructed.