A formalism and an algorithm for the general base parallel dispatch and sequencing state assignment of optimal general-base massively parallel multiprocessing architecture are presented. Transformations of a base-p hypercube, where p is an arbitrary integer, are shown to effect a dynamic contention-free general base optimal memory allocation of the parallel to massively parallel multiprocessing architecture. The formalism is shown to provide a single unique description of the architecture and sequencing of parallel operations. The approach is illustrated by factorizations involving the processing of matrices, which are function of four variables. Parallel operations are implemented matrix multiplications. Each matrix, of dimension Nxc3x97N, where N=pn, n integer, is a sampling matrix of which the structure depends on a variable parameter k. The degree of parallelism, in the form of M=pm processors can be chosen arbitrarily by varying m between zero to its maximum value of nxe2x88x921. The result is an equation describing the solution as a function of the four variables n, p, k and m.
Applications of the approach are shown in relation with complex matrix structures of image processing and generalized spectral analysis transforms but cover a much larger class of parallel processing and multiprocessing systems.
Most computer arithmetic operations encountered in information processing algorithms in general and signal processing and sorting algorithms in particular call for iterative multiplications of large matrices. An approach and a formalism for designing optimal parallel/pipelined algorithms and processor architectures for effecting such operations has been recently proposed in Optimal Parallel and Pipelined Processing Through a New Class of Matrices with Application to Generalized Spectral Analysisxe2x80x9d, Michael J. Corinthios, IEEE Trans. Comput., Vol. 43, April 1994, pp. 443-459. The algorithms are optimal in their minimization of addressing requirements, of shuffle operations and of the number of memory partitions they call for. The algorithms and corresponding architectures involve general base matrix factorizations. As an application, the factorizations and corresponding optimal architectures are developed in Optimal Parallel and Pipelined Processing Through a New Class of Matrices with Application to Generalized Spectral Analysisxe2x80x9d, Michael J. Corinthios, IEEE Trans. Comput., Vol. 43, April 1994, pp. 443-459, to obtain optimal parallel-pipelined processors for the Generalized Walsh-Chrestenson transform, of which the Discrete (fast) Fourier transform is but a special case.
This invention describes a technique for designing optimal multiprocessing parallel architectures which employ multiples of general-base processors operating in parallel in an optimal global architecture. A formalism and closed forms are developed defining the state and sequencing assignments in a programmable hierarchical level of parallelism at each step of the algorithm execution.
A class of hierarchically parallel multiprocessing architectures employing general-base universal processing elements previously introduced as basic tools for multiprocessing as in 3-D cellular arrays for parallel/cascade image/signal processingxe2x80x9d, Michael J. Corinthios, in Spectral Techniques and Fault Detection, M. Karpovsky, Ed. New York: Academic Press, 1985, xe2x80x9cThe Design of a class of Fast Fourier Transform Computersxe2x80x9d, Michael J. Corinthios IEEE Trans. Comput., Vol. C-20, pp. 617-623, June 1971 is presented. Applications of the perfect shuffle matrices and hypercube representations to other classes of problems such as sorting and interconnection networks have received attention over the course of many years in 3-D cellular arrays for parallel/cascade image/signal processingxe2x80x9d, Michael J. Corinthios in Spectral Techniques and Fault Detection, M. Karpovsky, Ed. New York: Academic Press, 1985, xe2x80x9cThe Design of a class of fast Fourier Transform Computersxe2x80x9d, Michael J. Corinthios IEEE Trans. Comput., Vol. C-20, pp. 617-623, June 1971, xe2x80x9cA Parallel Algorithm for State Assignment of Finite State Machinesxe2x80x9d, G. Hasteer and P. Banerjee, IEEE Trans. Comput., vol. 47, No. 2, pp. 242-246, February 1998, xe2x80x9cHypercube Algorithms and Implementationsxe2x80x9d, O. A. Mc Bryan and E. F. Van De Velde, SIAM J. Sci. Stat. Comput., Vol. 8, No. 2, pp. s227-287, Mar. 1987, xe2x80x9cParallel Processing with the Perfectxe2x80x9d, H. S. Stone, IEEE Trans. Comput. Vol. C-20, No. 2, pp. 153-161, February 1971, xe2x80x9cDesign of a Massively Parallel Processorxe2x80x9d, K. E. Batcher, IEEE Trans. Comput, pp 836-840, September 1980. Advances in state assignment and memory allocation for array processors, using processing elements as multiprocessing cells, and their interconnection networks have been made in the last two decades by Parallel Processing with the Perfectxe2x80x9d, H. S. Stone, IEEE Trans. Comput. Vol. C-20, No. 2, pp. 153-161, February 1971, xe2x80x9cHierarchical Fat Hypercube Architecture for Parallel Processing Systemsxe2x80x9d, Galles, Michael B., U.S. Pat. No. 5,669,008, September 1997. Many of these contributions applied parallel and multiprocessing architectures to signal processing applications and in particular spectral analysis algorithms. In more recent years applications of parallel and multiprocessing techniques have focused on generalized spectral analysis, Discrete Cosine, Haar, Walsh and Chrestenson Transforms, among others in Optimal Parallel and Pipelined Processing Through a New Class of Matrices with Application to Generalized Spectral Analysisxe2x80x9d, Michael J. Corinthios, IEEE Trans. Comput., Vol. 43, April 1994, pp. 443-459.xe2x80x9d, xe2x80x9c3-D cellular arrays for parallel/cascade image/signal processingxe2x80x9d, Michael J. Corinthios in Spectral Techniques and Fault Detection, M. Karpovsky, Ed. New York: Academic Press, 1985, xe2x80x9cParallel Processing with the Perfectxe2x80x9d, H. S. Stone, IEEE Trans. Comput. Vol. C-20, No. 2, pp. 153-161, February 1971, xe2x80x9cAccess and Alignment of Data in an Array Processorxe2x80x9d, D. H. Lawrie, IEEE Trans. Comput., vol C-24, No. 2, December 1975, pp 1145-1155, xe2x80x9cFast Fourier Transforms over Finite Groups by Multiprocessor Systemsxe2x80x9d, Roziner, T. D., Karpovsky, M. G., and Trachtenberg, L. A., IEEE Trans. Accous., Speech, and Sign. Proc., ASSP, vol. 38, No. 2, February 1990, pp 226-240, xe2x80x9cAn Architecture for a Video Rate Two-Dimensional Fast Fourier Transform processorxe2x80x9d, Taylor, G. F., Steinvorth, R. H., and MacDonald J., IEEE Trans. Comput., vol. 37, No. 9, September 1988, pp 1145-1151. xe2x80x9cFault tolerant FFT Networksxe2x80x9d, IEEE Trans. Comput., vol. 37, No. 5, May 1988, pp. 548-561, Jou, Y.-Y. and Abraham, J. A., xe2x80x9cDesign of Multiple-Valued Systolic System for the Computation of the Chrestenson Spectrumxe2x80x9d, Moraga, Claudio, IEEE Trans. Comput., Vol. C-35, No. 2, February 1986, pp 183-188. xe2x80x9cMatrix Representation for Sorting and the Fast Fourier Transformxe2x80x9d, Sloate, H., IEEE Trans. Circ. And Syst., Vol. CAS-21, No. 1, January 1974, pp 109-116, xe2x80x9cProcessor for Signal processing and Hierarchical Multiprocessing Structure Including At Least One Such Processorxe2x80x9d, Luc Mary and Barazesh, Bahman, U.S. Pat. No. 4,845,660, July 1989. In 3-D cellular arrays for parallel/cascade image/signal processingxe2x80x9d, Michael J. Corinthios, in Spectral Techniques and Fault Detection, M. Karpovsky, Ed. New York: Academic Press, 1985, three-dimensional parallel and pipelined architectures of cellular array multiprocessors employ Configurable Universal Processing Elements (CUPE) forming what were referred to as xe2x80x98Isostolic Arraysxe2x80x99, applied to signals as well as images in Optimal Parallel and Pipelined Processing Through a New Class of Matrices with Application to Generalized Spectral Analysisxe2x80x9d, Michael J. Corinthios, IEEE Trans. Comput., Vol. 43, April 1994, pp. 443-459.xe2x80x9d, xe2x80x9c3-D cellular arrays for parallel/cascade image/signal processingxe2x80x9d, Michael J. Corinthios, in Spectral Techniques and Fault Detection, M. Karpovsky, Ed. New York: Academic Press, 1985.
Many patents of invention deal with the subject of hypercube transformations such as described in U.S. Pat. Nos. 5,669,008, 5,644,517, 5,513,371, 5,689,722, 5,475,856, 5,471,412, 4,980,822, 916,657 and 4,845,660. The present invention is unique in its concept of a generalized level of massive parallelism. The formulation is presented for an arbitrary number of M processing elements, M=pm, p being the general radix of factorization. The input data vector dimension N, or input data matrix dimension Nxc3x97N, where N=pn, the radix of factorization of the matrix p, the number of processors M, and the span, Optimal Parallel and Pipelined Processing Through a New Class of Matrices with Application to Generalized Spectral Analysisxe2x80x9d, Michael J. Corinthios, IEEE Trans. Comput., Vol. 43, April 1994, pp. 443-459, of the matrix are all variable. A unique optimal solution yielding parallel to massively parallel optimal architectures, as optimality is defined in Optimal Parallel and Pipelined Processing Through a New Class of Matrices with Application to Generalized Spectral Analysisxe2x80x9d, Michael J. Corinthios, IEEE Trans. Comput., Vol. 43, April 1994, pp. 443-459 is presented.
The approach, which was recently submitted for publication, and submitted as a Disclosure Document is illustrated by developing a formalism and optimal factorizations for the class of algorithms of generalized spectral analysis introduced recently in Optimal Parallel and Pipelined Processing Through a New Class of Matrices with Application to Generalized Spectral Analysisxe2x80x9d, Michael J. Corinthios, IEEE Trans. Comput., Vol. 43, April 1994, pp. 443-459. It has been shown in Optimal Parallel and Pipelined Processing Through a New Class of Matrices with Application to Generalized Spectral Analysisxe2x80x9d, Michael J. Corinthios, IEEE Trans. Comput., Vol. 43, April 1994, pp. 443-459, that transforms such as Fourier and more generally Chrestenson Generalized-Walsh (CGW) transforms can be factored into optimal forms.
In what follows we use some matrix definitions, such as the definition of a sampling matrix, a matrix poles and zeros, a matrix span, fixed topology processor and shuffle-free processor introduced in Optimal Parallel and Pipelined Processing Through a New Class of Matrices with Application to Generalized Spectral Analysisxe2x80x9d, Michael J. Corinthios, IEEE Trans. Comput., Vol. 43, April 1994, pp. 443-459. In addition we adopt the following definitions which will be formally introduced in latter sections.
General Base Processing Element
In what follows a general-base processing element PE with a base, or radix, p is a processor that receives simultaneously p input operands and produces simultaneously p output operands. The PE in general applies arithmetic or weighting operations on the input vector to produce the output vector. In matrix multiplication operations for example the PE applies a pxc3x97p matrix to the p-element input vector to produce the p-element output vector. The matrix elements may be real or complex.
Due to the diversified general applicability of such a processing element a Universal Processing Element UPE, which can be constructed in a 3D-type architecture has been recently proposed in Optimal Parallel and Pipelined Processing Through a New Class of Matrices with Application to Generalized Spectral Analysisxe2x80x9d, Michael J. Corinthios, IEEE Trans. Comput., Vol. 43, April 1994, pp. 443-459.xe2x80x9d, xe2x80x9c3-D cellular arrays for parallel/cascade image/signal processingxe2x80x9d, Michael J. Corinthios, in Spectral Techniques and Fault Detection, M. Karpovsky, Ed. New York: Academic Press, 1985. Its 3D-type architecture is such that its intermediate computation results are propagated between planes rather than in 2D along a plane. It may be viewed as a base-p processing element, operating on the p elements of an input vector simultaneously, applying to it a general pxc3x97p matrix and producing p output operands as the p-element output vector. A UPE has pxc3x97p=p2 multipliers but may be instead realized in a 3D architecture, in particular if the matrix is a transformation matrix that can be itself factored as in xe2x80x9c3-D cellular arrays for parallel/cascade image/signal processingxe2x80x9d, Michael J. Corinthios, in Spectral Techniques and Fault Detection, M. Karpovsky, Ed. New York: Academic Press, 1985, xe2x80x9cThe Design of a class of Fast Fourier Transform Computersxe2x80x9d, Michael J. Corinthios IEEE Trans. Comput., Vol. C-20, pp. 617-623, June 1971. The pipelining approach described in Optimal Parallel and Pipelined Processing Through a New Class of Matrices with Application to Generalized Spectral Analysisxe2x80x9d, Michael J. Corinthios, IEEE Trans. Comput., Vol. 43, April 1994, pp. 443-459, can thus be used, leading to a 3D-type xe2x80x9cisostolicxe2x80x9d architecture Optimal Parallel and Pipelined Processing Through a New Class of Matrices with Application to Generalized Spectral Analysisxe2x80x9d, Michael J. Corinthios, IEEE Trans. Comput., Vol. 43, April 1994, pp. 443-459.
In the context of this invention a UPE may be seen simply as a general base-p processing element PE as defined above, accepting p inputs, weighting them by the appropriate pxc3x97p matrix and producing p output operands.
Pilot Elements, Pilots Matrix
Similarly to signals and images an Nxc3x97N matrix may be sampled and the result is xe2x80x9cimpulsesxe2x80x9d, that is, isolated elements in the resulting Nxc3x97N samples (sampling) matrix. We shall assume uniform sampling of rows and columns yielding p uniformly spaced samples from each of p rows and element alignment along columns, that is, p uniformly spaced samples along columns as well as rows. The samples matrix which we may refer to as a xe2x80x9cframexe2x80x9d thus contains p rows of p equally spaced elements each, a rectangular grid of p2 impulses (poles) which we shall call a xe2x80x9cdispatchxe2x80x9d. With N=pn the N2 elements of the xe2x80x9cmainxe2x80x9d (or xe2x80x9cparentxe2x80x9d) matrix may be thus decomposed into N2/p2=pnxe2x88x922 such dispatches.
By fixing the row sampling period, the row span, as well as the column sampling period, the column span, it suffices to know the coordinates (indices) of the top left element, that is, the element with the smallest of indices, of a dispatch to directly deduce the positions of all its other poles (elements). The top left element acts thus as a reference point, and we shall call it the xe2x80x9cpilot elementxe2x80x9d. The other p2xe2x88x921 elements associated with it may be called its xe2x80x9csatellitesxe2x80x9d.
In other words if the element aij of A is a pilot element, the dispatch consists of the elements
ai+kc,j+lr; k=0,1, . . . ,pxe2x88x921, l=0,1, . . . ,pxe2x88x921
c and r being the column and row element spacings (spans), respectively.
A processing element assigned to a pilot element can thus access all p2 operands of the dispatch, having deduced their positions knowing the given row and column spans.
Since each pilot element of a frame originated from the same position in the parent matrix we can construct a xe2x80x9cpilots matrixxe2x80x9d by keeping only the pilot elements and forcing to zero all other elements of the parent matrix. The problem then is one of assignment, simultaneous and/or sequential, of the M=pm processors to the different elements of the pilots matrix.
Hypercube Dimension Reduction
The extraction of a pilots matrix from its parent matrix leads to a dimension reduction of the hypercube representing its elements. The dimension reduction is in the form of a suppression, that is, a forcing to zero, of one of the hypercube digits. Let C=(jnxe2x88x921 . . . j1j0)p be an n-digit base-p hypercube. We will write C{overscore (k)} to designate the hypercube C with the digit k suppressed, that is, forced to zero. Several digits can be similarly suppressed. For example, C{overscore (2)},{overscore (4)}=(jnxe2x88x921 . . . j50j30j1j0)p, and C{overscore (nxe2x88x921)}=(0jnxe2x88x922 . . . j1j0)p. It is interesting to note that the hypercube dimension reduction implies a xe2x80x9cskippingxe2x80x9d over its zeros in permutation operations such as those involving the perfect shuffle. For example, if A=C{overscore (2)} then PA=(j0jnxe2x88x921 . . . j50j3j1)p.
A sequence of perfect shuffle operations effected through simple hypercube transformations can be made to broadcast the state and access assignments to the different processors. The overall approach is described by the following algorithm which will be developed step by step in what follows.
The Parallel Dispatch, State Assignment and Sequencing Algorithm 1 dispatches the M=pm processors for each stage of the matrix factorization. The base-p m tuple (imxe2x88x921inxe2x88x922 . . . ili0)p is assigned to the parallel processors. The (n-m) tuple (jnxe2x88x921jnxe2x88x922 . . . jm) is assigned to the sequencing cycles of each processor. The algorithm subsequently applies hypercube transformations as dictated by the type of matrix, the stage of matrix factorization and the number of dispatched processors. It tests optimality to determine the type of scan of matrix elements to be applied and evaluates parameters such as pitch and memory optimal queue length, to be defined subsequently, it accesses the pilot elements and their satellites, proceeding to the parallel dispatch and sequencing of the processing elements.
Each processing element at each step of the algorithm thus accesses from memory its p input operands and writes into memory those of its output operands. The algorithm, while providing an arbitrary hierarchical level of parallelism up to the ultimate massive parallelism, produces optimal multiprocessing machine architecture minimizing addressing, the number of memory partitions as well as the number of required shuffles. Meanwhile it produces virtually wired-in pipelined architecture and properly ordered output.
In developing techniques for the multiprocessing of matrix multiplications it is convenient to effect a decomposition of a matrix into the sum of matrices. To this end let us define an xe2x80x9cimpulse matrixxe2x80x9d as the matrix xcex4(i,j) of which all the elements are zero except for the element at position (i,j), that is,                                           [                          δ              ⁡                              (                                  i                  ,                  j                                )                                      ]                    uv                =                  {                                                                      1                  ;                                                                                                  u                    =                    i                                    ,                                                                              v                  =                  j                                                                                                      0                  ;                                                            otherwise                                                              xe2x80x83                                                                                        (4.1)            
An Nxc3x97N matrix A having elements [A]i,j=aij can be written as the sum
A=a0,0xcex4(0,0)+a0,1xcex4(0,1)+a0,2xcex4(0,2)+ . . . +a1,0xcex4(1,0)+a1,1xcex4(1,1)+ . . . +aN1xe2x88x921, Nxe2x88x921xcex4(Nxe2x88x921, Nxe2x88x921)xe2x80x83xe2x80x83(4.2)
where the xcex4(i,j) matrices are of dimension Nxc3x97N each. The matrix A can thus be written in the form                     A        =                              ∑                          i              =              0                                      N              -              1                                ⁢                      xe2x80x83                    ⁢                                    ∑                              j                =                0                                            N                -                1                                      ⁢                                          a                                  i                  ,                  j                                            ⁢                              δ                ⁡                                  (                                      i                    ,                    j                                    )                                                                                        (4.3)            
Furthermore, in the parallel processing of matrix multiplication to a general base p it is convenient to decompose an Nxc3x97N matrix with N=pn as the sum of dispatches, a dispatch being, as mentioned earlier, a matrix of p2 elements arranged in a generally rectangular pxc3x97p pattern of p columns and p rows. Denoting by "sgr"R and "sgr"C the row and columns spans in Optimal Parallel and Pipelined Processing Through a New Class of Matrices with Application to Generalized Spectral Analysisxe2x80x9d, Michael J. Corinthios, IEEE Trans. Comput., Vol. 43, April 1994, pp. 443-459, of a dispatch we can decompose a matrix A into the form                     A        =                              ∑                          i              =              0                                                      N                /                p                            -              1                                ⁢                                    ∑                              j                =                0                                                              N                  /                  p                                -                1                                      ⁢                                          ∑                                  k                  =                  0                                                  p                  -                  1                                            ⁢                                                ∑                                      l                    =                    0                                                        p                    -                    1                                                  ⁢                                                      a                                                                  i                        +                                                  k                          ⁢                                                      xe2x80x83                                                    ⁢                                                      σ                            C                                                                                              ,                                              j                        +                                                  l                          ⁢                                                      xe2x80x83                                                    ⁢                                                      σ                            R                                                                                                                                ⁢                                      δ                    ⁡                                          (                                                                        i                          +                                                      k                            ⁢                                                          xe2x80x83                                                        ⁢                                                          σ                              C                                                                                                      ,                                                  j                          +                                                      l                            ⁢                                                          xe2x80x83                                                        ⁢                                                          σ                              R                                                                                                                          )                                                                                                                              (        4.4        )            
More generally we may wish to decompose A in an order different from the uniform row and column scanning as in this last equation. In other words we may wish to pick the dispatches at an arbitrary order rather than in sequence. As mentioned above, we shall call the top left element the pilot element and its p2xe2x88x921 companions its satellites. In this last equation the pilot elements are those where k=1=0.
To effect a parallel matrix decomposition to a general base p we use hypercubes described by base-p digits. The order of accessing the different dispatches is made in relation to a main clock. The clock K is represented by the hypercube to base p as
K≅(knxe2x88x921 . . . k1k0)p; kixcex5{0,1, . . . , pxe2x88x921}xe2x80x83xe2x80x83(4.5)
Its value at any time is given by                     K        =                              ∑                          t              =              0                                      n              -              1                                ⁢                                    p              t                        ⁢                          k              t                                                          (4.6)            
At each clock value K a set of M UPE""s (PE""s) is assigned a set of M dispatches simultaneously. We will reserve the symbols w and z to designate the row and column indices of a pilot element at clock K. In other words, at clock K each selected pilot element shall be designated aw,z, that is, [A]w,z where w and z are functions of K to be defined. They will be determined in a way that optimizes the parallel and sequential operations for the given matrix structure and the number M=pm of available UPE""s.
With M=pm base-p processing elements the hypercube representing K shall be re-written in the form
K≅(jnxe2x88x921 . . . jm+1jmimxe2x88x921 . . . i1i0)pxe2x80x83xe2x80x83(4.7)
where we have written                               k          t                =                  {                                                                                          i                    t                                    ;                                                                                                  t                    =                    0                                    ,                  1                  ,                  …                  ⁢                                      xe2x80x83                                    ,                                      m                    -                    1                                                                                                                                            j                    t                                    ;                                                                                                  t                    =                    m                                    ,                                      m                    +                    1                                    ,                  …                  ⁢                                      xe2x80x83                                    ,                                      n                    -                    1                                                                                                          (4.8)            
The m-sub-cube (imxe2x88x921, . . . i1i0) designates operations performed in parallel. The remaining (n-m)-sub-cube (jnxe2x88x921, . . . jm+1, jm) designates operations performed sequentially by each of the M dispatched parallel processors. With M=pm processors dispatched in parallel at clock K≅(jnxe2x88x921 . . . jm+1jmimxe2x88x921 . . . i1i0)p the matrix A can be decomposed in the form                     A        =                              ∑                                          k                                  n                  -                  2                                            =              0                                      p              -              1                                ⁢                      xe2x80x83                    ⁢                      …            ⁢                          xe2x80x83                        ⁢                                          ∑                                                      k                                          m                      +                      1                                                        =                  0                                                  p                  -                  1                                            ⁢                                                ∑                                                            k                      m                                        =                    0                                                        p                    -                    1                                                  ⁢                                  ⟨                                                            ∑                                                                        k                                                      m                            -                            1                                                                          =                        0                                                                    p                        -                        1                                                              ⁢                                          xe2x80x83                                        ⁢                                          …                      ⁢                                              xe2x80x83                                            ⁢                                                                        ∑                                                                                    k                              1                                                        =                            0                                                                                p                            -                            1                                                                          ⁢                                                                              ∑                                                                                          k                                0                                                            =                              0                                                                                      p                              -                              1                                                                                ⁢                                                                                    ∑                                                              l                                =                                0                                                                                            p                                -                                1                                                                                      ⁢                                                                                          ∑                                                                  k                                  =                                  0                                                                                                  p                                  -                                  1                                                                                            ⁢                                                                                                a                                                                                                                                                    w                                        ⁡                                                                                  (                                                                                                                                    k                                              0                                                                                        ,                                                                                          k                                              1                                                                                        ,                                                                                          xe2x80x83                                                                                        ⁢                                            ⋯                                            ⁢                                                                                          xe2x80x83                                                                                        ,                                                                                          k                                                                                              n                                                -                                                1                                                                                                                                                                              )                                                                                                                    +                                                                              k                                        ⁢                                                                                  xe2x80x83                                                                                ⁢                                                                                  σ                                          C                                                                                                                                                      ,                                                                                                                  z                                        ⁡                                                                                  (                                                                                                                                    k                                              0                                                                                        ,                                                                                          k                                              1                                                                                        ,                                                                                          xe2x80x83                                                                                        ⁢                                            ⋯                                            ⁢                                                                                          xe2x80x83                                                                                        ,                                                                                          k                                                                                              n                                                -                                                1                                                                                                                                                                              )                                                                                                                    +                                                                              l                                        ⁢                                                                                  xe2x80x83                                                                                ⁢                                                                                  σ                                          R                                                                                                                                                                                                                    ⁢                                                                  δ                                  ⁡                                                                      [                                                                                                                  {                                                                                                                              w                                            ⁡                                                                                          (                                                                                                                                                k                                                  0                                                                                                ,                                                                                                  k                                                  1                                                                                                ,                                                ⋯                                                ⁢                                                                                                  xe2x80x83                                                                                                ,                                                                                                  k                                                                                                      n                                                    -                                                    2                                                                                                                                                                                              )                                                                                                                                +                                                                                      k                                            ⁢                                                                                          xe2x80x83                                                                                        ⁢                                                                                          σ                                              C                                                                                                                                                                      }                                                                            ,                                                                              {                                                                                                                              z                                            ⁡                                                                                          (                                                                                                                                                k                                                  0                                                                                                ,                                                                                                  k                                                  1                                                                                                ,                                                ⋯                                                ⁢                                                                                                  xe2x80x83                                                                                                ,                                                                                                  k                                                                                                      n                                                    -                                                    2                                                                                                                                                                                              )                                                                                                                                +                                                                                      l                                            ⁢                                                                                          xe2x80x83                                                                                        ⁢                                                                                          σ                                              R                                                                                                                                                                      }                                                                                                              ]                                                                                                                                                                                                                                                            ⟩                                                                                        (4.9)            
Where the xe2x80x9cparenthesesxe2x80x9d  less than  and  greater than  enclose the elements accessed in parallel. In what follows we write Pv,xcexc to designate the pilot element of processor No.v at real time clock xcexc.
The lowest order base-p Chrestenson Generalized Walsh xe2x80x9ccore matrixxe2x80x9d is the p-point Fourier matrix                                           W            p                    =                                    1                              p                                      ⁢                          xe2x80x83                        [                                                                                w                    0                                                                                        w                    0                                                                    ⋯                                                                      w                    0                                                                                                                    w                    0                                                                                        w                    1                                                                    ⋯                                                                      w                                          p                      -                      1                                                                                                                    ⋮                                                                      xe2x80x83                                                                                        xe2x80x83                                                                                        xe2x80x83                                                                                                                    w                    0                                                                                        w                                          p                      -                      1                                                                                        ⋯                                                                      w                                                                  (                                                  p                          -                          1                                                )                                            2                                                                                            ]                          ,                            (5.1)            
where
w=exp(xe2x88x92j2xcfx80/p); j={square root over (xe2x88x921+L .)}xe2x80x83xe2x80x83(5.2)
In the following, for simplicity, the scaling factor 1/{square root over (p)} will be dropped. We start by deriving three basic forms of the Chrestenson transform in its three different orderings.
The GWN Transformation Matrix
The GWN transformation matrix WN for N=pn data points is obtained from the Generalized-Walsh core matrix Wp by the Kroneker multiplication of Wp by itself n times.
WN,nat=WpxWpx . . . xWp(n times)=Wp[n].xe2x80x83xe2x80x83(5.3)
GWP Transformation Matrix
The Generalized Walsh transform in the GWP order is related to the transform in natural order by a digit-reverse ordering. The general-base digit reverse ordering matrix KN(p) can be factored using the general-base perfect shuffle permutation matrix P(p) and Kroneker products                               K          N                      (            p            )                          =                              ∏                          i              =              0                                      n              -              1                                ⁢                                    (                                                P                                      p                                          (                                              n                        -                        i                                            )                                                                            (                    p                    )                                                  xc3x97                                  I                                      p                    i                                                              )                        .                                              (5.4)            
Operating on a column vector x of dimension K the base-p Perfect Shuffle permutation matrix of dimension Kxc3x97K produces the vector
PKx=[x0,xK/p,x2K/p, . . . ,x(pxe2x88x921)K/p,x1,xK/p+1, . . . ,x2,xK/p+2, . . . ,xKxe2x88x921]xe2x80x83xe2x80x83(5.5)
The GWP matrix WN,WP can thus be written in the form                                                                         W                                  N                  ,                  WP                                            =                                                K                  N                                      (                    p                    )                                                  ⁢                                                      W                                          N                      ,                      nat                                                        .                                                                                                        =                                                ∏                                      i                    =                    0                                                        n                    -                    1                                                  ⁢                                                      (                                                                  P                                                  p                                                      (                                                          n                              -                              i                                                        )                                                                                                    (                          p                          )                                                                    xc3x97                                              I                                                  p                          i                                                                                      )                                    ⁢                                                            W                      p                                              [                        n                        ]                                                              .                                                                                                          (5.6)            
GWK Transformation Matrix
The GWK transformation matrix is related to the GWP matrix through a p-ary to Gray transformation matrix GN(p).
WN,WK=GN(p)WN,WP.xe2x80x83xe2x80x83(5.7)
The following factorizations lead to shuffle-free optimal parallel-pipelined processors.
A. GWN Factorization
A fixed topology factorization of the GWN transformation matrix has the form                               W                      N            ,            nat                          =                                            ∏                              i                =                0                                            n                -                1                                      ⁢                                          P                N                            ⁢                              C                N                                              =                                    ∏                              i                =                0                                            n                -                1                                      ⁢                                                            P                  N                                ⁡                                  (                                                            I                                              N                        /                        p                                                              xc3x97                                          W                      p                                                        )                                            .                                                          (5.7)            
which can be re-written in the form                                           W                          N              ,              nat                                =                                    P              ⁢                              {                                                      ∏                                          n                      =                      0                                                              n                      -                      1                                                        ⁢                  CP                                }                            ⁢                              P                                  -                  1                                                      =                          P              ⁢                              {                                                      ∏                                          n                      =                      0                                                              n                      -                      1                                                        ⁢                  F                                }                            ⁢                              P                                  -                  1                                                                    ,                            (5.8)                                C        =                              C            N                    =                                    I                              p                                  n                  -                  1                                                      xc3x97                          W              p                                                          (5.9)            
And F=CP, noting that the matrix F is p2-optimal in Optimal Parallel and Pipelined Processing Through a New Class of Matrices with Application to Generalized Spectral Analysisxe2x80x9d, Michael J. Corinthios, IEEE Trans. Comput., Vol. 43, April 1994, pp. 443-459.
B. GWP Factorization
We fixed topology factorization of the GWP matrix has the form                               W                      N            ,            WP                          =                              ∏                          i              =              0                                      n              -              1                                ⁢                                    J              i                        ⁢                          C              N                                                          (        5.10        )            xe2x80x83Ji=(Ipnxe2x88x92ixe2x88x921xPpi+1)=Hnxe2x88x92ixe2x88x921xe2x80x83xe2x80x83(5.11)
Letting
Qi=CNJi+1=CnHnxe2x88x92ixe2x88x922;i=0,1, . . . ,nxe2x88x922Qnxe2x88x921=CNxe2x80x83xe2x80x83(5.12)
we obtain                                           W                          N              ,              WP                                =                                    ∏                              i                =                0                                            n                -                1                                      ⁢                          Q              i                                      ,                            (        5.13        )            
where each matrix Qi; i=0, 1, . . . , nxe2x88x922, is p2-optimal, while Qnxe2x88x921 is p-optimal.
C. GWK Factorization
The fixed topology GWK factorization has the form                               W                      N            ,            WK                          =                  P          ⁢                      {                                          ∏                                  i                  =                  0                                                  n                  -                  1                                            ⁢                                                P                                      -                    1                                                  ⁢                                  H                  i                                ⁢                                  C                  N                                ⁢                                  E                  i                                                      }                    ⁢                                    P                              -                1                                      .                                              (5.14)            
Letting
Hi=IpixPpnxe2x88x92i,Ei=IpixDxe2x80x2pnxe2x88x92ixe2x80x83xe2x80x83(5.15)
Dxe2x80x2pn=quasidiag(Ipnxe2x88x921,Dpnxe2x88x921,D2pnxe2x88x921, . . . , Dpnxe2x88x921(pxe2x88x921))xe2x80x83xe2x80x83(5.16)
xe2x80x83Dipnxe2x88x921=DipxIpnxe2x88x922
Dp=diag(w0,wxe2x88x921,wxe2x88x922, . . . ,wxe2x88x92(pxe2x88x921)).xe2x80x83xe2x80x83(5.17)
                                          W                          N              ,              WK                                =                      P            ⁢                          {                                                ∏                                      i                    =                    0                                                        n                    -                    1                                                  ⁢                                                      P                                          -                      1                                                        ⁢                                      H                    i                                    ⁢                                      G                    i                                                              }                        ⁢                          P                              -                1                                                    ,                            (5.18)            
where
Gi=CNEi.xe2x80x83xe2x80x83(5.19)
Letting
Si=Pxe2x88x921HiP=(Ipixe2x88x921xPpnxe2x88x92ixIp)xe2x80x83xe2x80x83(5.20)
we have                               W                      N            ,            WK                          =                              P            2                    ⁢                      {                                          ∏                                  i                  =                  0                                                  n                  -                  1                                            ⁢                                                P                                      -                    1                                                  ⁢                                  G                  i                                ⁢                                  S                                      i                    +                    1                                                                        }                    ⁢                      P                          -              1                                                          (5.21)            
with
Snxe2x88x921=Sn=IN.xe2x80x83xe2x80x83(5.22)
The factorization can also be re-written in the form                                           W                          N              ,              WK                                =                      P            ⁢                          {                                                ∏                                      i                    =                    0                                                        n                    -                    1                                                  ⁢                                  Γ                  i                                            }                        ⁢                          P                              -                1                                                    ,                            (5.23)            
where                                                                         Γ                i                            =                                                P                                      -                    1                                                  ⁢                                  G                  i                                ⁢                                  S                                      i                    +                    1                                                                                                                                          =                                                                            P                                              -                        1                                                              ⁢                                                                  G                        i                                            ⁡                                              (                                                                              I                                                          p                              i                                                                                xc3x97                                                      P                                                          p                                                              n                                -                                i                                -                                1                                                                                                              xc3x97                                                      I                            p                                                                          )                                                              ⁢                                          xe2x80x83                                        ⁢                    i                                    =                  1                                            ,              2              ,              ⋯              ⁢                              xe2x80x83                            ,                                                n                  -                  1                                ;                                                                                                        Γ                0                            =                                                G                  0                                ⁢                                                      S                    1                                    .                                                                                        (5.24)            
The matrices xcex93i are p2-optimal, except for xcex930 which is maximal span. These are therefore optimal algorithms which can be implemented by an optimal parallel processor, recirculant or pipelined, with no shuffling cycle called for during any of the n iterations.
The potential in enhanced speed of processing of the optimal algorithms is all the more evident within the context of real-time image processing applications. For 2D signals, algorithms of generalized spectral analysis can be applied on sub-images or on successive column-row vectors of the input image. Factorizations of the algorithms of the Chrestenson transform applied on an Nxc3x97N points matrix X representing an image, with N=pn can be written for the different transform matrices. The GWN 2D transformation for optimal pipelined architecture can be written in the form                                                                         Y                nat                            =                              P                ⁢                                  {                                                            ∏                                              i                        =                        0                                                                    n                        -                        1                                                              ⁢                    F                                    }                                ⁢                                  P                                      -                    1                                                  xc3x97                                                      [                                          P                      ⁢                                              {                                                                              ∏                                                          i                              =                              0                                                                                      n                              -                              1                                                                                ⁢                          F                                                }                                            ⁢                                              P                                                  -                          1                                                                                      ]                                    T                                                                                                                        =                                  P                  ⁢                                      {                                                                  ∏                                                  i                          =                          0                                                                          n                          -                          1                                                                    ⁢                      F                                        }                                    ⁢                                      P                                          -                      1                                                        xc3x97                  P                  ⁢                                      {                                                                  ∏                                                  i                          =                          0                                                                          n                          -                          1                                                                    ⁢                      F                                        }                                    ⁢                                      P                                          -                      1                                                                                  ,                                                          (6.1)            
where T stands for transpose. The GWP factorization can be written in the form                                                                         Y                WP                            =                                                ∏                                      i                    =                    0                                                        n                    -                    1                                                  ⁢                                                      Q                    i                                    xc3x97                                                            (                                                                        ∏                                                      i                            =                            0                                                                                n                            -                            1                                                                          ⁢                                                  Q                          i                                                                    )                                        T                                                                                                                                          =                                                      ∏                                          i                      =                      0                                                              n                      -                      1                                                        ⁢                                                            Q                      i                                        xc3x97                                                                  ∏                                                  i                          =                          0                                                                          n                          -                          1                                                                    ⁢                                              Q                                                  n                          -                          i                          -                          1                                                T                                                                                                        ,                                                          (6.2)                                          Q          i          T                =                                            C              N                        ⁡                          (                                                I                                      p                                          n                      -                      i                      -                      1                                                                      xc3x97                                  P                                      p                                          i                      +                      1                                                                            -                    1                                                              )                                .                                    (6.3)            
The GWK factorization for optimal pipelined architecture can be written in the form                                                                         Y                WK                            =                                                P                  2                                ⁢                                  {                                                            ∏                                              i                        =                        0                                                                    n                        -                        1                                                              ⁢                                          Γ                      i                                                        }                                ⁢                P                xc3x97                                                      [                                                                  P                        2                                            ⁢                                              {                                                                              ∏                                                          i                              =                              0                                                                                      n                              -                              1                                                                                ⁢                                                      Γ                            i                                                                          }                                            ⁢                      P                                        ]                                    T                                                                                                                        =                                                      P                    2                                    ⁢                                      {                                                                  ∏                                                  i                          =                          0                                                                          n                          -                          1                                                                    ⁢                                              Γ                        i                                                              }                                    ⁢                  P                  xc3x97                                      P                                          -                      1                                                        ⁢                                      {                                                                  ∏                                                  i                          =                          0                                                                          n                          -                          1                                                                    ⁢                                              Γ                                                  n                          -                          i                          -                          1                                                T                                                              }                                    ⁢                                      P                                          -                      2                                                                                  ,                                                          (6.4)                                          Γ          i          T                =                              (                                          I                                  p                  i                                            xc3x97                              P                                  p                                      n                    -                    i                    -                    1                                                                    -                  1                                            xc3x97                              I                p                                      )                    ⁢                      G            i                          -              1                                ⁢                      P            .                                              (6.5)            
These fast algorithms are all p2-optimal requiring no shuffling between iterations of a pipelined processor. In applying these factorizations the successive iterations are effected on successive sub-images such that after logp N stages the transform image Y is pipelined at the processor output. Applications include real-time processing of video signals.
The Fourier transform is but a special case of the Chrestenson Generalized Walsh transform. The Fourier matrix for N points is the matrix FN defined above in (1) with p replaced by N:                               F          N                =                  [                                                                      w                  0                                                                              w                  0                                                            ⋯                                                              w                  0                                                                                                      w                  0                                                                              w                  1                                                            ⋯                                                              w                                      N                    -                    1                                                                                                                        w                  0                                                                              w                  2                                                            ⋯                                                              w                                      2                    ⁢                                          (                                              N                        -                        1                                            )                                                                                                                                            w                  0                                                                              w                                      N                    -                    1                                                                              ⋯                                                              w                                                            (                                              N                        -                        1                                            )                                        2                                                                                ]                                    (        6.6        )            
For images the factorization leads to the optimal form                               Y          F                =                              {                                          ∏                                  i                  =                  0                                                  n                  -                  1                                            ⁢                              F                i                                      }                    xc3x97                      {                                          ∏                                  k                  =                  0                                                  n                  -                  1                                            ⁢                              F                                  n                  -                  k                  -                  1                                                      }                                              (6.7)            
and for unidimensional signals the corresponding form for the Fourier matrix is                               F          N                =                              ∏                          i              =              0                                      n              -              1                                ⁢                      (                          F              i                        )                                              (6.8)            xe2x80x83Fi=UiCi
Ci=CJi+1; i=0,1, . . . ,nxe2x88x921
Cnxe2x88x921=Cxe2x80x83xe2x80x83(6.9)
U1=IN
Ui=Ipnxe2x88x92ixe2x88x921xDpi+1=Ipnxe2x88x92ixe2x88x921xDN/pnxe2x88x92ixe2x88x921
DN/m=diag(IN/(pm),Km,K2m, . . . ,Kmpxe2x88x921)
Kt=diag(w0,wt, . . . ,w[N/(mp)xe2x88x921]t)xe2x80x83xe2x80x83(6.10)
The hypercube transformations approach is illustrated using the important matrices of the Chrestenson Generalized Walsh-Paley (CGWP), Generalized Walsh-Kaczmarz (CGWK) and Fourier transforms.
We note that the matrices Ck in the Fourier transform expansion are closely related to the matrices Ji and Hi in the Chrestenson Generalized Walsh Paley factorization. In fact the following relations are readily established:
CNxcex94C
Ci=CJi+1=CHnxe2x88x92ixe2x88x922=Qixe2x80x83xe2x80x83(7.1)
Qnxe2x88x921=Cnxe2x88x921=Cxe2x80x83xe2x80x83(7.2)
Therefore, the CGWP matrices Qi are the same as the Ci matrices and have the same structure as the Fi matrices in the Fourier matrix factorization. Writing
Bk=CHkxe2x80x83xe2x80x83(7.3)
Hk=IpkxPpnxe2x88x92kxe2x80x83xe2x80x83(7.4)
the post-multiplication by Hk has the effect of permuting the columns of C so that at row w,
w≅(0jnxe2x88x922 . . . j1j0)xe2x80x83xe2x80x83(7.5)
the pilot element is at column z as determined by the permutation Hk, that is,
z≅(jk0jnxe2x88x922 . . . jk+1jkxe2x88x921 . . . j1j0)xe2x80x83xe2x80x83(7.6)
with the special case k=nxe2x88x922 producing
z≅(jnxe2x88x9220jnxe2x88x923 . . . j1j0)xe2x80x83xe2x80x83(7.7)
and that of k=nxe2x88x921 yielding
z≅(0jnxe2x88x922 . . . j1j0)xe2x80x83xe2x80x83(7.8)
Alternatively, we can write z directly as a function of w by using previously developed expressions of permutation matrices in Optimal Parallel and Pipelined Processing Through a New Class of Matrices with Application to Generalized Spectral Analysisxe2x80x9d, Michael J. Corinthios, IEEE Trans. Comput., Vol. 43, April 1994, pp. 443-459. For example,
B0=CH0=CPxe2x80x83xe2x80x83(7.9)
and using the expression defining P, namely,                                           [                          P                              p                n                            k                        ]                    uv                =                  {                                                                                                                                        1                        ;                                                                                                                                      u                          =                          0                                                ,                        1                        ,                        ⋯                        ⁢                                                  xe2x80x83                                                ,                                                                                                            p                              n                                                        -                            1                                                    ;                                                                                                                                                                        xe2x80x83                                                                                                            v                        =                                                                              [                                                          u                              +                                                                                                (                                                                      u                                    ⁢                                                                          xe2x80x83                                                                        ⁢                                    mod                                    ⁢                                                                          xe2x80x83                                                                        ⁢                                                                          p                                      k                                                                                                        )                                                                ⁢                                                                  (                                                                                                            p                                      n                                                                        -                                    1                                                                    )                                                                                                                      ]                                                    /                                                      p                            k                                                                                                                                                                                                  0                        ;                                                                                                            otherwise                        ;                                                                                            ⁢                                  
                                ⁢                k                            =              0                        ,            1            ,            ⋯            ⁢                          xe2x80x83                        ,                          N              -              1                        ,                                              (        7.10        )            
with k=1, we can write
z=[w+(w mod p)(pnxe2x88x921)]/pxe2x80x83xe2x80x83(7.11)
a relation that defines the pilot elements matrix.
Similarly,
B1=CH1=C(IpxPpNxe2x88x921)xe2x80x83xe2x80x83(7.12)
and from the definition given in Optimal Parallel and Pipelined Processing Through a New Class of Matrices with Application to Generalized Spectral Analysisxe2x80x9d, Michael J. Corinthios, IEEE Trans. Comput., Vol. 43, April 1994, pp. 443-459:                                           [                          P              i              t                        ]                    uv                =                  {                                                                      1                  ;                                                                                                  u                    =                    0                                    ,                  1                  ,                  ⋯                  ⁢                                      xe2x80x83                                    ,                                                                                    p                        n                                            -                      1                                        ;                                                                                                                        xe2x80x83                                                                              v                  =                                                            p                                              i                        -                                                  t                          ⁢                                                      xe2x80x83                                                    ⁢                                                      mod                            ⁡                                                          (                                                              n                                -                                i                                                            )                                                                                                                                            [                                                                                            p                                                      -                            i                                                                          ⁡                                                  (                                                      u                            -                                                          u                              ⁢                                                              xe2x80x83                                                            ⁢                              mod                              ⁢                                                              xe2x80x83                                                            ⁢                                                              p                                i                                                                                                              )                                                                    +                                                                                                                                            xe2x80x83                                                                              {                                                            [                                                                        p                                                      -                            i                                                                          ⁡                                                  (                                                      u                            -                                                          u                              ⁢                                                              xe2x80x83                                                            ⁢                              mod                              ⁢                                                              xe2x80x83                                                            ⁢                                                              p                                i                                                                                                              )                                                                    ]                                        ⁢                                          xe2x80x83                                        ⁢                    mod                    ⁢                                          xe2x80x83                                        ⁢                                          p                                              t                        ⁢                                                  xe2x80x83                                                ⁢                        mod                        ⁢                                                  xe2x80x83                                                ⁢                                                  (                                                      n                            -                            i                                                    )                                                                                                      }                                                                                                      xe2x80x83                                                                                                                                                (                                                                              p                                                          n                              -                              i                                                                                -                          1                                                )                                            ]                                        +                                          u                      ⁢                                              xe2x80x83                                            ⁢                      mod                      ⁢                                              xe2x80x83                                            ⁢                                              p                        i                                                                              ;                                                                                                      0                  ;                                                                              otherwise                  ;                                                                                        (        7.13        )            
with i=1 and t=1 we have
z=[pxe2x88x921(wxe2x88x92w mod p)+{[pxe2x88x921(wxe2x88x92w mod p)]mod p}(pnxe2x88x921xe2x88x921)]+w mod p.xe2x80x83xe2x80x83(7.14)
Consider the permutation matrix
RN=Rpn=IpmxPpjxIpkxe2x80x83xe2x80x83(7.15)
Let the base-p hypercube describing the order in a vector x of N=pn elements be represented as the n-tuple.
x≅(jnxe2x88x921 . . . j1j0)pjixcex5{0,1, . . . ,pxe2x88x921}xe2x80x83xe2x80x83(7.16)
The application of the matrix RpN on the n-tuple vector x, results in the n-tuple:
v=(jnxe2x88x921 . . . jnxe2x88x92k+1jnxe2x88x92kjmjnxe2x88x92kxe2x88x921 . . . jm+2jm+1jmxe2x88x921 . . . j1j0)xe2x80x83xe2x80x83(7.17)
We note that with respect to x the left k digits and the right m digits are left unchanged while the remaining digits are rotated using a circular shift of one digit to the right.
The pilot-elements matrix xcex2k corresponding to the matrix Bk is obtained by restricting the values of w (and hence the corresponding z values) to w=0, 1, . . . , pnxe2x88x921xe2x88x921.
Moreover, we note that if we write
Li=Pxe2x88x921Gi=Pnxe2x88x921Gixe2x80x83xe2x80x83(7.18)
and note that Gi is similar in structure to CN, we have
z=[w+(w mod pk)(pnxe2x88x921)]/pkxe2x80x83xe2x80x83(7.19)
with k=nxe2x88x921.
To obtain the pilot elements matrix xcexi corresponding to Li we write
zxe2x80x2=z mod pnxe2x88x921xe2x80x83xe2x80x83(7.20)
in order to reveal all satellite elements accompanying each pilot element. We then eliminate all the repeated entries in zxe2x80x2 and the corresponding w values, retaining only pilot elements positions. Alternatively we simply force to zero the digit of weight nxe2x88x922 in w and that of weight nxe2x88x921 in z.
We presently focus our attention on the matrices
xe2x80x83Bk=CHk; k=0,1, . . . ,nxe2x88x921xe2x80x83xe2x80x83(8.1)
In evaluating the pilot elements coordinates we begin by setting the number of processors M=1. The corresponding w-z relation of the pilot elements are thus evaluated with m=0. Once this relation has been established it is subsequently used as the reference xe2x80x9cw-z conversion templatexe2x80x9d to produce the pilot element positions for a general number of M=pm processors. A xe2x80x9crightxe2x80x9d scan is applied to the matrix in order to produce the w-z template with an ascending order of w. In this scanning type the algorithm advances the first index w from zero selecting pilot elements by evaluating their displacement to the right as the second index z. Once the template has been evaluated the value m corresponding to the number of processors to be dispatched is used to perform successive p-ary divisions in proportion to m to assign the M processors with maximum spacing, leading to maximum possible lengths of memory queues. A xe2x80x9cdownxe2x80x9d scan is subsequently applied, where p-ary divisions are applied successively while proceeding downward along the matrix columns, followed by a selection of the desired optimal scan.
The template evaluation and subsequent p-ary divisions for the assignment of the M processors through a right type scan produce the following hypercube assignments. The assignments are as expected functions of the four variables n, p, k and m. The conditions of validity of the different assignments are denoted by numbers and letters for subsequent referencing. With K denoting the main clock, the following hypercube transformations are obtained
K≅(jnxe2x88x921 . . . jm+1jmimxe2x88x921 . . . i1i0)p
K{overscore (nxe2x88x921)}≅(0jnxe2x88x922 . . . jm+1jmimxe2x88x921 . . . i1i0)p
K{overscore (nxe2x88x922)}≅(jnxe2x88x9210jnxe2x88x923 . . . jm+1jmimxe2x88x921 . . . i1i0)pxe2x80x83xe2x80x83(8.2)
L k less than nxe2x88x922
(1) x: m=0
w≅K{overscore (nxe2x88x921)}xe2x80x83xe2x80x83(8.3)
z≅[(IpkxPpnxe2x88x92k)K]{overscore (nxe2x88x922)}xe2x80x83xe2x80x83(8.4)
(2) y: 1xe2x89xa6mxe2x89xa6nxe2x88x92kxe2x88x922                    w        ≃                              [                                          (                                                      P                                          p                                              k                        +                        1                                                                              xc3x97                                      I                                          p                                              n                        -                        k                        -                        1                                                                                            )                            ⁢                                                ∏                                      t                    =                    1                                                        m                    -                    1                                                  ⁢                                                      (                                                                  I                                                  p                          t                                                                    xc3x97                                              P                                                  p                                                      n                            -                            t                            -                            1                                                                                              xc3x97                                              I                        p                                                              )                                    ⁢                  K                                                      ]                                              n              -              1                        _                                              (8.5)                                z        ≃                              [                                          P                                  p                  n                                            ⁢                                                ∏                                      t                    =                    1                                                        m                    -                    1                                                  ⁢                                                      (                                                                  I                                                  p                          t                                                                    xc3x97                                              P                                                  p                                                      n                            -                            t                            -                            1                                                                                              xc3x97                                              I                        p                                                              )                                    ⁢                  K                                                      ]                                              n              -              2                        _                                              (8.6)            
(3) z: nxe2x88x92kxe2x88x921xe2x89xa6mxe2x89xa6nxe2x88x921                    w        ≃                              [                                          (                                                      P                                          p                                              k                        +                        1                                                                              xc3x97                                      I                                          p                                              n                        -                        k                        -                        1                                                                                            )                            ⁢                                                ∏                                      t                    =                    1                                                        m                    -                    1                                                  ⁢                                                      (                                                                  I                                                  p                          t                                                                    xc3x97                                              P                                                  p                                                      n                            -                            t                            -                            1                                                                                              xc3x97                                              I                        p                                                              )                                    ⁢                  K                                                      ]                                              n              -              1                        _                                              (8.7)                                z        ≃                              [                                          P                                  p                  n                                            ⁢                                                ∏                                      t                    =                    1                                                        m                    -                    1                                                  ⁢                                                      (                                                                  I                                                  p                          t                                                                    xc3x97                                              P                                                  p                                                      n                            -                            t                            -                            1                                                                                              xc3x97                                              I                        p                                                              )                                    ⁢                  K                                                      ]                                              n              -              2                        _                                              (8.8)            
II. k=nxe2x88x922
(1) u: m=0
w≅K{overscore (nxe2x88x921)}
z≅[(Ipnxe2x88x922xPp2)K]{overscore (nxe2x88x922)}xe2x80x83xe2x80x83(8.9)
(2) v: mxe2x89xa71                    w        ≃                              [                                          ∏                                  t                  =                  0                                                  m                  -                  1                                            ⁢                                                (                                                            I                                              p                        t                                                              xc3x97                                          P                                              p                                                  n                          -                          t                          -                          1                                                                                      xc3x97                                          I                      p                                                        )                                ⁢                K                                      ]                                              n              -              1                        _                                              (8.10)                                z        ≃                              [                                          P                                  p                  n                                            ⁢                                                ∏                                      t                    =                    1                                                        m                    -                    1                                                  ⁢                                                      (                                                                  I                                                  p                          t                                                                    xc3x97                                              P                                                  p                                                      n                            -                            t                            -                            1                                                                                              xc3x97                                              I                        p                                                              )                                    ⁢                  K                                                      ]                                              n              -              2                        _                                              (8.11)            
t: k=nxe2x88x921                    w        =                  z          ≃                                    [                                                ∏                                      t                    =                    0                                                        m                    -                    1                                                  ⁢                                                      (                                                                  I                                                  p                          t                                                                    xc3x97                                              P                                                  p                                                      n                            -                            t                            -                            1                                                                                              xc3x97                                              I                        p                                                              )                                    ⁢                  K                                            ]                                                      n                -                1                            _                                                          (8.12)            
Evaluated, these hypercubes yield the following pilot elements assignments:
x: (k less than nxe2x88x922, m=0)                     w        =                              ∑                          j              =              0                                      n              -              2                                ⁢                                    p              t                        ⁢                          j              t                                                          (8.13)                                z        =                                            ∑                              j                =                0                                            k                -                1                                      ⁢                                          p                t                            ⁢                              j                t                                              +                                    p                              n                -                1                                      ⁢                          j              k                                +                                    ∑                              t                =                                  k                  +                  1                                                            n                -                2                                      ⁢                                          p                                  t                  -                  1                                            ⁢                              j                t                                                                        (8.14)            
y: k less than nxe2x88x922, 1xe2x89xa6mxe2x89xa6nxe2x88x92kxe2x88x922                    w        =                                            p              k                        ⁢                          i              0                                +                                    ∑                              s                =                1                                            m                -                1                                      ⁢                                          p                                  n                  -                  1                  -                  s                                            ⁢                              i                s                                              +                                    ∑                              t                =                m                                            m                +                k                -                1                                      ⁢                                          p                                  t                  -                  m                                            ⁢                              j                t                                              +                                    ∑                              t                =                                  m                  +                  k                                                            n                -                2                                      ⁢                                          p                                  t                  -                  m                  +                  1                                            ⁢                              j                t                                                                        (8.15)                                z        =                                            p                              n                -                1                                      ⁢                          i              0                                +                                    ∑                              s                =                1                                            m                -                1                                      ⁢                                          p                                  n                  -                  2                  -                  s                                            ⁢                              i                s                                              +                                    ∑                              t                =                m                                            n                -                2                                      ⁢                                          p                                  t                  -                  m                                            ⁢                              j                t                                                                        (8.16)            
z: k less than nxe2x88x922, nxe2x88x92kxe2x88x921xe2x89xa6mxe2x89xa6nxe2x88x921                    w        =                                            p              k                        ⁢                          i              0                                +                                    ∑                              s                =                1                                            n                -                k                -                2                                      ⁢                                          p                                  n                  -                  1                  -                  s                                            ⁢                              i                s                                              +                                    ∑                              s                =                                  n                  -                  k                  -                  1                                                            m                -                1                                      ⁢                                          p                                  n                  -                  2                  -                  s                                            ⁢                              i                s                                              +                                    ∑                              s                =                m                                            n                -                2                                      ⁢                                          p                                  t                  -                  m                                            ⁢                              j                t                                                                        (8.17)                                z        =                                            p                              n                -                1                                      ⁢                          i              0                                +                                    ∑                              s                =                1                                            m                -                1                                      ⁢                                          p                                  n                  -                  2                  -                  s                                            ⁢                              i                s                                              +                                    ∑                              t                =                m                                            n                -                2                                      ⁢                                          p                                  t                  -                  m                                            ⁢                              j                t                                                                        (8.18)            
u: k=nxe2x88x922, m=0                    w        =                              ∑                          t              =              0                                      n              -              2                                ⁢                                    p              t                        ⁢                          j              t                                                          (8.19)                                z        =                                            ∑                              j                =                0                                            n                -                3                                      ⁢                                          p                t                            ⁢                              j                t                                              +                                    p                              n                -                1                                      ⁢                          j                              n                -                2                                                                        (8.20)            
v: k=nxe2x88x922, mxe2x89xa71                    w        =                                            ∑                              s                =                0                                            m                -                1                                      ⁢                                          p                                  k                  -                  s                                            ⁢                              i                s                                              +                                    ∑                              t                =                m                                            n                -                2                                      ⁢                                          p                                  t                  -                  m                                            ⁢                              j                t                                                                        (8.21)                                z        =                                            p                              n                -                1                                      ⁢                          i              0                                +                                    ∑                              s                =                1                                            m                -                1                                      ⁢                                          p                                  k                  -                  s                                            ⁢                              i                s                                              +                                    ∑                              t                =                m                                            n                -                2                                      ⁢                                          p                                  t                  -                  m                                            ⁢                              j                t                                                                        (8.22)            
t: k=nxe2x88x921                    w        =                  z          =                                                    ∑                                  s                  =                  0                                                  m                  -                  1                                            ⁢                                                p                                      n                    -                    2                    -                    s                                                  ⁢                                  i                  s                                                      +                                          ∑                                  t                  =                  m                                                  n                  -                  2                                            ⁢                                                p                                      t                    -                    m                                                  ⁢                                  j                  t                                                                                                  (          8.23          )                    
A processor is considered optimal if it requires a minimum of memory partitions, is shuffle free, meaning the absence of clock times used uniquely for shuffling and produces an ordered output given an ordered input in Optimal Parallel and Pipelined Processing Through a New Class of Matrices with Application to Generalized Spectral Analysisxe2x80x9d, Michael J. Corinthios, IEEE Trans. Comput., Vol. 43, April 1994, pp. 443-459. We have seen in Optimal Parallel and Pipelined Processing Through a New Class of Matrices with Application to Generalized Spectral Analysisxe2x80x9d, Michael J. Corinthios, IEEE Trans. Comput., Vol. 43, April 1994, pp. 443-459, that p2-optimal algorithms and processors lead to a minimum number of p2 partitions of N/p2 queue length each. With M=pm base-p processors operating in parallel the number of partitions increases to pm+2 and the queue length of each partition reduces to N/pm+2.
An optimal multiprocessing algorithm should satisfy such optimality constraints. The horizontal spacing between simultaneously accessed pilot elements defines the input memory queue length. The vertical spacing defines the output memory queue length. With M processors applied in parallel the horizontal spacing between the accessed elements will be referred to as the xe2x80x9cinput pitchxe2x80x9d, while the vertical spacing as the xe2x80x9coutput pitchxe2x80x9d.
By choosing the pilot elements leading to the maximum possible pitch, which is the highest of the two values: the minimum input pitch and minimum output pitch, optimality in the form of N/pm+2 queue length is achieved.
We note that Optimal Minimum memory queue length MMQL satisfies   MMQL  =      {                                                      p                              n                -                m                -                2                                      ;                                                m            ≤                          n              -              2                                                                        1            ;                                                m            =                          n              -              1                                          
The following algorithm, Algorithm 2, describes this approach to state assignment optimality.
Algorithm 2: Optimality search
begin
Extract pilots matrix
Apply right scan
Evaluate input pitch
Evaluate output pitch
pi,min=min[input pitch]
po,min=min[output pitch]
pr,min=min[pi,min, po,min]
Apply down scan
Evaluate output pitch
pi,min=min[input pitch]
po,min=min[output pitch]
pd,min=min[pi,min, po,min]
Optimal pitch=max[pd,min, pr,min]
If pr,minxe2x89xa7pd,min then optimal=right scan Else optimal=down scan
Apply hypercube transformations
Dispatch and sequence M processors
end
In following the algorithm we note that in the validity condition y of the Bk matrix y: 1xe2x89xa6mxe2x89xa6nxe2x88x92kxe2x88x922 the results obtained are such that the digit i0 of w is of a weight pk. Hence the input pitch is pk while the output pitch which can be deduced from the position of i0 in z is pnxe2x88x921, that is, maximal possible. The input pitch is thus function of k and can be low if k is small. By performing a down scan of Bk we obtain the following solution:
k less than nxe2x88x922
y: 1xe2x89xa6mxe2x89xa6nxe2x88x92kxe2x88x922
w: 0 i0 i1 . . . imxe2x88x921 jnxe2x88x922 . . . jm+1 jm 
z: jm+k 0 i0 i1 . . . imxe2x88x921 jnxe2x88x922 . . . jm+k+1 jm+kxe2x88x921 . . . jm+1 jm 
where now it is imxe2x88x921 that leads to a minimum pitch and it has a weight of pnxe2x88x92mxe2x88x921 in w and pnxe2x88x92mxe2x88x922 in z. We deduce that the minimum pitch in this solution is pnxe2x88x92mxe2x88x922, which is the optimal sought. The same reasoning leads to the optimal assignment for the case
k less than nxe2x88x922,
z: nxe2x88x92kxe2x88x921xe2x89xa6mxe2x89xa6nxe2x88x921
w: 0 i0 i1 . . . imxe2x88x921 jnxe2x88x922 . . . jm+1 jm 
z: inxe2x88x922xe2x88x92k 0 i0 i1 . . . inxe2x88x923xe2x88x92k inxe2x88x921xe2x88x92k inxe2x88x92k . . . imxe2x88x921 jnxe2x88x922 . . . jm+1 jm 
These are the only two cases of the matrix that need be thus modified for optimality. All results obtained above for the other validity conditions can be verified to be optimal.
In the above from one iteration to the next the value of k is incremented. In each iteration once the pilot element matrix coordinates (w, z) are determined as shown above each processor accesses p elements spaced by the row span starting with the pilot element and writes its p outputs at addresses spaced by the column span. The row and column spans of a matrix are evaluated as is shown in Optimal Parallel and Pipelined Processing Through a New Class of Matrices with Application to Generalized Spectral Analysisxe2x80x9d, Michael J. Corinthios, IEEE Trans. Comput., Vol. 43, April 1994, pp. 443-459. In particular we note that the matrix
Bk=CHkxe2x80x83xe2x80x83(9.1)
has the same column span as that of C, namely "sgr"c(Bk)="sgr"c(C)=pnxe2x88x921. The row span of Bk is evaluated by noticing that Bk has the same structure as C with its columns permuted in accordance with the order implied by
Hkxe2x88x921=IpkxPpnxe2x88x92kxe2x88x921xe2x80x83xe2x80x83(9.2)
The transformation of the hypercube (inxe2x88x921 . . . i1i0) corresponding to Hkxe2x88x921 is one leading to a most significant digit equal to inxe2x88x922. Since this digit changes value from 0 to 1 in a cycle length of pnxe2x88x922 we deduce that the row span of all the Bk matrices is simply
"sgr"R(Bk)=pnxe2x88x922xe2x80x83xe2x80x83(9.3)
Each processing element thus accesses p operands spaced pnxe2x88x922 points apart and writes their p outputs at points which are pnxe2x88x921 points apart.
The sampling matrices of the GWK factorization are more complex in structure than the other generalized spectral analysis matrices. They are defined by
xcex93i=Pxe2x88x921GiSi+1xe2x80x83xe2x80x83(11.1)
Let
Lixcex94Pxe2x88x921Gixe2x80x83xe2x80x83(11.2)
we have
xcex93i=LiSi+1xe2x80x83xe2x80x83(11.3)
We note that the sampling matrix Gi has the same structure in poles and zeros in Optimal Parallel and Pipelined Processing Through a New Class of Matrices with Application to Generalized Spectral Analysisxe2x80x9d, Michael J. Corinthios, IEEE Trans. Comput., Vol. 43, April 1994, pp. 443-459, that is, in the positions of non-zero and zero elements respectively, as that of the matrix CN. We can write for the matrix Gi
wGi≅(jnxe2x88x922 . . . j1j0)
zGi≅(jnxe2x88x922 . . . j1j0)xe2x80x83xe2x80x83(11.4)
as the pilot elements positions.
Given the definition of the matrix Li a hypercube rotation corresponding to the matrix Pxe2x88x921 would yield the w and z values of Li as:
wLi≅(jnxe2x88x9220jnxe2x88x923 . . . j1j0)
zLi=Pxe2x88x921wLi≅(0jnxe2x88x923 . . . j1j0jnxe2x88x922)xe2x80x83xe2x80x83(11.5)
Alternatively, a z-ordered counterpart can be written as:
zLi≅(0jnxe2x88x922 . . . jij0)
wLi≅(j00jnxe2x88x922 . . . j2j1)xe2x80x83xe2x80x83(11.6)
Similarly, the matrix xcex930=G0S1 which is obtained from G0 by permuting its columns according to the order dictated by
xe2x80x83S1xe2x88x921=Ppnxe2x88x921xe2x88x921xIpxe2x80x83xe2x80x83(11.7)
leads to the m=0 template assignment
wxcex930≅(0jnxe2x88x922 . . . j1j0)xe2x80x83xe2x80x83(11.8)
zxcex930=S1wxcex930≅(0j0jnxe2x88x922 . . . j2j1)xe2x80x83xe2x80x83(11.9)
and a similar z-ordered state assignment counter part.
For
xcex93k=G0Sk; k greater than 0xe2x80x83xe2x80x83(11.10)
we have
Skxe2x88x921=Ipkxe2x88x921xPpnxe2x88x92kxe2x88x921xIpxe2x80x83xe2x80x83(11.11)
which leads to the state template assignment
wxcex93k≅wLi≅(jnxe2x88x9220jnxe2x88x923 . . . j1j0),
zxcex93k=Sk+1zLi≅(0jkxe2x88x921jnxe2x88x923 . . . jk+1jkjkxe2x88x922 . . . j1j0jnxe2x88x922); k greater than 0.xe2x80x83xe2x80x83(11.12)
With m made variable a right scan yields the following expressions for the different validity conditions
                              1.          ⁢                      xe2x80x83                    ⁢          k                =        0                            xe2x80x83                                                                                                      a                  :                  k                                =                0                            ,                              m                =                0                                                                                        xe2x80x83                            ⁢                              w                ≃                                  K                                                            n                      -                      1                                        _                                                                                                                          xe2x80x83                                                                          xe2x80x83                            ⁢                              z                ≃                                                      P                                          p                      n                                                        ⁢                                      K                                                                  n                        -                        1                                            _                                                                      ≡                                                      [                                                                  (                                                                              P                                                          p                                                              n                                -                                1                                                                                                              xc3x97                                                      I                            p                                                                          )                                            ⁢                      K                                        ]                                                                              n                      -                      1                                        _                                                                                                          (        11.13        )                                                                                                      b                  :                  k                                =                0                            ,                              m                ≥                2                                                                                        xe2x80x83                            ⁢                              w                ≃                                                      [                                                                  ∏                                                  t                          =                          1                                                                          m                          -                          1                                                                    ⁢                                                                        (                                                                                    I                                                              p                                t                                                                                      xc3x97                                                          P                                                              p                                                                  n                                  -                                  t                                  -                                  1                                                                                                                      xc3x97                                                          I                              p                                                                                )                                                ⁢                        K                                                              ]                                                                              n                      -                      1                                        _                                                                                                          (        11.14        )                                          xe2x80x83                ⁢                  z          ≃                                    [                                                ∏                                      t                    =                    0                                                        m                    -                    1                                                  ⁢                                                      (                                                                  I                                                  p                          t                                                                    xc3x97                                              P                                                  p                                                      n                            -                            t                            -                            1                                                                                              xc3x97                                              I                        p                                                              )                                    ⁢                  K                                            ]                                                      n                -                1                            _                                                          (        11.15        )                                          2.          ⁢                      xe2x80x83                    ⁢          1                ≤        k        ≤                  n          -          3                                    xe2x80x83                                                                                                      c                  :                  m                                =                0                            ⁢                              xe2x80x83                                                                                        xe2x80x83                            ⁢                              w                ≃                                                      [                                                                  (                                                                              I                                                          p                                                              n                                -                                2                                                                                                              xc3x97                                                      P                                                          p                              2                                                                                                      )                                            ⁢                      K                                        ]                                                                              n                      -                      2                                        _                                                                                                          (        11.16        )                                          xe2x80x83                ⁢                  z          ≃                                    [                                                (                                                            I                                              p                        k                                                              xc3x97                                          P                                              p                                                  n                          -                          k                          -                          1                                                                                      xc3x97                                          I                      p                                                        )                                ⁢                                  (                                                            P                                              p                                                  n                          -                          1                                                                                            -                        1                                                              xc3x97                                          I                      p                                                        )                                ⁢                K                            ]                                                      n                -                1                            _                                                          (        11.17        )                                                                                                      d                  :                  m                                =                1                            ⁢                              xe2x80x83                                                                                        xe2x80x83                            ⁢                              w                ≃                                                      [                                                                  (                                                                              I                                                          p                                                              n                                -                                2                                                                                                              xc3x97                                                      P                                                          p                              2                                                                                                      )                                            ⁢                                              (                                                                              P                                                          p                              k                                                                                xc3x97                                                      I                                                          p                                                              n                                -                                k                                                                                                                                    )                                            ⁢                      K                                        ]                                                                              n                      -                      2                                        _                                                                                                          (        11.18        )                                          xe2x80x83                ⁢                  z          ≃                                    [                                                (                                                            I                      p                                        xc3x97                                          P                                              p                                                  n                          -                          2                                                                                      xc3x97                                          I                      p                                                        )                                ⁢                                  (                                                            P                                              p                                                  n                          -                          1                                                                                            -                        1                                                              xc3x97                                          I                      p                                                        )                                ⁢                K                            ]                                                      n                -                1                            _                                                          (        11.19        )                                                                    e              :                              m                ≥                2                                                                                        xe2x80x83                            ⁢                              z                ≃                                                      [                                                                  (                                                                              P                                                          p                                                              n                                -                                1                                                                                                              xc3x97                                                      I                            p                                                                          )                                            ⁢                                                                        ∏                                                      t                            =                            2                                                                                m                            -                            1                                                                          ⁢                                                                              (                                                                                          I                                                                  p                                  t                                                                                            xc3x97                                                              P                                                                  p                                                                      n                                    -                                    t                                    -                                    1                                                                                                                              xc3x97                                                              I                                p                                                                                      )                                                    ⁢                          K                                                                                      ]                                                                              n                      -                      1                                        _                                                                                                          (        11.20        )                                                      α            )                    ⁢                      xe2x80x83                    ⁢          m                ≥                  n          -          k                                    xe2x80x83                                          xe2x80x83                ⁢                  w          ≃                                    [                                                (                                                            P                                              p                        k                                                              xc3x97                                          I                                              p                                                  n                          -                          k                                                                                                      )                                ⁢                                                      ∏                                          t                      =                      1                                                              m                      -                      1                                                        ⁢                                                            (                                                                        I                                                      p                            t                                                                          xc3x97                                                  P                                                      p                                                          n                              -                              t                              -                              1                                                                                                      xc3x97                                                  I                          p                                                                    )                                        ⁢                    K                                                              ]                                                      n                -                2                            _                                                          (        11.21        )                                                                                                      β                  )                                ⁢                                  xe2x80x83                                ⁢                2                            ≤              m              ≤                              n                -                k                                                                        w              ≃                                                [                                                            (                                                                        P                                                      p                            k                                                                          xc3x97                                                  I                                                      p                                                          n                              -                              k                                                                                                                          )                                        ⁢                                                                  ∏                                                  t                          =                          1                                                                          m                          -                          1                                                                    ⁢                                                                        (                                                                                    I                                                              p                                t                                                                                      xc3x97                                                          P                                                              p                                                                  n                                  -                                  t                                  -                                  1                                                                                                                      xc3x97                                                          I                              p                                                                                )                                                ⁢                        K                                                                              ]                                                                      n                    -                    2                                    _                                                                                        (        11.22        )                                          3.          ⁢                      xe2x80x83                    ⁢          k                ≥                  n          -          2                                    xe2x80x83                                          xe2x80x83                ⁢                  w          ≃                                    [                                                (                                                            P                                              p                                                  n                          -                          2                                                                                      xc3x97                                          P                                              p                        2                                                                              )                                ⁢                K                            ]                                                      n                -                2                            _                                                          (        11.23        )                                          xe2x80x83                ⁢                  z          ≃                                    [                                                (                                                            P                                              p                                                  n                          -                          1                                                                                            -                        1                                                              xc3x97                                          I                      p                                                        )                                ⁢                K                            ]                                                      n                -                1                            _                                                          (        11.24        )                                          g          :          m                =                              1            ⁢                          xe2x80x83                        ⁢            w                    ≃                                    [                                                (                                                            I                                              p                        2                                                              xc3x97                                          P                                              p                                                  n                          -                          2                                                                                                      )                                ⁢                                  (                                                            P                                              p                                                  n                          -                          2                                                                                      xc3x97                                          I                                              p                        2                                                                              )                                ⁢                K                            ]                                                      n                -                2                            _                                                          (        11.25        )                                          xe2x80x83                ⁢                  z          ≃                                    [                                                (                                                            P                                              p                                                  n                          -                          2                                                                                            -                        1                                                              xc3x97                                          I                                              p                        2                                                                              )                                ⁢                                  (                                                            P                                              p                                                  n                          -                          1                                                                                      xc3x97                                          I                      p                                                        )                                ⁢                K                            ]                                                      n                -                1                            _                                                          (        11.26        )                                h        :                  m          ≥                      2            ⁢                          xe2x80x83                        ⁢            w                    ≃                                    [                                                (                                                            P                                              p                                                  n                          -                          2                                                                                      xc3x97                                          I                                              p                        2                                                                              )                                ⁢                                                      ∏                                          t                      =                      1                                                              m                      -                      1                                                        ⁢                                                            (                                                                        I                                                      p                            t                                                                          xc3x97                                                  P                                                      p                                                          n                              -                              t                              -                              1                                                                                                      xc3x97                                                  I                          p                                                                    )                                        ⁢                    K                                                              ]                                                      n                -                2                            _                                                          (        11.27        )                                i        :                  2          ≤          m          ≤                      n            -                          2              ⁢                              xe2x80x83                            ⁢              z                                ≃                                    [                                                (                                                            P                                              p                                                  n                          -                          1                                                                                      xc3x97                                          I                      p                                                        )                                ⁢                                                      ∏                                          t                      =                      2                                                              m                      -                      1                                                        ⁢                                                            (                                                                        I                                                      p                            t                                                                          xc3x97                                                  P                                                      p                                                          n                              -                              t                              -                              1                                                                                                      xc3x97                                                  I                          p                                                                    )                                        ⁢                    K                                                              ]                                                      n                -                1                            _                                                          (        11.28        )                                          j          :          m                =                              n            -                          1              ⁢                              xe2x80x83                            ⁢              z                                ≃                                    [                                                (                                                            P                                              p                                                  n                          -                          1                                                                                      xc3x97                                          I                      p                                                        )                                ⁢                                                      ∏                                          t                      =                      2                                                              m                      -                      1                                                        ⁢                                                            (                                                                        I                                                      p                            t                                                                          xc3x97                                                  P                                                      p                                                          n                              -                              t                              -                              1                                                                                                      xc3x97                                                  I                          p                                                                    )                                        ⁢                    K                                                              ]                                                      n                -                1                            _                                                          (        11.29        )            
A xe2x80x9cdownxe2x80x9d scan of the xcex93k matrix yields optimal assignments for two validity conditions:
1. k=0
a: k=0, m=1
w: 0 i0 jnxe2x88x922 . . . . j2 j1 
z: 0 j1 i0 jnxe2x88x922 . . . j3 j2 
b: k=0, mxe2x89xa72
w: 0 i0 i1 . . . imxe2x88x921 jnxe2x88x922 . . . jm+1 jm 
z: 0 jm i0 i1 . . . imxe2x88x922 imxe2x88x921 jnxe2x88x922 . . . jm+1 
All other assignments generated by the xe2x80x9crightxe2x80x9d scan are optimal and need not be replaced.
Using the same approach we deduce the spans of the different CGWK factorization matrices.
We have
"sgr"R(Li)="sgr"R(Gi)=pnxe2x88x921xe2x80x83xe2x80x83(11.30)
"sgr"c(Li)=pnxe2x88x922xe2x80x83xe2x80x83(11.31)
"sgr"R(xcex930)=pnxe2x88x921xe2x80x83xe2x80x83(11.32)
"sgr"c(xcex930)="sgr"c(G0)=pnxe2x88x921xe2x80x83xe2x80x83(11.33)
and
"sgr"R(xcex93i)=pnxe2x88x921xe2x80x83xe2x80x83(11.34)
"sgr"c(xcex93i)="sgr"c(Pxe2x88x921Gi)="sgr"c(Li)=pnxe2x88x922xe2x80x83xe2x80x83(11.35)
With N=16 and M=pm the pilots matrices xcex2k,m for different values of k and m are deduced from the results shown above. In what follows the pilot elements"" positions thus evaluated, associated with each xcex2k,m and the processor dispatched thereat at the appropriate clock are listed below for some values of k and m.                               β                      0            ,            1                          :                  [                                                                      P                  00                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                                                      xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              P                  01                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                                                      xe2x80x83                                                                              P                  02                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                                                      xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              P                  03                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                                                      xe2x80x83                                                                              xe2x80x83                                                                              P                  10                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                                                      xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              P                  11                                                                              xe2x80x83                                                                              xe2x80x83                                                                                                      xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              P                  12                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                                                      xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              P                  13                                                                              xe2x80x83                                                              ]                                                  β                      2            ,            3                          :                  [                                                                      P                  00                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                                                      xe2x80x83                                                                              P                  40                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                                                      xe2x80x83                                                                              xe2x80x83                                                                              P                  20                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                                                      xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              P                  60                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                                                      xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              P                  10                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                                                      xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              P                  50                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                                                      xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              P                  30                                                                              xe2x80x83                                                                              xe2x80x83                                                                                                      xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              P                  70                                                                              xe2x80x83                                                              ]                                                  B                      3            ,            2                          :                  [                                                                      P                  00                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                                                      xe2x80x83                                                                              P                  01                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                                                      xe2x80x83                                                                              xe2x80x83                                                                              P                  20                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                                                      xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              P                  21                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                                                      xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              P                  10                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                                                      xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              P                  11                                                                              xe2x80x83                                                                              xe2x80x83                                                                                                      xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              P                  30                                                                              xe2x80x83                                                                                                      xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              xe2x80x83                                                                              P                  31                                                              ]                    
For the matrix Bk with k=1, N=729 and M=9 we have
w={0, 81, 162, 27, 108, 189, 54, 135, 216, 1, 83, 163, 28, . . . ,
2, 83, 164, . . . , 218, 3, 84, 165, . . . , 18, 99, 180, . . . }
z={0, 27, 54, 9, 36, 63, 18, 45, 72, 1, 28, 55, 10, . . . , 2, 29, 56, . . . ,
74, 243, 270, 297, . . . , 6, 33, 60, . . . }
Nine elements are dispatched in one real time clock. The memory minimum queue length MMQL=minimum pitch=9=3nxe2x88x922xe2x88x92m, confirming the optimality of the state assignment.
For the matrix Bk with k=2, N=729 and M=243 processors we have
w={0, 81, 162, 27, 108, 189, 54, 135, 216, 9, 90, 171, 117, . . . ,
18, 99, 180, . . . , 3, 84, 165, . . . , 6, 87, 168, . . . , 1, 82, 163, . . . , 2, 83, 164, . . . }
z={0, 27, 54, 9, 36, 163, 18, 45, 72, 243, 270, 297, 252, . . . ,
486, 513, 640, . . . , 3, 30, 57, . . . , 6, 33, 60, . . . , 1, 28, 55, . . . 2, 29, 56, . . . }
MMQL=1. We note that if M=81 we obtain the same w and z values but here 81 pilot elements are dispatched in one clock rather than 243 as is the case for m=5. With m=4 the MMQL=3.
For the matrix xcex93k with k=3, N=729 and M=3. The xe2x80x9crightxe2x80x9d scan emphasizing scanning the upper rows before performing p-ary division from the top down using the above xcex93k results we obtain
w={0, 9, 18, 1, 10, 19, 2, 11, 20, . . . , 8, 17, 26, 27, 36, 45, 54, 63, 72, . . . ,
57, 66, 165, . . . , 243, 252, 261, 244, 253, . . . , }
z={0, 81, 162, 3, 84, 165, 6, 87, 168, . . . , 24, 105, 186, 27, 108, 189,
54, 135, 216, . . . , 141, 222, 403, . . . , 1, 82, 163, 4, 85, . . . }
We note that:
MMQL=minimum pitch=9
With m=1 the optimal memory queue length=27. Using a xe2x80x9cdownxe2x80x9d scan, applying a p-ary division from top down we obtain the optimal assignment by a simple shuffle of the above values:
w={0, 27, 54, 1, 28, 55, . . . , 8, 35, 62, 9, 36, 63, 10, 37, 56, . . . }
z={0, 27, 54, 3, 30, 57, 6, 33, 60, 9, . . . , 24, 51, 78, 81, 108, 135, 84, 111, 138, . . . }