1. Field of the Invention
This invention relates to processors for performing neighborhood transformation on matrices of data elements for image processing and the like and more particularly, to a high speed processor containing a plurality of sections that operate upon separate sections of a single data matrix simultaneously.
2. Prior Art
Neighborhood processors are a class of devices that operate upon a first data array or matrix to generate a second matrix wherein each element has a value dependent upon the value of its equivalent element in the first matrix, and the values of its neighboring elements in the first matrix. These devices are useful for pattern recognition, image enhancement, area correlation and like image processing functions. One form of prior art neighborhood processing device is constructed in a parallel array form with a single computing element for each matrix element or pixel. A parallel array neighborhood processor of this type is disclosed in U.S. Pat. No. 3,106,698 to Unger. It comprises a matrix of identical processing cells, each cell including a memory register for storing the value of a single data element (pixel), and a neighborhood logic translator for computing the transformed value of that pixel as a function of the present value of the pixel and the neighborhood pixel values, and parallel connections between the translator and neighboring memory registers. The neighborhood logic may be fixed, in which case the same transformation is repeated indefinitely, or, may be programmable, in which case the neighborhood transition function may be modified at required times in the transition sequence of the image processing scheme. A common clock causes a simultaneous transition in the state of all the pixel values registers to achieve a transformation of the entire matrix.
The principal advantage of such a parallel array processor is speed. A neighborhood transformation of the entire image or matrix requires only a single clock pulse interval so that transformations may be performed at rates of millions per second. The principal disadvantage of the parallel array processor configuration is complexity since the neighborhood logic must be replicated in every processor cell, making a processor for large arrays, such as 1000.times.1000, which may be a reasonable size for a digitized image, very large and costly.
A serial array processor represents an alternative approach to neighborhood processing which greatly simplifies the processor structure at the expense of speed when compared to the parallel array. Such a system is disclosed in U.S. Pat. No. 3,339,179 to Shelton, et al. and in my U.S. patent application Ser. No. 742,127. That system employs a chain of serial neighborhood processing stages, each stage capable of generating the transformed value of a single pixel within a single clock pulse interval. The serial neighborhood processing stage employs a neighborhood logic translator identical to its counterpart in the parallel array processor cell, and line delay memory for receiving a serial pixel stream from a row by row raster scan of the input matrix and for configuring the neighborhood window by providing the appropriate matrix elements to the neighborhood logic translator. The serialized input matrix is provided to the line delay memory and the data bits are serially shifted through the line delays. When the line delay memory has been filled with input data it contains the neighborhood configuration for the first element to be transformed. Taps at appropriate positions in the line delay memory provide parallel neighborhood element values to the neighborhood logic translator. These tapped memory elements in the tapped line delay memory constitute the neighborhood window registers.
The output of a serial neighborhood processing stage occurs at the same rate as its input and has the same format. This allows the output of one stage to be provided to the input of a subsequent stage, which may perform the same or a different neighborhood logic transformation. A chain of serial neighborhood processing stages constitute a serial array processor.
The most complex section of either a serial neighborhood processor stage or of a parallel array processor cell is the neighborhood logic translator. The serial array processor is conservative of neighborhood logic translator circuitry requiring only one translator circuit per stage while a parallel array requires one translator circuit for each matrix element.
In most practical design applications where the input matrix represents an image, the matrix size must be relatively large in order to achieve high resolution. For example, when the input matrix is generated by a state-of-the-art television pick-up tube it may be digitized into a matrix of about 1,000 .times.1,000 pixels. The designer of a processor for this image is faced by the choice of a parallel array processor which can generate one transformation per clock time but will have 1,000,000 relatively complex cellular elements; or a serial array processor consisting of a chain of serial processing stages, one stage for each neighborhood transformation in the image processing algorithm.
The parallel array processor transforms images at the maximal rate of one neighborhood image transformation per clock pulse interval, while the cyclic serial array processor performs image transformations at the rate of K/P image transformation per discrete time step, where P is the total number of pixels in an image and K is the number of processing stages in the serial array. For large images, the ratio of serial array processor speed to parallel array processor speed can be very small. Since processing speed can only be increased by increasing the ratio of neighborhood logic modules to the total number of data elements in the matrix, the question arises as to whether it is possible to incorporate more than one neighborhood logic module per serial processor stage, or equivalently, whether it is possible to reduce the number of line delay memory elements associated with each neighborhood logic module.
Another design problem arises from the desirability of forming the processor using integrated circuit techniques. When chains of serial image processing stages must operate on large arrays the total number of elements in the stage line delay memory may prohibit the custom integration of the processing stage circuitry on a single large scale integrated chip. When efforts are made to divide the serial processor onto a number of smaller chips the large number of interconnections between the chips frustrates the design approach.