The invention relates to signal processing. More specifically, the invention relates to a new apparatus and method implementing a new Distributed Arithmetic architecture for creating an inner product of a vector and a matrix and having a particularly useful application in a digital camera and similar image processing applications.
As consumers become more comfortable with digital devices such as compact disks, computers, printers, and cell phones, they are more willing to accept digitally enabled technology in other areas of their lives, such as photography. Indeed, the current trend in photography is toward digital photography that eliminates the requirement for film and instead uses digital storage devices in place of the film to hold the pictures users have taken. However, the cost of digital cameras still remain outside the reach of most people and efforts are being taken to bring the cost down in order to allow for mass acceptance. In addition to lowering costs, in order to increase the demand for digital cameras, the image quality of the pictures must be comparable with that of a typical film based photo. This image quality is driven by increasing the number of pixels (the light-to-electricity converters) used in the image sensor within the digital camera. Unfortunately, this increase in the number of pixels further drives up the cost of the digital camera due to the increased processing demands required to convert the image captured on the image sensor into an acceptable digital format that can fit within the limits of the digital storage device used in the camera. In order to allow an acceptable number of pictures to be taken with a digital camera and stored within it, some form of image compression is necessary to reduce the storage requirements.
Naturally, users are also demanding new features to take advantage of the digital properties of the pictures they have taken. For example, rather than correct for color balance, light levels, contrast, etc. on a personal computer after a set of photographs have been taken, the users wish to have these operations performed automatically on the camera itself so the pictures can be reproduced directly on a color printer, thus bypassing the personal computer entirely.
Therefore, to enable the digital photography market, the cost of a digital camera must be reduced while adding additional functionality. This new digital camera requires that the electronics within it be versatile enough to provide the additional functionality. In addition, the electronics must require less integrated circuit area so that costs are decreased.
Some previous attempts to reduce size and cost of image processing circuits have focused on Distributed Arithmetic methods. Distributed Arithmetic (DA) gets its name because the arithmetic functions are distributed among various electronic devices in a non-conventional sense, rather than in discrete arithmetic blocks that are coupled together (e.g. addition, multiplication). In image processing, the most encountered form of arithmetic is multiplying a vector (a portion of the image) and a matrix (a transform function, such as image compression or expansion) to form an inner product. Fortunately, this inner product arithmetic is performed most efficiently by DA. In fact, previous DA methods have been successful in reducing the number of transistors used in an image processing integrated circuit by at least 50-80% over previous conventional architectures. However, the continuing need to reduce cost while providing still more functionality requires that a new DA method be implemented to further reduce the number of transistors in image processing circuits.
An apparatus computes an inner product vector of a matrix and a vector. The matrix has a first set of coefficients and the vector has a second set of coefficients. At least one input register is used to store the second set of coefficients. A plurality of storage elements are used to store partial sums that are pre-calculated from the first set of coefficients of the matrix. The outputs of the at least one input register are used as the address inputs to the plurality of storage elements to select a subset of the partial sums. In addition, a select circuit is coupled to the storage elements"" address lines to determine which row in the matrix the vector forms one element of the resultant inner product vector. The subset of partial sums from the outputs of the storage elements are added in an adder circuit to create a summation output that presents the element of the inner product vector of the matrix multiplied by the vector.