1. Field of the Invention
This invention relates generally to the detection of unresolved point targets in clutter using a Spatial Modulation staring array image sensor that could incorporate Foveal Enhanced Imaging.
2. Background Information
Detection of unresolved point targets in clutter using an imaging electro optical (EO) or infrared (IR) sensor is required by many military and civilian system applications—for example, a missile threat warning (to protect aircraft from air-to-air and ground-to-air missiles), infrared search and track (IRST), long-range ballistic missile early launch detection, as well as security/surveillance applications in perimeter control, port security, and airport security. The fundamental detection problem in all of these systems is the identification of a point target against a background of sensor noise and scene clutter. In many cases detection is clutter-limited, not noise-limited.
It is well known that to achieve a high target-to-clutter ratio, a high-resolution imaging sensor should be used. The power spectral density (PSD) of clutter typically falls off rapidly with increasing spatial frequency, whereas the PSD of a point target remains constant up to the highest spatial frequency passed by the sensor. The target-to-clutter ratio, therefore, generally increases with increasing spatial frequency. A high-resolution imaging sensor provides access to high spatial frequencies, thereby enabling a high target-to-clutter ratio to be realized.
The resolution of an imaging sensor is bounded by the angular subtense of an array detector element, employed in the sensor. Providing the high-resolution (e.g., 1 milliradian (mr)) necessary for a high target-to-clutter ratio can be a particularly significant problem when this high-resolution must be provided continually over a wide field of view (WFOV) (e.g., 1.5 r) that is often required. This problem is quantified by the large number of pixels involved. (E.g., 1,500×1,500 pixels for 1 mr resolution over a 1.5 radians (r) field of view (FOV)).
There are three basic types of imaging sensors that can be used for the detection of unresolved point targets: scanning, conventional staring, and Spatial Modulation staring. Each of these sensor types takes a different approach to the fundamental problem of sensing and processing a large number of pixels over the FOV.
In a scanning sensor, a single detector element or a small array of detector elements is scanned, opto-mechanically, over the required FOV. The detector array in a scanning sensor will have many fewer elements (e.g., 10×10) than there are pixels over the FOV (e.g., 1,500×1,500). However, the scanning sensor is undesirable in many applications because of the cost and complexity of opto-mechanical scanning systems, and also because any particular point in the scene is sensed only periodically, and briefly, as the small detector array is repetitively scanned over the scene in a sweeping motion. For these and other reasons a staring sensor is employed in many applications.
In a conventional staring sensor the entire FOV is imaged at one time onto a focal plane array (FPA) of discrete detector elements. Each detector element generates a picture element, or “pixel,” in the output image that is read out electronically from the FPA at a certain frame rate. The conventional staring sensor senses the large number of pixels over the object scene by employing an FPA that has the same large number of detector elements. A limitation of the conventional staring sensor is the relatively small format (e.g., 256×256) of affordable, or even available, FPAs. A conventional staring sensor with a small-format FPA cannot provide the high-resolution necessary for high-target-to-clutter detection over a WFOV.
An additional limitation of the conventional staring sensor is the high processing load required to process the large number of pixels. A very processing-intensive high-pass spatial-filtering operation, or its equivalent, must be performed on the output images from the detector FPA. This operation is needed to access the high-end of the spatial frequency spectrum where the target-to-clutter ratio is highest. The higher the resolution of the sensor, not only the higher the number of pixels per frame to process, but also the higher the necessary FPA frame rate to avoid motion smear in a dynamic environment. Additionally, very processing-intensive temporal-filtering operations (such as track-before-detect) may be required to achieve maximum detection sensitivity when the resolution and frame rate are both high.
The spatial-modulation staring sensor is a new type of staring sensor for point-target detection in clutter. It is able to effectively sense a large number of pixels over the object scene using a detector FPA that has many fewer detector elements. High-performance point-target detection in clutter (e.g., with 1,500×1,500 pixels over the object scene) can thereby be implemented using an affordable and available small-format detector FPA (e.g., 256×256). Moreover, no spatial filtering of the detector FPA output is required, and temporal filtering can be effected by simple frame averaging because of the low pixel-to-subtense density of the output image from the FPA.
FIG. 1 is a schematic diagram of a spatial-modulation staring sensor. A high-resolution image 10 of the object scene is formed by high-resolution front-end optics 20. This image is formed over a chopping reticle 30 located at an intermediate focal plane 40. The reticle 30 is a uniform checkerboard pattern of transparent and opaque cells. The edges of the checkerboard cells define a set of rectangular x, y coordinate axes. Typically, the size of each cell is approximately the same as that of the point-spread function (PSF) of the front-end optics.
The input high-resolution image is multiplied, i.e., spatially modulated, by the reticle pattern. The checkerboard pattern can be characterized mathematically by a spatial function that has the value “1” over a clear checkerboard cell and the value “0” over an opaque cell. The modulated (reticle-modified) image appearing at the back surface of the reticle 30 is then re-imaged by relay optics 50 onto the detector FPA 60. There are approximately SMF×SMF as many checkerboard cells covering that image as there are detector elements in the FPA, where SMF is a design parameter (e.g., SMF=6). As FIG. 2 illustrates, that is, a subarray 70 of SMF×SMF checkerboard cells (clear 76 and opaque 78) is imaged onto each detector element 80 in the FPA 60.
The reticle 30 is moved steadily in a linear motion along a fixed direction transverse to the intermediate focal plane. An actuator (not shown, but represented by multi-dimension arrows 37) is employed to move the reticle within a fixed line along a two-dimensional plane of movement. This fixed direction is inclined to the rectangular axes (x, y) so that, for example, when a point on the reticle has moved by four cell widths along one axis (e.g., the x axis) it will then have moved by one cell width along the other axis (e.g., they axis). The reticle speed is such that it moves 0.5-cell width along one axis (e.g., x) and 0.125 cell width along the other axis (e.g., y) during each FPA frame period.
If a point target is present in a static scene the image of that target on the reticle 30 will sometimes fall over a clear reticle cell, sometimes over an opaque cell and sometimes between an opaque and clear cell. Because the checkerboard reticle is moved at 0.5 cell width per frame a target that falls over a clear cell in frame n will fall mostly over an opaque cell in frame n+2. In that case the target signal can be recovered, and the clutter cancelled, by subtracting frame n from frame n+2. In the general case it can be shown that the target signal can be recovered no matter where its image falls over the checkerboard reticle by similarly processing a sequence of eight frames.
As FIG. 3 indicates, processing circuitry 72 receives the sequence of raw output frames from the detector FPA 60. Those frames can be symbolized as Vn(x,y), where n is the frame number and (x,y) are the coordinates of a detector element or output pixel. The processing circuitry 72 demodulates this sequence of frames in the following manner to derive a sequence of spatial-modulation output frames VVn(x,y) in which clutter is strongly attenuated but a point target is not:
            VV      8        ⁡          (              x        ,        y            )        =                                                                        [                                                                            V                      1                                        ⁡                                          (                                              x                        ,                        y                                            )                                                        -                                                            V                      3                                        ⁡                                          (                                              x                        ,                        y                                            )                                                                      ]                            2                        +                                          [                                                                            V                      2                                        ⁡                                          (                                              x                        ,                        y                                            )                                                        -                                                            V                      4                                        ⁡                                          (                                              x                        ,                        y                                            )                                                                      ]                            2                        +                                                                                          [                                                                            V                      5                                        ⁡                                          (                                              x                        ,                        y                                            )                                                        -                                                            V                      7                                        ⁡                                          (                                              x                        ,                        y                                            )                                                                      ]                            2                        +                                          [                                                                            V                      6                                        ⁡                                          (                                              x                        ,                        y                                            )                                                        -                                                            V                      8                                        ⁡                                          (                                              x                        ,                        y                                            )                                                                      ]                            2                                          
The signal from a point target is preserved by this frame-differencing action because the target image on the reticle is localized to a cell, and the moving reticle pattern effectively complements after every other frame. The target signal will thus appear in one or more of the four difference images V1−V3, V2−V4, V5−V7, V6−V8, depending upon the position of the target relative to the cell pattern. It can be shown that the target signal appears in VV at approximately its full amplitude no matter where the target falls over the reticle pattern.
However, the background clutter is virtually canceled by this same differencing action. This is because the clutter is not localized to a cell width, but extends continuously over the reticle cells. Each detector element receives the clutter signal falling over an SMF×SMF (e.g., 6×6) subarray of cells. The magnitude of this clutter signal is almost unchanged as the reticle cells complement. Thus the differencing action (e.g., V1−V3), which preserves the target signal, cancels the clutter almost to zero.
After moving some distance the reticle motion is reversed. After moving some distance in this other direction the reticle motion is again reversed, and so on. The size of the reticle is large enough so that the image of the object scene is covered by the checkerboard pattern for all positions of the reticle as it is moved forward and backward along its direction of motion.
Frame differencing, as described above, provides (in VV) a high level of background-clutter suppression while preserving the signal from a point target at almost full amplitude. In addition to the spatial-modulation output VV, a conventional output VID can be derived by frame addition. For example:VIDn(x,y)=Vn(x,y)+Vn+2(x,y)
This conventional output is essentially the same as would be provided by a conventional staring sensor covering the same FOV using the same FPA.
Point targets are generally not detectable in the conventional output image VID because of its low angular resolution. Point targets can be detected in the spatial-modulation output image VV because it is derived by spatial modulation from the high-resolution optical image formed on the reticle. A detection derived from VV can be used to annotate the conventional output VID presented by, for example, a display 74 to show the position of the detected target in relation to the background scene.
Human vision is very strongly based upon true foveal imaging. A Spatial Modulation sensor with Foveal Enhanced Imaging mimics the foveal-peripheral vision of the eye, achieving high-resolution point target detection over a wide field-of-view (WFOV), using a low density (e.g. 256×256) detector array, while providing conventional high-resolution imaging over a small area, also using a low density detector array, and rapidly switching that small area within the total field-of-regard. In human vision the WFOV sensing performed by peripheral vision allows objects of any size to be detected, although not recognized. A Spatial Modulation sensor with Foveal Enhanced Imaging performs a different low density sampling than human peripheral vision and can detect only unresolved (hot spot) point targets.