Electronic retinal prostheses for treating retinal degenerative diseases such as retinitis pigmentosa (RP) and age-related macular degeneration (AMD) are known. In these diseases, the photoreceptor cells of the retina are affected but other retinal cells remain relatively intact. Hence, the retinal prosthesis electrically activates these remaining cells of the retina to create artificial vision. External components are used to acquire and code image data for transmission to an implanted retinal stimulator.
FIG. 1 shows a schematic representation of the main components of a visual prosthesis taken from U.S. published patent application 2005/0288735, incorporated herein by reference in its entirety. In particular, the external portion (1) of the visual prosthesis comprises an imager (10), e.g., a video camera to capture video in real time, and a video data processing unit (20) comprising a Digital Signal Processor (DSP) to process video data and then send output commands (30) for a retinal stimulator to be implanted on the retina of a patient. Currently, the video camera (10) outputs images having resolution of about 640×480 to the DSP in real time. The DSP takes in these images and does image processing on each of these input frames. The processing rate can be of about 3 to 30 image frames per second depending on the processing method or algorithm. The output (30) of the DSP is to be sent to the telemetry block (3) of the retinal prosthesis system, which will not be explained here in detail.
The electrode grid of the retinal stimulator (40) to be implanted in the eye is currently of size 10×6. This implies that the resolution of the image output (30) from the DSP contained in the video data processing unit (20) is to be 10×6. A 10×6 size electrode grid array significantly reduces the field of vision that can be offered to a patient. Currently, such field of view is about 20 degrees. With a restricted field of vision, the patient has to scan the scene in front with head movements and find out the region that looks significant or worth giving attention to.
Saliency methods or algorithms are known in the art. Generally speaking, a saliency method treats different parts of the same image differently, focusing only on relevant portions or sections of that image. A saliency algorithm per se is known from Itti et al (see “A model for saliency-based search visual attention for rapid scene analysis” by Laurent Itti, Christof Koch and Ernst Niebur, IEEE Transactions for Pattern Analysis and Machine Intelligence, Vol 20, No 11 Nov. 1998 and “A saliency-based search mechanism for overt and covert shifts of visual attention” by Laurent Itti and Kristof Koch, both of which are incorporated herein by reference in their entirety). In particular, Itti et al use a visual-attention model which is based on an early primate visual attention model and aims at finding the most salient region of an image frame at any given instant. According to Itti et al's algorithm, saliency at a given location is determined primarily by how different the location is from its surround in color, orientation, motion, depth and so on.
FIG. 2 shows a general overview of Itti et al's saliency algorithm. In particular, the algorithm concentrates on the color (50), intensity (60) and orientation (70) information in any given image frame to calculate the most conspicuous location of an input image. 7 streams of information like intensity, Red-Green color, Blue-Yellow color and orientations at 0, 45, 90 and 135 degrees are extracted from the input image. Image pyramids with 9 levels are constructed for each of the information streams. Feature-maps for the center-surround structure are developed by taking the difference between the finer and coarser scales of the pyramids. Three conspicuity maps are constructed by summing the feature-maps of the three kinds of information after normalization. Normalization is a process to combine different modalities like intensity, color and orientation. Normalization helps to promote the maps having strong peaks and suppress the maps having comparable peaks and to bring feature-maps of the different information streams at the same level in order to linearly sum them to form the final saliency map. The saliency conspicuity map is constructed by summing the conspicuity maps for color, intensity and orientation, thus obtaining a topographically arranged map that represents visual saliency of the input image. A region surrounding the pixel with the highest gray-scale value is taken as the most salient region.