1. Field of the Invention
The present invention generally relates to visual attention prediction systems and methods, and more particularly to a learning-based visual attention prediction system and method for video signals.
2. Description of Related Art
Visual attention is an important characteristic of the five biological senses. It helps the human brain filter out excessive visual information and enables the eyes to focus on particular regions of interest. Visual attention has been a subject of research in neural science, physiology, psychology, and vision. Data gleaned from these studies can be used not only to greatly enrich current understandings of the psychophysical aspect of visual attention, but also to enhance the processing of the video signals.
The fixation points in an image usually attract the most attention. If the attended regions of the image can be predicted, the video signals of the more attractive regions can be detail-processed and visually more important areas can be better preserved in the coding process. A typical visual attention model consists of two parts: extraction of features and fusion of features. The feature maps are generated after feature extraction from the image, and the feature maps are then fused to form a saliency map by nonlinear fusion, linear fusion with equal weight, or linear fusion with dynamic weight. However, improper weight assignment in the feature fusion process or low-level features alone, such as color, orientation, etc., can result in perceptual mismatches between the estimated salience and the actual human fixation.
For the reason that conventional 3D imaging systems could not effectively predict visual attention, a need has arisen to propose a novel visual attention prediction system and method that can faithfully and easily predict visual attention.