Image sensors are semiconductor devices that capture and process light into electronic signals for forming still images or video. Their use has become prevalent in a variety of consumer, industrial, and scientific applications, including digital cameras and camcorders, hand-held mobile devices, webcams, medical applications, automotive applications, games and toys, security and surveillance, pattern recognition, and automated inspection, among others. The technology used to manufacture image sensors has continued to advance at a rapid pace.
There are two main types of image sensors available today: Charge-Coupled Device (“CCD”) sensors and Complementary Metal Oxide Semiconductor (“CMOS”) sensors. In either type of image sensor, a light gathering photosite is formed on a semiconductor substrate and arranged in a two-dimensional array. The photosites, generally referred to as picture elements or “pixels,” convert the incoming light into an electrical charge. The number, size, and spacing of the pixels determine the resolution of the images generated by the sensor.
Modern image sensors typically contain millions of pixels in the pixel array to provide high-resolution images. The image information captured in each pixel, e.g., raw pixel data in the Red, Green, and Blue (“RGB”) color space, is transmitted to an Image Signal Processor (“ISP”) or other Digital Signal Processor (“DSP”) where it is processed to generate a digital image.
Once generated digital images may be stored locally at the image sensor device and/or transferred or transmitted to other devices for future display or processing. For example, digital images generated with digital cameras or hand-held mobile devices are usually transferred or transmitted to a computer or other processing device having a larger memory. The processing device may be able to store, manipulate, and distribute thousands—if not millions—of digital images.
With so many digital images available, it becomes imperative to have applications in place that are able to effectively manage and process vast amounts of digital data. For example, at any given time, a user may be dealing with a variety of digital images that may require archival, identification, time-stamping, geo-stamping, searching, digital enhancement and restoration, segmentation, and/or compression, among other applications. Each application may work with a set of image formats for organizing and storing the visual data, ranging from raw image formats storing raw pixel data to GIF, TIFF, JPEG, and the like.
Managing digital images effectively often requires that some kind of image metadata, i.e., data about the image, be associated with the images. The metadata may be external to the visual data, such as in a header specified by the image format, or incorporated into the visual data itself, thereby allowing the metadata to be automatically accessible with the data. For example, steganography and digital watermarking techniques are typically used to embed a message, tag or code in a digital image to create an embedded image, i.e., a digital image incorporating an embedded code. The embedded code may include metadata such as the source of the image, the image title, copyright information, time-stamps, geo-stamps, and camera settings, among others.
In steganography techniques, the code is made imperceptible and can only be recovered by intended recipients. In digital watermarking techniques, the code may be imperceptible or visible in the image but it is made robust to potential attacks by intruders. Both techniques embed the code in a single or multiple locations in the image. The locations may be selected based on a perceptual or other criteria or on a key without which the code cannot be recovered. An alternative approach spreads the code throughout the image, so that any location in the image may contain some part of the code.
The code may be embedded in the spatial or frequency domain. Spatial-domain techniques directly alter the value of raw data pixels, while frequency-domain techniques alter frequency components of the image to incorporate the code in the image. Frequency-domain techniques may be more robust than spatial-domain techniques, which on the other hand, are less computational intensive and more suitable for applications where speed and power consumption are of crucial importance.
In general, any technique—whether in the spatial or frequency domain—for embedding a message, tag, or code in an image consists of three parts, such as (1) the code itself, (2) an embedding module for embedding the code in the image to generate an embedded image, and (3) a detection module for verifying and detecting the code in the embedded image. As described above, the code may include metadata associated with the image and may consist of a simple sequence of bits. The embedding module may incorporate the code in the spatial or frequency domain and may be implemented on-chip together with an image sensor array or on another chip co-located with the image sensor array or on a remote location. The detection module may be implemented in a processing device capable of receiving and processing embedded images to determine whether they contain an embedded code or whether the embedded code is present in a given embedded image.
For example, previous work on embedding a code in an image in the spatial-domain has included modifying the least significant bits of some or of all pixels in the image to incorporate the code and using pseudo-random numbers to determine the locations in an image for embedding the code, among others. Corresponding previous work for detecting the code in the embedded image has included extracting the code from the least significant bits of the affected pixels and using a key containing the seed of the pseudo-random numbers to identify the locations in the image where the code is embedded, among others.
These and other techniques for embedding a code in digital images are limited in that the code is typically independent of the contents of the image. That is, the code does not depend on whether the image contains a particular scene, object, or person, nor is it used to represent the image's contents. It is often desirable to represent an image based on its contents. For example, current image search techniques on the Internet are based on markup language tags describing the name of image files and the title of the images. Those tags are not, however, embedded in the images themselves and can only be used as part of the markup language.
In addition, techniques for embedding a code in digital images often do not record the location where the code is embedded. Not knowing the location results in the detection module having to use sophisticated techniques or to employ a key to extract the code from the digital image, which takes processing time and without a complete guarantee that the exact location is going to be determined. If the code location is not found or has been tampered with, those techniques fail completely.
Another limitation of the current embedding techniques lies in the fact that they are usually designed for use in a single application where they provide the most benefit, e.g., a digital rights management application. For example, a given embedding technique may not be able to be used to both reliably embed image metadata in an image and to identify the image contents.
Accordingly, it would be desirable to provide an apparatus and method for embedding recoverable data on a digital image that is dependent on features associated with the image and on the location where the data is embedded, and that is robust against tampering.