1. Field of the Invention
The present invention relates generally to systems and methods for embedding audio information in pictures and video images.
2. Discussion of the Prior Art
Generally, in books, magazines, and other media that include still or picture images, there is no audio or sound that accompanies the still (picture) images. In the case of a picture of a seascape, for example, it would be desirable to provide for the viewer the accompaniment of sounds such as wind and ocean waves. Likewise, for a video image, there may be audio information embedded in a separate audio track for simultaneous playback, however, the video content itself does not contain any embedded sound information that can be played back while the image is shown.
It would be highly desirable to provide a sound encoding system and method that enables the embedding of audio information directly within a picture or video image itself, and enables the playback or audio presentation of the embedded audio information associated with the viewed picture or video image.
The present invention relates to a system and method for encoding sound information in pixel units of a picture or image, and particularly the pixel intensity. Small differences in pixel intensities are typically not detectable by the eye, however, can be detected by scanning devices that measure the intensity differences between closely located pixels in an image, which differences are used to generate encoded numbers which are mapped into sound representations (e.g., cepstra) that are capable of forming audio or sound.
According to a first embodiment, one can measure digital pixel values in numbers of intensity that follows after some decimal point. For example, a pixel intensity may be represented digitally (in bytes/bits) as a number, e.g., 2.3567, with the first two numbers representing intensity capable of being detected by a human eye. Remaining decimal numbers however, are very small and may be used to represent encoded sound/audio information. As an example of such an audio encoding technique, for a 256 color (or gray scale) display, there are 8 bits per pixel. Current high-end graphic display systems utilize 24 bits per pixel: e.g., 8 bits for red, 8 bits for green, and 8 bits for blue; resulting in 256 shades of red, green and blue which may be blended to form a continuum of colors. According to the invention, if 8 bits per pixel quality is acceptable, then using a 24 bits per pixel graphics system, there remains 16 bits left for which audio data may be represented. Thus, for an 1000xc3x971000 image there may be 16 Kbits for sound effects which amount is sufficient to represent short phrases or sound effects (assuming a standard representation of a speech waveform requires 8 Kbits/sec).
According to a second embodiment, audio information may be encoded in special pixels located in the picture or image, for example, at predetermined coordinates. These special pixels may have encoded sound information that may be detected by a scanner, however, are located at special coordinates in the image in a manner such that the overall viewing of the image is not affected.
In accordance with these embodiments, a scanning system is employed which enables a user to scan through the picture, for instance, with a scanning device which sends the pixel encoded sound information to a server system (via wireless connection, for example). The server system may include devices for reading the pixel encoded data and converting the converted data into audio (e.g., music, speech etc.) for playback and presentation through a playback device.
The pixel encoded sound information may additionally include xe2x80x9cmeta informationxe2x80x9d provided in a file format such as Speech Mark-up language (Speech ML) for use with a Conversational Browser.
Advantageously, the encoded information embedded in a picture may include device-control codes which may be scanned and retrieved form controlling a device.