Unless otherwise indicated herein, the materials described in this section are not prior art to the claims in this application and are not admitted to be prior art by inclusion in this section.
FIG. 1 shows a block diagram that illustrates a system 10 including a computer system 11 and an associated Internet 22 connection. Such configuration is typically used for computers (hosts) connected to the Internet 22 and executing a server or a client (or a combination) software. The computer system 11 may be used as a portable electronic device such as a notebook/laptop computer, a media player (e.g., MP3 based or video player), a desktop computer, a laptop computer, a cellular phone, a Personal Digital Assistant (PDA), an image processing device (e.g., a digital camera or video recorder), any other handheld or fixed location computing devices, or a combination of any of these devices. Note that while FIG. 1 illustrates various components of the computer system 11, it is not intended to represent any particular architecture or manner of interconnecting the components; as such details are not germane. It is also be appreciated that network computers, handheld computers, cell phones and other data processing systems that have fewer components or perhaps more components may also be used. For example, the computer of FIG. 1 may be an Apple Macintosh computer or a Power Book, or an IBM compatible PC. The computer system 11 includes a bus 13, an interconnect, or other communication mechanism for communicating information, and a processor 127, commonly in the form of an integrated circuit, coupled to the bus 13 for processing information and for executing the computer executable instructions. The computer system 11 also includes a main memory 125a, such as a Random Access Memory (RAM) or other dynamic storage device, coupled to the bus 13 for storing information and instructions to be executed by the processor 127. The main memory 125a also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by the processor 127. The computer system 11 further includes a Read Only Memory (ROM) 125b (or other non-volatile memory) or other static storage device coupled to the bus 13 for storing static information and instructions for the processor 127. A storage device 125c, that may be a magnetic disk or optical disk, such as a hard disk drive (HDD) for reading from and writing to a hard disk, a magnetic disk drive for reading from and writing to a magnetic disk, and/or an optical disk drive (such as DVD) for reading from and writing to a removable optical disk, is coupled to the bus 13 for storing information and instructions. The hard disk drive, magnetic disk drive, and optical disk drive may be connected to the system bus 13 by a hard disk drive interface, a magnetic disk drive interface, and an optical disk drive interface, respectively. The drives and their associated computer-readable media provide non-volatile storage of computer readable instructions, data structures, program modules and other data for the general-purpose computing devices. Typically, the computer system 11 includes an Operating System (OS) stored in a non-volatile storage 125b for managing the computer resources and provides the applications and programs with an access to the computer resources and interfaces. An operating system commonly processes system data and user input, and responds by allocating and managing tasks and internal system resources, such as controlling and allocating memory, prioritizing system requests, controlling input and output devices, facilitating networking and managing files. Non-limiting examples of operating systems are Microsoft Windows, Mac OS X, and Linux.
The computer system 11 may be coupled via the bus 13 to a display 17, such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), a flat screen monitor, a touch screen monitor or similar means for displaying text and graphical data to a user. The display 17 may be connected via a video adapter for supporting the display. The display 17 allows a user to view, enter, and/or edit information that is relevant to the operation of the system 10. An input device 18, including alphanumeric and other keys, is coupled to the bus 13 for communicating information and command selections to the processor 127. Another type of user input device is a cursor control 19, such as a mouse, a trackball, or cursor direction keys for communicating direction information and command selections to the processor 127 and for controlling cursor movement on the display 17. This cursor control 19 typically has two degrees of freedom in two axes, a first axis (e.g., x) and a second axis (e.g., y), that allows the device to specify positions in a plane.
The computer system 11 may be used for implementing the methods and techniques described herein. According to one embodiment, these methods and techniques are performed by the computer system 11 in response to the processor 127 executing one or more sequences of one or more instructions contained in the main memory 125a. Such instructions may be read into the main memory 125a from another computer-readable medium, such as the storage device 125c. Execution of the sequences of instructions contained in the main memory 125a causes the processor 127 to perform the process steps described herein. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions to implement the arrangement. Thus, embodiments of the invention are not limited to any specific combination of hardware circuitry and software.
The term “processor” is used herein to include, but not limited to, any integrated circuit or any other electronic device (or collection of electronic devices) capable of performing an operation on at least one instruction, including, without limitation, a microprocessor (μP), a microcontroller (μC), a Digital Signal Processor (DSP), or any combination thereof. A processor, such as the processor 127, may further be a Reduced Instruction Set Core (RISC) processor, a Complex Instruction Set Computing (CISC) microprocessor, a Microcontroller Unit (MCU), or a CISC-based Central Processing Unit (CPU). The hardware of the processor 127 may be integrated onto a single substrate (e.g., silicon “die”), or distributed among two or more substrates. Furthermore, various functional aspects of a processor 127 may be implemented solely as a software (or a firmware) associated with the processor 127.
The terms “memory” and “storage” are used interchangeably herein and refer to any physical component that can retain or store information (that can be later retrieved) such as digital data on a temporary or permanent basis, typically for use in a computer or other digital electronic device. A memory can store computer programs or any other sequence of computer readable instructions, or data, such as files, text, numbers, audio and video, as well as any other form of information represented as a string or structure of bits or bytes. The physical means of storing information may be electrostatic, ferroelectric, magnetic, acoustic, optical, chemical, electronic, electrical, or mechanical. A memory may be in a form of an Integrated Circuit (IC, a.k.a. chip or microchip). Alternatively or in addition, a memory may be in the form of a packaged functional assembly of electronic components (module). Such module may be based on a Printed Circuit Board (PCB) such as PC Card according to Personal Computer Memory Card International Association (PCMCIA) PCMCIA 2.0 standard, or a Single In-line Memory Module (SIMM) or a Dual In-line Memory Module (DIMM), standardized under the JEDEC JESD-21C standard. Further, a memory may be in the form of a separately rigidly enclosed box such as an external Hard-Disk Drive (HDD). Capacity of a memory is commonly featured in bytes (B), where the prefix ‘K’ is used to denote kilo=210=10241=1024, the prefix ‘M’ is used to denote mega=220=10242=1,048,576, the prefix ‘G’ is used to denote Giga=230=10243=1,073,741,824, and the prefix ‘T’ is used to denote tera=240=10244=1,099,511,627,776.
Various forms of computer-readable media may be involved in carrying one or more sequences of one or more instructions to the processor 127 for execution. For example, the instructions may initially be carried on a magnetic disk of a remote computer. The remote computer can load the instructions into its dynamic memory and send the instructions over a telephone line using a modem. A modem local to the computer system 11 can receive the data on the telephone line and use an infrared transmitter to convert the data to an infrared signal. An infrared detector can receive the data carried in the infrared signal and appropriate circuitry can place the data on the bus 13. The bus 13 carries the data to the main memory 125a, from which the processor 127 retrieves and executes the instructions. The instructions received by the main memory 125a may optionally be stored on the storage device 125c either before or after execution by the processor 127.
The computer system 11 commonly includes a communication interface 129 coupled to the bus 13. The communication interface 129 provides a two-way data communication coupling to a network link 128 that is connected to a Local Area Network (LAN) 24. For example, the communication interface 129 may be an Integrated Services Digital Network (ISDN) card or a modem to provide a data communication connection to a corresponding type of telephone line. As another non-limiting example, the communication interface 129 may be a Local Area Network (LAN) card to provide a data communication connection to a compatible LAN. For example, Ethernet based connection based on IEEE802.3 standard may be used, such as 10/100BaseT, 1000BaseT (gigabit Ethernet), 10 gigabit Ethernet (10GE or 10 GbE or 10 GigE per IEEE Std. 802.3ae-2002as standard), 40 Gigabit Ethernet (40 GbE), or 100 Gigabit Ethernet (100 GbE as per Ethernet standard IEEE P802.3ba). These technologies are described in Cisco Systems, Inc. Publication number 1-587005-001-3 (June 1999), “Internetworking Technologies Handbook”, Chapter 7: “Ethernet Technologies”, pages 7-1 to 7-38, which is incorporated in its entirety for all purposes as if fully set forth herein. In such a case, the communication interface 129 typically includes a LAN transceiver or a modem, such as Standard Microsystems Corporation (SMSC) LAN91C111 10/100 Ethernet transceiver, described in the Standard Microsystems Corporation (SMSC) data-sheet “LAN91C111 10/100 Non-PCI Ethernet Single Chip MAC+PHY” Data-Sheet, Rev. 15 (Feb. 20, 2004), which is incorporated in its entirety for all purposes as if fully set forth herein.
An Internet Service Provider (ISP) 26 is an organization that provides services for accessing, using, or participating in the Internet 22. The Internet Service Provider 26 may be organized in various forms, such as commercial, community-owned, non-profit, or otherwise privately owned. Internet services, typically provided by ISPs, include Internet access, Internet transit, domain name registration, web hosting, and colocation. Various ISP Structures are described in Chapter 2: “Structural Overview of ISP Networks” of the book entitled: “Guide to Reliable Internet Services and Applications”, by Robert D. Doverspike, K. K. Ramakrishnan, and Chris Chase, published 2010 (ISBN: 978-1-84882-827-8), which is incorporated in its entirety for all purposes as if fully set forth herein.
An arrangement 20 of a computer system connected to the Internet 22 is shown in FIG. 2. A computer system or a workstation 27 is shown, including a main unit box 28, which encloses a motherboard on which the processor 127 and the memories 125a, 125b, and 125c are typically mounted. The workstation 27 includes a keyboard 212 (corresponding to the input device 18), a printer 211, a computer mouse (corresponding to the cursor control 19), and a display 29 (corresponding to the display 17). FIG. 2 illustrates various devices connected via the Internet 22, such as client device #1 21a, client device #2 21b, data server #1 23a, data server #2 23b, and the workstation 27, connected to the Internet 22 via the router or gateway 25 and the ISP 26.
Operating system. An Operating System (OS) is software that manages computer hardware resources and provides common services for computer programs. The operating system is an essential component of any system software in a computer system, and most application programs usually require an operating system to function. For hardware functions such as input/output and memory allocation, the operating system acts as an intermediary between programs and the computer hardware, although the application code is usually executed directly by the hardware and will frequently make a system call to an OS function or be interrupted by it. Common features typically supported by operating systems include process management, interrupts handling, memory management, file system, device drivers, networking (such as TCP/IP and UDP), and Input/Output (I/O) handling. Examples of popular modern operating systems include Android, BSD, iOS, Linux, OS X, QNX, Microsoft Windows, Windows Phone, and IBM z/OS.
A camera 30 shown in FIG. 3 may be a digital still camera which converts captured image into an electric signal upon a specific control, or can be a video camera, wherein the conversion between captured images to the electronic signal is continuous (e.g., 24 frames per second). The camera 30 is preferably a digital camera, wherein the video or still images are converted using an electronic image sensor 32. The digital camera 30 includes a lens 71 (or few lenses) for focusing the received light onto the small semiconductor image sensor 32. The image sensor 32 commonly includes a panel with a matrix of tiny light-sensitive diodes (photocells), converting the image light to electric charges and then to electric signals, thus creating a video picture or a still image by recording the light intensity. Charge-Coupled Devices (CCD) and CMOS (Complementary Metal-Oxide-Semiconductor) are commonly used as the light-sensitive diodes. Linear or area arrays of light-sensitive elements may be used, and the light sensitive sensors may support monochrome (black & white), color or both. For example, the CCD sensor KAI-2093 Image Sensor 1920 (H)×1080 (V) Interline CCD Image Sensor or KAF-50100 Image Sensor 8176 (H)×6132 (V) Full-Frame CCD Image Sensor can be used, available from Image Sensor Solutions, Eastman Kodak Company, Rochester, N.Y.
An image processor block 33 receives the analog signal from the image sensor 32. The Analog Front End (AFE) in the block 33 filters, amplifies, and digitizes the signal, using an analog-to-digital (A/D) converter. The AFE further provides Correlated Double Sampling (CDS), and provides a gain control to accommodate varying illumination conditions. In the case of a CCD-based sensor 32, a CCD AFE (Analog Front End) component may be used between the digital image processor 33 and the sensor 32. Such an AFE may be based on VSP2560 ‘CCD Analog Front End for Digital Cameras’ from Texas Instruments Incorporated of Dallas, Tex., U.S.A. The block 33 further contains a digital image processor, which receives the digital data from the AFE, and processes this digital representation of the image to handle various industry-standards, and to execute various computations and algorithms. Preferably, additional image enhancements may be performed by the block 33 such as generating greater pixel density or adjusting color balance, contrast, and luminance. Further, the block 33 may perform other data management functions and processing on the raw digital image data. Commonly, the timing relationship of the vertical/horizontal reference signals and the pixel clock are also handled in this block. Digital Media System-on-Chip device TMS320DM357 from Texas Instruments Incorporated of Dallas, Tex., U.S.A. is an example of a device implementing in a single chip (and associated circuitry) part or all of the image processor 33, part or all of a video compressor 34 and part or all of a transceiver 35. In addition to a lens or lens system, color filters may be placed between the imaging optics and the photosensor array 32 to achieve desired color manipulation.
The processing block 33 converts the raw data received from the photosensor array 32 (which can be any internal camera format, including before or after Bayer translation) into a color-corrected image in a standard image file format. The camera 30 further comprises a connector 39a, and a transmitter or a transceiver 35 is disposed between the connector 39a and the image processor 33. The transceiver 35 also includes isolation magnetic components (e.g. transformer-based), balancing, surge protection, and other suitable components required for providing a proper and standard interface via the connector 39a. In the case of connecting to a wired medium, the connector 39 further contains protection circuitry for accommodating transients, over-voltage and lightning, and any other protection means for reducing or eliminating the damage from an unwanted signal over the wired medium. A band pass filter may also be used for passing only the required communication signals, and rejecting or stopping other signals in the described path. A transformer may be used for isolating and reducing common-mode interferences. Further a wiring driver and wiring receivers may be used in order to transmit and receive the appropriate level of signal to and from the wired medium. An equalizer may also be used in order to compensate for any frequency dependent characteristics of the wired medium.
Other image processing functions performed by the image processor 33 may include adjusting color balance, gamma and luminance, filtering pattern noise, filtering noise using Wiener filter, changing zoom factors, recropping, applying enhancement filters, applying smoothing filters, applying subject-dependent filters, and applying coordinate transformations. Other enhancements in the image data may include applying mathematical algorithms to generate greater pixel density or adjusting color balance, contrast and/or luminance.
The image processing may further include an algorithm for motion detection by comparing the current image with a reference image and counting the number of different pixels, where the image sensor 32 or the digital camera 30 are assumed to be in a fixed location and thus assumed to capture the same image. Since images are naturally differ due to factors such as varying lighting, camera flicker, and CCD dark currents, pre-processing is useful to reduce the number of false positive alarms. Algorithms that are more complex are necessary to detect motion when the camera itself is moving, or when the motion of a specific object must be detected in a field containing other movement that can be ignored.
The image processing may further include video enhancement such as video denoising, image stabilization, unsharp masking, and super-resolution. Further, the image processing may include a Video Content Analysis (VCA), where the video content is analyzed to detect and determine temporal events based on multiple images, and is commonly used for entertainment, healthcare, retail, automotive, transport, home automation, safety and security. The VCA functionalities include Video Motion Detection (VIVID), video tracking, and egomotion estimation, as well as identification, behavior analysis, and other forms of situation awareness. A dynamic masking functionality involves blocking a part of the video signal based on the signal itself, for example because of privacy concerns. The egomotion estimation functionality involves the determining of the location of a camera or estimating the camera motion relative to a rigid scene, by analyzing its output signal. Motion detection is used to determine the presence of a relevant motion in the observed scene, while an object detection is used to determine the presence of a type of object or entity, for example a person or car, as well as fire and smoke detection. Similarly, face recognition and Automatic Number Plate Recognition may be used to recognize, and therefore possibly identify persons or cars. Tamper detection is used to determine whether the camera or the output signal is tampered with, and video tracking is used to determine the location of persons or objects in the video signal, possibly with regard to an external reference grid. A pattern is defined as any form in an image having discernible characteristics that provide a distinctive identity when contrasted with other forms. Pattern recognition may also be used, for ascertaining differences, as well as similarities, between patterns under observation and partitioning the patterns into appropriate categories based on these perceived differences and similarities; and may include any procedure for correctly identifying a discrete pattern, such as an alphanumeric character, as a member of a predefined pattern category. Further, the video or image processing may use, or be based on, the algorithms and techniques disclosed in the book entitled: “Handbook of Image & Video Processing”, edited by Al Bovik, by Academic Press, ISBN: 0-12-119790-5, which is incorporated in its entirety for all purposes as if fully set forth herein.
A controller 37, located within the camera device or module 30, may be based on a discrete logic or an integrated device, such as a processor, microprocessor or microcomputer, and may include a general-purpose device or may be a special purpose processing device, such as an ASIC, PAL, PLA, PLD, Field Programmable Gate Array (FPGA), Gate Array, or other customized or programmable device. In the case of a programmable device as well as in other implementations, a memory is required. The controller 37 commonly includes a memory that may include a static RAM (random Access Memory), dynamic RAM, flash memory, ROM (Read Only Memory), or any other data storage medium. The memory may include data, programs, and/or instructions and any other software or firmware executable by the processor. Control logic can be implemented in hardware or in software, such as a firmware stored in the memory. The controller 37 controls and monitors the device operation, such as initialization, configuration, interface, and commands. The term “processor” is meant to include any integrated circuit or other electronic device (or collection of devices) capable of performing an operation on at least one instruction including, without limitation, reduced instruction set core (RISC) processors, CISC microprocessors, microcontroller units (MCUs), CISC-based central processing units (CPUs), and digital signal processors (DSPs). The hardware of such devices may be integrated onto a single substrate (e.g., silicon “die”), or distributed among two or more substrates. Furthermore, various functional aspects of the processor may be implemented solely as software or firmware associated with the processor.
The digital camera device or module 30 requires power for its described functions such as for capturing, storing, manipulating, and transmitting the image. A dedicated power source may be used such as a battery or a dedicated connection to an external power source via connector 39b. The power supply 38 contains a DC/DC converter. In another embodiment, the power supply 38 is power fed from the AC power supply via AC plug as the connector 39b and a cord, and thus may include an AC/DC converter, for converting the AC power (commonly 115 VAC/60 Hz or 220 VAC/50 Hz) into the required DC voltage or voltages. Such power supplies are known in the art and typically involves converting 120 or 240 volt AC supplied by a power utility company to a well-regulated lower voltage DC for electronic devices. In one embodiment, the power supply 38 is integrated into a single device or circuit, in order to share common circuits. Further, the power supply 38 may include a boost converter, such as a buck boost converter, charge pump, inverter and regulators as known in the art, as required for conversion of one form of electrical power to another desired form and voltage. While the power supply 38 (either separated or integrated) can be an integral part and housed within the camera 30 enclosure, it may be enclosed as a separate housing connected via cable to the camera 30 assembly. For example, a small outlet plug-in step-down transformer shape can be used (also known as wall-wart, “power brick”, “plug pack”, “plug-in adapter”, “adapter block”, “domestic mains adapter”, “power adapter”, or AC adapter). Further, the power supply 38 may be a linear or switching type.
Various formats that can be used to represent the captured image are TIFF (Tagged Image File Format), RAW format, AVI, DV, MOV, WMV, MP4, DCF (Design Rule for Camera Format), ITU-T H.261, ITU-T H.263, ITU-T H.264, ITU-T CCIR 601, ASF, Exif (Exchangeable Image File Format), and DPOF (Digital Print Order Format) standards. In many cases, video data is compressed before transmission, in order to allow its transmission over a reduced bandwidth transmission system. A video compressor 34 (or video encoder) is shown in FIG. 3 disposed between the image processor 33 and the transceiver 35, allowing for compression of the digital video signal before its transmission over a cable or over-the-air. In some cases, compression may not be required, hence obviating the need for such compressor 34. Such compression can be lossy or lossless types. Common compression algorithms are JPEG (Joint Photographic Experts Group) and MPEG (Moving Picture Experts Group). The above and other image or video compression techniques can make use of intraframe compression commonly based on registering the differences between part of single frame or a single image. Interframe compression can further be used for video streams, based on registering differences between frames. Other examples of image processing include run length encoding and delta modulation. Further, the image can be dynamically dithered to allow the displayed image to appear to have higher resolution and quality.
The single lens or a lens array 31 is positioned to collect optical energy representative of a subject or a scenery, and to focus the optical energy onto the photosensor array 32. Commonly, the photosensor array 32 is a matrix of photosensitive pixels, which generates an electric signal that is a representative of the optical energy directed at the pixel by the imaging optics.
While the digital camera 30 has been exampled above with regard to capturing a single image using the single lens 31 and the single sensor 32, it is apparent that multiple images can be equally considered, using multiple image capturing mechanisms. An example of two capturing mechanisms is shown for a digital camera 40 shown in FIG. 4. Lenses 31a and 31b are respectively associated with sensors 32a and 32b, which in turn respectively connects to image processors 33a and 33b. In the case where a compression function is used, video compressors 34a and 34b, respectively, compress the data received from the processors 33a and 33b. In one embodiment, two transceivers (each of the same as transceiver 35, for example) and two ports (each of the same type as port 39a, for example) are used. Further, two communication mediums (each similar or the same as described above) can be employed, each carrying the image corresponding to the respective lens. Further, the same medium can be used using Frequency Division/Domain Multiplexing (FDM). In such an environment, each signal is carried in a dedicated frequency band, distinct from the other signals concurrently carried over the same medium. The signals are combined onto the medium and separated from the medium using various filtering schemes, employed in a multiplexer 41. In another embodiment, the multiple images are carried using Time Domain/Division Multiplexing (TDM). The digital data stream from the video compressors 34a and 34b is multiplexed into a single stream by the multiplexer 41, serving as a time multiplexer. The combined signal is then fed to the single transceiver 35 for transmitting onto the medium. Using two or more image-capturing components can further be used to provide stereoscopic video, allowing 3-D or any other stereoscopic view of the content, or other methods of improving the displayed image quality of functionality.
A prior art example of a portable electronic camera connectable to a computer is disclosed in U.S. Pat. No. 5,402,170 to Parulski et al. entitled: “Hand-Manipulated Electronic Camera Tethered to a Personal Computer”. A digital electronic camera which can accept various types of input/output cards or memory cards is disclosed in U.S. Pat. No. 7,432,952 to Fukuoka entitled: “Digital Image Capturing Device having an Interface for Receiving a Control Program”, and the use of a disk drive assembly for transferring images out of an electronic camera is disclosed in U.S. Pat. No. 5,138,459 to Roberts et al., entitled: “Electronic Still Video Camera with Direct Personal Computer (PC) Compatible Digital Format Output”, which are all incorporated in their entirety for all purposes as if fully set forth herein. A camera with human face detection means is disclosed in U.S. Pat. No. 6,940,545 to Ray et al., entitled: “Face Detecting Camera and Method”, and in U.S. Patent Application Publication No. 2012/0249768 to Binder entitled: “System and Method for Control Based on Face or Hand Gesture Detection”, which are both incorporated in their entirety for all purposes as if fully set forth herein.
Face detection (also known as face localization) includes algorithms for identifying a group of pixels within a digitally acquired image that relates to the existence, locations and sizes of human faces. Common face-detection algorithms focused on the detection of frontal human faces, and other algorithms attempt to solve the more general and difficult problem of multi-view face detection. That is, the detection of faces that are either rotated along the axis from the face to the observer (in-plane rotation), or rotated along the vertical or left-right axis (out-of-plane rotation), or both. Various face detection techniques and devices (e.g. cameras) having face detection features are disclosed in U.S. Pat. RE33,682, RE31,370, U.S. Pat. Nos. 4,047,187, 4,317,991, 4,367,027, 4,638,364, 5,291,234, 5,386,103, 5,488,429, 5,638,136, 5,642,431, 5,710,833, 5,724,456, 5,781,650, 5,812,193, 5,818,975, 5,835,616, 5,870,138, 5,978,519, 5,987,154, 5,991,456, 6,097,470, 6,101,271, 6,128,397, 6,148,092, 6,151,073, 6,188,777, 6,192,149, 6,249,315, 6,263,113, 6,268,939, 6,282,317, 6,301,370, 6,332,033, 6,393,148, 6,404,900, 6,407,777, 6,421,468, 6,438,264, 6,456,732, 6,459,436, 6,473,199, 6,501,857, 6,504,942, 6,504,951, 6,516,154, 6,526,161, 6,940,545, 7,110,575, 7,315,630, 7,317,815, 7,466,844, 7,466,866 and 7,508,961, which are all incorporated in its entirety for all purposes as if fully set forth herein.
Image. A digital image is a numeric representation (normally binary) of a two-dimensional image. Depending on whether the image resolution is fixed, it may be of a vector or raster type. Raster images have a finite set of digital values, called picture elements or pixels. The digital image contains a fixed number of rows and columns of pixels, which are the smallest individual element in an image, holding quantized values that represent the brightness of a given color at any specific point. Typically, the pixels are stored in computer memory as a raster image or raster map, a two-dimensional array of small integers, where these values are commonly transmitted or stored in a compressed form. The raster images can be created by a variety of input devices and techniques, such as digital cameras, scanners, coordinate-measuring machines, seismographic profiling, airborne radar, and more. Common image formats include GIF, JPEG, and PNG.
The Graphics Interchange Format (better known by its acronym GIF) is a bitmap image format that supports up to 8 bits per pixel for each image, allowing a single image to reference its palette of up to 256 different colors chosen from the 24-bit RGB color space. It also supports animations and allows a separate palette of up to 256 colors for each frame. GIF images are compressed using the Lempel-Ziv-Welch (LZW) lossless data compression technique to reduce the file size without degrading the visual quality. The GIF (GRAPHICS INTERCHANGE FORMAT) Standard Version 89a is available from www.w3.org/Graphics/GIF/spec-gif89a.txt.
JPEG (seen most often with the .jpg or peg filename extension) is a commonly used method of lossy compression for digital images, particularly for those images produced by digital photography. The degree of compression can be adjusted, allowing a selectable tradeoff between storage size and image quality and typically achieves 10:1 compression with little perceptible loss in image quality. JPEG/Exif is the most common image format used by digital cameras and other photographic image capture devices, along with JPEG/JFIF. The term “JPEG” is an acronym for the Joint Photographic Experts Group, which created the standard. JPEG/JFIF supports a maximum image size of 65535×65535 pixels—one to four gigapixels (1000 megapixels), depending on the aspect ratio (from panoramic 3:1 to square). JPEG is standardized under as ISO/IEC 10918-1:1994 entitled: “Information technology—Digital compression and coding of continuous-tone still images: Requirements and guidelines”. 
Portable Network Graphics (PNG) is a raster graphics file format that supports lossless data compression that was created as an improved replacement for Graphics Interchange Format (GIF), and is the commonly used lossless image compression format on the Internet. PNG supports palette-based images (with palettes of 24-bit RGB or 32-bit RGBA colors), grayscale images (with or without alpha channel), and full-color non-palette-based RGB images (with or without alpha channel). PNG was designed for transferring images on the Internet, not for professional-quality print graphics, and, therefore, does not support non-RGB color spaces such as CMYK. PNG was published as an ISO/IEC15948:2004 standard entitled: “Information technology—Computer graphics and image processing—Portable Network Graphics (PNG): Functional specification”. 
Metadata. The term “metadata”, as used herein, refers to data that describes characteristics, attributes, or parameters of other data, in particular, files (such as program files) and objects. Such data is typically structured information that describes, explains, locates, and otherwise makes it easier to retrieve and use an information resource. Metadata typically includes structural metadata, relating to the design and specification of data structures or “data about the containers of data”; and descriptive metadata about individual instances of application data or the data content. Metadata may include the means of creation of the data, the purpose of the data, time and date of creation, the creator or author of the data, the location on a computer network where the data were created, and the standards used.
For example, metadata associated with a computer word processing file might include the title of the document, the name of the author, the company to whom the document belongs, the dates that the document was created and last modified, keywords which describe the document, and other descriptive data. While some of this information may also be included in the document itself (e.g., title, author, and data), metadata may be a separate collection of data that may be stored separately from, but associated with, the actual document. One common format for documenting metadata is eXtensible Markup Language (XML). XML provides a formal syntax, which supports the creation of arbitrary descriptions, sometimes called “tags.” An example of a metadata entry might be <title>War and Peace</title>, where the bracketed words delineate the beginning and end of the group of characters that constitute the title of the document that is described by the metadata. In the example of the word processing file, the metadata (sometimes referred to as “document properties”) is entered manually by the author, the editor, or the document manager. The metadata concept is further described in a National Information Standards Organization (NISO) Booklet entitled: “Understanding Metadata” (ISBN: 1-880124-62-9), in the IETF RFC 5013 entitled: “The Dublin Core Metadata Element Set”, and in the IETF RFC 2731 entitled: “Encoding Dublin Core Metadata in HTML”, which are all incorporated in their entirety for all purposes as if fully set forth herein. An extraction of metadata from files or objects is described in a U.S. Pat. No. 8,700,626 to Bedingfield, entitled: “Systems, Methods and Computer Products for Content-Derived Metadata”, and in a U.S. Patent Application Publication 2012/0278705 to Yang et al., entitled: “System and Method for Automatically Extracting Metadata from Unstructured Electronic Documents”, which are both incorporated in their entirety for all purposes as if fully set forth herein.
Metadata can be stored either internally in the same file, object, or structure as the data (this is also called internal or embedded metadata), or externally in a separate file or field separated from the described data. A data repository typically stores the metadata detached from the data, but can be designed to support embedded metadata approaches. Metadata can be stored in either human-readable or binary form. Storing metadata in a human-readable format such as XML can be useful because users can understand and edit it without specialized tools, however, these formats are rarely optimized for storage capacity, communication time, and processing speed. A binary metadata format enables efficiency in all these respects, but requires special libraries to convert the binary information into a human-readable content.
Tag. A tag is a type of metadata relating to non-hierarchical keyword or term assigned to a digital image, describing the image and allows it to be found again by browsing or searching. Tags may be chosen informally and personally by the item's creator or by its viewer, depending on the system.
Color space. A color space is a specific organization of colors, allowing for reproducible representations of color, in both analog and digital representations. A color model is an abstract mathematical model describing the way colors can be represented as tuples of numbers (e.g., three tuples/channels in RGB or four in CMYK). When defining a color space, the usual reference standard is the CIELAB or CIEXYZ color spaces, which were specifically designed to encompass all colors the average human can see. Colors are commonly created in printing with color spaces based on the CMYK color model, using the subtractive primary colors of pigment (Cyan (C), Magenta (M), Yellow (Y), and Black (K)). To create a three-dimensional representation of a given color space, we can assign the amount of magenta color to the representation's X axis, the amount of cyan to its Y axis, and the amount of yellow to its Z axis. The resulting 3-D space provides a unique position for every possible color that can be created by combining those three pigments. Colors are typically created on computer monitors with color spaces based on the RGB color model, using the additive primary colors (red, green, and blue). A three-dimensional representation would assign each of the three colors to the X, Y, and Z axes. Popular color models include RGB, CMYK, HSL, YUV, YCbCr, and YPbPr color formats.
Color spaces and the various color space models are described in an article by Marko Tkalcic and Jurij F. Tasic of the University of Ljubljana, Slovenia entitled: “Colour spaces—perceptual, historical and applicational background”, and in the article entitled: “Color Space Basics” by Andrew Oran and Vince Roth, published May 2012, Issue 4 of the journal ‘The Tech Review’ by the Association of Moving Image Archivists, which are both incorporated in their entirety for all purposes as if fully set forth herein. Conversions between color spaces or models are described in an article entitled: “Colour Space Conversions” by Adrian Ford and Alan Roberts (Aug. 11, 1998), and in an article by Philippe Colantoni and Al dated 2004, entitled: “Color Space Transformations”, which are both incorporated in their entirety for all purposes as if fully set forth herein.
A color space maps a range of physically produced colors (from mixed light, pigments, etc.) to an objective description of color sensations registered in the eye, typically in terms of tristimulus values, but not usually in the LMS space defined by the cone spectral sensitivities. The tristimulus values associated with a color space can be conceptualized as amounts of three primary colors in a tri-chromatic additive color model. In some color spaces, including LMS and XYZ spaces, the primary colors used are not real colors, in the sense that they cannot be generated with any light spectrum.
CIE color space. The CIE 1931 standards were created by the International Commission on Illumination (CIE) in 1931 and include the CIE 1931 RGB, the CIE 1931 XYZ, the CIELUV, and the CIEUVW color models. When judging the relative luminance (brightness) of different colors in well-lit situations, humans tend to perceive light within the green parts of the spectrum as brighter than red or blue light of equal power. The luminosity function that describes the perceived brightnesses of different wavelengths is thus roughly analogous to the spectral sensitivity of M cones. The CIE model capitalizes on this fact by defining Y as luminance. Z is quasi-equal to blue stimulation, or the S cone response, and X is a mix (a linear combination) of cone response curves chosen to be nonnegative. The XYZ tristimulus values are thus analogous to, but different to, the LMS cone responses of the human eye. Defining Y as luminance has the useful result that for any given Y value, the XZ plane contains all possible chromaticities at that luminance. CIE color space is described in a paper by Gernot Hoffmann entitled: “CIE Color Space”, which is incorporated in its entirety for all purposes as if fully set forth herein.
RGB color space. RGB is an abbreviation for Red-Green-Blue. An RGB color space is any additive color space based on the RGB color model. A particular RGB color space is defined by the three chromaticities of the red, green, and blue additive primaries, and can produce any chromaticity by the triangle defined by those primary colors. The complete specification of an RGB color space also requires a white point chromaticity and a gamma correction curve. RGB (Red, Green, and Blue) describes what kind of light needs to be emitted to produce a given color. Light is added together to create form from darkness. RGB stores individual values for red, green and blue. RGB is a color model, and there are many different RGB color spaces derived from this color model, such as RGBA that is RGB with an additional channel, alpha, to indicate transparency. RGB color spaces are described in an article published by The BabelColor Company by Danny Pascale (Revised 2003 Oct. 6) entitled: “A Review of RGB Color Spaces . . . from xyY to R′G′B′”, and in an article by Sabine Susstrunk, Robert Buckley, and Steve Swen from the Laboratory of audio-visual Communication (EPFL) entitled: “Standard RGB Color Spaces”, which are both incorporated in their entirety for all purposes as if fully set forth herein. The RGB color space includes the RGB, sRGB, Adobe RGB, Adobe Wide Gamut RGB, ProPhoto RGB color space, Apple RGB, ISO RGB, ROMM RGB, International Telecommunication Union (ITU) Radiocommunication Sector (ITU-R) Recommendation ITU-R BT.709, and ITU-R BT.202.
Luma plus chroma/chrominance (YUV). Some color spaces are based on separating the component (Y) that represents the luma information, from the components (U+V, or I+Q) that represent the chrominance information. YUV is a color space typically used as part of a color image pipeline, where it encodes a color image or video taking human perception into account, allowing reduced bandwidth for chrominance components, thereby typically enabling transmission errors or compression artifacts to be more efficiently masked by the human perception than using a “direct” RGB-representation. Other color spaces have similar properties, and the main reason to implement or investigate properties of Y′UV would be for interfacing with analog or digital television or photographic equipment that conforms to certain Y′UV standards.
The Y′UV model defines a color space in terms of one luma (Y′) and two chrominance (UV) components. The Y′UV color model is used in the PAL and SECAM composite color video standards. Previous black-and-white systems used only luma (Y′) information. Color information (U and V) was added separately via a sub-carrier so that a black-and-white receiver would still be able to receive and display a color picture transmission in the receiver's native black-and-white format. Y′ stands for the luma component (the brightness) and U and V are the chrominance (color) components; luminance is denoted by Y and luma by Y′—the prime symbols (′) denote gamma compression, with “luminance” meaning perceptual (color science) brightness, while “luma” is electronic (voltage of display) brightness. The YPbPr color model used in analog component video and its digital version YCbCr used in digital video are more or less derived from it, and are sometimes called Y′UV. (CB/PB and CR/PR are deviations from grey on blue-yellow and red-cyan axes, whereas U and V are blue-luminance and red-luminance differences.) The Y′IQ color space used in the analog NTSC television broadcasting system is related to it, although in a more complex way. YCbCr, Y′CbCr, or Y Pb/Cb Pr/Cr, also written as YCBCR or Y′CBCR, is a family of color spaces used as a part of the color image pipeline in video and digital photography systems. Y′ is the luma component and CB and CR are the blue-difference and red-difference chroma components. Y′ (with prime) is distinguished from Y, which is luminance, meaning that light intensity is nonlinearly encoded based on gamma corrected RGB primaries. Color models based on the YUV color space include YUV (used in PAL), YDbDr (used in SECAM), YIQ (used in NTSC), YCbCr (described in ITU-R BT.601, BT.709, and BT.2020), YPbPr, xvYCC, and YCgCo. The YUV family is further described in an article published in the International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 Vol. 2, Issue 2, March-April 2012, pp. 152-156 by Gnanathej a Rakesh and Sreenivasulu Reddy of the University College of Engineering, Tirupati, entitled: “YCoCg color Image Edge detection”, which is incorporated in its entirety for all purposes as if fully set forth herein.
Hue and saturation. HSL (Hue-Saturation-Lightness) and HSV (Hue-Aaturation-Value) are the two most common cylindrical-coordinate representations of points in an RGB color model, and are commonly used today in color pickers, in image editing software, and in image analysis and computer vision. The two representations rearrange the geometry of RGB in an attempt to be more intuitive and perceptually relevant than the Cartesian (cube) representation, by mapping the values into a cylinder loosely inspired by a traditional color wheel. The angle around the central vertical axis corresponds to “hue” and the distance from the axis corresponds to “saturation”. These first two values give the two schemes the ‘H’ and ‘S’ in their names. The height corresponds to a third value, the system's representation of the perceived luminance in relation to the saturation.
Perceived luminance is a notoriously difficult aspect of color to represent in a digital format (see disadvantages section), and this has given rise to two systems attempting to solve this issue: HSL (L for lightness) and HSV or HSB (V for value or B for brightness). A third model, HSI (I for intensity), common in computer vision applications, attempts to balance the advantages and disadvantages of the other two systems. While typically consistent, these definitions are not standardized. HSV and HSL color models are described in an article by Darrin Cardanu entitled: “Adventures in HSV Space”, and in an article by Douglas A. Kerr (Issue 3, May 12, 2008) entitled: “The HSV and HSL Color Models and the Infamous Hexcones”, which are both incorporated in their entirety for all purposes as if fully set forth herein.
Optical flow. Optical flow or optic flow is the pattern of apparent motion of objects, surfaces, and edges in a visual scene caused by the relative motion between an observer (an eye or a camera) and the scene. Motion estimation and video compression schemes have developed as a major aspect of the optical flow research, and typically use gradient-based optical flow estimation. Motion estimation is the process of determining motion vectors that describe the transformation from one 2D image to another, usually from adjacent frames in a video sequence. The motion vectors may be represented by a translational model or by many other models that can approximate the motion of a real video camera, such as rotation and translation in all three dimensions and zoom. Optical flow techniques are described in an article by David J. Fleet and Yair Weiss entitled: “Optical Flow Estimation”, and in an article by J. L. Baron, D. J. Fleet, and S. S. Beauchemin entitled: “Performance of Optical Flow Techniques”, which are both incorporated in their entirety for all purposes as if fully set forth herein.
Digital photography is described in an article by Robert Berdan (downloaded from www.canadianphotographer.com) entitled: Digital Photography Basics for Beginners“, and in a guide published on April 2004 by Que Publishing (ISBN—0-7897-3120-7) entitled: Absolute Beginner's Guide to Digital Photography” authored by Joseph Ciaglia et al., which are both incorporated in their entirety for all purposes as if fully set forth herein.
Aperture. Camera aperture is the unit of measurement that defines the size of the opening in the lens (typically measured in f-stop), that can be adjusted to control the amount of light reaching the film or digital sensor, and determines the cone angle of a bundle of rays that come to a focus in the image plane. The aperture determines how collimated the admitted rays are, and if an aperture is narrow, then highly collimated rays are admitted, resulting in a sharp focus at the image plane, while if an aperture is wide, then uncollimated rays are admitted, resulting in a sharp focus only for rays with a certain focal length. Commonly, camera aperture refers to the diameter of the aperture stop rather than the physical stop or the opening itself. Most digital cameras provide automatic aperture control, which allows viewing and metering at the lens's maximum aperture, stops the lens down to the working aperture during exposure, and returns the lens to maximum aperture after exposure.
The aperture stop of a photographic lens can usually be adjusted to control the amount of light reaching the film or image sensor. In combination with variation of shutter speed, the aperture size regulates the image sensor degree of exposure to light. Typically, a fast shutter requires a larger aperture to ensure sufficient light exposure, and a slow shutter requires a smaller aperture to avoid excessive exposure. The lens aperture is usually specified as an f-number, the ratio of focal length to effective aperture diameter. A lens typically has a set of marked “f-stops” that the f-number can be set to. A lower f-number denotes a greater aperture opening which allows more light to reach the film or image sensor. The photography term “one f-stop” refers to a factor of √2 (approx. 1.41) change in f-number, which in turn corresponds to a factor of 2 change in light intensity. Typical ranges of apertures used in photography are about f/2.8-f/22 or f/2-f/16 covering 6 stops, which may be divided into wide, middle, and narrow of 2 stops each, roughly (using round numbers) f/2-f/4, f/4-f/8, and f/8-f/16 or (for a slower lens) f/2.8-f/5.6, f/5.6-f/11, and f/11-f/22.
Exposure Index (ISO). An Exposure Index (EI) rating (a.k.a. ISO setting) refers to a relationship between exposure and sensor data values, that can be achieved by setting the signal gain of the sensor, and is specified by a digital camera manufacturer such that the image files produced by the camera has a lightness similar to what would be obtained with film of the same EI rating at the same exposure. Commonly few EI choices are provided by adjusting the signal gain of an image sensor in the digital realm.
Focal length. An optical focus, also called an image point, is the point where light rays originating from a point on the object converge, and although the focus is conceptually a point, physically the focus has a spatial extent, called the blur circle. An image, or image point or region, is in focus if the light from object points is converged almost as much as possible in the image, and out of focus if the light is not well converged. For a lens, or a spherical or parabolic mirror, it is a point onto which collimated light parallel to the axis is focused. Since light can pass through a lens in either direction, a lens has two focal points—one on each side. The distance in air from the lens or mirror's principal plane to the focus is referred to as the focal length. A photographic lens for which the focus is not adjustable is called a fixed-focus lens or sometimes focus-free, and the focus is set at the time of manufacture, and remains fixed. It is usually set to the hyperfocal distance, so that the depth of field ranges all the way down from half that distance to infinity, which is acceptable for most cameras used for capturing images of humans or objects larger than a meter.
For an interchangeable lens camera, the Flange Focal Distance (FFD) (also known as the flange-to-film distance, flange focal depth, Flange Back Distance (FBD), Flange Focal Length (FFL), or register, depending on the usage and source) of a lens mount system is the distance from the mounting flange (the metal ring on the camera and the rear of the lens) to the film or sensor plane. This value is different for different camera systems, and the range of this distance which renders an image clearly in focus within all focal lengths is usually measured in hundredths of millimeters and is known as the depth of focus (not to be confused with the similarly named depth of field).
Autofocus (AF) systems rely on one or more sensors to determine correct focus. Some AF systems rely on a single sensor while others use an array of sensors. Most modern SLR cameras use through-the-lens optical AF sensors, with a separate sensor array providing light metering, although the latter can be programmed to prioritize its metering to the same area as one or more of the AF sensors. Through-the-lens optical autofocusing is often speedier and more precise than can be achieved manually with an ordinary viewfinder, although more precise manual focus can be achieved with special accessories such as focusing magnifiers. Autofocus accuracy within ⅓ of the depth of field (DOF) at the widest aperture of the lens is not uncommon in professional AF SLR cameras. Most multi-sensor AF cameras allow manual selection of the active sensor, and many offer an automatic selection of the sensor using algorithms that attempt to discern the location of the subject.
Exposure time. Shutter speed (or exposure time) is the length of time interval a digital camera shutter is open when taking a photograph, so that the amount of light that reaches the film or image sensor is proportional to the exposure time. The camera's shutter speed, the lens's brightness (f-number), and the scene's luminance together determine the amount of light that reaches the film or sensor (the exposure), and Exposure Value (EV) is a quantity that accounts for the shutter speed and the f-number. In addition to its effect on exposure, the shutter speed changes the way movement appears in photographs. Very short shutter speeds can be used to freeze fast-moving subjects, for example at sporting events while very long shutter speeds are used to intentionally blur a moving subject for artistic effect. Short exposure times are sometimes called “fast”, and long exposure times “slow”. Adjustment to the aperture controls the depth of field, the distance range over which objects are acceptably sharp; such adjustments need to be compensated by changes in the shutter speed. Shutter speed is one of several methods used to control the amount of light recorded by the camera's digital sensor or film, and may also be used to manipulate the visual effects of the final image beyond its luminosity.
A method and an apparatus for rating a captured image based on accessing a database of reference images that have an associated rating value, and selecting reference images to form a metadata-based subset of reference images, are described in U.S. Patent Application Publication No. 2012/0213445 to LUU et al., entitled: “Method, Apparatus, and System for Rating Images”, which is incorporated in its entirety for all purposes as if fully set forth herein. A method and an apparatus for disqualifying an unsatisfactory scene as an image acquisition control for a camera by analyzing mouth regions in an acquired image, are described in U.S. Pat. No. 8,265,348 to Steinberg et al., entitled: “Digital Image Acquisition Control and Correction Method and Apparatus”, which is incorporated in its entirety for all purposes as if fully set forth herein. An apparatus and a method for facilitating analysis of a digital image by using image recognition processing in a server, allowing for suggesting for meta-tagging the image by a user, are described in U.S. Pat. No. 8,558,921 to Walker et al., entitled: “Systems and Methods for Suggesting Meta-Information to a Camera User”, which is incorporated in its entirety for all purposes as if fully set forth herein.
Systems and methods for determining the location where an image was captured using a central system that compares the submitted images to images in an image library to identify matches are described in U.S. Pat. No. 8,131,118 to Jing et al., entitled: “Inferring Locations from an Image”, which is incorporated in its entirety for all purposes as if fully set forth herein. Further, methods for automatically rating and selecting digital photographs by estimating the importance of each photograph by analyzing its content as well as its metadata, are described in an article by Daniel Kormann, Peter Dunker, and Ronny Paduscheck, all of the Fraunhofer Institute for Digital Media in Ilmenau, Germany, entitled: “Automatic Rating and Selection of Digital Photographs”, which is incorporated in its entirety for all purposes as if fully set forth herein.
Various systems and methods are known for analyzing and for providing the user a feedback regarding the quality of a digital image captured by a digital camera. A processor within a digital camera, which generates and utilizes a recipe data file and communicates with a network-based storage location for uploading and downloading, is described in U.S. Patent Application Publication No. 2013/0050507 to Syed et al., entitled: “Recipe Based Real-Time Assistance for Digital Image Capture and Other Consumer Electronics Devices”, a method and system for determining effective policy profiles that includes client devices configured to initiate a request for at least one effective policy profile, a server mechanism communicatively coupled to the client devices and configured to receive the request, and a policy data storage component configured to store a plurality of policy profiles, are described in U.S. Patent Application Publication No. 2010/0268772 to Romanek et al., entitled: “System and Method for Determining Effective Policy Profiles in a Client-Server Architecture”, methods and apparatuses for analyzing, characterizing and/or rating composition of images and providing instructive feedback or automatic corrective actions are described in U.S. Patent Application Publication No. 2012/0182447 to Gabay entitled: “Methods, Circuits, Devices, Apparatuses and Systems for Providing Image Composition Rules, Analysis and Improvement”, an approach for providing device angle image correction where an image (e.g., still or moving) of a subject is captured via a camera of a mobile device is described in U.S. Patent Application Publication No. 2013/0063538 to Hubner et al., entitled: “Method and Apparatus for Providing Device Angle Image Correction”, an apparatus and an associated method that facilitate capturing an image in an electronic camera with the image being completely focused are described in U.S. Patent Application Publication No. 2012/0086847 to Foster entitled: “Convergence Feedback Indicator, Provided When Taking a Picture in a Camera Application”, a method for providing real-time feedback of an estimated quality of a captured final image including calculating a quality score of a preliminary obtained image is described in U.S. Patent Application Publication No. 2014/0050367 to CHEN et al., entitled: “Smart Document Capture Based on Estimated Scanned-Image Quality”, and methods and systems for determining augmentability information associated with an image frame captured by a digital imaging part of a user device are described in PCT International Application Publication No. WO2013/044983 to Hofmann et al., entitled: “Feedback to User for Indicating Augmentability of an Image”, which are all incorporated in their entirety for all purposes as if fully set forth herein.
Further, a digital image acquisition system that includes a portable apparatus for capturing digital images and a digital processing component for detecting, analyzing, invoking subsequent image captures, and informing the photographer regarding motion blur, and reducing the camera motion blur in an image captured by the apparatus, is described in U.S. Pat. No. 8,244,053 entitled: “Method and Apparatus for Initiating Subsequent Exposures Based on Determination of Motion Blurring Artifacts”, and in U.S. Pat. No. 8,285,067 entitled: “Method Notifying Users Regarding Motion Artifacts Based on Image Analysis”, both to Steinberg et al. which are both incorporated in their entirety for all purposes as if fully set forth herein.
Furthermore, a camera that has the release button, a timer, a memory and a control part, and the timer measures elapsed time after the depressing of the release button is released, used to prevent a shutter release moment to take a good picture from being missed by shortening time required for focusing when a release button is depressed again, is described in Japanese Patent Application Publication No. JP2008033200 to Hyo Hana entitled: “Camera”, a through image that is read by a face detection processing circuit, and the face of an object is detected, and is detected again by the face detection processing circuit while half pressing a shutter button, used to provide an imaging apparatus capable of photographing a quickly moving child without fail, is described in a Japanese Patent Application Publication No. JP2007208922 to Uchida Akihiro entitled: “Imaging Apparatus”, and a digital camera that executes image evaluation processing for automatically evaluating a photographic image (exposure condition evaluation, contrast evaluation, blur or focus blur evaluation), and used to enable an image photographing apparatus such as a digital camera to automatically correct a photographic image, is described in Japanese Patent Application Publication No. JP2006050494 to Kita Kazunori entitled: “Image Photographing Apparatus”, which are all incorporated in their entirety for all purposes as if fully set forth herein.
Skin Detection. Various skin detection algorithms are based on identifying a human skin by means of its color. For example, in a paper published Volume 1, No. 3, November-December 2012 (ISSN 2319-2720) of the International Journal of Computing, Communication and Networking (UCCN) entitled: “Face Detection using Skin Colour Model and distance between Eyes” by Pallabi Saikia, Gollo Janam, and Margaret Kathing, which is incorporated in its entirety for all purposes as if fully set forth herein, it is suggested that in YCbCr color model, a skin pixels and region may be identified by having 77≦Cb≦127 and 133≦Cr≦173 in 0-255 range, or 0.3≦Cb≦0.49 and 0.51≦0.678 when using normalized 0-1 range. Detecting human skin when other color models are used is described for example in a paper published by the International Journal of Applied Information Systems (IJAIS) Volume 3, No. 4, July 2012 (ISSN: 2249-0868), entitled: “Comparison between YCbCr Color Space and CIELab Color Space for Skin Color Segmentation” by Amanpreet Kaur and B. V. Kranthi, both of Lovely Professional University, Jalandhar, India, which is incorporated in its entirety for all purposes as if fully set forth herein.
Other techniques for skin detection are based on skin reflectance, where the physical properties of light reflection from a human skin using a known light wavelength. Such techniques are described in a University of Pennsylvania, Department of Computer & Information Science Technical Report MS-CIS-99-29 (Dec. 20, 1999) by Elli Angelopoulou entitled: “The Reflectance Spectrum of Human Skin”, which is incorporated in its entirety for all purposes as if fully set forth herein, disclosing that a human skin may be detected based on that the reflection of a human skin is typically 3 times stronger at 600-700 Nanometer than in 450 and 550 Nanometer. Other technique for skin detection that are based on skin reflectance is described in an article published in Computer Methods and Programs in Biomedicine (70(2): 179-186) by I. V. Meglinski and S. J. Matcher entitled: “Computer simulation of the skin reflectance spectra”, which is incorporated in its entirety for all purposes as if fully set forth herein. A model providing a prediction of various human face areas is described in a study published in ACM Transactions on Graphics (TOG), ISSN:0730-0301, Vol. 25, Issue 3, pp. 1013-1024, July 2006 (ACM Press) by Tim Weyrich et al. entitled: “Analysis of Human Faces Using a Measurement-Based Skin Reflectance Model”, which is incorporated in its entirety for all purposes as if fully set forth herein. Any light spectrum emitted by any light source may be equally used such as a light bulb or a florescent light, and in particular a black body spectrum by the sun, having intensity centered at 450-550 Nanometer, with substantial intensity also at 600-700 Nanometer (seen as Orange/Red).
Head Pose. Various systems and methods are known for estimating the head pose using a digital camera. A method for head pose estimation based on including receiving block motion vectors for a frame of video from a block motion estimator, selecting a block for analysis, determining an average motion vector for the selected block, and estimating the orientation of the user head in the video frame based on the accumulated average motion vector is described in U.S. Pat. No. 7,412,077 to Li et al., entitled: “Apparatus and Methods for Head Pose Estimation and Head Gesture Detection”, methods for generating a low dimension pose space and using the pose space to estimate head rotation angles of a user's head are described in U.S. Pat. No. 8,687,880 to Wei et al., entitled: “Real Time Head Pose Estimation”, techniques for performing accurate and automatic head pose estimation, integrated with a scale-invariant head tracking method based on facial features detected from a located head in images are described in U.S. Pat. No. 8,781,162 to Zhu et al., entitled: “Method and System for Head Tracking and Pose Estimation”, a three-dimensional pose of the head of a subject determined based on depth data captured in multiple images is described in U.S. Patent Application Publication No. 2012/0293635 to Sharma et al., entitled: “Head Pose Estimation Using RGBD Camera”, and a device and method for estimating head pose and obtaining an excellent head pose recognition result free from the influence of an illumination change, the device including a head area extracting unit, a head pitch angle unit, a head yaw unit, and a head pose displaying unit, is disclosed in U.S. Patent Application Publication No. 2014/0119655 to LIU et al., entitled: “Device and Method for Estimating Head Pose”, which are all incorporated in their entirety for all purposes as if fully set forth herein.
Further head pose techniques are described in IEEE Transaction on Pattern Analysis and Machine Intelligence published 2008 (Digital Object Identifier 10.1109/TPAMI.2008.106) by Erik Murphy-Chutorian and Mohan Trivedi entitled: “Head Pose Estimation in Computer Vision: A Survey”, and in an article by Xiangxin Zhu and Deva Ramanan of the University of California, Irvine, entitled: “Face detection, Pose Estimation, and Landmark Localization in the Wild”, which are both incorporated in their entirety for all purposes as if fully set forth herein. Further head-pose and eye-gaze information and techniques are described in a book by Jian-Gang Wang entitled: “Head-Pose and Eye-Gaze estimation: With Use of Face Domain knowledge” (ISBN-13: 978-3659132100).
Measuring the eye gaze using a monocular image that zooms in on only one eye of a person is described in an article published in Proceedings of the Ninth IEEE International Conference on Computer Vision (ICCV 2003) by Jian-Gang Wang, Eric Sung, and Ronda Venkateswarlu, all of Singapore, entitled: “Eye Gaze Estimation from a Single Image of One Eye”, and an Isophote Curvature method employed to calculate the location of irises center using faces in images from camera detected by Haar-like feature is described in a paper published in the International Symposium on Mechatronics and Robotics (Dec. 10, 2013, HCMUT, Viet Nam), by Dinh Quang Tri, Van Tan Thang, Nguyen Dinh Huy, and Doan The Thao of the University of Technology, HoChin Minh, Viet Nam, entitled: “Gaze Estimation with a Single Camera based on an ARM-based Embedded Linux Platform”, an approach for accurately measuring the eye gaze of faces from images of irises is described in an article by Jia-Gang Wang and Eric Sung of the Nanyang Technological University, Singapore, entitled: “Gaze Detection via Images of Irises”, two novel approaches, called the “two-circle” and “one-circle” algorithm respectively, for measuring eye gaze using monocular image that zooms in on two eyes or only one eye of a person are described in a paper by Jian-Gang Wang and Eric Sung of the School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore 639798, entitled: “Gaze Direction Determination”, ASEF eye locator is described in the web-site: ‘github.com/laoyang/ASEF’ (preceded by https://), a locating the center of the eye within the area of the pupil on low resolution images using isophrote properties to gain invariance to linear lighting changes is described in a paper published in IEEE Transaction on Pattern Analysis and Machine Intelligence (2011) by Roberto Valenti and Theo Gevers entitled: “Accurate Eye Center Location through Invariant Isocentric Patterns”, and an approach for accurate and robust eye center localization by using image gradients is described in an article by Fabian Timm and Erhardt Barth entitled: “Accurate Eye Localisation by Means of Gradients”, which are all incorporated in their entirety for all purposes as if fully set forth herein.
A method for controlling a zoom mode function of a portable imaging device equipped with multiple camera modules based on the size of an identified user's face or based on at least one of the user's facial features is described in U.S. Patent Application Publication No. 2014/0184854 to Musatenko, entitled: “Front Camera Face Detection for Rear Camera Zoom Function”, methods and apparatus for image capturing based on a first camera mounted on a rear side of a mobile terminal and a second camera mounted on the front side of the mobile terminal are described in U.S. Patent Application Publication No. 2014/0139667 to KANG, entitled: “Image Capturing Control Apparatus and Method”, a method and device for capturing accurate composition of an intended image/self-image/self-image with surrounding objects, with desired quality or high resolution and quality of the image achieved by using motion sensor/direction sensor/position sensor and by matching minimum number of contrast points are described in PCT International Application Publication No. WO 2015/022700 to RAMSUNDAR SHANDILYA et al., entitled: “A Method for Capturing an Accurately Composed High Quality Self-Image Using a Multi Camera Device”, a method and computer program product for remotely controlling a first image capturing unit in a portable electronic device including a first and second image capturing unit, and the device detects and tracks an object via the second capturing unit and detects changes in an area of the object are described in U.S. Patent Application Publication No. 2008/0212831 to Hope, entitled: “Remote Control of an Image Capturing Unit in a Portable Electronic Device”, methods and devices for camera aided motion direction and speed estimation of a mobile device based on capturing a plurality of images that represent views from the mobile device and adjusting perspectives of the plurality of images are described in U.S. Patent Application Publication No. 2014/0226864 to Subramaniam Venkatraman et al, entitled: “Camera Aided Motion Direction and Speed Estimation”, and a smart mobile phone with a front camera and a back camera where the position coordinates of pupil centers in the front camera reference system, when the mobile device holder watches a visual focus on a display screen are collected through the front camera, is described in the Abstract of Chinese Patent Application Publication No. CN 103747183 Huang Hedong, entitled: “Mobile Phone Shooting Focusing Method”, which are all incorporated in their entirety for all purposes as if fully set forth herein.
In consideration of the foregoing, it would be an advancement in the art to provide an image analysis solution and other methods and systems for improving image related functionalities, that are simple, secure, cost-effective, load balanced, redundant, reliable, provide lower CPU and/or memory usage, easy to use, reduce latency, faster, has a minimum part count, minimum hardware, and/or uses existing and available components, protocols, programs and applications for providing better quality of service, overload avoidance, better or optimal resources allocation, better communication and additional functionalities, and provides a better user experience.