Unless otherwise indicated herein, the materials described in this section are not prior art to the claims in this application and are not admitted to be prior art by inclusion in this section.
FIG. 1 shows a block diagram that illustrates a system 10 including a computer system 11 and an associated Internet 22 connection. Such a configuration is typically used for computers (hosts) connected to the Internet 22 and executing a server or a client (or a combination) software. The computer system 11 may be part of or may be used as a portable electronic device such as a notebook/laptop computer, a media player (e.g., MP3 based or a video player), a desktop computer, a laptop computer, a cellular phone, a Personal Digital Assistant (PDA), an image processing device (e.g., a digital camera or a video recorder), any other handheld or fixed location computing devices, or a combination of any of these devices. Note that while FIG. 1 illustrates various components of the computer system 11, it is not intended to represent any particular architecture or manner of interconnecting the components. Further, apart from the devices mentioned above, other electronic devices such as network computers, handheld computers, cell phones and other data processing systems that have fewer components or perhaps more components may also be used. For example, the computer of FIG. 1 may be an Apple Macintosh computer or a Power Book, or an IBM compatible PC. The computer system 11 may include a bus 13, an interconnect, or other communication mechanism for communicating information, and a processor 12, commonly in the form of an integrated circuit, coupled to the bus 13 for processing information and for executing the computer executable instructions. The computer system 11 may also include a main memory 15a, such as a Random Access Memory (RAM) or any other dynamic storage device, coupled to the bus 13 for storing information and instructions to be executed by the processor 12. The main memory 15a may also be used for storing temporary variables or other intermediate information during execution of the instructions to be executed by the processor 12. The computer system 11 further includes a Read Only Memory (ROM) 15b (or any other non-volatile memory) or other static storage device coupled to the bus 13 for storing static information and instructions for the processor 12. A storage device 15c that may be a magnetic disk or an optical disk, such as a hard disk drive (HDD) for reading from and writing to a hard disk, a magnetic disk drive for reading from and writing to a magnetic disk, and/or an optical disk drive (such as DVD) for reading from and writing to a removable optical disk, is coupled to the bus 13 for storing information and instructions. The hard disk drive, magnetic disk drive, or optical disk drive may be connected to the system bus 13 by a hard disk drive interface, a magnetic disk drive interface, or an optical disk drive interface, respectively. The drives and their associated computer-readable media provide non-volatile storage of computer readable instructions, data structures, program modules and other data for the general-purpose computing devices. Typically, the computer system 11 includes an Operating System (OS) stored in a non-volatile storage 15b for managing the computer resources. The operating system provides the applications and programs with an access to the computer resources and interfaces. The operating system commonly processes system data and user inputs, and responds by allocating and managing tasks and internal system resources, such as controlling and allocating memory, prioritizing system requests, controlling input and output devices, facilitating networking, and managing files. Non-limiting examples of operating systems are Microsoft Windows, Mac OS X, and Linux.
The computer system 11 may be coupled via the bus 13 to a display 17, such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), a flat screen monitor, a touch screen monitor or similar means for displaying text and graphical data to a user. The display 17 may be connected via a video adapter, and allows a user to view, enter, and/or edit information that is relevant to the operation of the system 10. An input device 18, including alphanumeric and other keys, is coupled to the bus 13 for communicating information and command selections to the processor 12. Another type of input device is a cursor control 18a, such as a mouse, a trackball, or cursor direction keys for communicating direction information and command selections to the processor 12 and for controlling cursor movement on the display 17. This cursor control 18a typically has two degrees of freedom in two axes, a first axis (e.g., x) and a second axis (e.g., y), that allows the device to specify positions in a plane.
The computer system 11 may be used for implementing the methods and techniques described herein. According to one embodiment, these methods and techniques are performed by the computer system 11 in response to the processor 12 executing one or more sequences of one or more instructions contained in the main memory 15a. Such instructions may be read into the main memory 15a from another computer-readable medium, such as the storage device 15c. Execution of the sequences of instructions contained in the main memory 15a causes the processor 12 to perform the process steps described herein. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions to implement the arrangement. Thus, embodiments of the present invention are not limited to any specific combination of hardware circuitry and software.
The term “processor” is used herein to include, but not limited to, any integrated circuit or any other electronic device (or collection of electronic devices) capable of performing an operation on at least one instruction, including, without limitation, a microprocessor (μP), a microcontroller (μC), a Digital Signal Processor (DSP), or any combination thereof. A processor, such as the processor 12, may further be a Reduced Instruction Set Core (RISC) processor, a Complex Instruction Set Computing (CISC) microprocessor, a Microcontroller Unit (MCU), or a CISC-based Central Processing Unit (CPU). The hardware of the processor 12 may either be integrated onto a single substrate (e.g., silicon “die”), or distributed among two or more substrates. Furthermore, various functional aspects of the processor 12 may be implemented solely as a software (or firmware) associated with the processor 12.
A memory can store computer programs or any other sequence of computer readable instructions, or data, such as files, text, numbers, audio and video, as well as any other form of information represented as a string or structure of bits or bytes. The physical means of storing information may be electrostatic, ferroelectric, magnetic, acoustic, optical, chemical, electronic, electrical, or mechanical. A memory may be in a form of an Integrated Circuit (IC, a.k.a. chip or microchip). Alternatively or in addition, a memory may be in the form of a packaged functional assembly of electronic components (module). Such module may be based on a Printed Circuit Board (PCB) such as PC Card according to Personal Computer Memory Card International Association (PCMCIA) PCMCIA 2.0 standard, or a Single In-line Memory Module (SIMM) or a Dual In-line Memory Module (DIMM), standardized under the JEDEC JESD-21C standard. Further, a memory may be in the form of a separately rigidly enclosed box such as an external Hard-Disk Drive (HDD). A capacity of a memory is commonly featured in bytes (B), where the prefix ‘K’ is used to denote kilo=210=10241=1024, the prefix ‘M’ is used to denote mega=220=10242=1,048,576, the prefix ‘G’ is used to denote Giga=230=10243=1,073,741,824, and the prefix ‘T’ is used to denote tera=240=10244=1,099,511,627,776.
Various forms of computer-readable media may be involved in carrying one or more sequences of one or more instructions to the processor 12 for execution. For example, the instructions may initially be carried on a magnetic disk of a remote computer. The remote computer may load the instructions into its dynamic memory and send the instructions over a telephone line using a modem. A modem local to the computer system 11 can receive the data on the telephone line and use an infrared transmitter to convert the data to an infrared signal. An infrared detector can receive the data carried in the infrared signal, and appropriate circuitry can place the data on the bus 13. The bus 13 carries the data to the main memory 15a, from where the processor 12 retrieves and executes the instructions. The instructions received by the main memory 15a may optionally be stored on the storage device 15c either before or after execution by the processor 12.
The computer system 11 commonly includes a communication interface 9 coupled to the bus 13. The communication interface 9 provides a two-way data communication coupling to a network link 8 that is connected to a Local Area Network (LAN) 14. For example, the communication interface 9 may be an Integrated Services Digital Network (ISDN) card or a modem to provide a data communication connection to a corresponding type of telephone line. As another non-limiting example, the communication interface 9 may be a Local Area Network (LAN) card to provide a data communication connection to a compatible LAN. For example, Ethernet-based connection based on IEEE802.3 standard may be used, such as 10/100BaseT, 1000BaseT (gigabit Ethernet), 10 gigabit Ethernet (10GE or 10 GbE or 10 GigE per IEEE Std. 802.3ae-2002as standard), 40 Gigabit Ethernet (40 GbE), or 100 Gigabit Ethernet (100 GbE as per Ethernet standard IEEE P802.3ba). These technologies are described in Cisco Systems, Inc. Publication number 1-587005-001-3 (6/99), “Internetworking Technologies Handbook”, Chapter 7: “Ethernet Technologies”, pages 7-1 to 7-38, which is incorporated in its entirety for all purposes as if fully set forth herein. In such a case, the communication interface 9 typically includes a LAN transceiver or a modem, such as a Standard Microsystems Corporation (SMSC) LAN91C111 10/100 Ethernet transceiver, described in the Standard Microsystems Corporation (SMSC) data-sheet “LAN91C111 10/100 Non-PCI Ethernet Single Chip MAC+PHY” Data-Sheet, Rev. 15 (Feb. 20, 2004), which is incorporated in its entirety for all purposes as if fully set forth herein.
An Internet Service Provider (ISP) 16 is an organization that provides services for accessing, using, or otherwise utilizing the Internet 22. The Internet Service Provider 16 may be organized in various forms, such as commercial, community-owned, non-profit, or otherwise privately owned. Internet services, typically provided by ISPs, include Internet access, Internet transit, domain name registration, web hosting, and co-location. Various ISP Structures are described in Chapter 2: “Structural Overview of ISP Networks” of the book entitled: “Guide to Reliable Internet Services and Applications”, by Robert D. Doverspike, K. K. Ramakrishnan, and Chris Chase, published 2010 (ISBN: 978-1-84882-827-8), which is incorporated in its entirety for all purposes as if fully set forth herein.
A mailbox provider is an organization that provides services for hosting electronic mail domains with access to storage for mailboxes. It provides email servers to send, receive, accept, and store email for end users or other organizations. Internet hosting services provide email, web-hosting or online storage services. Other services include virtual server, cloud services, or physical server operation. A virtual ISP (VISP) is an operation that purchases services from another ISP, sometimes called a wholesale ISP in this context, which allow the VISP's customers to access the Internet using services and infrastructure owned and operated by the wholesale ISP. It is akin to mobile virtual network operators and competitive local exchange carriers for voice communications. A Wireless Internet Service Provider (WISP) is an Internet service provider with a network based on wireless networking. Technology may include commonplace Wi-Fi wireless mesh networking, or proprietary equipment designed to operate over open 900 MHz, 2.4 GHz, 4.9, 5.2, 5.4, 5.7, and 5.8 GHz bands or licensed frequencies in the UHF band (including the MMDS frequency band) and LMDS.
ISPs may engage in peering, where multiple ISPs interconnect at peering points or Internet exchange points (IXs), allowing routing of data between each network, without charging one another for the data transmitted—data that would otherwise have passed through a third upstream ISP, incurring charges from the upstream ISP. ISPs that require no upstream and have only customers (end customers and/or peer ISPs) are referred to as Tier 1 ISPs.
An arrangement 10a of a computer system connected to the Internet 22 is shown in FIG. 2. A computer system or a workstation 7 is shown, including a main unit box 6, which encloses a motherboard on which the processor 12 and the memories 15a, 15b, and 15c are typically mounted. The workstation 7 may further include a keyboard 2 (corresponding to the input device 18), a printer 4, a computer mouse 3 (corresponding to the cursor control 18a), and a display 5 (corresponding to the display 17). FIG. 2 further illustrates various devices connected via the Internet 22, such as a client device #1 24, a client device #2 24a, a data server #1 23a, a data server #2 23b, and the workstation 7, connected to the Internet 22 over a LAN 14 and via a router or a gateway 19 and the ISP 16.
The client device #1 24 and the client device #2 24a may communicate over the Internet 22 for exchanging or obtaining data from the data server #1 23a and the data server #2 23b. In one example, the servers are HTTP servers, sometimes known as web servers. A method describing a more efficient communication over the Internet is described in U.S. Pat. No. 8,560,604 to Shribman et al., entitled: “System and Method for Providing Faster and More Efficient Data Communication” (hereinafter the ‘604 patent’), which is incorporated in its entirety for all purposes as if fully set forth herein. A splitting of a message or a content into slices, and transferring each of the slices over a distinct data path is described in U.S. Patent Application No. 2012/0166582 to Binder entitled: “System and Method for Routing-Based Internet Security”, which is incorporated in its entirety for all purposes as if fully set forth herein.
The term “computer-readable medium” (or “machine-readable medium”) is used herein to include, but not limited to, any medium or any memory, that participates in providing instructions to a processor, (such as the processor 12) for execution, or any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computer). Such a medium may store computer-executable instructions to be executed by a processing element and/or control logic, and data, which is manipulated by a processing element and/or control logic, and may take many forms, including but not limited to, non-volatile medium, volatile medium, and transmission medium. Transmission media includes coaxial cables, copper wire and fiber optics, including the wires that comprise the bus 13. Transmission media may also take the form of acoustic or light waves, such as those generated during radio-wave or infrared data communications, or other form of propagating signals (e.g., carrier waves, infrared signals, digital signals, etc.). Common forms of computer-readable media include a floppy disk, a flexible disk, hard disk, magnetic tape, or any other magnetic medium, a CD-ROM, any other optical medium, punch-cards, paper-tape, any other physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave as described hereinafter, or any other medium from which a computer can read.
Various forms of computer-readable media may be involved in carrying one or more sequences of one or more instructions to the processor 12 for execution. For example, the instructions may initially be carried on a magnetic disk of a remote computer. The remote computer can load the instructions into its dynamic memory and send the instructions over a telephone line using a modem. A modem local to the computer system 11 can receive the data on the telephone line and use an infrared transmitter to convert the data to an infrared signal. An infrared detector can receive the data carried in the infrared signal and appropriate circuitry can place the data on the bus 13. The bus 13 carries the data to the main memory 15a, from which the processor 12 retrieves and executes the instructions. The instructions received by the main memory 15a may optionally be stored on the storage device 15c either before or after execution by the processor 12.
Operating system. An Operating System (OS) is a software that manages computer hardware resources and provides common services to various computer programs. The operating system is an essential component of any system software in a computer system, and most application programs usually require an operating system to function. For hardware functions such as input/output and memory allocation, the operating system acts as an intermediary between programs and the computer hardware, although the application code is usually executed directly by the hardware and frequently makes a system call to an OS function or be interrupted by it. Common features typically supported by operating systems include process management, interrupts handling, memory management, file system, device drivers, networking (such as TCP/IP and UDP), and Input/Output (I/O) handling. Examples of popular modern operating systems include Android, BSD, iOS, Linux, OS X, QNX, Microsoft Windows, Windows Phone, and IBM z/OS.
A camera 30 shown in FIG. 3 may be a digital still camera that converts captured image into an electric signal upon a specific control, or may be a video camera, wherein the conversion between captured images to the electronic signal is continuous (e.g., 24 frames per second). The camera 30 is preferably a digital camera, where the video or still images are converted using an electronic image sensor 32. The digital camera 30 includes a lens 31 (or few lenses) for focusing the received light onto a small semiconductor image sensor 32. The image sensor 32 commonly includes a panel with a matrix of tiny light-sensitive diodes (photocells), converting the image light to electric charges and then to electric signals, thus creating a video picture or a still image by recording the light intensity. Charge-Coupled Devices (CCD) and CMOS (Complementary Metal-Oxide-Semiconductor) are commonly used as the light-sensitive diodes. Linear or area arrays of light-sensitive elements may be used, and the light sensitive sensors may support monochrome (black & white), color or both. For example, the CCD sensor KAI-2093 Image Sensor 1920 (H)×1080 (V) Interline CCD Image Sensor or KAF-50100 Image Sensor 8176 (H)×6132 (V) Full-Frame CCD Image Sensor may be used, available from Image Sensor Solutions, Eastman Kodak Company, Rochester, N.Y.
An image processor block 33 receives the analog signal from the image sensor 32. An Analog Front End (AFE) in the block 33 filters, amplifies, and digitizes the signal, using an analog-to-digital (A/D) converter. The AFE further provides Correlated Double Sampling (CDS) and a gain control to accommodate varying illumination conditions. In the case of a CCD-based sensor 32, a CCD AFE (Analog Front End) component may be used between the digital image processor 33 and the sensor 32. Such an AFE may be based on VSP2560 ‘CCD Analog Front End for Digital Cameras’ from Texas Instruments Incorporated of Dallas, Tex., U.S.A. The block 33 further contains a digital image processor, which receives the digital data from the AFE, and processes this digital representation of the image to handle various industry standards, and to execute various computations and algorithms. Preferably, additional image enhancements may be performed by the block 33 such as generating greater pixel density or adjusting color balance, contrast, and luminance. Further, the block 33 may perform other data management functions and processing on the raw digital image data. Commonly, the timing relationship of the vertical/horizontal reference signals and the pixel clock are also handled in this block. Digital Media System-on-Chip device TMS320DM357 from Texas Instruments Incorporated of Dallas, Tex., U.S.A. is an example of a device implementing on a single chip (and associated circuitry) part or all of the image processor 33, part or all of a video compressor 34 and part or all of a transceiver 35. In addition to a lens or lens system, color filters may be placed between the imaging optics and the photosensor sensor (or array) 32 to achieve desired color manipulation.
The processing block 33 converts the raw data received from the photosensor array 32 (which can be any internal camera format, including before or after Bayer translation) into a color-corrected image in a standard image file format. The camera 30 further comprises a connector 39a, and a transmitter or a transceiver 35 disposed between the connector 39a and the image processor 33. The transceiver 35 includes isolation magnetic components (e.g. transformer-based), balancing, surge protection, and other suitable components required for providing a proper and standard interface via the connector 39a. In the case of connecting to a wired medium, the connector 39a further includes protection circuitry for accommodating transients, over-voltage, and lightning, and any other protection means for reducing or eliminating the damage from an unwanted signal over the wired medium. A band pass filter may also be used for passing only the required communication signals, and rejecting or stopping other signals in the described path. A transformer may be used for isolating and reducing common-mode interferences. Further a wiring driver and wiring receivers may be used to transmit and receive the appropriate level of signals to and from the wired medium. An equalizer may also be used to compensate for any frequency dependent characteristics of the wired medium.
Other image processing functions performed by the image processor 33 may include adjusting color balance, gamma and luminance, filtering pattern noise, filtering noise using Wiener filter, changing zoom factors, recropping, applying enhancement filters, applying smoothing filters, applying subject-dependent filters, and applying coordinate transformations. Other enhancements in the image data may include applying mathematical algorithms to generate greater pixel density or adjusting color balance, contrast and/or luminance.
The image processing may further include an algorithm for motion detection by comparing the current image with a reference image and counting the number of different pixels, where the image sensor 32 or the digital camera 30 are assumed to be in a fixed location and thus assumed to capture the same image. Since images naturally differ due to factors such as varying lighting, camera flicker, and CCD dark currents, pre-processing is useful to reduce the number of false positive alarms. Algorithms that are more complex are necessary to detect motion when the camera itself is moving, or when the motion of a specific object must be detected in a field containing another movement that can be ignored.
The image processing may further include video enhancement such as video denoising, image stabilization, unsharp masking, and super-resolution. Further, the image processing may include a Video Content Analysis (VCA), where the video content is analyzed to detect and determine temporal events based on multiple images, and is commonly used for entertainment, healthcare, retail, automotive, transport, home automation, safety and security. The VCA functionalities include Video Motion Detection (VMD), video tracking, and egomotion estimation, as well as identification, behavior analysis, and other forms of situation awareness. A dynamic masking functionality involves blocking a part of the video signal based on the video signal itself, for example because of privacy concerns. The egomotion estimation functionality involves the determining of the location of a camera or estimating the camera motion relative to a rigid scene, by analyzing its output signal. Motion detection is used to determine the presence of a relevant motion in the observed scene, while an object detection is used to determine the presence of a type of object or entity, for example, a person or car, as well as fire and smoke detection. Similarly, face recognition and Automatic Number Plate Recognition may be used to recognize, and therefore possibly identify persons or cars. Tamper detection is used to determine whether the camera or the output signal is tampered with, and video tracking is used to determine the location of persons or objects in the video signal, possibly with regard to an external reference grid. A pattern is defined as any form in an image having discernible characteristics that provide a distinctive identity when contrasted with other forms. Pattern recognition may also be used, for ascertaining differences, as well as similarities, between patterns under observation and partitioning the patterns into appropriate categories based on these perceived differences and similarities; and may include any procedure for correctly identifying a discrete pattern, such as an alphanumeric character, as a member of a predefined pattern category. Further, the video or image processing may use, or be based on, the algorithms and techniques disclosed in the book entitled: “Handbook of Image & Video Processing”, edited by Al Bovik, published by Academic Press, ISBN: 0-12-119790-5, and in the book published by Wiley-Interscience, ISBN13 978-0-471-71998-4 (2005) by Tinku Acharya and Ajoy K. Ray entitled: “Image Processing—Principles and Applications”, which are both incorporated in their entirety for all purposes as if fully set forth herein.
A controller 37, located within the camera device or module 30, may be based on a discrete logic or an integrated device, such as a processor, microprocessor or microcomputer, and may include a general-purpose device or may be a special purpose processing device, such as an ASIC, PAL, PLA, PLD, Field Programmable Gate Array (FPGA), Gate Array, or other customized or programmable device. In the case of a programmable device as well as in other implementations, a memory is required. The controller 37 commonly includes a memory that may include a static RAM (Random Access Memory), dynamic RAM, flash memory, ROM (Read Only Memory), or any other data storage medium. The memory may include data, programs, and/or instructions and any other software or firmware executable by the processor. Control logic can be implemented in hardware or in software, such as firmware stored in the memory. The controller 37 controls and monitors the device operation, such as initialization, configuration, interface, and commands. The term “processor” is meant to include any integrated circuit or other electronic device (or collection of devices) capable of performing an operation on at least one instruction including, without limitation, reduced instruction set core (RISC) processors, CISC microprocessors, microcontroller units (MCUs), CISC-based central processing units (CPUs), and digital signal processors (DSPs). The hardware of such devices may be integrated onto a single substrate (e.g., silicon “die”), or distributed among two or more substrates. Furthermore, various functional aspects of the processor may be implemented solely as software or firmware associated with the processor.
The digital camera device or module 30 requires power for its described functions such as for capturing, storing, manipulating, and transmitting the image. A dedicated power source such as a battery may be used or a dedicated connection to an external power source via connector 39b. The camera device 30 may further includes a power supply 38 that contains a DC/DC converter. In another embodiment, the power supply 38 is power fed from the AC power supply via AC plug as the connector 39b and a cord, and thus may include an AC/DC converter, for converting the AC power (commonly 115 VAC/60 Hz or 220 VAC/50 Hz) into the required DC voltage or voltages. Such power supplies are known in the art and typically involves converting 120 or 240 volt AC supplied by a power utility company to a well-regulated lower voltage DC for electronic devices. In one embodiment, the power supply 38 is integrated into a single device or circuit for sharing common circuits. Further, the power supply 38 may include a boost converter, such as a buck-boost converter, charge pump, inverter and regulators as known in the art, as required for conversion of one form of electrical power to another desired form and voltage. While the power supply 38 (either separated or integrated) can be an integral part and housed within the camera 30 enclosure, it may be enclosed in a separate housing connected via cable to the camera 30 assembly. For example, a small outlet plug-in step-down transformer shape can be used (also known as wall-wart, “power brick”, “plug pack”, “plug-in adapter”, “adapter block”, “domestic mains adapter”, “power adapter”, or AC adapter). Further, the power supply 38 may be a linear or switching type.
Various formats that can be used to represent the captured image are TIFF (Tagged Image File Format), RAW format, AVI, DV, MOV, WMV, MP4, DCF (Design Rule for Camera Format), ITU-T H.261, ITU-T H.263, ITU-T H.264, ITU-T CCIR 601, ASF, Exif (Exchangeable Image File Format), and DPOF (Digital Print Order Format) standards. In many cases, video data is compressed before transmission, in order to allow its transmission over a reduced bandwidth transmission system. A video compressor 34 (or video encoder) is shown in FIG. 3 disposed between the image processor 33 and the transceiver 35, allowing for compression of the digital video signal before its transmission over a cable or over-the-air. In some cases, compression may not be required, hence obviating the need for such compressor 34. Such compression can be lossy or lossless types. Common compression algorithms are JPEG (Joint Photographic Experts Group) and MPEG (Moving Picture Experts Group). The above and other image or video compression techniques may use of intraframe compression commonly based on registering the differences between a part of a single frame or a single image. Interframe compression can further be used for video streams, based on registering differences between frames. Other examples of image processing include run length encoding and delta modulation. Further, the image can be dynamically dithered to allow the displayed image to appear to have higher resolution and quality.
The single lens or a lens array 31 is positioned to collect optical energy representative of a subject or scenery, and to focus the optical energy onto the photosensor array 32. Commonly, the photosensor array 32 is a matrix of photosensitive pixels, which generates an electric signal that is a representative of the optical energy directed at the pixel by the imaging optics.
A prior art example of a portable electronic camera connectable to a computer is disclosed in U.S. Pat. No. 5,402,170 to Parulski et al. entitled: “Hand-Manipulated Electronic Camera Tethered to a Personal Computer”. A digital electronic camera which can accept various types of input/output cards or memory cards is disclosed in U.S. Pat. No. 7,432,952 to Fukuoka entitled: “Digital Image Capturing Device having an Interface for Receiving a Control Program”, and the use of a disk drive assembly for transferring images out of an electronic camera is disclosed in U.S. Pat. No. 5,138,459 to Roberts et al., entitled: “Electronic Still Video Camera with Direct Personal Computer (PC) Compatible Digital Format Output”, which are all incorporated in their entirety for all purposes as if fully set forth herein.
Bitmap. A bitmap (a.k.a. bit array or bitmap index) is a mapping from some domain (for example, a range of integers) to bits (values that are zero or one). In computer graphics, when the domain is a rectangle (indexed by two coordinates) a bitmap gives a way to store a binary image, that is, an image in which each pixel is either black or white (or any two colors). More generally, the term ‘bitmap’ is used herein to include, but not limited to, a pixmap, which refers to a map of pixels, where each one may store more than two colors, thus using more than one bit per pixel. A bitmap is a type of memory organization or image file format used to store digital images.
In typically uncompressed bitmaps, image pixels are stored using a color depth of 1, 4, 8, 16, 24, 32, 48, or 64 bits per pixel. Pixels of 8 bits and fewer can represent either grayscale or indexed color. An alpha channel (for transparency) may be stored in a separate bitmap, where it is similar to a grayscale bitmap, or in a fourth channel that, for example, converts 24-bit images to 32 bits per pixel. The bits representing the bitmap pixels may be packed or unpacked (spaced out to byte or word boundaries), depending on the format or device requirements. Depending on the color depth, a pixel in the picture occupies at least n/8 bytes, where n is the bit depth. For an uncompressed, packed within rows, bitmap, such as is stored in Microsoft DIB or BMP file format, or in uncompressed TIFF format, a lower bound on storage size for a n-bit-per-pixel (2n colors) bitmap, in bytes, can be calculated as: size=width·height·n/8, where height and width are given in pixels. In the formula above, header size and color palette size, if any, are not included.
The BMP file format, also known as bitmap image file or Device Independent Bitmap (DIB) file format or simply a bitmap, is a raster graphics image file format used to store bitmap digital images, independently of the display device (such as a graphics adapter), especially on Microsoft Windows and OS/2 operating systems. The BMP file format is capable of storing 2D digital images of arbitrary width, height, and resolution, both monochrome and color, in various color depths, and optionally with data compression, alpha channels, and color profiles. The Windows Metafile (WMF) specification covers the BMP file format.
Face detection. A camera with a human face detection means is disclosed in U.S. Pat. No. 6,940,545 to Ray et al., entitled: “Face Detecting Camera and Method”, and in U.S. Patent Application Publication No. 2012/0249768 to Binder entitled: “System and Method for Control Based on Face or Hand Gesture Detection”, which are both incorporated in their entirety for all purposes as if fully set forth herein.
Face detection (also known as face localization) includes algorithms for identifying a group of pixels within a digitally acquired image that relates to the existence, locations and sizes of human faces. Common face-detection algorithms focused on the detection of frontal human faces, and other algorithms attempt to solve the more general and difficult problem of multi-view face detection. That is, the detection of faces that are either rotated along the axis from the face to the observer (in-plane rotation), or rotated along the vertical or left-right axis (out-of-plane rotation), or both. Various face detection techniques and devices (e.g. cameras) having face detection features are disclosed in U.S. Pat. RE33,682, RE31,370, U.S. Pat. Nos. 4,047,187, 4,317,991, 4,367,027, 4,638,364, 5,291,234, 5,386,103, 5,488,429, 5,638,136, 5,642,431, 5,710,833, 5,724,456, 5,781,650, 5,812,193, 5,818,975, 5,835,616, 5,870,138, 5,978,519, 5,987,154, 5,991,456, 6,097,470, 6,101,271, 6,128,397, 6,148,092, 6,151,073, 6,188,777, 6,192,149, 6,249,315, 6,263,113, 6,268,939, 6,282,317, 6,301,370, 6,332,033, 6,393,148, 6,404,900, 6,407,777, 6,421,468, 6,438,264, 6,456,732, 6,459,436, 6,473,199, 6,501,857, 6,504,942, 6,504,951, 6,516,154, 6,526,161, 6,940,545, 7,110,575, 7,315,630, 7,317,815, 7,466,844, 7,466,866 and 7,508,961, which are all incorporated in its entirety for all purposes as if fully set forth herein.
Image. A digital image is a numeric representation (normally binary) of a two-dimensional image. Depending on whether the image resolution is fixed, it may be of a vector or raster type. Raster images have a finite set of digital values, called picture elements or pixels. The digital image contains a fixed number of rows and columns of pixels, which are the smallest individual element in an image, holding quantized values that represent the brightness of a given color at any specific point. Typically, the pixels are stored in computer memory as a raster image or raster map, a two-dimensional array of small integers, where these values are usually transmitted or stored in a compressed form. The raster images can be created by a variety of input devices and techniques, such as digital cameras, scanners, coordinate-measuring machines, seismographic profiling, airborne radar, and more. Common image formats include GIF, JPEG, and PNG.
The Graphics Interchange Format (better known by its acronym GIF) is a bitmap image format that supports up to 8 bits per pixel for each image, allowing a single image to reference its palette of up to 256 different colors chosen from the 24-bit RGB color space. It also supports animations and allows a separate palette of up to 256 colors for each frame. GIF images are compressed using the Lempel-Ziv-Welch (LZW) lossless data compression technique to reduce the file size without degrading the visual quality. The GIF (GRAPHICS INTERCHANGE FORMAT) Standard Version 89a is available from www.w3.org/Graphics/GIF/spec-gif89a.txt.
JPEG (seen most often with the .jpg or .jpeg filename extension) is a commonly used method of lossy compression for digital images, particularly for those images produced by digital photography. The degree of compression can be adjusted, allowing a selectable tradeoff between storage size and image quality and typically achieves 10:1 compression with little perceptible loss in image quality. JPEG/Exif is the most common image format used by digital cameras and other photographic image capture devices, along with JPEG/JFIF. The term “JPEG” is an acronym for the Joint Photographic Experts Group, which created the standard. JPEG/JFIF supports a maximum image size of 65535×65535 pixels—one to four gigapixels (1000 megapixels), depending on the aspect ratio (from panoramic 3:1 to square). JPEG is standardized under as ISO/IEC 10918-1:1994 entitled: “Information technology—Digital compression and coding of continuous-tone still images: Requirements and guidelines”. 
Portable Network Graphics (PNG) is a raster graphics file format that supports lossless data compression that was created as an improved replacement for Graphics Interchange Format (GIF), and is the commonly used lossless image compression format on the Internet. PNG supports palette-based images (with palettes of 24-bit RGB or 32-bit RGBA colors), grayscale images (with or without alpha channel), and full-color non-palette-based RGBimages (with or without alpha channel). PNG standard was designed for transferring images on the Internet (and not for professional-quality print graphics) and, therefore, does not support non-RGB color spaces such as CMYK, and was published as an ISO/IEC15948:2004 standard entitled: “Information technology—Computer graphics and image processing—Portable Network Graphics (PNG): Functional specification”. 
Metadata. The term “metadata”, as used herein, refers to data that describes characteristics, attributes, or parameters of other data, in particular, files (such as program files) and objects. Such data typically includes structured information that describes, explains, locates, and otherwise makes it easier to retrieve and use an information resource. Metadata typically includes structural metadata, relating to the design and specification of data structures or “data about the containers of data”; and descriptive metadata about individual instances of application data or the data content. Metadata may include the means of creation of the data, the purpose of the data, time and date of creation, the creator or author of the data, the location on a computer network where the data were created, and the standards used.
For example, metadata associated with a computer word processing file may include the title of the document, the name of the author, the company to whom the document belongs, the dates that the document was created and last modified, keywords which describe the document, and other descriptive data. While some of this information may also be included in the document itself (e.g., title, author, and data), metadata may be a separate collection of data that may be stored separately from, but associated with, the actual document. One common format for documenting metadata is eXtensible Markup Language (XML). XML provides a formal syntax, which supports the creation of arbitrary descriptions, sometimes called “tags.” An example of a metadata entry might be <title>War and Peace</title>, where the bracketed words delineate the beginning and end of the group of characters that constitute the title of the document that is described by the metadata. In the example of the word processing file, the metadata (sometimes referred to as “document properties”) is entered manually by the author, the editor, or the document manager. The metadata concept is further described in a National Information Standards Organization (NISO) Booklet entitled: “Understanding Metadata” (ISBN: 1-880124-62-9), in the IETF RFC 5013 entitled: “The Dublin Core Metadata Element Set”, and in the IETF RFC 2731 entitled: “Encoding Dublin Core Metadata in HTML”, which are all incorporated in their entirety for all purposes as if fully set forth herein. An extraction of metadata from files or objects is described in a U.S. Pat. No. 8,700,626 to Bedingfield, entitled: “Systems, Methods and Computer Products for Content-Derived Metadata”, and in a U.S. Patent Application Publication 2012/0278705 to Yang et al., entitled: “System and Method for Automatically Extracting Metadata from Unstructured Electronic Documents”, which are both incorporated in their entirety for all purposes as if fully set forth herein.
Metadata can be stored either internally in the same file, object, or structure as the data (this is also called internal or embedded metadata), or externally in a separate file or field separated from the described data. A data repository typically stores the metadata detached from the data, but can be designed to support embedded metadata approaches. Metadata can be stored in either human-readable or binary form. Storing metadata in a human-readable format such as XML can be useful because users can understand and edit it without specialized tools, however, these formats are rarely optimized for storage capacity, communication time, and processing speed. A binary metadata format enables efficiency in all these respects but requires special libraries to convert the binary information into a human-readable content.
Tag. A tag is a type of metadata relating to a non-hierarchical keyword or term assigned to a digital image, describing the image and allows it to be found again by browsing or searching. Tags may be chosen informally and personally by the item's creator or by its viewer, depending on the system.
Color space. A color space is a specific organization of colors, allowing for reproducible representations of color, in both analog and digital representations. A color model is an abstract mathematical model describing the way colors can be represented as tuples of numbers (e.g., three tuples/channels in RGB or four in CMYK). When defining a color space, the usual reference standard is the CIELAB or CIEXYZ color spaces, which were specifically designed to encompass all colors the average human can see. Colors are commonly created in printing with color spaces based on the CMYK color model, using the subtractive primary colors of pigment (Cyan (C), Magenta (M), Yellow (Y), and Black (K)). To create a three-dimensional representation of a given color space, we can assign the amount of magenta color to the representation's X axis, the amount of cyan to its Y axis, and the amount of yellow to its Z axis. The resulting 3-D space provides a unique position for every possible color that can be created by combining those three pigments. Colors are typically created on computer monitors with color spaces based on the RGB color model, using the additive primary colors (red, green, and blue). A three-dimensional representation would assign each of the three colors to the X, Y, and Z-axes. Popular color models include RGB, CMYK, HSL, YUV, YCbCr, and YPbPr color formats.
Color spaces and the various color space models are described in an article by Marko Tkalcic and Jurij F. Tasic (of the University of Ljubljana, Slovenia) entitled: “Colour spaces—perceptual, historical and applicational background”, and in the article entitled: “Color Space Basics” by Andrew Oran and Vince Roth, published May 2012, Issue 4 of the journal ‘The Tech Review’ by the Association of Moving Image Archivists, which are both incorporated in their entirety for all purposes as if fully set forth herein. Conversions between color spaces or models are described in an article entitled: “Colour Space Conversions” by Adrian Ford and Alan Roberts (Aug. 11, 1998), and in an article by Philippe Colantoni and Al dated 2004, entitled: “Color Space Transformations”, which are both incorporated in their entirety for all purposes as if fully set forth herein.
A color space maps a range of physically produced colors (from mixed light, pigments, etc.) to an objective description of color sensations registered in the eye, typically in terms of tristimulus values, but not usually in the LMS space defined by the cone spectral sensitivities. The tristimulus values associated with a color space can be conceptualized as amounts of three primary colors in a tri-chromatic additive color model. In some color spaces, including LMS and XYZ spaces, the primary colors used are not real colors, in the sense that they cannot be generated with any light spectrum.
CIE color space. The CIE 1931 standards were created by the International Commission on Illumination (CIE) in 1931 and include the CIE 1931 RGB, CIE 1931 XYZ, CIELUV, and CIEUVW color models. When judging the relative luminance (brightness) of different colors in well-lit situations, humans tend to perceive light within the green parts of the spectrum as brighter than red or blue light of equal power. A luminosity function that describes the perceived brightness of the different wavelengths is thus roughly analogous to a spectral sensitivity of M cones. The CIE model capitalizes on this fact by defining Y as luminance. Z is quasi-equal to blue stimulation, or the S cone response and X is a mix (a linear combination) of cone response curves chosen to be nonnegative. The XYZ tristimulus values are thus analogous to, but different to, the LMS cone responses of the human eye. Defining Y as luminance helps in deriving that for any given Y value, the XZ plane contains all possible chromaticities at that luminance. CIE color space is described in a paper by Gernot Hoffmann entitled: “CIE Color Space”, which is incorporated in its entirety for all purposes as if fully set forth herein.
RGB color space. RGB is an abbreviation for Red-Green-Blue. An RGB color space is any additive color space based on the RGB color model. A particular RGB color space is defined by the three chromaticities of the red, green, and blue additive primaries that can produce any chromaticity by a triangle defined by those primary colors. The complete specification of an RGB color space also requires a white point chromaticity and a gamma correction curve. The RGB (Red, Green, and Blue) is a color model that describes what kind of light needs to be emitted to produce a given color by storing individual values for red, green and blue. Further, there are many different RGB color spaces derived from the RGB color model, such as RGBA that is RGB with an additional channel, alpha, to indicate transparency. RGB color spaces are described in an article published by The BabelColor Company by Danny Pascale (Revised 2003 Oct. 6) entitled: “A Review of RGB Color Spaces . . . from xyY to R′GB′”, and in an article by Sabine Susstrunk, Robert Buckley, and Steve Swen from the Laboratory of audio-visual Communication (EPFL) entitled: “Standard RGB Color Spaces”, which are both incorporated in their entirety for all purposes as if fully set forth herein. The RGB color space includes the RGB, sRGB, Adobe RGB, Adobe Wide Gamut RGB, ProPhoto RGB color space, Apple RGB, ISO RGB, ROMM RGB, International Telecommunication Union (ITU) Radiocommunication Sector (ITU-R) Recommendation ITU-R BT.709, and ITU-R BT.202.
Luma plus chroma/chrominance (YUV). Some color spaces are based on separating the component (Y) that represents the luma information, from the components (U+V, or I+Q) that represent the chrominance information. YUV is a color space typically used as part of a color image pipeline, where it encodes a color image or video taking human perception into account, allowing reduced bandwidth for chrominance components, thereby typically enabling transmission errors or compression artifacts to be more efficiently masked by the human perception than using a “direct” RGB-representation. Other color spaces may have similar properties, and the main reason to implement or investigate properties of Y′UV would be for interfacing with analog or digital television or photographic equipment that conforms to certain Y′UV standards.
The Y′UV model defines a color space in terms of one luma (Y′) and two chrominance (UV) components. The Y′UV color model is used in the PAL and SECAM composite color video standards. Previously known black-and-white systems used only luma (Y′) information. Color information (U and V) was added separately via a sub-carrier so that a black-and-white receiver would still be able to receive and display a color picture transmission in the receiver's native black-and-white format. Y′ stands for the luma component (the brightness) and U and V are the chrominance (color) components; luminance is denoted by Y and luma by Y′—the prime symbols (′) denotes gamma compression, with “luminance” meaning perceptual (color science) brightness, while “luma” is electronic (voltage of display) brightness. A YPbPr color model used in analog component video and its digital version YCbCr used in digital video are more or less derived from it, and are sometimes called Y′UV. (CB/PB and CR/PR are deviations from gray on blue-yellow and red-cyan axes, whereas U and V are blue-luminance and red-luminance differences.) The Y′IQ color space used in the analog NTSC television broadcasting system is related to it, although in a more complex way. YCbCr, Y′CbCr, or Y Pb/Cb Pr/Cr, also written as YCBCR or Y′CBCR, is a family of color spaces used as a part of the color image pipeline in video and digital photography systems. Y′ is the luma component and CB and CR are the blue-difference and red-difference chroma components. Y′ (with prime) is distinguished from Y, which is luminance, meaning that light intensity is nonlinearly encoded based on a gamma corrected RGB primaries. Color models based on the YUV color space include YUV (used in PAL), YDbDr (used in SECAM), YIQ (used in NTSC), YCbCr (described in ITU-R BT.601, BT.709, and BT.2020), YPbPr, xvYCC, and YCgCo. The YUV family is further described in an article published in the International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 Vol. 2, Issue 2, March-April 2012, pp. 152-156 by Gnanathej a Rakesh and Sreenivasulu Reddy of the University College of Engineering, Tirupati, entitled: “YCoCg color Image Edge detection”, which is incorporated in its entirety for all purposes as if fully set forth herein.
Hue and Saturation. HSL (Hue-Saturation-Lightness) and HSV (Hue-Saturation-Value) are the two most common cylindrical-coordinate representations of points in an RGB color model, and are commonly used today in color pickers, in image editing software, and in image analysis and computer vision. These two representations rearrange the geometry of RGB in an attempt to be more intuitive and perceptually relevant than the Cartesian (cube) representation, by mapping the values into a cylinder loosely inspired by a traditional color wheel. The angle around the vertical central axis corresponds to “hue” and the distance from the axis corresponds to “saturation”. These first two values give the two schemes the ‘H’ and ‘S’ in their names. The height corresponds to a third value, the system's representation of the perceived luminance in relation to the saturation.
Perceived luminance is a notoriously difficult aspect of color to represent in a digital format, and this has given rise to two systems attempting to solve this issue: HSL (L for lightness) and HSV or HSB (V for value or B for brightness). A third model, HSI (I for intensity), common in computer vision applications, attempts to balance the advantages and disadvantages of the other two systems. While typically consistent, these definitions are not standardized. HSV and HSL color models are described in an article by Darrin Cardanu entitled: “Adventures in HSV Space”, and in an article by Douglas A. Kerr (Issue 3, May 12, 2008) entitled: “The HSV and HSL Color Models and the Infamous Hexcones”, which are both incorporated in their entirety for all purposes as if fully set forth herein.
Grayscale. A grayscale (or greyscale) digital image is an image in which the value of each pixel is a single sample and carries only intensity information. Images of this sort, also known as black-and-white, are composed exclusively of shades of gray, varying from black at the weakest intensity to white at the strongest. Grayscale images are range from one-bit bi-tonal black-and-white images, which in the context of computer imaging are images with only the two colors, black, and white (also called bilevel or binary images), to many shades of gray in between. Grayscale images are often the result of measuring the intensity of light at each pixel in a single band of the electromagnetic spectrum (e.g., infrared, visible light, or ultraviolet), and in such cases they are monochromatic proper when only a given frequency is captured. The intensity of a pixel is expressed within a given range between a minimum and a maximum, inclusive. This range is represented in an abstract way as a range from ‘0’ (total absence, black) and ‘1’ (total presence, white), with any fractional values in between. Although the grayscale can be computed through rational numbers, image pixels are stored in binary, quantized form. Some early grayscale monitors can only show up to sixteen (4-bit) different shades, but today grayscale images (as photographs) intended for visual display (both on screen and printed) are commonly stored with 8 bits per sampled pixel, which allows 256 different intensities (i.e., shades of gray) to be recorded, typically on a non-linear scale. The precision provided by this format is barely sufficient to avoid visible banding artifacts but very convenient for programming since a single pixel occupies a single byte.
Some technical uses often require more levels, to make full use of the sensor accuracy (typically 10 or 12 bits per sample) and to guard against roundoff errors in computations. Sixteen bits per sample (65,536 levels) is a convenient choice for such uses, as computers manage 16-bit words efficiently. The TIFF and the PNG (among other) image file formats support 16-bit grayscale natively, although browsers and many imaging programs tend to ignore the low order 8 bits of each pixel. No matter what pixel depth is used, the binary representations assume that ‘0’ is black and the maximum value (255 at 8 bpp, 65,535 at 16 bpp, etc.) is white, if not otherwise noted. In an 8-bit color palette, each pixel value is represented by 8 bits resulting in a 256-value palette (28=256). This is usually the maximum number of grays in ordinary monochrome systems; each image pixel occupies a single memory byte.
Digital photography is described in an article by Robert Berdan (downloaded from www.canadianphotographer.com) entitled: “Digital Photography Basics for Beginners”, in a guide published on April 2004 by Que Publishing (ISBN—0-7897-3120-7) entitled: “Absolute Beginner's Guide to Digital Photography” authored by Joseph Ciaglia et al., and in a UPDIG Photographic Guidelines (downloaded June 2015) entitled: “Universal Photographic Digital Imaging Guidelines v 4.0”, which are all incorporated in their entirety for all purposes as if fully set forth herein.
A method and an apparatus for rating a captured image based on accessing a database of reference images, each having an associated rating value, and selecting reference images to form a metadata-based subset of reference images, are described in a U.S. Patent Application Publication No. 2012/0213445 to LUU et al., entitled: “Method, Apparatus, and System for Rating Images”, which is incorporated in its entirety for all purposes as if fully set forth herein. A method and an apparatus for disqualifying an unsatisfactory scene as an image acquisition control for a camera by analyzing mouth regions in an acquired image, are described in a U.S. Pat. No. 8,265,348 to Steinberg et al., entitled: “Digital Image Acquisition Control and Correction Method and Apparatus”, which is incorporated in its entirety for all purposes as if fully set forth herein. An apparatus and a method for facilitating analysis of a digital image by using image recognition processing by a server, allowing for suggesting for meta-tagging the image by a user, are described in U.S. Pat. No. 8,558,921 to Walker et al., entitled: “Systems and Methods for Suggesting Meta-Information to a Camera User”, which is incorporated in its entirety for all purposes as if fully set forth herein.
Systems and methods for determining the location where an image was captured using a central system that compares the submitted images to images in an image library to identify matches are described in a U.S. Pat. No. 8,131,118 to Jing et al., entitled: “Inferring Locations from an Image”, which is incorporated in its entirety for all purposes as if fully set forth herein. Further, methods for automatically rating and selecting digital photographs by estimating the importance of each photograph by analyzing its content as well as its metadata, are described in an article by Daniel Kormann, Peter Dunker, and Ronny Paduscheck, all of the Fraunhofer Institute for Digital Media in Ilmenau, Germany, entitled: “Automatic Rating and Selection of Digital Photographs”, which is incorporated in its entirety for all purposes as if fully set forth herein.
Various systems and methods are known for analyzing and for providing feedback to the user regarding the quality of a digital image captured by a digital camera. A processor within a digital camera, which generates and utilizes a recipe data file and communicates with a network-based storage location for uploading and downloading, is described in U.S. Patent Application Publication No. 2013/0050507 to Syed et al., entitled: “Recipe Based Real-Time Assistance for Digital Image Capture and Other Consumer Electronics Devices”, a method and system for determining effective policy profiles that includes client devices configured to initiate a request for at least one effective policy profile, a server mechanism communicatively coupled to the client devices and configured to receive the request, and a policy data storage component configured to store a plurality of policy profiles, are described in U.S. Patent Application Publication No. 2010/0268772 to Romanek et al., entitled: “System and Method for Determining Effective Policy Profiles in a Client-Server Architecture”, methods and apparatuses for analyzing, characterizing and/or rating composition of images and providing instructive feedback or automatic corrective actions are described in U.S. Patent Application Publication No. 2012/0182447 to Gabay entitled: “Methods, Circuits, Devices, Apparatuses and Systems for Providing Image Composition Rules, Analysis and Improvement”, an approach for providing device angle image correction where an image (e.g., still or moving) of a subject is captured via a camera of a mobile device is described in U.S. Patent Application Publication No. 2013/0063538 to Hubner et al., entitled: “Method and Apparatus for Providing Device Angle Image Correction”, an apparatus and an associated method that facilitate capturing an image in an electronic camera with the image being completely focused are described in U.S. Patent Application Publication No. 2012/0086847 to Foster entitled: “Convergence Feedback Indicator, Provided When Taking a Picture in a Camera Application”, a method for providing real-time feedback of an estimated quality of a captured final image including calculating a quality score of a preliminary obtained image is described in U.S. Patent Application Publication No. 2014/0050367 to CHEN et al., entitled: “Smart Document Capture Based on Estimated Scanned-Image Quality”, and methods and systems for determining augmentability information associated with an image frame captured by a digital imaging part of a user device are described in PCT International Application Publication No. WO2013/044983 to Hofmann et al., entitled: “Feedback to User for Indicating Augmentability of an Image”, which are all incorporated in their entirety for all purposes as if fully set forth herein.
Further, a digital image acquisition system that includes a portable apparatus for capturing digital images and a digital processing component for detecting, analyzing, invoking subsequent image captures, and informing the photographer regarding motion blur, and reducing the camera motion blur in an image captured by the apparatus, is described in U.S. Pat. No. 8,244,053 entitled: “Method and Apparatus for Initiating Subsequent Exposures Based on Determination of Motion Blurring Artifacts”, and in U.S. Pat. No. 8,285,067 entitled: “Method Notifying Users Regarding Motion Artifacts Based on Image Analysis”, both to Steinberg et al. which are both incorporated in their entirety for all purposes as if fully set forth herein.
Furthermore, a camera that has the release button, a timer, a memory and a control part, and the timer measures elapsed time after the depressing of the release button is released, used to prevent a shutter release moment to take a good picture from being missed by shortening the time required for focusing when a release button is depressed again, is described in Japanese Patent Application Publication No. JP2008033200 to Hyo Hana entitled: “Camera”, a through image that is read by a face detection processing circuit, and the face of an object is detected, and is detected again by the face detection processing circuit while half pressing a shutter button, used to provide an imaging apparatus capable of photographing a quickly moving child without fail, is described in a Japanese Patent Application Publication No. JP2007208922 to Uchida Akihiro entitled: “Imaging Apparatus”, and a digital camera that executes image evaluation processing for automatically evaluating a photographic image (exposure condition evaluation, contrast evaluation, blur or focus blur evaluation), and used to enable an image photographing apparatus such as a digital camera to automatically correct a photographic image, is described in Japanese Patent Application Publication No. JP2006050494 to Kita Kazunori entitled: “Image Photographing Apparatus”, which are all incorporated in their entirety for all purposes as if fully set forth herein.
Object detection. Object detection (a.k.a. ‘object recognition’) is a process of detecting and finding semantic instances of real-world objects, typically of a certain class (such as humans, buildings, or cars), in digital images and videos. Object detection techniques are described in an article published International Journal of Image Processing (MP), Volume 6, Issue 6—2012, entitled: “Survey of The Problem of Object Detection In Real Images” by Dilip K. Prasad, and in a tutorial by A. Ashbrook and N. A. Thacker entitled: “Tutorial: Algorithms For 2-dimensional Object Recognition” published by the Imaging Science and Biomedical Engineering Division of the University of Manchester, which are both incorporated in their entirety for all purposes as if fully set forth herein. Various object detection techniques are based on pattern recognition, described in the Computer Vision: March 2000 Chapter 4 entitled: “Pattern Recognition Concepts”, and in a book entitled: “Hands-On Pattern Recognition—Challenges in Machine Learning, Volume 1”, published by Microtome Publishing, 2011 (ISBN-13:978-0-9719777-1-6), which are both incorporated in their entirety for all purposes as if fully set forth herein.
Various object detection (or recognition) schemes in general, and face detection techniques in particular, are based on using Haar-like features (Haar wavelets) instead of the usual image intensities. A Haar-like feature considers adjacent rectangular regions at a specific location in a detection window, sums up the pixel intensities in each region, and calculates the difference between these sums. This difference is then used to categorize subsections of an image. Viola-Jones object detection framework, when applied to a face detection using Haar features, is based on the assumption that all human faces share some similar properties, such as the eyes region is darker than the upper cheeks, and the nose bridge region is brighter than the eyes. The Haar-features are used by the Viola-Jones object detection framework, described in articles by Paul Viola and Michael Jones, such as the International Journal of Computer Vision 2004 article entitled: “Robust Real-Time Face Detection” and in the Accepted Conference on Computer Vision and Pattern Recognition 2001 article entitled: “Rapid Object Detection using a Boosted Cascade of Simple Features”, which are both incorporated in their entirety for all purposes as if fully set forth herein.
Edge detection. Edge detection is a name for a set of mathematical methods that aim at identifying points in a digital image at which the image brightness changes sharply, or more formally, has discontinuities. The points at which image brightness changes sharply are typically organized into a set of curved line segments, which are termed ‘edges’. The purpose of detecting sharp changes in image brightness is to capture important events and changes in properties, and it can be shown that under rather general assumptions for an image formation model, discontinuities in image brightness are likely to correspond to discontinuities in depth, discontinuities in surface orientation, changes in material properties, and variations in scene illumination.
Ideally, the result of applying an edge detector to an image may lead to a set of connected curves that indicate the boundaries of objects, the boundaries of surface markings as well as curves that correspond to discontinuities in surface orientation. Thus, applying an edge detection algorithm to an image may significantly reduce the amount of data to be processed and may therefore filter out information that may be regarded as less relevant, while preserving the important structural properties of the image. If the edge detection step is successful, the subsequent task of interpreting the information contents in the original image may, therefore, be substantially simplified.
A typical edge might be the border between a block of red color and a block of yellow color. In contrast, a line (as can be extracted by a ridge detector) may be a small number of pixels of a different color on an otherwise unchanging background. For a line, there may therefore usually be one edge on each side of the line. There are many methods for edge detection, but most of them can be grouped into two major categories, a search-based and a zero-crossing based. The search-based methods detect edges by first computing a measure of edge strength, usually a first-order derivative expression such as the gradient magnitude, and then searching for local directional maxima of the gradient magnitude using a computed estimate of the local orientation of the edge, usually the gradient direction. The zero-crossing based methods search for zero crossings in a second-order derivative expression computed from the image to find edges, usually the zero-crossings of a Laplacian or the zero-crossings of a non-linear differential expression. As a pre-processing step to edge detection, a smoothing stage, typically Gaussian smoothing, is almost always applied (see also noise reduction). The general criteria for edge detection includes detection of edge with low error rate, which means that the detection should accurately catch as many edges shown in the image as possible, the edge point detected by the operator should accurately localize on the center of the edge, and a given edge in the image should only be marked once, and where possible, image noise should not create false edges.
Various edge detection techniques are described in a paper by Djemel Ziou (of Universite de Sherbrooke, Quebec, Canada) and Salvatore Tabbone (of Crin-Cnrs/Inria Lorraine, Nancy, France) (downloaded July 2015) entitled: “Edge Detection Techniques—An Overview”, in an International Journal of Computer Science Issues (IJCSI), Vol. 9 Issue 5, No. 1, September 2012 [ISSN (online): 1694-0814] by G. T. Shrivakshan (of Bharathiar University, Tamilnadu, India) and Dr. C. Chandrasekar (of Periyar University Salem, Tamilnadu, India) entitled: “A Comparison of various Edge Detection Techniques used in Image Processing”, in a technical report CES-506 by the University of Essex (dated 29 Feb. 2010) ISSN 1744-8050 entitled: “A Survey on Edge Detection Methods”, in a paper by Applied Methematical Sciences, Vol. 2, 2008, no. 31, 1507-1520 by Ehsan Nadernejad, Sara Sharifzadeh, and Hamid Hassanpour entitled: “Edge Detection Techniques: Evaluations and Comparisons”, and in a paper by Tzu-Heng Henry Lee (of National Taiwan University, Taipei, Taiwan, ROC), downloaded July 2015 entitled: “Edge Detection Analysis”, which are all incorporated in their entirety for all purposes as if fully set forth herein.
Various existing tools may be used for edge detection such as the Apple Inc. Quartz™ 2D drawing engine (available from Apple Inc.) and described in Apple Inc. Developer guide (dated 2014 Sep. 17) entitled: “Quartz 2D Programming Guide”, which is incorporated in its entirety for all purposes as if fully set forth herein.
Canny edge detection. Canny edge detector is an edge detection operator that uses a multi-stage algorithm to detect a wide range of edges in images, and may be used to filter spurious edges. A process of Canny edge detection algorithm can be broken down into 5 different steps, (1) Apply Gaussian filter to smooth the image in order to remove the noise, (2) Find the intensity gradients of the image, (3) Apply non-maximum suppression to get rid of spurious response to edge detection, (4) Apply double threshold to determine potential edges, and (5) Track edge by hysteresis, followed by finalizing the detection of edges by suppressing all the other edges that are weak and not connected to strong edges. Canny edge detection (and any variants thereof) is described in an IEEE Transaction on Pattern Analysis and Machine Intelligence, Vol. PAMI-8, No. 6, November 1986 paper (0162-8828/86/1100-0679$01.00) by John Canny entitled: “A Computational Approach to Edge Detection”, in a tutorial 09gr820 (dated Mar. 23, 2009) entitled: “Canny Edge Detection”, and in an International Journal of Computer Vision 53(3), 225-243, 2003 paper authored by R. Kimmel and A. M. Bruckstein (of the Technion, Haifa, Israel) entitled: “Regularized Laplacian Zero Crossings as Optimal Edge Integrators”, which are all incorporated in their entirety for all purposes as if fully set forth herein.
Differential edge detection. A differential edge detection is a second-order edge detection approach that automatically detects edges with sub-pixel accuracy by using the differential approach of detecting zero-crossings of the second-order directional derivative in the gradient direction.
Prewitt operator. The Prewitt operator is a discrete differentiation operator, for computing an approximation of the gradient of an image intensity function. At each point in the image, the result of the Prewitt operator is either the corresponding gradient vector or the norm of this vector. The Prewitt operator is based on convolving the image with a small, separable, and integer valued filter in horizontal and vertical directions and is therefore relatively inexpensive in terms of computations. On the other hand, the gradient approximation produced is relatively crude, in particular for high-frequency variations in the image. In simple terms, the operator calculates the gradient of the image intensity at each point, giving the direction of the largest possible increase from light to dark and the rate of change in that direction. The result therefore shows how “abruptly” or “smoothly” the image changes at that point, and therefore how likely it is that a part of the image represents an edge, as well as how that edge is likely to be oriented. In practice, the magnitude (effectively the likelihood of an edge) calculation is more reliable and easier to interpret than the direction calculation.
Mathematically, the gradient of a two-variable function (here the image intensity function) is at each image point a 2D vector with the components given by the derivatives in the horizontal and vertical directions, and the operator uses two 3×3 kernels that are convolved with the original image to calculate approximations of the derivatives—one for horizontal changes, and one for vertical. At each image point, the gradient vector points in the direction of the largest possible intensity increase, and the length of the gradient vector corresponds to the rate of change in that direction. This implies that the result of the Prewitt operator at an image point which is in a region of constant image intensity is a zero vector and at a point on an edge is a vector which points across the edge, from darker to brighter values. The Prewitt operator is described in a paper by Judith M. S. Prewitt (of University of Pennsylvania, Philadelphia, Pa., U.S.A.), entitled: “Object Enhancement and Extraction”, which is incorporated in its entirety for all purposes as if fully set forth herein.
Sobel operator. Sobel operator (also referred to as Sobel-Feldman operator), sometimes called Sobel Filter, is used in image processing and computer vision, particularly within edge detection algorithms, to create an image that emphasizes edges and transitions. Technically, it is a discrete differentiation operator, computing an approximation of the gradient of the image intensity function. At each point in the image, the result of the Sobel operator is either the corresponding gradient vector or the norm of this vector. The Sobel operator is based on convolving the image with a small, separable, and integer valued filter in horizontal and vertical direction and is therefore relatively inexpensive in terms of computations. On the other hand, the gradient approximation that it produces is relatively crude, in particular for high-frequency variations in the image. Since the intensity function of a digital image is only known at discrete points, derivatives of this function cannot be defined unless we assume that there is an underlying continuous intensity function that has been sampled at the image points. With some additional assumptions, the derivative of the continuous intensity function may be computed as a function of the sampled intensity function, i.e. the digital image. It turns out that the derivatives of the continuous intensity function at any particular point are functions of the intensity values at virtually all image points. However, approximations of these derivative functions may be defined at lesser or larger degrees of accuracy.
The Sobel operator represents a rather inaccurate approximation of the image gradient but is still of sufficient quality to be of practical use in many applications. More precisely, it uses intensity values only in a 3×3 region around each image point to approximate the corresponding image gradient, and it uses only integer values for the coefficients that weight the image intensities to produce the gradient approximation. The Sobel operator (and variants thereof) is described in a paper by Irwin Sobel (Updated Jun. 14, 2015), entitled: “History and Definition of the so-called “Sobel Operator” more appropriately named the Sobel-Feldman Operator”, in an article by Guennadi (Henry) Levkine (of Vancouver, Canada) Second Draft, June 2012 entitled: “Prewitt, Sobel, and Scharr gradient 5×5 convolution Matrices”, and in an article in Proceedings of Informing Science & IT Education Conference (InSITE) 2009 by O. R. Voncent and O. Folorunso (both of University of Agriculture, Abeokuta, Nigeria), entitled: “A Descriptive Algorithm for Sobel Image Edge Detection”, which are all incorporated in their entirety for all purposes as if fully set forth herein.
Deriche edge detector. Deriche edge detector (often referred to as Canny-Deriche detector) is an edge detection operator that includes a multistep algorithm to obtain an optimal result of edge detection in a discrete two-dimensional image, targeting the following criteria for optimal edge detection: Detection quality—all existing edges should be marked and no false detection should occur, Accuracy—the marked edges should be as close to the edges in the real image as possible, and Unambiguity—a given edge in the image should only be marked once, where no multiple responses to a single edge in the real image should occur. This differential edge detector can be seen as a reformulation of Canny's method from the viewpoint of differential invariants computed from a scale space representation leading to a number of advantages in terms of both theoretical analysis and sub-pixel implementation. The Deriche edge detector is described in an article by Rachid Deriche (of INRIA, Le Chesnay, France) published in International Journal of Computer Vision, 167-187 (1987), entitled: “Using Canny's criteria to Derive a Recursively Implemented Optimal Edge Detector”, and in a presentation by Diane Lingrand (of University of Nice, Sophia Antipolis, France) dated August 2006, entitled: “Segmentation”, which are both incorporated in their entirety for all purposes as if fully set forth herein.
RANSAC. RANdom SAmple Consensus (RANSAC) is an iterative method to estimate parameters of a mathematical model from a set of observed data which contains outliers. It is a non-deterministic algorithm in the sense that it produces a reasonable result only with a certain probability, with this probability increasing as more iterations are allowed. A basic assumption is that the data consists of “inliers”—data whose distribution can be explained by some set of model parameters, though may be subject to noise, and “outliers”—data that do not fit the model. The outliers may come from extreme values of the noise, from erroneous measurements, or from incorrect hypotheses about the interpretation of data. RANSAC also assumes that, given a (usually small) set of inliers, there exists a procedure that can estimate the parameters of a model that optimally explains or fits this data.
The RANSAC algorithm is a learning technique to estimate parameters of a model by random sampling of observed data. Given a dataset whose data elements contain both inliers and outliers, RANSAC uses a voting scheme to find the optimal fitting result. Data elements in the dataset are used to vote for one or multiple models. The implementation of this voting scheme is based on two assumptions: that the noisy features is not voted consistently for any single model (few outliers) and that there are enough features to agreeing on a good model (few missing data). The RANSAC algorithm is essentially composed of two steps that are iteratively repeated: In the first step, a sample subset containing minimal data items is randomly selected from the input dataset. A fitting model and the corresponding model parameters are computed using only the elements of this sample subset. The cardinality of the sample subset is the smallest sufficient to determine the model parameters. In the second step, the algorithm checks which elements of the entire dataset are consistent with the model instantiated by the estimated model parameters obtained from the first step. A data element will be considered as an outlier if it does not fit the fitting model instantiated by the set of estimated model parameters within some error threshold that defines the maximum deviation attributable to the effect of noise. The set of inliers obtained for the fitting model is called consensus set. The RANSAC algorithm iteratively repeats the above two steps until the obtained consensus set in certain iteration has enough inliers.
RANSAC is described in SRI International (Menlo Park, Calif., U.S.A.) Technical Note 213 (March 1980) by Martin A. Fischler and Robert C. Bolles entitled: Random Sample Consensus: “A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography”, and in an article by Anders Hast, Johan Nysjo (both of Uppsala University, Uppsala, Sweden) and Andrea Marchetti (of IIT, CNR, Pisa, Italy) entitled: “Optimal RANSAC—Towards a Repeatable Algorithm for Finding the Optimal Set”, which are both incorporated in their entirety for all purposes as if fully set forth herein. Using RANSAC for edge detection is described in U.S. Patent Application Publication No. 2011/0188708 to AHN et al. entitled: “Three Dimensional Edge Extraction Method, Apparatus and Computer-Readable Medium Using Time of Flight Camera”, in U.S. Pat. No. 8,121,431 to Hwang et al. entitled: “Method and Apparatus for Detecting Edge of Image and Computer Readable Medium Processing Method”, in U.S. Pat. No. 8,224,051 to Chen et al. entitled: “Method for Detection of Linear Structures and Microcalcifications in Mammographic Images”, and in U.S. Pat. No. 8,265,393 to Tribelhorn et al. entitled: “Photo-Document Segmentation Method and System”, which are all incorporated in their entirety for all purposes as if fully set forth herein.
Line segment detection. Straight-line detection techniques are described in an article in J Math Imaging Vis (DOI 10.1007/s10851-008-0102-5) by Rafael Grompone von Gioi et al. (published by Springer Science+Business Media, LLC 2008) entitled: “On Straight Line Segment Detection”, and in a Norwegian University of Science and Technology (NTNU) Master work submitted June 2010 by Kari Haugsdal entitled: “Edge and line detection of complicated and blurred objects”, which are both incorporated in their entirety for all purposes as if fully set forth herein.
LSD is a common linear-time Line Segment Detector providing subpixel accurate results, designed to work on any digital image without parameter tuning. It controls its own number of false detections, and on average, one false alarm is allowed per image. The process starts by computing a level-line angle at each pixel to produce a level-line field, i.e., a unit vector field such that all vectors are tangent to the level line going through their base point. Then, this field is segmented into connected regions of pixels that share the same level-line angle up to a certain tolerance. Various Line Segment Detectors (LSD) are described in an article published in Image Processing On Line (IPOL) 2012 Mar. 24 (ISSN 2105-1232) by Rafael Grompone von Gioi et al. entitled: “LSD: a Line Segment Detector”, in an article in International Conference on Remote Sensing, Environment and Transportation Engineering (RSETE 2013) by TAN Xi, ZHAO Lingjun, and SU Yi (of NUDT, Changsha, China), entitled: “Linear Feature Extraction from SAR Images based on the modified LSD Algorithm”, in a paper dated September 2011 by Rafael Grompone von Gioi et al. entitled: “LSD: a Line Segment Detector”, and in an article by Xiaohu Lu, Jian Yao, Kai Li, and Li Li (of Wuhan University, P.R. China), entitled: “Cannylines: A Parameter-Free Line Segment Detector”, which are all incorporated in their entirety for all purposes as if fully set forth herein.
Hough transform. The Hough transform is a feature extraction technique used in image analysis, computer vision, and digital image processing. The purpose of this technique is to find imperfect instances of objects within a certain class of shapes by a voting procedure that is carried out in a parameter space, from which object candidates are obtained as local maxima in a so-called accumulator space that is explicitly constructed by an algorithm for computing the Hough transform. The classical Hough transform is concerned with the identification of lines in the image, but may be used for identifying positions of arbitrary shapes, most commonly circles or ellipses. Hough transform is described in an article in Computer Vision, Graphics, and Image Processing 44, 87-116 (1988) [0734-189X/88] by J. Illingworth and J. Kittler entitled: “A Survey of the Hough Transform”, and in an article by Allam Shehata Hassanein et al. (of Electronic Research Institute, El-Dokki, Giza, Egypt) entitled: “A Survey on Hough Transform, Theory, Techniques and Applications”, which are both incorporated in their entirety for all purposes as if fully set forth herein.
Detecting lines by using the Hough Transformation is described in Graphics and Image Processing (Association for Computing Machinery, 1972) by Richard O. Duda and Peter E. Hart (of Stanford Research Institute, Menlo Park, Calif., U.S.A.), entitled: “Use of the Hough Transformation To Detect Lines and Curves in Pictures”, in Chapter 2 of a book “Real-Time detection of Lines and grids” by Herout, A., Dubska, M, and Havel, J., (ISBN: 978-1-4471-4413-7), entitled: “Chapter 2—Review of Hough Transform for Line Detection”, which are both incorporated in their entirety for all purposes as if fully set forth herein.
Corner detection. A corner is defined herein as the intersection of two edges, or as a point for which there are two dominant and different edge directions in a local neighborhood of the point. Techniques for corner detection are described in a paper in 2010 10th International Conference on Computer and Information Technology (CIT 2010) by Andres Solis Montero, Milos Stojmenovic, and Amiya Nayak (of the University of Ottawa, Ottawa, Canada) [978-0-7695-4108-2/10, DOI 10.1109/CIT.2010.109] entitled: “Robust Detection of Corners and Corner-line links in images”, in a paper by Chris Harris and Mike Stephens of The Plessey Company plc. 1988 [AVC 1988 doi:10.5244/C.2.23] entitled: “A Combined Corner and Edge Detector”, and in April 1980 paper by Les Kitchen and Azriel Rosenfeld (of University of Maryland, College Park, Md., U.S.A.) [DARPA TR-887, DAAG-53-76C-0138] entitled: “Gray-Level Corner Detection”, which are all incorporated in their entirety for all purposes as if fully set forth herein.
Other corner detection techniques are described in U.S. Pat. No. 4,242,734 to Deal entitled: “Image Corner Detector Using Haar Coefficients”, in U.S. Pat. No. 5,311,305 to Mahadevan et al. entitled: “Technique for Edge/Corner Detection/Tracking in Image Frames”, in U.S. Pat. No. 6,124,896 to Kurashige entitled: “Corner Detection Device and Corner Detection Method”, in U.S. Pat. No. 8,873,865 to Sung entitled: “Algorithm for Fast Corner Detection”, and in U.S. Patent Application Publication No. 2013/0135689 to Shacham et al. entitled: “Automatic detection of Corners of a Scanned Document”, which are all incorporated in their entirety for all purposes as if fully set forth herein.
Neural networks. Neural Networks (or Artificial Neural Networks (ANNs)) are a family of statistical learning models inspired by biological neural networks (the central nervous systems of animals, in particular the brain) and are used to estimate or approximate functions that may depend on a large number of inputs and are generally unknown. Artificial neural networks are generally presented as systems of interconnected “neurons” which send messages to each other. The connections have numeric weights that can be tuned based on experience, making neural nets adaptive to inputs and capable of learning. For example, a neural network for handwriting recognition is defined by a set of input neurons that may be activated by the pixels of an input image. After being weighted and transformed by a function (determined by the network designer), the activations of these neurons are then passed on to other neurons, and this process is repeated until finally, an output neuron is activated, and determines which character was read. Like other machine learning methods—systems that learn from data—neural networks have been used to solve a wide variety of tasks that are hard to solve using ordinary rule-based programming, including computer vision and speech recognition. A class of statistical models is typically referred to as “Neural” if it contains sets of adaptive weights, i.e. numerical parameters that are tuned by a learning algorithm, and capability of approximating non-linear functions from their inputs. The adaptive weights can be thought of as connection strengths between neurons, which are activated during training and prediction. Neural Networks are described in a book by David Kriesel entitled: “A Brief Introduction to Neural Networks” (ZETA2-EN) [downloaded May 2015 from www.dkriesel.com], which is incorporated in its entirety for all purposes as if fully set forth herein.
Neural networks based techniques may be used for image processing, as described in an article in Engineering Letters, 20:1, EL_20_1_09 (Advance online publication: 27 Feb. 2012) by Juan A. Ramirez-Quintana, Mario I. Cacon-Murguia, and F. Chacon-Hinojos entitled: “Artificial Neural Image Processing Applications: A Survey”, in an article published 2002 by Pattern Recognition Society in Pattern Recognition 35 (2002) 2279-2301 [PII: S0031-3203(01)00178-9] authored by M. Egmont-Petersen, D. de Ridder, and H. Handels entitled: “Image processing with neural networks—a review”, and in an article by Dick de Ridder et al. (of the Utrecht University, Utrecht, The Netherlands) entitled: “Nonlinear image processing using artificial neural networks”, which are all incorporated in their entirety for all purposes as if fully set forth herein.
Neural networks may be used for object detection as described in an article by Christian Szegedy, Alexander Toshev, and Dumitru Erhan (of Google, Inc.) (downloaded July 2015) entitled: “Deep Neural Networks for Object Detection”, in a CVPR2014 paper provided by the Computer Vision Foundation by Dumitru Erhan, Christian Szegedy, Alexander Toshev, and Dragomir Anguelov (of Google, Inc., Mountain-View, Calif., U.S.A.) (downloaded July 2015) entitled: “Scalable Object Detection using Deep Neural Networks”, and in an article by Shawn McCann and Jim Reesman (both of Stanford University) (downloaded July 2015) entitled: “Object Detection using Convolutional Neural Networks”, which are all incorporated in their entirety for all purposes as if fully set forth herein.
Using neural networks for object recognition or classification is described in an article (downloaded July 2015) by Mehdi Ebady Manaa, Nawfal Turki Obies, and Dr. Tawfiq A. Al-Assadi (of Department of Computer Science, Babylon University), entitled: “Object Classification using neural networks with Gray-level Co-occurrence Matrices (GLCM)”, in a technical report No. IDSIA-01-11 January 2001 published by IDSIA/USI-SUPSI and authored by Dan C. Ciresan et al. entitled: “High-Performance Neural Networks for Visual Object Classification”, in an article by Yuhua Zheng et al. (downloaded July 2015) entitled: “Object Recognition using Neural Networks with Bottom-Up and top-Down Pathways”, and in an article (downloaded July 2015) by Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman (all of Visual Geometry Group, University of Oxford), entitled: “Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps”, which are all incorporated in their entirety for all purposes as if fully set forth herein.
Using neural networks for object recognition or classification is further described in U.S. Pat. No. 6,018,728 to Spence et al. entitled: “Method and Apparatus for Training a Neural Network to Learn Hierarchical Representations of Objects and to Detect and Classify Objects with Uncertain Training Data”, in U.S. Pat. No. 6,038,337 to Lawrence et al. entitled: “Method and Apparatus for Object Recognition”, in U.S. Pat. No. 8,345,984 to Ji et al. entitled: “3D Convolutional Neural Networks for Automatic Human Action Recognition”, and in U.S. Pat. No. 8,705,849 to Prokhorov entitled: “Method and System for Object Recognition Based on a Trainable Dynamic System”, which are all incorporated in their entirety for all purposes as if fully set forth herein.
Saliency. Salience (also called saliency) of an item—be it an object, a person, a pixel, etc.—is a state or a quality by which it stands out relative to its neighbors. Saliency detection is considered to be a key attentional mechanism that facilitates learning and survival by enabling organisms to focus their limited perceptual and cognitive resources on the most pertinent subset of the available sensory data. Saliency typically arises from contrasts between items and their neighborhood, such as a red dot surrounded by white dots, a flickering message indicator of an answering machine, or a loud noise in an otherwise quiet environment. Saliency detection is often studied in the context of the visual system, but similar mechanisms operate in other sensory systems. What is salient can be influenced by training: for example, for human subjects particular letters can become salient by training.
When attention deployment is driven by salient stimuli, it is considered to be bottom-up, memory-free, and reactive. Attention can also be guided by top-down, memory-dependent, or anticipatory mechanisms, such as when looking ahead of moving objects or sideways before crossing streets. Humans and other animals have difficulty paying attention to more than one item simultaneously, so they are faced with the challenge of continuously integrating and prioritizing different bottom-up and top-down influences.
Saliency map. ‘Saliency Map’ is a topographically arranged map that represents visual saliency of a corresponding visual scene. Saliency maps, as well as techniques for creating and using saliency and saliency maps, are described in an article by Tiike Judd, Frado Durand, and Antonio Torralba (2012) entitled: “Supplemental Material for A Benchmark of Computational Models of Saliency to Predict Human Fixations”, in an ICVS article (pages 66-75. Springer, 2008. 410, 412, 414) by R. Achanta, F. Estrada, P. Wils, and S. Susstrunk (of I&C EPFL) entitled: “Salient Region Detection and Segmentation”, in an CVPR article (pages 1597-1604, 2009. 409, 410, 412, 413, 414, 415) by R. Achanta, S. Hemami, F. Estrada, and S. Susstrunk entitled: “Frequency-tuned Salient Region Detection”, in an IEEE article (TPAMI, 20(11):1254-1259, 1998. 409, 410, 412, 414) by L. Itti, C. Koch, and E. Niebur entitled: “A Model of Saliency based Visual Attention for Rapid Scene Analysis”, in an CVPR article (2010. 410, 412, 413, 414, 415) by S. Goferman, L. Zelnik-Manor, and A. Tal (all of the Technion, Haifa, Israel) entitled: “Context-Aware Saliency Detection”, and in an CVPR (2011) article by M M Cheng, G X Zhang, N. J. Mitra, X. Huang, S. M. Hu entitled: “Global Contrast based Salient Region Detection”, which are all incorporated in their entirety for all purposes as if fully set forth herein.
Techniques for generating saliency maps, and for using such maps for image analysis or manipulation are described in U.S. Patent Application Publication No. 2013/0156320 to Fredembach entitled: “Method, Apparatus and System for Determining a Saliency Map for an Input Image”, in U.S. Pat. No. 8,437,543 to Chamaret et al. entitled: “Method and Device of Determining a Saliency Map for an Image”, in U.S. Pat. No. 8,649,606 to Zhao et al. entitled: “Method and Systems for Generating Saliency Models Through Linear and/or Nonlinear Integration”, in U.S. Pat. No. 8,660,351 to Tang entitled: “Auto-Cropping Using Salience Maps”, in U.S. Pat. No. 8,675,966 to Tang entitled: “System and Method for Saliency Map Generation”, in PCT International Publication No. WO 2008/043204 to GU et al. entitled: “Device and Method for Generating a Saliency Map of a Picture”, in European Patent Application No. EP 2034439 to Zhu et al. entitled: “Method for Establishing the Saliency Map of an Image”, and in European Patent Application No. EP 2731074 to Chevet entitled: “Method for Reframing an Image Based on a Saliency Map”, which are all incorporated in their entirety for all purposes as if fully set forth herein.
Scaling. An image scaling is a process of resizing a digital image. Scaling is a non-trivial process that involves a trade-off between efficiency, smoothness, and sharpness. In bitmap graphics, as the size of an image is reduced or enlarged, the pixels that form the image become increasingly visible, making the image appear “soft” if pixels are averaged, or jagged if not. With vector graphics, the trade-off may be in processing power for re-rendering the image, which may be noticeable as slow re-rendering with still graphics, or slower frame rate and frame skipping in computer animation. Apart from fitting a smaller display area, image size is most commonly decreased (or subsampled or downsampled) to produce thumbnails. Enlarging an image (upsampling or interpolating) is generally used for making smaller imagery fit a bigger screen in full-screen mode, for example. However, there are several methods of increasing the number of pixels that an image contains, which evens out the appearance of the original pixels. Typically scaling of an image, such as enlarging or reducing the image, involves manipulation of one or more pixels of the original image into one or more pixels in the target image. In many applications, image scaling is required to be executed in real-time, requiring high processing power. Scaling or resizing of an image is typically measured as the ratio (in percentage, for example) of the number of pixels of the resulting image relative to the number of pixels in the original image. In general, sizing and resizing of an image makes it particularly suitable for viewing, transmission, downloading, sharing, editing, and further processing.
Image interpolation techniques (a.k.a. image resizing, image resampling, digital zooming, image magnification or enhancement) use an image interpolation algorithm to convert an image from one resolution (dimension) to another resolution without loosing the visual content in the picture. An image may be interpolated from a higher resolution to a lower resolution (referred to traditionally as image down-scaling or down-sampling), or may be interpolated from a lower resolution to a higher resolution (referred to as image up-scaling or up-sampling). Most of the image interpolation techniques in the literature have been developed by interpolating the pixels based on the characteristics of local features such as edge information or nearest neighbor criteria. Image interpolation techniques can be broadly categorized into two categories—adaptive and non-adaptive techniques. The principles of adaptive interpolation algorithms rely on the intrinsic image features or contents of the image and accordingly the computational logic is mostly dependent upon the intrinsic image features and contents of the input image. The non-adaptive algorithms do not rely on the image features or its contents, and the same computational logic is repeated for every pixel or group of local pixels irrespective of the image contents. A scaling may use, or be based on, the algorithms and techniques disclosed in the book entitled: “Handbook of Image & Video Processing”, edited by Al Bovik, published by Academic Press, ISBN: 0-12-119790-5, and in the book published by Wiley-Interscience, ISBN13 978-0-471-71998-4 (2005) by Tinku Acharya and Ajoy K. Ray entitled: “Image Processing—Principles and Applications”, which are both incorporated in their entirety for all purposes as if fully set forth herein. Further, various scaling techniques are described in an ACM Ubiquity Vol. 8, 2007 article by Tinku Acharya and Ping-Sing Tsai entitled: “Computational Foundations of Image Interpolation Algorithms”, in an International Journal of Application or Innovation in Engineering & Management (IJAIEM) Vol. 2, Issue 5, May 2013 article by Sudhir Sharma and Robin Walia (of Maharishi Markandeshwar University, Mullana, India), entitled: “Zooming Digital Images using Modal Interpolation”, and in Digital Light & Color (2001) publication by Jonathan Sachs entitled: “Image Resampling”, which are all incorporated in their entirety for all purposes as if fully set forth herein.
Non-Adaptive Algorithms. In non-adaptive image interpolation algorithms, certain computations are performed indiscriminately to the whole image for interpolation regardless of its contents. Common non-adaptive image interpolation algorithms include nearest-neighbor replacement, bilinear interpolation, bicubic interpolation, and some widely used digital filtering based approaches.
Nearest Neighbor Replacement. The simplest interpolation method is just to replace the interpolated point with the nearest neighboring pixel, providing the advantage of simplicity and low computation. However, the resultant pixelization or blocky effect makes the image quality unacceptable for highest quality imaging applications.
Bilinear Interpolation. Bilinear interpolation can be considered as a weighted average of four neighboring pixel values, and is an extension of linear interpolation for interpolating functions of two variables (e.g., x and y) on a rectilinear 2D grid. It is based on performing linear interpolation first in one direction, and then again in the other direction. Although each step is linear in both the sampled values and the position, the interpolation as a whole is not linear but rather quadratic in the sample location. When an image needs to be scaled up, each pixel of the original image needs to be moved in a certain direction based on a scale constant. However, when scaling up an image by a non-integral scale factor, there are pixels (i.e., holes) that are not assigned appropriate pixel values. In this case, those holes should be assigned with appropriate RGB or grayscale values so that the output image does not have non-valued pixels.
Bilinear interpolation can be used where perfect image transformation with pixel matching is impossible so that one can calculate and assign appropriate intensity values to pixels. Unlike other interpolation techniques such as nearest neighbor interpolation and bicubic interpolation, bilinear interpolation uses only the 4 nearest pixel values which are located in diagonal directions from a given pixel in order to find the appropriate color intensity values of that pixel. Bilinear interpolation considers the closest 2×2 neighborhood of known pixel values surrounding the unknown pixel's computed location. It then takes a weighted average of these 4 pixels to arrive at its final, interpolated value. The weight on each of the 4-pixel values is based on the computed pixel distance (in 2D space) from each of the known points.
Bicubic Interpolation. Bicubic interpolation is an extension of cubic interpolation for interpolating data points on a regular two-dimensional grid, and the surface interpolated by this technique is smoother than corresponding surfaces obtained by bilinear interpolation or nearest-neighbor interpolation. Bicubic interpolation can be accomplished using either Lagrange polynomials, cubic splines, or cubic convolution algorithm. In contrast to bilinear interpolation, which only takes 4 pixels (2×2) into account, bicubic interpolation considers 16 pixels (4×4). Images resampled with bicubic interpolation are smoother and have fewer interpolation artifacts. The general form of a bicubic interpolation is to calculate the gradients (the first derivatives) in both the x and y directions and the cross derivative at each of the four corners of the square, resulting in 16 equations that determine the 16 coefficients.
Spline interpolation. Spline interpolation is a form of interpolation where the interpolant is a special type of piecewise polynomial called a spline. Spline interpolation is often preferred over polynomial interpolation because the interpolation error can be made small even when using low degree polynomials for the spline. Spline interpolation avoids the problem of Runge phenomenon, in which oscillation can occur between points when interpolating using high degree polynomials.
Filtering-based Techniques. Filtering-based methods (also known as re-sampling methods) suggest a process of transforming discrete image pixels defined at one coordinate system to a new coordinate system of a different resolution. Frequently, the re-sampling technique is used to up-sample an image to enhance its resolution and appearance.
Lanczos interpolation. Lanczos resampling and Lanczos filtering are two applications of a mathematical formula. It can be used as a low-pass filter or used to smoothly interpolate the value of a digital signal between its samples, so that each sample of the given signal maps to a translated and scaled copy of the Lanczos kernel, which is a sinc function windowed by the central lobe of a second, longer, sinc function. The sum of these translated and scaled kernels is then evaluated at the desired points. Lanczos resampling is typically used to increase the sampling rate of a digital signal, or to shift it by a fraction of the sampling interval. The theoretically optimal reconstruction filter for band-limited signals is the sinc filter, which has infinite support. The Lanczos filter is one of many practical (finitely supported) approximations of the sinc filter. Each interpolated value is the weighted sum of 2a consecutive input samples, thus by varying the 2a parameter one may trade computation speed for improved frequency response. The parameter also allows one to choose between a smoother interpolation or a preservation of sharp transients in the data. For image processing, the trade-off is between the reduction of aliasing artifacts and the preservation of sharp edges. Also as with any such processing, there are no results for the borders of the image. Increasing the length of the kernel increases the cropping of the edges of the image. The Lanczos filtering and other filters are described in an article published by Graphics Gems I [Academic Press, pp. 147-165. ISBN 978-0-12-286165-9] authored by Ken Turkowski and Steve Gabriel (April 1990) entitled: “Filters for Common Resampling Tasks”, which is incorporated in its entirety for all purposes as if fully set forth herein.
Adaptive Algorithms. Adaptive image interpolation algorithms typically exploit the intrinsic image features such as hue or edge information.
Downscaling techniques are further described in U.S. Patent Application Publication No. 2008/0260291 to Alakarhu et al. entitled: “Image Downscaling by Binning”, in U.S. Patent Application Publication No. 2009/0016644 to Kalevo et al. entitled: “Method and Apparatus for Downscaling a Digital Matrix Image”, and in U.S. Pat. No. 6,205,245 to Yuan et al. entitled: “Method and Apparatus for Rapid Down-Scaling of Color Images Directly from Sensor Color Filter Array Space”, which are all incorporated in their entirety for all purposes as if fully set forth herein.
Photographic paper. Photographic paper is a paper coated with a light-sensitive chemical formula, used for making photographic prints, such that when the photographic paper is exposed to light it captures a latent image that is then developed to form a visible image. The light-sensitive layer of the paper is called the emulsion, and while is commonly based on silver salts, other alternatives are used as well. The print image is traditionally produced by interposing a photographic negative between the light source and the paper, either by direct contact with a large negative (forming a contact print) or by projecting the shadow of the negative onto the paper (producing an enlargement). The initial light exposure is carefully controlled to produce a gray scale image on the paper with appropriate contrast and gradation.
Photographic papers typically fall into the categories of papers that were or are used for negative-positive processes (includes all current black-and-white papers and chromogenic colour papers), papers that were or are used for positive-positive processes in which the “film” is the same as the final image (e.g., the Polaroid process), and papers that were or are used for positive-positive film-to-paper processes, where a positive image is enlarged and copied onto a photographic paper. Typically photographic papers consist of a light-sensitive emulsion, consisting of silver halide salts suspended in a colloidal material—usually gelatin-coated onto a paper, resin coated paper or polyester support. In black-and-white papers, the emulsion is normally sensitized to blue and green light but is insensitive to wavelengths longer than 600 nm in order to facilitate handling under red or orange safelighting. In Chromogenic color papers, the emulsion layers are sensitive to red, green and blue light, respectively producing cyan, magenta, and yellow dye during processing.
Modern black-and-white papers are coated on a small range of bases; baryta-coated paper, resin-coated paper, or polyester, while most color photographic materials available today are coated on either RC (resin coated) paper or on solid polyester. The photographic emulsion used for color photographic materials consists of three-color emulsion layers (cyan, yellow, and magenta) along with other supporting layers. The color layers are sensitized to their corresponding colors. Although it is commonly believed that the layers in negative papers are shielded against the intrusion of light of a wavelength different than the actual layer by color filters which dissolve during processing, this is not so. The color layers in negative papers are actually produced to have speeds that increase from cyan (red sensitive) to magenta (green sensitive) to yellow (blue sensitive), and thus when filtered during printing, the blue light is “normalized” so that there is no crosstalk. Therefore, the yellow (blue sensitive) layer is nearly ISO 100 while the cyan (red) layer is about ISO 25. After adding enough yellow filtration to make it neutral, the blue sensitivity of the slow cyan layer is lost.
Photograph. A photograph or photo as used herein includes any black-and-white or color print of an image created by photographic printing that is derived from a light falling on a light-sensitive surface, usually photographic film or an electronic medium such as a CCD or a CMOS chip. Most photographs are created using a camera, which uses a lens to focus the scene's visible wavelengths of light into a reproduction that can be seen by a human. The photo may consist of any photographic material, such as a photographic paper, film, plates, diazo print, or transparency. Photo sizes may be standardized and categorized as small photo prints and large photo prints, where the small photo prints size typically use sizes such as 4″×6″ or 5″×7″, while the large photo prints sizes commonly include 6″×9″, 8″×10″, 11″×17″, 8″×12″, and 20″×24″.
Photo deterioration. Two main types of deterioration are associated with photographic materials: Chemical deterioration occurs when the chemicals in the photograph undergo reactions (either through contact with outside catalysts, or because the chemicals are inherently unstable) that damage the material, and physical or structural deterioration occurs when chemical reactions are not involved and deterioration is due to abrasion and tearing. In addition to aging, both types of deterioration are caused by environmental storage conditions (such as temperature and humidity), inappropriate storage enclosures and repair attempts, and human use and handling. Chemical damage can also be caused by improper chemical processing. Different types of photographic materials are particularly susceptible to different types and causes of deterioration. Hence, over the time, the optical density, color balance, lustre, and other qualities of a print will degrade. The rate at which deterioration occurs depends primarily on two main factors: the print itself, in particular, the colorants used to form the image, the medium on which image resides, and the type of environment the print is exposed to. Photos deterioration is described in a guide by Gawain Weaver (of Image Permanence Institute, Rochester Institute of Technology) published 2008 entitled: “A GUIDE to Fiber-Base Gelatin Silver Print Condition and Deterioration”, which is incorporated in its entirety for all purposes as if fully set forth herein.
Photo Album. A photo album is a binder or book structure having front and rear covers (commonly opaque and rigid) in which pages are bound along one edge either by glueing, sewing, or by metal posts or rings. The collection of photographs stored in albums may be attached to pages that may have protective plastic cover sheets, or inserted into pockets pages, envelopes, or any other compartments that the photos may be slipped into. Other albums use a heavy paper with an abrasive surface covered with clear plastic sheets, on which surface photos can be put. Older style albums are often simply books of heavy paper in which photos could be glued to or attached to with adhesive corners or pages. Each page may include a different numbers of photos, and different types (portrait, landscape, round, oval, diamond, square, etc.) of images. A landscape format describes photos arranged so that they have longer lateral dimensions than their vertical dimensions, and a portrait format refers to photos positioned so that have longer vertical dimensions than their lateral dimensions.
An example of a photo album 41 is shown as closed in a view 40 in FIG. 4, and shown as open in a view 40a in FIG. 5. The photo album shown in FIG. 5 is an example of a one-side of an exemplary page 42 that contains no photos, and thus only a background consisting of multiple flower drawings, such as a flower 43. The photo album 41 further enclose three photos designated as a photo #1 44a, a photo #2 44b, and a photo #3 44c, stored on a page 42a. 
Color balance. In photography and image processing, color balance may be defined as a global adjustment in the intensities of the colors (typically red, green, and blue primary colors). An important goal of this adjustment is to render specific colors—particularly neutral colors—correctly; hence, the general method is sometimes called gray balance, neutral balance, or white balance. Color balance changes the overall mixture of colors in an image and is used for color correction; generalized versions of color balance correction are used to get colors other than neutrals also to appear correct or pleasing.
Color balancing may be based on scaling all relative luminances in an image so that objects that are believed to be neutral appear so. Using RGB 8-bit color space, each pixel is associated with (r, g, b) values, each value between 0 and 255. Assuming a surface with r=240 which is believed to be a white object, and if 255 is the count which corresponds to white, scaling may be achieved by multiplying all red values by 255/240. Doing analogously for green and blue would result, at least in theory, in a color balanced image. Such simple scaling may be formalized as a transformation the 3×3 matrix that is a diagonal matrix:
      (                            r                                      g                                      b                      )    =            (                                                  255                                                r                  ′                                ⁢                w                                                          0                                0                                                0                                              255                                                g                  ′                                ⁢                w                                                          0                                                0                                0                                              255                                                b                  ′                                ⁢                w                                                        )        ⁢          (                                                  r              ′                                                                          g              ′                                                                          b              ′                                          )      
where r, g, and b are the color balanced red, green, and blue components of a pixel in the image; r′, g′, and b′ are the red, green, and blue components of the image before color balancing, and r′w, g′w, and b′w are the red, green, and blue components of a pixel which is believed to be a white surface in the image before color balancing.
Color balancing techniques are further described in a paper by Jonathan Sachs (1999) entitled: “Color Balancing Techniques”, in an article by Francesca Gasparini and Raimondo Schettini (of Universita degli Studi di Milano-Bicocca, Milano, Italy) (downloaded July 2015) entitled: “Color Balancing of Digital Photos Using Simple Image Statistics”, and in a book by Alexis Van Hurkman published 2014 by Peachpit Press [ISBN—13: 978-0-321-92966-2], 2nd Edition, entitled: “Color Correction Handbook: Professional Techniques for Video and Cinema, Second Edition”, which are all incorporated in their entirety for all purposes as if fully set forth herein. Various color balance correction algorithms are further described in U.S. Pat. No. 7,557,969 to Sone entitled: “Color Balance Correction Chart, Color Balance Correction Method, and Image Forming Apparatus”, in U.S. Pat. No. 7,664,319 to Toyoda et al. entitled: “Method and Device for Color Balance Correction, and Computer Product”, and in U.S. Pat. No. 7,702,148 to Hayaishi entitled: “Color Balance Correction Based on Color Cast Attribute”, which are both incorporated in their entirety for all purposes as if fully set forth herein.
Contrast enhancement. Contrast is a difference in appearance of two or more parts of a field seen simultaneously or successively such as the difference in luminance or color that makes an object (or its representation in an image or display) distinguishable. In images, contrast is typically determined by the difference in the color and brightness of the object and other objects within the same field of view, and the maximum contrast of an image is referred to as the contrast ratio or dynamic range. As part of a photo deterioration, the contrast is degraded over time, affecting the quality of the photo image.
Contrast enhancement is frequently referred to as one of the most important issues in image processing. Contrast is created by the difference in luminance reflected from two adjacent surfaces. In visual perception, contrast is determined by the difference in the color and brightness of an object with other objects. The human visual system is more sensitive to contrast than absolute luminance; therefore, we can perceive the world similarly regardless of the considerable changes in illumination conditions. If the contrast of an image is highly concentrated in a specific range, the information may be lost in those areas which are excessively and uniformly concentrated. Contrast stretching may be used to increase the dynamic range of gray levels in the image being processed. Linear and nonlinear digital techniques are two widely practiced methods of increasing the contrast of an image.
Linear contrast enhancement. Linear contrast enhancement, also referred to as linear contrast stretching, linearly expands the original digital values of the image data into a new distribution. By expanding the original input values of the image, the total range of sensitivity of the display device can be utilized. Linear contrast enhancement also makes subtle variations within the data more obvious, and these types of enhancements are best applied to remotely sensed images with Gaussian or near-Gaussian histograms, meaning, all the brightness values fall within a narrow range of the histogram and only one mode is apparent. There are three methods of linear contrast enhancement: Min-Max Linear Contrast Stretch, Percentage Linear Contrast Stretch, and Piecewise Linear Contrast Stretch.
When using the minimum-maximum linear contrast stretch, the original minimum and maximum values of the data are assigned to a newly specified set of values that utilize the full range of available brightness values. Consider an image with a minimum brightness value of 45 and a maximum value of 205, hence when such an image is viewed without enhancements, the values of 0 to 44 and 206 to 255 are not displayed. Important spectral differences can be deselected by stretching the minimum value of 45 to 0 and the maximum value of 120. Using this method is applying with respect to image application type: g(x,y)=(f(x,y)−min)/(max−min))*N, where N is the number of the intensity level, g(x,y) represents the output image, and f(x,y) represents input image. In this equation, the “min” and “max” are the minimum intensity value and the minimum intensity value in the current image. Here N shows the total number of intensity values that can be assigned to a pixel. For example, normally in the gray-level images, the lowest possible intensity is 0, and the highest intensity value is 255, thus N=255.
The percentage linear contrast stretch is similar to the minimum-maximum linear contrast stretch except that this method uses specified minimum and maximum values that lie in a certain percentage of pixels from the mean of the histogram. A standard deviation from the mean is often used to push the tails of the histogram beyond the original minimum and maximum values.
When the distribution of a histogram in an image is bi or remodel, an analyst may stretch certain values of the histogram for increased enhancement in selected areas. This method of contrast enhancement is called a piecewise linear contrast stretch. A piecewise linear contrast enhancement involves the identification of a number of linear enhancement steps that expands the brightness ranges in the modes of the histogram. This type can be expressed by f(x,y)=ax for 0≤x≤x1, f(x,y)=b(x−x1)+y(x1) for x1≤x≤x2 and f(x,y)=c(x−x2)+y(x1) for x2≤x≤B, where f(x,y) is the Piecewise Linear Contrast Stretch in the image, a, b, and c are appropriate constants, which are the slopes in the respective regions and B is the maximum intensity value.
Nonlinear contrast enhancement. Nonlinear contrast enhancement typically involves histogram equalizations through the use of an algorithm. The nonlinear contrast stretch method has one major disadvantage. Each value in the input image can have several values in the output image, so that objects in the original scene lose their correct relative brightness value. There are four methods of nonlinear contrast enhancement: Histogram Equalizations, Adaptive Histogram Equalization, Unsharp Mask, and Homomorphic Filter.
Histogram equalization is a common form of nonlinear contrast enhancement, involving equalizing the image's histogram by redistributing all pixel values of the image so that there are approximately an equal number of pixels in each of the user-specified output gray-scale classes (e.g., 32, 64, and 256). Contrast is increased at the most populated range of brightness values of the histogram (or “peaks”), and it automatically reduces the contrast in very light or dark parts of the image associated with the tails of a normally distributed histogram. Histogram equalization can also separate pixels into distinct groups, if there are few output values over a wide range. Histogram equalization is effective only when the original image has a poor contrast to start with, since otherwise histogram equalization may degrade the image quality.
In an adaptive histogram equalization the image is divided into several rectangular domains, an equalizing histogram is computed and levels are modified so that they match across boundaries, depending on the nature of the nonuniformity of the image. Adaptive histogram equalization acts as a local operation and uses the histogram equalization mapping function supported over a certain size of a local window to determine each enhanced density value. Therefore, regions occupying different gray scale ranges can be enhanced simultaneously. A histogram modification may be applied to each pixel to improve local contrast based on the histogram of pixels that are neighbors to a given pixel, typically resulting in maximum contrast enhancement. According to this method, we partition the given image into blocks of suitable size and equalize the histogram of each sub-block, and in order to eliminate artificial boundaries created by the process, the intensities are interpolated across the block regions using bicubic interpolating functions.
Homomorphic filter is a filter which controls both high-frequency and low-frequency components. The homomorphic filtering technique has a multiplicative model and aims at handling image of large intensity. When images are acquired by optical means, the image of the object is a product of the illuminating light source and the reflectance of the object, as described by: f(x,y)=I(x,y) ρ (x,y), where I is the intensity of the illuminating light source, f is the image, and 0≤ρ≤1 is the reflectance of the object. In order to enhance an image with a poor contrast, we can use the model and selectively filter out the light source while boosting the reflectance component. The result will be an enhancement of the image. In order to separate the two components, they must be additive, therefore it is required to transform the image into a log domain, whereby the multiplicative components become additive, as Ln (f)=Ln (I)+Ln (ρ). Since the natural logarithm is monotonic, Ln (I) is low pass and Ln (ρ) is high pass, thus an image f=ln (f) has additive components and can therefore be selectively filtered by a linear filter. In order to enhance an image, the homomorphic filter must have a higher response in the high-frequency region than in the low-frequency region so that the details, which fall in the high-frequency region, can be accentuated while lowering the illumination component.
The unsharp mask method is a technique to increase the sharpness in the image contrast, and unsharp masking can be expressed by y(m,n)=f(m,n)+a*g(m,n) where: f is the input image, y is the sharpened image, g is the gradient image, and a is the contrast constant greater than zero.
Various contrast enhancement techniques may be used to improve the contrast of a digital image, including linear and non-linear stretching, histogram equalization or specification, and adaptive histogram modification, such as the contrast enhancement techniques that are described in a presentation by Yao Wang (of Polytechnic University, Brooklyn, N.Y.) (downloaded July 2015) entitled: “EL5123—Image Processing—Contrast Enhancement”, in an article by S. Gayathri, N. Mohanapriya, and Dr. B. Kalaavathi (all of Tiruchengode, Namakkal, India) published on International Journal of Advanced Research in Computer and Communication Engineering, Vol. 2, Issue 11, November 2013 [ISSN: 2319-5940], entitled: “Survey on Contrast Enhancement Techniques”, in an article by Manpreet Kaur, Jasdeep Kaur, and Jappreet Kaur (of Guru Nanak Engineering College, Ludhiana, India), published on International Journal of Advanced Computer Science and Applications (IJACSA) Vol. 2, No. 7, 2011, entitled: “Survey of Contrast Enhancement Techniques based on Histogram Equalization”, in an article by Sandeep Singh and Sandeep Sharma (of GNDU, Amristar) published on International Journal of Computer Science (IIJCS) Volume 2, Issue 5, May 2014 [ISSN 2321-5992] entitled: “A Survey of Image Enhancement Techniques”, and in an article by Mr. Salem Saleh Al-amri et al. published on International Journal of Computer Science and Network Security (IJCSNS) Vol. 10 No. 2, February 2010 entitled: “Linear or Non-Linear Contrast Enhancement Image”, which are all incorporated in their entirety for all purposes as if fully set forth herein. Various image contrast enhancement techniques are further described in U.S. Pat. No. 6,463,173 to Tretter entitled: “System and Method for Histogram-Based Image Contrast Enhancement”, and in U.S. Pat. No. 8,228,560 to Hooper entitled: “Image Contrast Enhancement”, which are both incorporated in their entirety for all purposes as if fully set forth herein.
Line segment. A line segment herein refers to a part of a straight line that is bounded by two distinct end-points, and contains every point on the line between the end points. Examples of line segments include the sides of a triangle or a square. More generally, when both of the segment end points are vertices of a polygon or polyhedron, the line segment is either an edge or side (of that polygon or polyhedron) if they are adjacent vertices, or otherwise a diagonal. Commonly, photos in a photo album are in a form of a rectangle, which is any quadrilateral with four right angles.
Before the digital era, consumers have been taking pictures that were stored as hard copy prints. These photos (or prints) were taken at various personal and professional occasions, often having great sentimental value to the taker and the people associated therewith, typically for viewing thereafter. Due to the relatively inexpensive nature of taking photographs, people have often taken many photographs over the years. While some attempts have been made in trying to organize the images and pictures taken by consumers, often due to the extensive amount of pictures taken, the photographic prints are stored into various containers such as photo albums and shoe boxes in random fashion. Thus, over the course of many years, people often store hundreds of pictures in boxes where the images are not placed in any particular order or organization. While there are various solutions for arranging and storing digitally captured images (such as images captured by digital cameras), there is a need for easy sorting and organizing photographs for the consumer to go through, rather than sorting a pile of stored pictures or in photo albums. Further, there is a need to harmonized and align the methods of arrangement and handling of digital images with those used with hardcopy photos.
Various providers are available for providing a service of scanning of photos and photo albums. However, these services require the customer to send physically the photos to the service provider for physical scanning, which is inconvenient, costly, and time-consuming. Such services are provided for example by ScanCafe Inc. (Headquartered in Hayward, Calif., U.S.A.) offering services as described in the web page www.scancafe.com/services/photo-scanning (preceded by http://) downloaded July 2015, which is incorporated in its entirety for all purposes as if fully set forth herein, or by EverPresent (Headquartered in Newton, Mass., U.S.A.) offering services as described in the web page everpresentonline.com/services/photo-scanning-to-digital (preceded by http://) downloaded July 2015, which is incorporated in its entirety for all purposes as if fully set forth herein.
Various services associated to the digital images are known such as digital archive and internet storage services, and such services are provided for example by FOREVER.com (Headquartered in Pittsburgh, Pa., U.S.A.) offering services as described in the web page www.forever.com/features (preceded by https://) downloaded July 2015, which is incorporated in its entirety for all purposes as if fully set forth herein, or by iMemories Inc. (Headquartered in Scottsdale, Ariz., U.S.A.), offering services as described in the web page www.imemories.com/features (preceded by http://) downloaded July 2015, which is incorporated in its entirety for all purposes as if fully set forth herein.
Availability of printed photos enable users to store, organize, manage, edit, enhance, and share digital images locally, or over the Internet using a web browser or other software applications. A user may also share photos, post photos online, and create personalized photo products or projects. Creating personalized image products, however, can take a considerable amount of time and effort. Additionally, it is challenging to personalize image products using mobile devices because these devices often have smaller displays, lower communication bandwidth, and possibly have lower computing power compared to the desktop computers. Users of mobile devices also tend to have shorter attention spans than users of desktop or laptop computers. Further, customers are often interested to design and personalize their products. The term “personalized” refers to the information that is specific to the recipient, the user, the gift product, and the occasion, which may include personalized content, personalized text messages, personalized images, and personalized designs that can be incorporated in the image products. The content of personalization may be provided by a user or selected by the user from a library of content provided by the service provider. Examples of the image-based products include image prints, photo books, photo calendars, photo greeting cards, holiday cards, photo stationeries, photo mugs, and photo T-shirts, which incorporate image content provided by the user or the image service provider.
A computer-implemented method for creating an image collage is described in U.S. Patent Application Publication No. 2014/0307980 to Hilt entitled: “Adaptive and fast Image Collage Creation”, which is incorporated in its entirety for all purposes as if fully set forth herein. The method includes automatically selecting a first image from the group of images that best fits image-collage template based on predetermined criteria, placing and fitting the first image in the image-collage template, automatically selecting one or more additional images from the group of images that best fits the image-collage template including one or more already placed images that include the first image, and placing and fitting the one or more additional images image in the image-collage template by the computer system. An image collage is formed after all the images in the group are placed in the image-collage template. A computer software product and a method of organizing and searching images, where the digital images may be obtained from a plurality of hard copy prints that are digitally scanned, and where the digital images are analyzed in accordance with a pre-determined criteria are described in a U.S. Pat. No. 7,260,587 to Testa et al., entitled: “Method for Organizing Digital Images”, which is incorporated in its entirety for all purposes as if fully set forth herein.
A digital image manipulation system for automatically cropping acquired digital images is described in U.S. Pat. No. 8,406,515 to Cheatle entitled: “Method for Automatically Cropping Digital Images”, which is incorporated in its entirety for all purposes as if fully set forth herein. The system includes a memory device configured to store at least one acquired digital image, a crop analysis segmentation subsystem configured to divide at least one image into a set of similarly colored regions, a classification subsystem configured to classify each region into one of a set of possible classes which include subject background and distraction at a minimum, an optimization search configured to search possible crop boundaries and a selection module configured to automatically select a highest rated crop boundary determined from the search and based on an optimization criterion that is derived from results of the crop analysis segmentation and classification subsystems.
An image processing apparatus and method, an image capturing apparatus, and a program which make it possible to crop an image with an optimal composition even for a subject other than a person is described in U.S. Patent Application Publication No. 2010/0290705 to Nakamura entitled: “Image Processing Apparatus and Method, Image Capturing Apparatus, and Program”, which is incorporated in its entirety for all purposes as if fully set forth herein. The disclosure describes a composition pattern setting section sets a composition pattern corresponding to an input image, on the basis of the number of salient regions to which attention is directed in the input image, and the scene of the input image. On the basis of the composition pattern set by the composition pattern setting section, a composition analyzing section determines a crop region in the input image which is optimal for an image to be cropped in the composition pattern from the input image.
A method and system for cropping an image is described in U.S. Pat. No. 7,529,390 to Zhang et al. entitled: “Automatically cropping an image”, which is incorporated in its entirety for all purposes as if fully set forth herein. The cropping system automatically crops an image by selecting an image template whose condition is best satisfied by the image and then by selecting a cropping of the image that best attains the goal of the selected image template, and may use a metric or objective function to rate how well a cropping attains the goal of the selected image template. The cropping system may apply various optimization algorithms to identify a cropping that is the best as indicated by the metric, and can then automatically crop the image based on the identified cropping.
An image processing apparatus that determines crop positions for an image including a plurality of objects in a preferred manner is described in U.S. Patent Application Publication No. 2014/0176612 to Tamura; Yusuke; et al. entitled: “Image Processing Apparatus, Image Capturing Apparatus, Image Processing Method, and Storage Medium”, which is incorporated in its entirety for all purposes as if fully set forth herein. The image processing apparatus specifies object regions from the image and sets a plurality of crop region candidates for each of the specified object regions. The image processing apparatus selects a predetermined number of crop regions from among the plurality of crop region candidates based on evaluation values obtained for the plurality of crop region candidates and on similarities among the plurality of crop region candidates.
Methods and systems for cropping images of book pages are disclosed in U.S. Pat. No. 7,945,116 to Curtis entitled: “Computer-Assisted Image Cropping for Book Scans”, which is incorporated in its entirety for all purposes as if fully set forth herein. The method may include identifying reference images and receiving cropping rectangles for the reference images. These cropping rectangles associated with reference images may then be used to generate cropping rectangles for images of book pages between the reference images, and may be generated based on a linear interpolation of the cropping rectangles associated with the reference images and the number of pages between images. The method may also display one or more images of book pages with the associated one or more cropping rectangles superimposed thereon. A user may then have the opportunity to adjust the position and/or size of the cropping rectangles.
A method for auto-cropping is described in U.S. Pat. No. 8,660,351 to Tang entitled: “Auto-Cropping Images Using Saliency Maps”, which is incorporated in its entirety for all purposes as if fully set forth herein. The method includes, creating a saliency map corresponding to a digital image, the saliency map indicating a relevance of pixels within the digital image with saliency values within a number range, a lower end of the number range being less than zero and an upper end of the number range being greater than zero. The method further includes analyzing the saliency map to find a potential cropping rectangle, the potential cropping rectangle having a maximum sum of saliency values within borders of the rectangle.
A method and computer program/system for cropping a digital image is described in U.S. Pat. No. 6,654,506 to Luo et al. entitled: “A Method for Automatically Creating Cropped and Zoomed Versions of Photographic Images”, which is incorporated in its entirety for all purposes as if fully set forth herein. The method includes inputting a belief map of a photographic image, (a belief value at a location in the belief map indicates an importance of a photographic subject at the same location in the photographic image), selecting a zoom factor and a crop window, clustering regions of the belief map to identify background portions, secondary portions and main portions, positioning the crop window such that the crop window is centered around a main portion having a highest belief value, moving the crop window such that the crop window is included completely within the image, moving the crop window such that a sum of belief values is at a maximum, and cropping the image according to the crop window.
A method for selecting important digital images in a collection of digital images is described in U.S. Pat. No. 8,774,528 to Hibino et al. entitled: “Method of Selecting Important Digital Images”, which is incorporated in its entirety for all purposes as if fully set forth herein. The method comprising: analyzing the digital images in the collection of digital images to identify one or more sets of similar digital images; identifying one or more sets of similar digital images having the largest number of similar digital images; selecting one or more digital images from the identified largest sets of similar digital images to be important digital images; and storing an indication of the selected important digital image in a processor accessible memory.
U.S. Pat. No. 6,535,636 to Savakis, entitled “Method for Automatically Detecting Digital Images That are Undesirable for Placing in Albums”, which is incorporated in its entirety for all purposes as if fully set forth herein, teaches automatically determining an overall image quality parameter by assessing various technical image quality attributes (e.g., sharpness, contrast, noise, and exposure). U.S. Pat. No. 6,658,139 to Coolcingham et al., entitled “Method for Assessing Overall Quality of Digital Images”, which is incorporated in its entirety for all purposes as if fully set forth herein teaches a method determining a numerical representation of user perceived overall image quality of a digital image. The method involves creating a digital reference image series with each reference digital image having a corresponding numerical representation of overall image quality. User inputs are collected while iteratively displaying a digital test image in comparison with the digital reference images. The user inputs are analyzed to infer a numerical representation of the overall image quality of the digital test image. U.S. Pat. No. 6,940,545 to Ray, entitled “Face Detecting Camera and Method”, which is incorporated in its entirety for all purposes as if fully set forth herein, teaches automatically assessing aesthetic image quality based on whether detected faces are positioned in a location consistent with the ‘rule of thirds’.
Pre- and post-capture user interaction monitoring has also been used to determine important images. Such approaches are based on the monitoring of user behavior, changes to user expressions, or changes to user physiology while capturing, viewing, or utilizing images. These techniques often involve additional devices such cameras to monitor, record, and analyze facial expressions or eye gaze or dilation, or devices that monitor galvanic skin response (GSR), heart rate, breathing rate or the like. In other cases, user interaction with images is monitored and recorded within the capture device to monitor user interactions with the image capture device. For example, interaction with the zoom control, exposure button, exposure modes and settings can be monitored to determine the level of effort the user engaged in to capture the image. Similarly post capture interaction, such as image review with a capture device's integrated display screen or after the images have been transferred to a computer or printer, these interactions can be analyzed to determine via utilization models which images are important to users. U.S. Pat. No. 7,620,270 to Matraszek et al. entitled “Method for Creating and Using Affective Information in a Digital Imaging System”, which is incorporated in its entirety for all purposes as if fully set forth herein, discloses a retrieval procedure for stored digital images based a user's affective information. The affective information is obtained by a signal detecting means representing an emotional reaction of the user to one of the stored digital images. Digital images are categorized based on the affective information. U.S. Pat. No. 7,742,083 to Fredlund et al., entitled “In-camera Dud Image Management”, which is incorporated in its entirety for all purposes as if fully set forth herein, teaches automatically determining a value index from one or more of: user inputs to said camera during capture, usage of a particular image record following capture, semantic image content of an image record, and user reactions to the image record. Image records are classified into unacceptable image records having value indexes within a predetermined threshold and acceptable image records having respective said value indexes beyond the predetermined threshold.
In consideration of the foregoing, it would be an advancement in the art to provide an image analysis solution and other methods and systems for improving capturing, arrangement, storage, or any other handling of photos and other papers, that are simple, secure, cost-effective, load-balanced, redundant, reliable, provide lower CPU and/or memory usage, easy to use, reduce latency, faster, has a minimum part count, minimum hardware, and/or uses existing and available components, protocols, programs and applications for providing better quality of service, overload avoidance, better or optimal resources allocation, better communication and additional functionalities, and provides a better user experience.