Image recognition is a process, performed by computers, to analyze and understand an image (such as a photo or video clip). Images are generally produced by sensors, including light sensitive cameras. Each image includes a large number (such as millions) of pixels. Each pixel corresponds to a specific location in the image. Additionally, each pixel typically corresponds to light intensity in one or more spectral bands, physical measures (such as depth, absorption or reflectance of sonic or electromagnetic waves), etc. Pixels are typically represented as color tuples in a color space. For example, in the well-known Red, Green, and Blue (RGB) color space, each color is generally represented as a tuple with three values. The three values of a RGB tuple expresses red, green, and blue lights that are added together to produce the color represented by the RGB tuple.
In addition to the data (such as color) that describes pixels, image data may also include information that describes an object in an image. For example, a human face in an image may be a frontal view, a left view at 30°, or a right view at 45°. As an additional example, an object in an image is an automobile, instead of a house or an airplane. Understanding an image requires disentangling symbolic information represented by image data. Specialized image recognition technologies have been developed to recognize colors, patterns, human faces, vehicles, air crafts, and other objects, symbols, forms, etc., within images.
Scene understanding or recognition has also advanced in recent years. A scene is a view of a real-world surrounding or environment that includes more than one object. A scene image can contain a big number of physical objects of various types (such as human beings, vehicle). Additionally, the individual objects in the scene interact with or relate to each other or their environment. For example, a picture of a beach resort may contain three objects—a sky, a sea, and a beach. As an additional example, a scene of a classroom generally contains desks, chairs, students, and a teacher. Scene understanding can be extremely beneficial in various situations, such as traffic monitoring, intrusion detection, robot development, targeted advertisement, etc.
Facial recognition is a process by which a person within a digital image (such as a photograph) or video frame(s) is identified or verified by a computer. Facial detection and recognition technologies are widely deployed in, for example, airports, streets, building entrances, stadia, ATMs (Automated Teller Machines), and other public and private settings. Facial recognition is usually performed by a software program or application running on a computer that analyzes and understands an image.
Recognizing a face within an image requires disentangling symbolic information represented by image data. Specialized image recognition technologies have been developed to recognize human faces within images. For example, some facial recognition algorithms recognize facial features by extracting features from an image with a human face. The algorithms may analyze the relative position, size and shape of the eyes, nose, mouth, jaw, ears, etc. The extracted features are then used to identify a face in an image by matching features.
Image recognition in general and facial and scene recognition in particular have been advanced in recent years. For example, Principal Component Analysis (“PCA”) algorithm, Linear Discriminant Analysis (“LDA”) algorithm, Leave One Out Cross-Validation (“LOOCV”) algorithm, K Nearest Neighbors (“KNN”) algorithm, and Particle Filter algorithm have been developed and applied for facial and scene recognition. Descriptions of these example algorithms are more fully described in “Machine Learning, An Algorithmic Perspective,” Chapters 3, 8, 10, 15, Pages 47-90, 167-192, 221-245, 333-361, Marsland, CRC Press, 2009, which is hereby incorporated by reference to materials filed herewith.
Despite the development in recent years, facial recognition and scene recognition have proved to present a challenging problem. At the core of the challenge is image variation. For example, at the same place and time, two different cameras typically produce two pictures with different light intensity and object shape variations, due to difference in the camera themselves, such as variations in the lenses and sensors. Additionally, the spatial relationship and interaction between individual objects have an infinite number of variations. Moreover, a single person's face may be cast into an infinite number of different images. Present facial recognition technologies become less accurate when the facial image is taken at an angle more than 20° from the frontal view. As an additional example, present facial recognition systems are ineffective to deal with facial expression variation.
A conventional approach to image recognition is to derive image features from an input image, and compare the derived image features with image features of known images. For example, the conventional approach to facial recognition is to derive facial features from an input image, and compare the derived image features with facial features of known images. The comparison results dictate a match between the input image and one of the known images. The conventional approach to recognize a face or scene generally sacrifices matching accuracy for recognition processing efficiency or vice versa.
People manually create photo albums, such as a photo album for a specific stop during a vacation, a weekend visitation of a historical site or a family event. In today's digital world, the manual photo album creation process proves to be time consuming and tedious. Digital devices, such as smart phones and digital cameras, usually have large storage size. For example, a 32 gigabyte (“GB”) storage card allows a user to take thousands of photos, and record hours of video. Users oftentimes upload their photos and videos onto social websites (such as Facebook, Twitter, etc.) and content hosting sites (such as Dropbox and Picassa) for sharing and anywhere access. Digital camera users covet for an automatic system and method to generate albums of photos based certain criteria. Additionally, users desire to have a system and method for recognizing their photos, and automatically generating photo albums based on the recognition results.
Accordingly, there is a need for a system, apparatus and method for providing accurate and fast facial and scene recognition. Furthermore, there is a need for a system, apparatus and method for automatically generating photo albums based on image recognition results. Additionally, it is desirable that the automatic generation of photo albums factors in metadata embedded in and tags attached to the photos.