Image recognition is a process, performed by computers, to analyze and understand an image (such as a photo or video clip). Images are generally produced by sensors, including light sensitive cameras. Each image includes a large number (such as millions) of pixels. Each pixel corresponds to a specific location in the image. Additionally, each pixel typically corresponds to light intensity in one or more spectral bands, physical measures (such as depth, absorption or reflectance of sonic or electromagnetic waves), etc. Pixels are typically represented as color tuples in a color space. For example, in the well-known Red, Green, and Blue (RGB) color space, each color is generally represented as a tuple with three values. The three values of a RGB tuple expresses red, green, and blue lights that are added together to produce the color represented by the RGB tuple.
In addition to the data (such as color) that describes pixels, image data may also include information that describes an object in an image. For example, a human face in an image may be a frontal view, a left view at 30°, or a right view at 45°. As an additional example, an object in an image is an automobile, instead of a house or an airplane. Understanding an image requires disentangling symbolic information represented by image data. Specialized image recognition technologies have been developed to recognize colors, patterns, human faces, vehicles, air crafts, and other objects, symbols, forms, etc., within images.
Scene understanding or recognition has also advanced in recent years. A scene is a view of a real-world surrounding or environment that includes more than one object. A scene image can contain a big number of physical objects of various types (such as human beings, vehicle). Additionally, the individual objects in the scene interact with or relate to each other or their environment. For example, a picture of a beach resort may contain three objects—a sky, a sea, and a beach. As an additional example, a scene of a classroom generally contains desks, chairs, students, and a teacher. Scene understanding can be extremely beneficial in various situations, such as traffic monitoring, intrusion detection, robot development, targeted advertisement, etc.
Facial recognition is a process by which a person within a digital image (such as a photograph) or video frame(s) is identified or verified by a computer. Facial detection and recognition technologies are widely deployed in, for example, airports, streets, building entrances, stadia, ATMs (Automated Teller Machines), and other public and private settings. Facial recognition is usually performed by a software program or application running on a computer that analyzes and understands an image.
Recognizing a face within an image requires disentangling symbolic information represented by image data. Specialized image recognition technologies have been developed to recognize human faces within images. For example, some facial recognition algorithms recognize facial features by extracting features from an image with a human face. The algorithms may analyze the relative position, size and shape of the eyes, nose, mouth, jaw, ears, etc. The extracted features are then used to identify a face in an image by matching features.
Image recognition in general and facial and scene recognition in particular have been advanced in recent years. For example, Principal Component Analysis (“PCA”) algorithm, Linear Discriminant Analysis (“LDA”) algorithm, Leave One Out Cross-Validation (“LOOCV”) algorithm, K Nearest Neighbors (“KNN”) algorithm, and Particle Filter algorithm have been developed and applied for facial and scene recognition. Descriptions of these example algorithms are more fully described in “Machine Learning, An Algorithmic Perspective,” Chapters 3, 8, 10, 15, Pages 47-90, 167-192, 221-245, 333-361, Marsland, CRC Press, 2009, which is hereby incorporated by reference to materials filed herewith.
Despite the development in recent years, facial recognition and scene recognition have proved to present a challenging problem. At the core of the challenge is image variation. For example, at the same place and time, two different cameras typically produce two pictures with different light intensity and object shape variations, due to difference in the camera themselves, such as variations in the lenses and sensors. Additionally, the spatial relationship and interaction between individual objects have an infinite number of variations. Moreover, a single person's face may be cast into an infinite number of different images. Present facial recognition technologies become less accurate when the facial image is taken at an angle more than 20° from the frontal view. As an additional example, present facial recognition systems are ineffective to deal with facial expression variation.
A conventional approach to image recognition is to derive image features from an input image, and compare the derived image features with image features of known images. For example, the conventional approach to facial recognition is to derive facial features from an input image, and compare the derived image features with facial features of known images. The comparison results dictate a match between the input image and one of the known images. The conventional approach to recognize a face or scene generally sacrifices matching accuracy for recognition processing efficiency or vice versa.
People manually create photo albums, such as a photo album for a specific stop during a vacation, a weekend visitation of a historical site or a family event. In today's digital world, the manual photo album creation process proves to be time consuming and tedious. Digital devices, such as smart phones and digital cameras, usually have large storage size. For example, a 32 gigabyte (“GB”) storage card allows a user to take thousands of photos, and record hours of video. Users oftentimes upload their photos and videos onto social websites (such as Facebook, Twitter, etc.) and content hosting sites (such as Dropbox and Picassa) for sharing and anywhere access. Digital camera users covet for an automatic system and method to generate albums of photos based certain criteria. Additionally, users desire to have a system and method for recognizing their photos, and automatically generating photo albums based on the recognition results.
Accordingly, there is a need for a system, apparatus and method for providing accurate and fast facial and scene recognition. Furthermore, there is a need for a system, apparatus and method for automatically generating photo albums based on image recognition results. Additionally, it is desirable that the automatic generation of photo albums factors in metadata embedded in and tags attached to the photos.
Objects of the Disclosed System, Method, and Apparatus
Accordingly, it is an object of this disclosure to provide a system, apparatus and method for recognizing the scene category of a scene image.
Another object of this disclosure is to provide a system, apparatus and method for matching a scene image to an image stored in a database.
Another object of this disclosure is to provide a system, apparatus and method for efficiently and accurately understanding a scene through an iterative process.
Another object of this disclosure is to provide a system, apparatus and method for segmenting a scene image into multiple images, recognizing each of the multiple images, and, based on the recognition results, matching the scene image to a scene category in a set of scene categories.
Another object of this disclosure is to provide a system, apparatus and method for segmenting a scene image into multiple images, recognizing each of the multiple images, and, based on the recognition results, matching the scene image to an image in a set of images.
Another object of this disclosure is to provide a system, apparatus and method for deriving raw image features and selecting significant image features from the raw image features for training images.
Another object of this disclosure is to provide a system, apparatus and method for deriving image features from training images.
Another object of this disclosure is to provide a system, apparatus and method for refining scene understanding using an iterative training process.
Another object of this disclosure is to provide a system, apparatus and method for recognizing a scene image using a client-server computing platform.
Another object of this disclosure is to provide a system, apparatus and method for recognizing a scene image using a client-server computing platform in an offline mode.
Another object of this disclosure is to provide a system, apparatus and method for recognizing a scene image using a cloud computing platform.
Another object of this disclosure is to provide a system, apparatus and method for organizing images by scene types.
Another object of this disclosure is to provide a system, apparatus and method for associating a scene type with photos on a web page.
Another object of this disclosure is to provide a system, apparatus and method for associating a scene type with a web video clip.
Another object of this disclosure is to provide a system, apparatus and method for crawling social network sites for facial recognition and model training on photos hosted on the social network sites.
Another object of this disclosure is to provide a system, apparatus and method for recognizing a face in a set of images.
Another object of this disclosure is to provide a system, apparatus and method for recognizing a face in a video clip.
Another object of this disclosure is to provide a system, apparatus and method for training models using a video clip.
Another object of this disclosure is to provide a system, apparatus and method for detecting a face within an image using deep learning algorithms.
Another object of this disclosure is to provide a system, apparatus and method for performing facial recognition by detecting facial feature points, extracting partial facial features, and concatenating partial facial features.
Another object of this disclosure is to provide a system, apparatus and method for performing facial recognition using metric learning algorithms.
Another object of this disclosure is to provide a system, apparatus and method for facial recognition that supports different facial recognition mechanisms depending on facial recognition time requirements.
Another object of this disclosure is to provide a system, apparatus and method for facial recognition that corrects and improves facial recognition results.
Another object of this disclosure is to provide a system, apparatus and method for facial recognition using a client-server based facial recognition system.
Another object of this disclosure is to provide a system, apparatus and method for facial recognition using a client-server based parallel facial recognition system.
Another object of this disclosure is to provide a system, apparatus and method for facial recognition using a cloud based facial recognition system.
Another object of this disclosure is to provide a system, apparatus and method for uploading batch images for facial recognition.
Another object of this disclosure is to provide a system, apparatus and method
Other advantages of this disclosure will be clear to a person of ordinary skill in the art. It should be understood, however, that a system or method could practice the disclosure while not achieving all of the enumerated advantages, and that the protected disclosure is defined by the claims.