Conventional methods for image processing for object recognition generally utilize “deep learning” neural network approaches. Neural networks simulate real neural networks, such as the human brain, through simple simulated neurons (also referred to as nodes) connected through a series of layers. Such neural networks “learn” through the feedback of the correct responses provided to the neural networks. This process is also referred to as “training.” In the context of neural networks, the term “deep” refers to the number of layers within a neural network where a deep network has more layers than a shallow network.
A neural network specifically designed for image processing is referred to as a Convolutional Neural Network (CNN). The convolutional layers in such neural networks filter part of the image looking for certain visual attributes. For example, one convolution might look for narrow vertical bars. CNNs have been utilized for visual object recognition. In some instances, CNNs approximate and improve upon human object recognition performance.
With respect to wearable item image analysis, a number of neural network approaches have been proposed. As an example, one available neural network approach takes a user-submitted image, recognizes a wearable item included in the image, and identifies the same or similar wearable item in an inventory. That neural network approach applies to a broad range of products in addition to wearable items. While the neural network approach described above may have some merits, there are two recognized issues with the neural network approach: (1) the significant amount of resources and (2) lack of explainability.
With respect to the first issue, the neural network approach requires a significant amount of data and computational resources to train a neural network model. As an example, a million images may be considered a typical number of images used for training a neural network model. Furthermore, such images must be pre-labeled with correct responses. For example, images of wearable items used for training must also include the correct style characteristics. For specialized uses, such as wearable item style analysis, data sets with correct style characteristics are difficult to find and/or are expensive. Moreover, the hardware (e.g., graphics processing units “GPUs” or tensor processing units “TPUs”) used to train neural network models at any level of efficiency is specifically designed for neural network modeling, and is expensive to buy or rent. For example, typical third party cloud services rent GPUs for 1 to 24 dollars per hour, and a typical training run may last several days.
With respect to the second issue, while the results provided by the neural network approach may be accurate, it is difficult to explain how the neural network models reached such results. Most of the processing for neural network models is conducted in “hidden” layers between an input (e.g., an image) and an output (e.g., results). This lack of transparency makes it difficult to explain how the results were achieved, therefore making it difficult to perform an act at a functional level (e.g., providing recommendations to merchandising) based on the results provided by the neural network model.
The background description provided herein is for the purpose of generally presenting the context of the disclosure. Unless otherwise indicated herein, the materials described in this section are not prior art to the claims in this application and are not admitted to be prior art, or suggestions of the prior art, by inclusion in this section.