The subject matter discussed in this section should not be assumed to be prior art merely as a result of its mention in this section. Similarly, a problem mentioned in this section or associated with the subject matter provided as background should not be assumed to have been previously recognized in the prior art. The subject matter in this section merely represents different approaches, which in and of themselves can also correspond to implementations of the claimed technology.
Regarding “real-world data,” one of the most challenging aspects of generating training data is that the training data should resemble an underlying distribution of “real-world data.” “Real-world data” is data that is similar to what a user is trying to match when a user is presented with documents or images on a screen.
Roughly described, the technology disclosed relates to an overall process of providing a service using a trained model. The trained model uses algorithms for generating predictions in the form of images and/or screens that are believed to draw the customer to their target image (e.g., an image in their mind that they are trying to reach, such as a specific product). The images and/or screens are produced using embeddings created by the trained model.
The outcome of the service is only as good as the trained model. Use of better or more comprehensive training data allows for the creation of a better (e.g., more accurate or realistic) model, because the model is only as “smart” as the data that was used for training. This is why it is important to improve the training data generation process. Training data should satisfy two important aspects—(i) comprehensiveness, i.e., having richly tagged real-world images that are captured in a wide spectrum of uncontrolled environments (e.g., arbitrary perspectives, textures, backgrounds, occlusion, illumination) so that the model is proficient at handling a diverse array of image requests from the customers during production or inference and (ii) scale, i.e., having large amounts of such tagged real-world images so that the model is adequately trained. There exists a shortage of such training data because colleting and tagging real-world images is tedious, time consuming, and error prone.
Therefore, an opportunity arises for preparing a data object creation and recommendation database for use in a data object creation and recommendation system.