Image editing applications receive image editing requests (IERs) via a user interface to edit images, such as for cropping, adjusting color, adding or removing objects, and the like. A user interface of an image editing application usually receives IERs via user selection of items exposed in the user interface. For instance, a user may select a tab in a user interface with a cursor of a mouse to apply a filter designated by the tab to an image. Furthermore, an IER may require multiple user inputs. For instance, to crop an image, a user may have to navigate to a crop tool, select the crop tool, move multiple handles of the crop tool to crop the image, and deselect the crop tool. Because the editing of images can require a large number of IERs, and an IER may require multiple user inputs, image editing with image editing applications can require significant user effort, in terms of time and input to the image editing application.
Furthermore, image editing applications do not usually accept voice commands from a user as IERs, since voice commands alone that sufficiently describe an IER request are often complex, and therefore are generally not appropriate to train a neural network for use in an image editing application to understand IERs. For instance, a user may speak an IER “Move the dog by the leg of the table from under the table” to request to move one of multiple dogs in an image. However, this voice command may be too complex to adequately train a neural network to understand the intent of the voice command, because of the number and relationships of terms in the voice command.
Some websites accept written IERs along with an image to be edited, and (for a small fee) will return an image to a user that has been edited in accordance with the submitted written IER. However, images and associated IERs from these websites are generally not appropriate to use as training data for a neural network to recognize IERs because the IERs from these websites are either highly abstract (e.g., “Please make this image more instragrammable”), or contain superfluous information to the IER about the image (e.g., “My dog Rover passed away. We used to walk every day except Sunday. Could anyone make him look like a good dog?”), or both.
Accordingly, there is a lack of appropriate data to train a neural network, machine learning algorithm, artificial intelligence model, and the like, to recognize IERs in an image editing application, so a user would not have to rely on user selection of items exposed in the user interface, such as by selecting tabs and menu items with a mouse cursor, to implement an IER. Hence, user interfaces of image editing applications remain inefficient and require significant user input to accomplish image editing tasks.