In the related art, the input for image identification and machine vision recognition is performed by first taking pictures and then determining a target object. Specifically, pictures are first taken, foreground environment images and background environment images of the target object are saved, and then the target object is selected by delimiting the target object on a screen by a finger and segmented so as to perform the image identification. In this way, the pictures shot need to subject to human intervention, for example, the pictures need to be delimited manually, such that the operation step is complicated and the experience of a user is not smooth. In addition, the above process is only applicable to smart terminals with touch screens, and thus the applicability is poor.