Conventionally, a technology for performing a recognition process such as a character recognition process or an object recognition process on an input image and overlaying a result of the recognition process on the input image has been known. An image display apparatus to which such a technology is applied can present a target, which is included in an input image and is assumed to attract user's interest, to the user in an easily understandable form. However, since it takes a time to acquire the result of the recognition process for an image, for example, for an application having a scene image captured using a camera of a mobile terminal as an input image or the like, enhancement is requested from the viewpoint of the responsiveness.