There is disclosed technology for generating combined image data representing an image obtained by joining a first image and a second image. For example, when a document has a size that cannot be read in a single scanning operation, the document is scanned in two scanning operations, thereby acquiring scan data representing the first image and scan data representing the second image. Then, the two scan data are used to generate output image data representing an image obtained by joining the first and second images. In this case, a pattern matching is used to determine a position at which the first and second images are joined.