1. Field of the Invention
The present invention relates to a technology for predicting an optimum function for image data in a data processing apparatus.
2. Description of the Related Art
In recent years, with the spread of color scanners and digital cameras, a printed document is scan-input and document image data of the scan-input printed document is accumulated, output, and reused by user terminals. Moreover, like encoded electronic document data, the scan-input document image data is transmitted to a remote location via a network.
This means that, on the network, document data is circulated in a form of scanned document image data or encoded electronic document data. The scanned document image data or the encoded electronic document data is transmitted and accumulated between user terminals over the network so that users use that data according to tasks or preferences.
For later reuse of data, it is desirable to classify the data according to criteria determined by the users when storing the data in storage members.
Conventional technologies of a document classification system for classifying data are disclosed in, for example, Japanese Patent No. 3441500, Japanese Patent No. 3792411, and Japanese Patent No. 3771047. In the document classification system, data is classified based on language information such as keywords extracted from electronic document data or keywords extracted from document image data read by an optical character reader.
When a system is to handle electronic document data and document image data (hereinafter, “document data”) in optimum forms for various purposes of use, the system needs to support various types of document data depending on various preferences and purposes of use of users. In particular, in recent documents, layouts and coloring are much more diversified. Therefore, in classifying document data, it is necessary to pay attention to not only language information such as keywords but also features of images.
In classifying the document data based on features of images, because it is difficult to represent the features of the images with “language” and “signs”, it may be difficult for users to designate classification criteria.
Moreover, because classification criteria are different for different users, each user has to designate a classification category for each of a large quantity of images. Therefore, a burden is imposed on the users and work efficiency degrades.