1. Field of the Invention
The present invention relates to a technology for encoding input data, and searching for data based on encoded data.
2. Description of the Related Art
Upon receipt of data such as an image file, a data searching device encodes the input data, stores therein the input data and the encoded data in an associated manner. When retrieving stored data, the data searching device encodes either or both title and keyword specified for desired data and, based on them, retrieves and outputs the desired data.
For example, Japanese Patent Application Laid-open No. H9-270902 discloses a conventional technology to store image data read from a material document as an image file into a storage, and to search for a desired image file from stored image files. Specifically, a ratio of the width to the height of a character rectangle in a text area is extracted from the image data as a material document attribute. The ratio is encoded based on a threshold, and the obtained code is written to each rectangle. The code is managed as registration key information in a manner associated with the image file to be used for searching for the image file.
Another conventional technology has been proposed, in which a projection histogram is created as an attribute of a text area included in image data. The projection histogram is normalized, and then encoded based on the number of black pixels at each position in the normalized projection histogram, and the encoded projection histogram is managed in a manner associated with the image data to be used for searching for the image data.
However, according to the former conventional technology, the ratio of the width to the height of a character rectangle is individually calculated on each extracted rectangle, and a code is assigned on each rectangle based on the calculated ratio. Consequently, if rectangles are extracted from pieces of text of the same contents that are enlarged or reduced with different aspect ratios, different codes are assigned to characters in the pieces. Consequently, if subject image data has a different aspect ratio (see FIG. 27), it cannot be retrieved based on codes assigned thereto.
According to the latter conventional technology, the projection histogram is encoded based on the number of black pixels corresponding to each position in the projection histogram. If the image data is enlarged only horizontally, the number of black pixels changes. As a result, different codes are assigned to the image data having the same contents.