1. Field of the Invention
The present invention relates to a data retrieval apparatus and a data retrieval method that performs high-speed retrieval processing by use of a database engine.
2. Description of the Related Art
Along with increase in flash memory capacity in recent years, even resource-constrained devices such as a digital camera or a photo frame, an inkjet multifunctional peripheral and the like, in which there is a limit to computational resources, have come to have enough storage capacity to save several tens of thousands to several hundreds of thousands of data sheets. Further, an improvement of recognition technology enables providing a diversity of attribute values such as person names or location names.
There is a growing need to perform a wide variety and a large amount of data retrievals at high-speeds using these attribute values, even in the resource-constrained devices. In a case of photo data, attribute values such as dates, shooting parameters, Global Positioning System (GPS) coordinates serving as position information, etc., are assigned during shooting. Attribute values such as favorite degrees or printing specifications are assigned during reproduction. A typical format used to store these attribute values is the Exchangeable image file format. These attributes are useful in retrieving a user's desired data. However, in a case where data is retrieved and attribute values are of a wide variety and in a large amount, performing a total scanning on all pieces of data to perform the retrieval processes requires an enormous calculation amount, and accordingly causes delay of a response time.
Thus, in order to perform retrievals at a high speed, as a common approach, a response time is shortened by keeping an index built in advance that contains index information and using the index during retrieval. If a plurality of indexes can be used, a predetermined index will be selected and used. Selecting an index to be preferentially used, depending on a retrieval condition, becomes important for enhancement of a response speed. For example, in selecting the index, a rule that is generally used does not consider a hit ratio of a cache which retains the index, such as preferentially using a column having a small number of data and a high refinement effect.
However, when retrieval is repeatedly implemented, data reading from a database file is frequently performed, and a hit ratio of the cache greatly influences the response speed. This is because a transfer speed of an external storage unit in which the database file is placed is slower compared with that of a random-access memory (RAM) serving as a main storage, and frequent data reading degrades a computing performance of the entire apparatus. Generally, in order to address the problem, read and write frequency is suppressed by providing an intermediate layer such as the caching mechanism in the RAM or the main storage unit, but the caching mechanism has a small capacity because of the restrictions of computational resources in the resource-constrained device, which particularly influences the performance. For example, in a digital camera or the like, metadata is synchronized and updated between a personal computer (PC) and a camera. As a result, records of a large number of data are retrieved and updated, a hit ratio of the cache drops, and data reading from the database file frequently occurs.
Conventionally, as a retrieval method which focuses on a retrieval index selection method, for example, as discussed in Japanese Patent Application Laid-Open No. 07-311699, a method for selecting an index using analysis results such as data distribution in advance, or rebuilding a database file so that retrieval cost becomes low are available.
However, conventional methods such as the above-described example are premised on an operation in a PC or a server with abundant resources. An advance analysis of data distribution or rebuilding of the database file is not suitable for a resource-constrained device with limited computational resources. For this reason, index selection control must be performed in response to an instructed retrieval condition without relying on computational resources, regardless of the distribution of data in the resource-constrained device. It may be possible to perform analysis processing or rebuilding processing during standby time, but in many cases there is a limit to an electric power or computation ability.