In recent years, high-traffic technology is sometimes used for searching the data stored in database systems (see, for example, Japanese Laid-open Patent Publication No. 2007-241516). The high-traffic technology realizes reduction in search time even when search requests are simultaneously transmitted. The high-traffic technology mentioned here is a technology for creating a single automaton from a plurality of search requests and then matching, by using the created automaton, the search requests with target data of the search. FIG. 13 is a schematic diagram illustrating an example of a data search performed using high-traffic technology.
As illustrated in FIG. 13, for example, a database system integrates search requests 1 to 4, which are simultaneously transmitted. The database system then searches collectively search target data for data corresponding to the integrated search request. The database system allocates the respective search results to the search requests 1 to 4. Accordingly, even when there is a high degree of overlap in the search requests simultaneously transmitted, the search request data are searched collectively; therefore, the time taken to search for the data (hereinafter, referred to as a “search time”) can be estimated (For related technology, see Japanese Laid-open Patent Publication No. 11-232302, for example).
Furthermore, an index technology is used as a technology for reducing the search time of data. The index technology is a technology which allows for directly selecting (searching) a record having a predetermined value in a column of a table in a relational database, and for directly searching data indicated by a specific path from data that has a predetermined structure, such as an XML document. FIG. 14 is a schematic diagram illustrating an example of a data search using the index technology.
As illustrated in FIG. 14, the search request 1 searches for data d2 and d4 (d2′ and d4′) indicated by an index path 2 and an index path 4, whereas the search request 2 searches for data d3 and d4 (d3′ and d4′) indicated by an index path 3 and the index path 4. Accordingly, the search request 1 and the search request 2 search a minimum required number of data without searching all of the data, thus reducing the data search time.
However, data searched according to one search request may be searched by another search request. In other words, searched data may overlap. In the example illustrated in FIG. 14, the data d4 (d4′) indicated by the path 4 is searched by both the search request 1 and the search request 2 as indicated by D1 (D1′). Because the overlapped data d4 (d4′) is searched twice, the data search efficiency is reduced due to redundant search.
To further reduce the data search time, a technology that adapts the index technology for high-traffic technology may be employed. This technology integrates search requests that are simultaneously transmitted, creates a union of sets of index paths of the integrated search requests, and collectively searches data indicated by the index paths belonging to the created union. FIG. 15 is a schematic diagram illustrating a technology that uses the index technology for high-traffic technology.
As illustrated in FIG. 15, the search request 1 searches for data indicated by the index path 2 and the index path 4, whereas the search request 2 searches for data indicated by the index path 3 and the index path 4. In such a case, this technology creates the union of sets of index paths 2, 3, and 4 for the search requests 1 and the search request 2, and collectively searches for data d2, d3, and d4 (d2′, d3′, and d4′) indicated by the paths belonging to the created union. Accordingly, even when data searched by one search request is also searched by another search request, this data is searched only once as indicated by D2 (D2′) in FIG. 15, thus improving the data search efficiency.
Furthermore, there is a technology for allocating search requests transmitted almost simultaneously, in other words, allocating search expressions, to a plurality of search expression sets in accordance with predicted search speed of each search expression; sequentially performing a search based on the search expression sets in a descending order of the predicted search speed; and thus performing a search collectively based on the search expressions in a corresponding search expression set (see Japanese Laid-open Patent Publication No. 2009-251686, for example).
However, when adapting the index technology for high-traffic technology, there is a problem in that the data search efficiency is not always improved. Specifically, if a volume of overlapped data searched by search requests that are simultaneously transmitted is small, the effect of searching for overlapped data only once is trivial, thus, the data search efficiency may not be improved.
In the following, the problem of lack of improvement in the data search efficiency will be more specifically described with reference to FIG. 16. FIG. 16 is a schematic diagram illustrating a problem that arises when the index technology is adapted for high-traffic technology. As illustrated in FIG. 16, the search request 1 searches for data indicated by the index path 2 and the index path 4, whereas the search request 2 searches for data indicated by the index path 1 and the index path 3. In such a case, the technology creates the union of sets of the index paths 1, 2, 3, and 4 for the search requests 1 and 2 and collectively searches for data d1 to d4 (d1′ to d4′) indicated by the index paths belonging to the created union of sets. However, because there is no overlapped data searched by the search request 1 and the search request 2, the effect of the searching for the overlapped data only once is reduced. Accordingly, the data search efficiency is not improved.
Furthermore, even if a technology that sequentially searches for search expression sets in an order for which the search speed is predicted to be high, when the data searched by the search expressions contained in the search expression sets do not overlap, the data search efficiency is not improved.
The problem described above also occurs when searching for data stored in database systems, as well as when searching for data using a search engine on the Internet or the like.