In conventional full text retrieval, an index is made for character strings to execute high-speed retrieval; however, no index is made for numerical values. Further, although comparison for matching numerical character strings is performed, comparison for matching numerical values is not. For example, different numerical expressions such as a string including two-byte numerals “6850” and a string including one-byte numerals “¥6,850” are determined to be “not matching” when compared as numerical character strings notwithstanding the fact that both express an identical numerical value.
The Electronic Ledger Storage Law stipulates that a forms system must have a retrieval function suitable for designation of a numerical range, such as that, for example, disclosed in Japanese Laid-Open Patent Application Publication No. H3-19081. Electronic ledgers are conventionally stored in a compressed/encoded form for data volume reduction and security.
Thus, in numerical value retrieval with respect to stored electronic ledgers, a file of each ledger is decompressed and decoded, and is compared in size with each numerical value of the data. This process requires enormous hardware resources and extensive processing time. Relevant full text retrieval methods are classified into the following three types, the respective speeds of numerical retrieval for which cannot be improved.
Advanced-index, full-text retrieval: In full text retrieval with respect to the Internet, an advanced index is used to increase retrieval speed. However, numerical value retrieval requires an index of an even greater size, and is, therefore, not improved in terms of retrieval speed with this retrieval method.
Simplified-index, full-text retrieval: This retrieval method increases the speed of full text retrieval using a simplified index, such as a character composition table for kanji; however, the speed of numerical value retrieval does not increase.
Indexless full-text retrieval: This method involves performing character string checks with respect to all data using a high-speed retrieval engine, but does not increase the speed of numerical value retrieval.
However, with the numerical value matching retrieval according to the conventional techniques above, faster retrieval speeds cannot be achieved for matching retrieval with respect to noncompressed/nonencoded numerical values, which leads to extremely time-consuming numerical value matching retrieval. In addition, the conventional numerical value matching retrieval does not determine different numerical expressions such as “6850” and “¥6,850” to be “matching”, bringing about a need for faster matching retrieval using a simplified index and an improved method of determining “matching”.
When numerical value matching retrieval is performed with respect to compressed/encoded numerical values according to conventional matching retrieval, electronic ledger data, etc., which has been compressed and encoded for storage and security, are subject to size comparison after decompression and decoding; thereby requiring a large amount of processing time. Therefore, a problem arises in that improving retrieval speed involves revision of decompression and decoding processes.
When numerical value range retrieval is performed with respect to noncompressed/nonencoded numerical values according to conventional matching retrieval, a problem arises in that faster retrieval involves using a simplified index, similar to the above case of “numerical value matching retrieval”.
When numerical value range retrieval is performed with respect to compressed/encoded numerical values according to conventional range retrieval, a problem arises in that improving retrieval speed involves revision of decompression and decoding processes, similar the above case of “numerical value matching retrieval”.
With no particular method found in the conventional technique, abstracted retrieval with respect to numerical values concerning clinical test data, etc., involves alteration of numerical value range retrieval mixed with human judgment and a combination of various retrievals. Hence, the establishment of a technique for data abstraction and improved retrieval speed are desirable.