As a technology for searching an electronic document, for example, in Japanese Patent Laid-Open No. 2006-73035 (Patent Document 1), a configuration described below has been disclosed. That is, an electronic document search system comprises: an index storage means to store an index word, a document frequency and document identifier of a registered document including the index word, as well as an in-document frequency and appearing position of the index word within each registered document; a document division means to divide a registered document into index words which are a chain of n characters (n is an integer no less than 1); a search word division means to divide a search word into index words that are one or more n-characters chains covering the search word; a search condition analysis means to generate a search condition tree synthesized with a position operator which specifies distances among appearing positions of two or more index words when the search word is divided into two or more index words; and a search condition evaluation means to carryout a search result synthesizing processing based on the search condition tree and acquire a search result.
In addition, a method described below has been disclosed in Japanese Patent Laid-Open No. 2008-140357 (Patent Document 2). That is, in the case where a document identification number is compressed into a byte string by Variable byte method, w bits within the byte string is used for representing the number of appearing of this index word within this document, and x bits are used for representing attribute information of a posting. The number of appearing which cannot be represented in w bits, after writing into the byte string a special value indicating that it is a numerical value which cannot be represented in w bits, is described by Variable byte method and is postposed. Here, x and w are integers given as parameters. In addition, a means by which a compressed posting can be read even from a position in the middle of the inverted list is made to be provided, and dichotomizing search on the inverted list is made to be possible.
In addition, a technology for searching an electronic document using an inverted index has been described also in Zobel, Justin and Moffat, Alistair “Inverted Files for Text Search Engines”, ACM Computing Surveys (New York: Association for Computing Machinery), pp. 8-9 pp. 19-23 Vol. 38 No. 2 Article 6, July 2006 (Non-patent Document 1).
In addition, an example of a data compression technology in a tree structure has been disclosed in National Publication of International Patent Application No. 2003-501749 (Patent Document 3). That is, a memory is executed as a directory structure body comprising a tree shape hierarchy having a node on a large number of different hierarchy levels. In this directory structure body, a pointer is added first to a width-compressed node that is a node where a table includes an element of a given first number. In order to make performance of a functional tree structure into the maximum, addition of a pointer indicating each width-compressed node is allowed as far as the number of pointers within the node corresponds to a prescribed threshold value smaller than the above-mentioned first number. The width-compressed node, as soon as the number of pointers which can be received in the width-compressed node exceeds the above-mentioned threshold value, is converted into a cluster of nodes formed of a parent node and individual child node.