Latent word meaning usually refers to the latent meaning of a word or phrase and it usually may be expressed in one or more words or phrases. For example, the latent word meaning of “” (fridge) generally refers to “” (refrigerator), while the latent word meaning of “” (cotton slippers) generally refers to “” (all-cotton slippers), etc.
Many studies have been carried out on the automatic finding of latent semantics, most of which attempt to find near-synonym using the co-appearance and link relation of words. Some existing techniques use synonyms to determine the relationship between words. The number of vocabulary entries obtained based on manual-labeled corpora, however, is limited and it can be difficult to guarantee the effect of automatic finding of synonyms.
The indexing mode of a search engine typically includes separate word search, word partitioning indexing, and hybrid indexing. When using the separate word indexing technique, the distance between the separate words in a file typically needs to be calculated. Thus, the efficiency is often low and accuracy often poor. The problem is particularly pronounced in languages without natural word separators (e.g., spaces) between words, such as Chinese. For example, the differences among “” (pesticides), “” (Shen Nong pharmaceuticals) and “” (Shen Nong pesticides factory) cannot be readily distinguished using separate word indexing. In contrast, the word partitioning search technique has higher accuracy and is fast, but the recall rate is often low. For example, when it searches for “” (fridge), only the results for “” (fridge) can be found and the results of “” (refrigerator) cannot be found. For the hybrid indexing method in which the separate word indexing and the word partitioning indexing are combined, it usually first queries according to the word partitioning indexing technique and then queries according to the separate word indexing technique. For example, when “” (glass bottle) is queried, the results of “” (glass bottle) are found according to word partitioning indexing and then other results are found according to separate word indexing. This makes up for the disadvantages of the two earlier-mentioned methods, but “” (glass bottles) is found according to separate word indexing and the search engine cannot distinguish between “” (glass bottles) and “” (causing bottle neck), therefore the accuracy is affected. More effective search techniques are therefore needed.