This invention pertains to technology used for data search, particularly data search over the Internet.
In many practical applications such as documents storing, searching, comparative analysis, and reconstruction, it is extremely important to have a solution allowing a user to compare and rank different keywords associated with documents.
Unfortunately such solution does not exist today, and there are many reasons for this.
First of all, the number of keywords used in any language is incredibly large. For example, there are over 200,000 general purpose words and over 500,000 special words and abbreviations in the English language alone. The number of keywords combing two, three, or four words (called terms) in the English language is respectively 5, 10, and 15 orders of magnitude larger.
Second, there are many parameters characterizing keywords from different points of view. Some of them contradict with others. For example, a search engine operator can characterize keywords by a number of available matches, number of assigned advertisements, or keyword CTR (click through rate). An Internet user can characterize keyword by language, complexity, length, or popularity. An advertiser can characterize keywords by biding prices, generated traffic, CPM, or conversion rate.
Third, there is a lack of any theoretical models that can effectively aggregate multiple viewpoints together in the unified working system to practically address the problem of comparing and ranking keywords and terms.
The proposed invention defines a method and apparatus to compute keyword masses based on the invented keyword mass computation technology.