In recent years there has been a massive movement towards computerizing medical data for various health service e organizations. However, making doctors write down their examination documents and their diagnostics using specific codes and sentences to write down the prognosis of each patient, will inevitably lower their productivity. Thus, most modern systems designed for computerizing medical data today go the path of natural language processing (NLP), allowing the doctors to write down their prognosis the way they are used to, and using computer analysis to extract vital information such as information about a patient, about illnesses, treatments etc. through the use of natural language processing (NLP).
Naturally, this process presents many problems. One of them is the need to analyze and normalize sentences—for example “there is no sign of a hernia”; This prognosis can be written in many forms in natural language—for example “hernia has been ruled out”, or “no apparent sign of a hernia” and so on. These variations appear in different documents, and they all express the same concept.
Most algorithms, such as the ones described in the public Stanford NLP pages and in many patents, refer to web searches. In these cases users fail to choose effective query terms. Often documents that satisfy user's information need may use different words than the query terms. We are interested in professional information retrieval system aimed to be used by professional community, such as health data retrieval system. In this case the query is expressed with the exact terms, but the meaning of the query depends on the whole phrase. In many cases the query defines allowed distances between words, but they do not require that that words in the phrase are in the same sentence. Thus wrong results can be retrieved.