Speedy and quick information retrieval systems and methods e.g. like web based search engines and the use thereof are increasingly important and popular tools within many business areas as well as for private use.
Web based search engines e.g. like the ones provided by Google and other companies are popular due to being quick to deliver a search result and easy to use for most users. The search engines are optimised in various ways to provide links to documents or web-pages (forth only denoted documents) where the retrieved documents often are sorted or ranked based on containing the provided key-words specified in the search as well as the popularity of the retrieved document in some form(s). The popularity measure(s) or metric(s) may include how often a given document is linked to by other documents or sources and how popular they themselves are (PageRank), user visit rates, and/or other forms of user recommendation. Such measures favour documents with many in-links (backlinks) or results often viewed by users.
Focus on quickly returning a search result by such search engines entails some trade-offs and they are often optimised for search queries containing a few keywords, typically about 2-3.
This makes such search engines less usable for certain tasks, areas, or domains. Not related to speed of delivering the search result but related to the relevance of the retrieved documents of the search result.
Within the area of medically related information, the internet has become a primary source of information about illnesses and/or treatments with an exponential growth in both volume and amount of entries available. This source of information is used by both non-expert and expert medical users e.g. in the form of private persons and medical professionals.
One example of a medical expert is e.g. a clinician that may use web-based search engines e.g. used in assisting with the iterative cycle of hypotheses about a given disease being formulated from evidence followed by the collection of additional discriminating evidence.
One medically related area where current web-based search engines do not perform well is e.g. the area related to rare or so-called orphan diseases. The exact definition of what constitutes a rare or orphan (forth only denoted rare) disease e.g. in terms of prevalence, threshold, and requirement for severity various across the globe, but a disease may be said, in general, to be rare if it affects fewer than about one in two thousand individuals. Currently around seven thousand rare diseases are known and it is estimated that about 6-8% of the population will be affected by a rare disease during their lifetime. Due to their rarity and large number, ordinary diagnosis of rare diseases is difficult and often associated with year long delays and diagnostic errors.
A study Rare Diseases (EURORDIS) e.g. showed that 40% of rare disease patients were wrongly diagnosed before the correct diagnosis was given and that 25% of patients had diagnostic delays ranging between 5 and 30 years.
One reason for current web-based search engines also not performing well or optimally within this particular area is due to precisely the fact that such diseases are rare and thus any ranking of relevance of a document source using a popularity-based measure or metric will tend to disregard them. Information of rare diseases is (relatively) very sparse and less hyperlinked than other medical content.
Additionally, efficiency concerns may have led to brute-force index pruning e.g. by removing low frequency terms and/or terms that are (relatively) unusually long (e.g. removing the term “hydrochlorofluorocarbons”), which is not beneficial when retrieving relevant documents related to rare diseases.
Another reason is, as mentioned, that most current web-based search engines are optimised for very short queries (often about 2-3 terms long) whereas a useful medically related query comprising of patient symptoms (both for rare and non-rare diseases) and/or characteristics of a patient usually needs to be much longer to be meaningful. They may e.g. easily be as long as 10-20 terms.
Furthermore, such relevant queries often contain symptoms expressed as multi-word units. But most current web-based search engines often make term independence assumptions in order to increase efficiency. As an example, most current web-based search engines will not distinguish between the two different queries “sleep deficiency, increased sexual appetite” and “sexual deficiency, increased sleep” hence returning non-relevant search results.
Furthermore, some symptoms listed in a query may not apply to the given correct disease and/or some pertinent symptoms for the given correct disease may be missing from the query because they are masked under different conditions. However, many or most current web-based search engines are designed to maximise the match between all the query terms and the returned documents.
Specific medical decision support or expert systems have also existed for quite a long time and a number of success stories exist. However, they require user training and a relatively high cost of keeping information up to date and expand it, requiring the use of experts, which has hindered a widespread and sustained use. It may virtually be impossible to keep such a system up to date, especially in fields like medicine, where the amount of information found in textbooks, cases studies, research articles, etc. doubles approximately every 5 years.