Today, people search the internet to obtain information on every known topic. This is largely because information can be disseminated at an unparalleled rate and accessed so easily on the internet. Unfortunately not all the information published on the internet is correct. Nor are there ratings or applied standards to every website to alert people that information on the website may be incorrect. This leads to many people relying on incorrect information. In fact, certain groups of people may be more susceptible to relying on incorrect information found on the internet. For example, it has been reported that cancer patients are susceptible to relying on unproven cancer treatments that are posted on the internet.
Unproven cancer treatments are known as quackery with the quacks promoting them defined as untrained people who pretend to be physicians and dispense medical advice and treatment. The internet allows quacks to advocate inaccurate and unproven cancer treatments. These unproven treatments have had adverse outcomes and in some extreme cases the treatments have proven fatal. Studies have reported that a large majority of cancer patients used at least one unproven treatment, while some cancer patients have gone as far as purchasing unconventional medical therapies off of the internet.
There are several manual methods used to alert cancer patients to unproven cancer treatments on the internet. The Health-on-the-Net Foundation advocates self regulation of health related web pages. The foundation applies strict criteria to web pages and grants them a seal of approval if they pass. In another approach, experts produce rating tools that allow consumers to rate web pages so that future users can have a basis for evaluating a webpage. Another method is the manual review of individual web pages that are published either in print or electronically.
Unfortunately, all these methods have drawbacks: self-regulation relies on knowledge of the certification and a vigilant public to report failing web pages; rating tools are dependent on a knowledgeable public to apply, they are difficult to validate, time consuming to produce, and do not always produce consistent ratings; manual review suffers from limits in reviewer time and selection of web pages to review.
To try to solve these issues automated approaches have been developed to identify high quality web pages. One approach is to evaluated web pages content by combining a score measuring quality proxies for each page. Although this algorithm can discriminate between desirable and undesirable web pages, the algorithm does not measure content quality directly and some of the criteria used may not correlate with true content quality. In addition, Google has become a de facto standard for identifying and ranking web pages. Researchers have explored the use of the Page Rank score in assessing the quality of health web pages and found that it is not inherently useful for discriminating or helping users to avoid inaccurate or poor information.
What is needed is an automated approach for evaluating web pages content that measures the content quality directly and identifies high and low quality pages.