Review websites often feature user-generated opinions. Such review websites may permit a user to provide reviews for any type of service provider, including, for example, a restaurant, bar, hotel, transportation company (e.g., airline, train, livery service), shopping venue, spa and beauty service provider, health care provides or institution, house repair/maintenance service provider, automotive services, pet-related services, professional or financial services provider, religious organization, or other. Review websites also may be directed to certain types of products such as consumer products or industrial products, or may be directed to a business as a whole.
Review websites are generally open for any user to submit a review, and accordingly, are subject to abuse. Users sometimes generate inappropriate or fraudulent reviews, which are termed “opinion spam” for purposes of this application. Some users systematically generate opinion spam on one or more review websites for monetary gain.
Opinion spam can range from annoying self-promotion of an unrelated website or blog to deliberate review fraud. An example of deliberate review fraud may occur when a company hires people to write positive reviews for an otherwise poorly reviewed product. Although opinion spam may include a false positive review (as in the example), opinion spam also may be a false negative review, potentially in order to damage the reputation of a competitor.
Conventional methods for detecting and preventing spam are directed to email spam and Web spam. However, because of the many differences in email spam and web spam, such as delivery method and ease of recognition, compared to opinion spam, known methods are not easily adaptable to detecting opinion spam.
Apparatuses and procedures for detecting and preventing certain types of opinion spam have been developed. Specifically, such procedures are directed to detection of what is termed “disruptive opinion spam”—that is, non-review information entered into review fields. Examples of disruptive opinion spam include advertisements, questions, and other irrelevant or non-opinion text entered into review fields in review websites. Typically, a user would be able to easily identify disruptive opinion spam because, for example, it does not actually review a product or service. Accordingly, while the presence of disruptive opinion spam may be a nuisance, it poses a minimal risk to the user since the users can easily identify and ignore it.
Other known procedures for detecting opinion spam are configured to identify duplicate opinions. Such procedures are trained to assess the review text, reviewer identification, and product, to distinguish between duplicate opinions—considered opinion spam—and non-duplicate opinions—considered truthful. Duplicate or near-duplicate opinions are opinions that appear more than once in the corpus with the same or similar text. Duplicate opinions are likely to be deceptive. However, such procedures do not identify non-duplicative deceptive opinions.
Additional known procedures for detecting opinion spam incorporate analysis of psycholinguistic qualities of deceptive opinions. To study this phenomenon, participants are asked to give both their true and untrue views on personal issues (e.g., their stance on the death penalty). However, while these research studies compare psycholinguistic qualities of deceptive opinions to a random guess baseline of 50%, they fail to evaluate and compare any other computational and manual approaches.
Another known procedure may include manual analysis of a small dataset, e.g., 40 non-deceptive reviews and 42 deceptive reviews. In one approach, the psychologically relevant linguistic differences between the non-deceptive review and deceptive reviews are compared manually. In another approach, the dataset is manually reviewed for distortions in popularity rankings. Clearly, such manual analysis is time intensive and relatively subjective.
Lastly, automatic approaches to determining review quality, review helpfulness, and review credibility have been developed. However, most measurements employed in that determination are based exclusively on human judgments, which are subjective and poorly calibrated to detecting opinion spam.
There is a need for a system and methods to detect deceptive opinion spam that includes automatic and time-efficient assessment of reviews and that minimizes subjectivity. The present invention satisfies this demand.