Deception is the intentional falsification of truth. There are many shades of deception from outright lies to “spin”. Content in social networking sites such as Craigslist and Facebook, blogs, emails, witness testimonies in a court of law, answers to job interview questions, personal essays in online dating services, insurance claim forms, etc. are candidates for deception.
The Internet is evolving into a medium that is beyond just web searching. Text based applications such as emails, social networks, chat rooms, blogs, etc. are some applications that are already popular or gaining popularity. E-mail is one of the most commonly used communication media today. Clearly, this presents opportunities for deceptive or fraudulent activities. Deception is interpreted to be the manipulation of a message to cause a false impression or conclusion, as discussed in Burgoon, et al., “Interpersonal deception: Ill effects of deceit on perceived communication and nonverbal behavior dynamics.” Journal of Nonverbal Behavior, vol. 18, no. 2, pp. 155-184 (1994) (“Burgoon), which is incorporated by reference herein. Psychology studies show that a human beings ability to detect deception is poor. Therefore, automatic techniques to detect deception are important.
There has been a reasonable amount of research on deception in face-to-face communication, as discussed for example in Burgoon, Buller, et al., “Interpersonal deception theory,” Communication Theory, vol. 6, no. 3, pp. 203-242 (1996) (“Buller”) and Burgoon, et al., “Detecting deception through linguistic analysis,” ISI, pp. 91-101 (2003) (“Burgoon II”), the disclosures of which are hereby incorporated by reference. There is very little work in modeling and detecting deception in text, especially relatively short texts, such as, electronic textual communication messages, such as emails. In face-to-face communication or in vocal communication (e.g., cell phone communication) both verbal and non verbal features (also called cues) can be used to detect deception. But, the problem is harder in, e.g., email communication because only the textual information is available to the deception detector. Previous research regarding deception detection use theories developed for deception detection in face-to-face communication, such as is discussed in Zhou, et al., “An exploratory study into deception detection in text-based computer-mediated communication,” Proceedings of the 36th Hawaii International Conference on System Sciences, Hawaii, U.S.A. (2003) (“Zhou”), Zhou, “Automating linguistics-based cues for detecting deception in text-based asynchronous computer-mediated communication,” Group Decision and Negotiation, vol. 13, pp. 81-106, (2004) (“Zhou II”); Zhou, et al., “Language dominance in interpersonal deception in computer-mediated communication,” Computers in Human Behavior, vol. 20, pp. 381-402 (2004) (Zhou III”) and Zhou, “An empirical investigation of deception behavior in instant messaging,” IEEE Transactions on Professional Communication, vol. 48, no. 2, pp. 147-160 (June 2005) (“Zhou IV”), the disclosures of which are incorporated by reference herein.
Some deception detection theories include media richness theory, channel expansion theory, interpersonal deception theory, statement validity analysis and reality monitoring. Some studies show that the cue space may be richer for instant messaging type applications, such as is discussed in Zhou, et al., “Can online behavior unveil deceivers?—an exploratory investigation of deception in instant messaging,” Proceedings of the 37th Hawaii International Conference on System Sciences, Hawaii, U.S.A., 2004 (“Zhou V”) and Madhusudan, “On a text-processing approach to facilitating autonomous deception detection,” in Proceedings of the 36th Hawaii International Conference on System Sciences, Hawaii, U.S.A. (2002) (“Madhusudan”), the disclosures of each of which are incorporated by reference herein.
It has been recognized that cues that indicate deception (“deception indicators”) for instant messaging type data and email data may differ. In Zhou, et al., “A comparison of classification methods for predicting deception in computer-mediated communication,” Journal of Management Information Systems, vol. 20, no. 4, pp. 139-165 (2004) (“Zhou VI”), which is incorporated by reference herein, the authors apply discriminatory analysis, logistic regression, decision trees and neural networks for deception detection. A neural networks based classifier is observed to achieve the most consistent and robust performance. In Thou, et al. “Modeling and handling uncertainty in deception detection,” Proceedings of the 38th Hawaii International Conference on System Sciences, Hawaii, U.S.A. (2005) (“Zhou VII”), which is incorporated by reference herein, the model of uncertainty in deception detection is considered. A neuro-fuzzy method is proposed to detect deception and is shown to outperform the previous cues-based classifiers. See also Zhou, et al., “A statistical language modeling approach to online deception detection,” IEEE Transactions on Knowledge and Data Engineering (2008) (“Zhou VIII”), which is incorporated by reference herein. It is noted that analyzing emails for deception is a harder problem because of their typically smaller average word length. What is needed is a method of detecting deception in emails.