User authorization methods based upon biometric data, for instance using users' physiological characteristics (such as fingerprint) or behavioral characteristics (such as keystroke pattern), have been employed as security measures. Biometrics pattern based user authentication system can be employed for both (1) verification and (2) identification. In user verification system, user makes a claim of status (such as using a login id) and the system performs a one-to-one search to verify the claim. In user identification system, the system performs a one-to-many search to identify a user from a set of users. User verification system typically makes a decision on the authenticity of the claimed user using some user-defined-threshold(s). Once verified, the identified user may be checked as an authorized user. As used herein, “user identification” includes both verification, identification and authorization functions.
Prior art identification using keystroke patterns includes use of a fixed text string as a means of identification, but not a string of arbitrary symbols (e.g. “free text”). Keystroke patterns provide a cost-effective solution for the user authentication problem. This is because keystroke patterns based user authentication system does not require a separate device for extracting the measurements from users' typing to form a keystroke pattern. Additionally, keystroke patterns based user authentication system can be used for both static user authentication (i.e., authenticating user before giving him (or her) access to the computer system) and continuous user authentication (i.e., after a user has been given access to the computer system, authenticate the identity of the user periodically). Continuous user authentication using keystroke patterns would not be intrusive for the users because the system could run in the background while the user is typing at a keyboard.
User authentication using a physiological biometric identifier is considered to be more successful than user authentication using keystroke patterns. One reason may be that keystroke patterns are from the domain of behavioral biometric identifiers and typing behavioral biometric identifiers may change between two provided typing samples because of change in psychological (or physiological) condition of the user. In addition, keystroke patterns of a user may show some variation between two consecutively provided typing samples even without any evident change in the psychological (and physiological) condition of the user.
Therefore to minimize the effects of variability in the keystroke patterns on the performance of user authentication system, most of the previous studies have reported the performance of their proposed user authentication methods using the following experimental settings: (1) each user provided more than one typing sample of a fixed text string to create his (or her) typing profile; (2) users provided all the typing samples in one session (consecutive samples) to supply keystroke data for creating their typing profiles; and (3) the typing sample was discarded if the user made any typing error while providing a typing sample. From these experimental settings, these authentication methods created a typing profile of a user using a structured text analysis, where the words and arrangement of words typed is fixed. However, the typing profile of a user which is derived from consecutively provided fixed typing samples may not be an accurate representation of the user's typing at a keyboard. This is because keystroke patterns of a user can change with change in psychological condition of the user. Also, we conclude that these authentication methods are not applicable for the problem of identifying a user given arbitrary text input (i.e. “free” text”), or portions of text randomly taken from a larger test manuscript. Since an arbitrary text string is not the same string of characters input by the user, “impersonation” is more difficult to achieve. An arbitrary text model is more desirable as it would be more difficult for an imposter to replicate the typing profile of an authorized user on arbitrary text. As used herein, “arbitrary” text or free text, in the broad sense, means a typing sample that is not a fixed string of symbols that the user would always type. For instance, a user ID/password is considered “fixed” text, not free text, as the text is constant over a period of time. Arbitrary or free text means that the test text varies from session to session, and may simply be random text input by the user.
Furthermore, presence of outliers in the data can adversely affect the performance of keystroke patterns based user authentication system if the outliers have not been detected and dealt with effectively. This is because when some observations deviate too much from other observations (i.e., outliers), and if used for creating a typing profile of a user, then the typing profile may not accurately represent the user's normal typing at a keyboard. Some prior studies have detected outliers in the keystroke data, but these have detected outliers using some standard statistical distribution techniques.
An improved keystroke identification/authorization technique capable of using arbitrary text, and improved outlier detection methods, are needed. The following is based upon a PhD dissertation by Shrijit S. Joshi, supervised by Dr. Vir Phoha, entitled “Nave Bayes and Similarity Based Methods for Identifying Computer Users Using Keystroke Patterns”—presented at the College Of Engineering And Science, Louisiana Tech University, in Ruston, La., which is hereby incorporated by reference.