1. Field of Invention
This invention relates generally to personalizing a user's interaction with content query systems and, more specifically, to inferring which user of a collection of users is the current user of an electronic device based on biometric data about the current user's keystrokes.
2. Description of Related Art
Methods of and systems for performing searches for content items are presented in U.S. Patent Application Publication No. US 2006/0101503, published May 11, 2006, entitled Method and System For Performing Searches For Television Content Items Using Reduced Text Input. As described in that application, a user can enter a reduced text search entry directed at identifying desired content items (e.g., television shows, movies, music, etc.). The system identifies a group of one or more content items having descriptors matching the search entry. The results can be ordered based on a relevance function that can be a domain specific combination of, for example, popularity, temporal relevance, location relevance, personal preferences, and the number of words in the input search string.
Methods of and systems for ranking the relevance of the members of a set of search results are presented in U.S. Patent Application Publication No. US 2007/0005563, published Jan. 4, 2007, entitled Method And System For Incremental Search With Reduced Text Entry Where The Relevance Of Results Is A Dynamically Computed Function Of User Input Search String Character Count. As described in that application, a user can enter a reduced text entry for finding content items across multiple search spaces. Based on the number of characters of the reduced text entry, the results from one search space are selectively boosted over results from a different search space.
Methods of and system for selecting and presenting content based on user preference information extracted from an aggregate preference signature are presented in U.S. patent application Ser. No. 11/682,695, filed Mar. 6, 2007, entitled Methods and Systems for Selecting and Presenting Content Based on User Preference Information Extracted From an Aggregate Preference Signature. As described in that application, a system discovers and tracks aggregate content preferences of a group of users based on content selections made through a common interface device. The content preferences of the individual users of the group are inferred from the aggregate content preferences using techniques described in that application. Once the individual content preferences are determined, the method enables a system to infer which user of the group is manipulating the common interface device at a later time by observing the content selections of the current user and finding the closest matching set of individual content item preferences. Upon inferring which user of the group is using the device, the system can then employ the content item preferences to enhance the user experience by promoting the relevance of content items that match the inferred user's preferences.
Several different methods of identifying users of a device on the basis of typing speed have been proposed. In short, prior art statistical classification systems apply first-order techniques and Gaussian likelihood estimations, followed by one or more geometric techniques. Also, non-statistical, combinatorial techniques are employed in validating the test sample. Clarke et al. present a non-statistical method in which typing speeds are collected on a fixed set of keys, and a neural network is used to classify the vector of typing speeds (N. L. Clarke, S. M. Furnell, B. M. Lines, and P. L. Reynolds, Keystroke Dynamics on a Mobile Handset: A Feasibility Study, Information Management & Computer Security, Vol. 11, No. 4, 2003, pp. 161-166, incorporated by reference herein). This is an example of a fixed text system, because it requires the user to enter a fixed set of keys during system initialization (D. Gunetti and C. Picardi, Keystroke Analysis of Free Text, ACM Transactions on Information and System Security, Vol. 8, No. 3, August 2005, pp. 312-347, incorporated by reference herein). Fixed text approaches are of limited utility, however, because it is often inconvenient for a user to enter a lengthy sequence of keys during this initialization phase. Most of the other work based on non-statistical techniques is described by Gunetti and Picardi, who conducted a comprehensive study on keystroke biometrics of free text on full-function keyboards.
Statistical techniques used for classification are presented by Joyce and Gupta and Monrose and Rubin (R. Joyce and G. Gupta, Identity Authentication Based on Keystroke Latencies, Communications of the ACM, Vol. 33, No. 2, February 1990, pp. 168-176; F. Monrose and A. D. Rubin, Keystroke Dynamics as a Biometric for Authentication, Future Generation Computer Systems Vol. 16, 2000, pp. 351-359, both incorporated by reference herein). Both of these studies use first-order techniques followed by geometric comparisons. Joyce and Gupta compute the mean and standard deviation of the reference sample, and, if the test sample is less than 0.5 standard deviations from the mean, it is considered to be valid. Additionally, this method treats the test sample of latencies as a vector in n-dimensions, and compares it to the reference vector by computing the L1-norm of the difference vector. If the L1-norm is small enough, the test sample is considered to be valid.
Monrose and Rubin perform similar vector comparisons, but use the Euclidean norm instead of L1-norm. They also compute the mean and standard deviation of the reference sample, which is assumed to be a Gaussian distribution. Given a reference sample with a mean μ and standard deviation σ, the probability of obtaining the observation value X is given by:
                              Prob          ⁡                      [            X            ]                          =                              1                          σ              ⁢                                                2                  ⁢                                                                          ⁢                  π                                                              ⁢                      ⅇ                                          -                                                      (                                          X                      -                      μ                                        )                                    2                                                            2                ⁢                                                                  ⁢                                  σ                  2                                                                                        (                  Equation          ⁢                                          ⁢          1                )            The probabilities are then summed over various features of the sample to obtain a “score” for each reference, and the reference sample achieving the maximum score is selected. In a variation of this approach, the weighted sum of probabilities is used. In both cases, the statistical technique used is (1) Fitting a single Gaussian to the reference sample and (2) Summing the resulting probabilities.
Monrose and Rubin also propose a more sophisticated Bayesian classifier, in which, using the terminology of Gunetti and Picardi, n-graphs are used to cluster the reference samples into features and a feature vector is formed for both the reference samples and the test sample. A likelihood probability is then computed based on the presumption that each feature is a Gaussian distribution. This technique is similar to the prior technique in the sense that it relies upon first-order statistical characteristics of the data.