Employers and advertisers have used personality profiling for decades to target specific individuals for specific job functions, products, or services. Recently, there has been an increasing unease regarding the use of such psychological tools, especially with respect to liability exposure and invasion of privacy considerations. This unease may arise from having third-party companies use personality profiles without the consent and/or knowledge of individuals. A tool is desired that enables individuals to knowingly use their personal significance pattern to search for target information, such as information on jobs, products, and services, thereby reversing the traditional control of such profiling data and alleviating the nonconsensual use of such information.
Search engines, such as Alta Vista, Excite, Webcrawler, and the like, are available on the Internet. Users typically enter a keyword on the Web page and the search engine returns a list of documents (e.g., through hyperlinks) where the keywords may be found. (Individuals and users herein are used interchangeably.) Depending on several factors such as the keywords used, the search engine's algorithms, available user related data, and the like, the resulting list may contain hundreds and even thousands of documents. A way to refine a search result, i.e., shorten the list returned, based on the personal characteristics and/or archetypes (e.g., “personality”) of a user is highly desirable.
Targeted marketing of individuals on the Internet is also common. Displayed advertisements or offers may also be keyword-linked, such that advertisements indexed or related to certain keywords are displayed only if the user enters at least one of those keywords.
This could be seen, for example, by a user entering a keyword, e.g., “travel,” on a search engine's search box and having advertisements related to the keyword “travel,” e.g., books on travel, travel agencies, cruises, and the like, be displayed on the resulting Web page. Such keyword-linked mechanism, however, does not take into account the personality, behavior, or psychology of a user. (A user's personality, behavior, and psychology are herein collectively referred to as “personality”). A way to take into account a user's personality so as to have a more efficient and effective targeted marketing is highly desirable.
Targeted marketing conventionally also employs information about the user. Internet service providers (ISPs), for example, monitor users who are logged into their system. They monitor the user for information such as Web sites visited, purchasing pattern, types of advertisements clicked, gender, resident address, types of articles read, and the like. Using such information, a profile based on these prior and explicit declarations of interest is created for each user such that only advertisements that would likely interest the user are displayed on a Web page. However, such personal profile information is usually obtained without the consent or knowledge of the user and typically does not adequately predict a user's preference when a new situation occurs, such as a search for an item that the user has never requested or explicitly expressed an interest in before. It is often difficult or impractical to obtain specific preference data for an individual relating to all the products, services and information with which that individual may be usefully matched. Thus, a way to efficiently match users with target information (e.g., via a search engine or targeted marketing) which is not keyword-linked and does not require users to explicitly declare an interest in that information beforehand, is desired.
Target information as defined herein includes all information that a user may want to do a search on or information that a third party may want to present (e.g., auditory) or display to a user. It also includes information such as information on products and services, articles, music, logos, advertisements, images, videos, and the like.
Several patents address targeted marketing and searches on the Internet but none addresses users' control on their significance patterns enabling them to utilize their user significance patterns to search for target information based on their personality. None addresses the creation of user significance patterns by having users participate in an online psychological test and based on such psychological test taken, create and maintain classifications and archetypes that would be employed in matching target information to a particular user, whether such matching is a result of a search or targeted marketing. None addresses the creation and maintenance of classifications based on characteristics and/or archetypes, typically independent of the content of the target information and abstracted from independent information obtained from a psychological test taken, and using such classification to match information. U.S. Pat. No. 5,848,396 issued to Gerace teaches a method of targeting audience based on profiles of users, which are created by recording the computer activity and viewing habits of the users. This method is based on the explicitly declared interests of users. U.S. Pat. No. 5,835,087 issued to Herz et al. teaches a method of automatically selecting target objects, such as articles of interest to a user. The method disclosed in Herz generates sets of search profiles for the users based on attributes such as the relative frequency of occurrence of words in the articles read by the users, and uses these search profiles to identify future articles of interest. This method depends on the use of keywords, which also requires an explicit declaration of interest from the user.
European Patent Application EP-A-0718784 describes a system for retrieving information based on a user-defined profile. A server acting on behalf of the client identifies information on the basis of the user-defined profile, to generate a personalized newspaper which is delivered to the user. This provides for an automatic sorting of the large volume of data available on the World Wide Web to generate a subset of information which is tailored to the user's specific interest. However this system is only used for providing newspaper data to a static user whose desires may change periodically.
Traditional marketing methodology often involves making deductions of interest based on crude demographic attributes such as age, education level, gender and household income. However, these methods of ascertaining user interest in a specific product or service are typically very inaccurate and the level of targeting achievable through these demographic methods is typically poor. Moreover, some of these user attributes (such as education, age, and income) are subject to change over time. In the present invention, a method is described where the user's cognitive style is abstracted from a set of specific responses. This is a relatively stable “signature” or significance pattern qualifying an individual's interest in products, services and information (i.e., target information) in a fundamental manner. This significance pattern is not based on demographic attributes.
From the discussion above, it should be apparent that there is a need for an online psychological patterning system that enables users to classify themselves based on characteristics and/or archetypes, and to use such characteristics and/or archetypes to obtain or receive target information better suited to their personality. Such a system would have much wider applicability than currently used systems, because specific declarations of interest through selection of keywords or other similar user input would not be required for each user. Once the user's cognitive style is ascertained, the user's abstracted significance pattern would be applicable to a variety of foreseen and unforeseen situations over time.
What is needed is a system where the psychological significance pattern is under the user's control, where the user is classified under a classification that is created through an online psychological test, where the classification is used to match users with target information, and which contains the above features and addresses the above-described shortcomings in the prior art.
The methodology for the technical solution to these problems described hereunder, represents a generic set of procedures for rapidly analyzing complex biological data sets and uncovering novel relationships within them. This innovation is relevant to meeting (a) the general need for new tools to investigate complex systems; and (b) the practical need for shortcuts that will generate useful predictions from complex data, even under the computational constraints of ‘point-of-use’ devices.
Multivariate data derived from a variety of sources, represent a vector of measures that describe the state or condition of a particular subject. Accessing the descriptive and predictive capabilities inherent in these vectors requires the use of powerful but general analytic techniques. Standard statistical analysis packages that contain this “toolbox” of techniques are commercially available (e.g., SAS™, SPSS™, BMDP™), as are an array of texts describing general multivariate techniques (Johnson, 1998; Sharma, 1996; Tabachnick and Fidell., 1996; Srivastava and Carter, 1983; Romesburg, 1984). However, while supplying the basic tools for formal analysis, none of these resources specifically addresses the issues faced when trying to extrapolate from these kinds of data to probable outcomes in “real-world, real-time” settings.
Significant efforts to understand the complexity of dynamics these kinds of data provide are presently underway across an array of scientific disciplines. For example, RNA expression data generated from genome-wide expression patterns in the budding yeast S. cerevisiae, were used by Eisen, et al. (1998) to understand the life cycle of the yeast. They employed a cluster analysis to identify patterns of genomic expression that appear to correspond with the status of cellular processes within the yeast during diauxic shift, mitosis, and heat shock disruption. The clustering algorithm employed was hierarchical, based on the average linkage distance method. Similarly, Heyer and colleagues (Heyer et al., 1999) developed a new clustering methodology that they refer to as a “jackknifed correlation analysis”, and generated a complete set of pairwise jackknifed correlations between expressed genes, which they then used to assign similarity measures and clusters to the yeast genome.
Applying graph theory to this same kind of problem, Ben-Dor, et al (1999) developed another form of clustering algorithm, which they eventually applied to similar data. And Tamayo, et al. (1999), Costa and Netto (1999), and Toronen et al. (1999) each approached this kind of multivariate problem by developing a series of self-organizing maps (SOMs), a variation on the k-means clustering theme. Tamayo's experience is illustrative of the point. Microarray data for 6416 human genes were generated from four cell lines, each undergoing normal hematopoietic differentiation. After applying a variance filter, 1036 genes were clustered into a 6×4 SOM. These developed into archetypes descriptive of the expression patterns roughly associated with cell line and maturation stage.
Other techniques try to project the problem from the multivariate space into a series of bivariate ones. Walker, et al. (1999) developed a “Guilt-by-Association” model that in essence reduces a gene-by-tissue library to a matrix of “present” or “absent” calls in a series of standard 2×2 contingency tables. In their model, under the assumptions of the null hypothesis, the “presence” and “absence” calls across libraries for each fixed pair of genes should be distributed as a Chi-square. Using Fisher's Exact test, a p-value testing the assumption of “no association” is then calculated. They decrease their analysis-wide false positive rate by applying the appropriate Bonferroni correction factor to the multiple comparison problem. Applying this technique to a set of 40,000 human genes across 522 cDNA libraries, they were able to identify a number of associations between unidentified genes and those with known links to prostate cancer, inflammation, steroid synthesis and other physiological processes.
Greller and Tobin (1999) developed a more general approach to the pattern recognition/discrimination problem. They derived a measure of statistical discrimination by establishing an analysis that transposes the clustering question into an outlier detection problem. Assuming a uniform distribution of interstate expression, and by accounting for both a statistical distribution of baseline measures and uncertainty in the observation technology, they derive a decision function that assigns a subject, in their case a gene, to one of three states: selectively upregulated, selectively downregulated, or unchanged. And Brown, et al. (2000) derived a knowledge-based analysis engine based on a technique known as “support vector machines” (SVMs). These “machines” are actually nonlinear in silico discrimination algorithms that “learn” to discriminate between, and derive archetypes for, binarially attributed data.
Complex biological systems often yield measurements that cannot easily be analyzed by reductionist means. As new technologies expand the rate, scope and precision with which such measurements are made, there is an accompanying need for new analytical tools with which to understand the underlying biological phenomena. Furthermore, ubiquitous access to modest computational power (in handheld devices, for instance, or on web client-server systems) has made it possible to imagine a range of field applications for such analytical tools, provided they are simpler and easier to use than more formal statistical packages. Protigen, Inc. (Applicant herein) has been testing the use of conventional web server-based architectures (accessible through desk-top and wireless handheld devices) for real-time analysis of complex biological data, consistent with the modest computational overhead that can be afforded each simultaneous user in a large web community. The goal is to explore the possibility of applying such tools to such areas as the real-time adjustment of online education to a user's cognitive (learning) style, point-of-care serum diagnostics for osteoporotic women, and the accurate prediction of a protein's solubility in a heterologous system based on its sequence.
Those skilled in the art will further recognize the wide applicability of such methodology to problems in areas ranging from psychology, knowledge management, artificial intelligence, and text-searching to cancer and pharmacogenomics. The following cited example data sets are not intended to limit the scope of the invention:
1. Cognitive test and behavioral preference data from a cohort of 1373 anonymous online users. The fundamental assumption underlying this type of psychometric analysis, a staple of personality psychology over the past fifty years, is that the human mind is a complex biological system whose state attributes can be reliably measured by self-reports. A second assumption is that these state attributes influence human behavior. The results obtained from our preliminary analysis are described in greater detail below.
2. Detailed serum biochemistry and 3-year bone mineral density data from a cohort of 220 osteoporotic women. A point-of-care diagnostic that could deduce the rate of aggregate bone loss from multivariate clues provided by the serum levels of insulin-like growth factors, selected binding proteins, and CICP would be invaluable for identifying post-menopausal women at high risk of developing complications from osteoporosis. An exciting possibility is that the relative levels of these biochemical markers carry information that cannot be derived from the levels themselves.
3. Solubility and amino acid sequence data from a set of 180 eukaryotic proteins expressed in E. coli as part of a genomics program. The effects of amino acid composition on heterologous protein solubility have been investigated by a number of groups (Wilkinson and Harrison, 1991; Zhang et al, 1998) but the interaction of a protein's structural and chemical attributes with a foreign environment appears to be multivariate in nature and has, so far, eluded all predictive algorithms. Since less than 30% of any random cDNA sequence will result in soluble (i.e. assayable) protein when expressed in an E. coli host, even with the use of fusion partners such as thioredoxin, there is built-in inefficiency in any high-throughput screen employing a bacterial cell for evaluating eukaryotic collections. An appropriate pre-screen in silico could lower screening costs by a factor of 3 or more.