It is useful to determine a set of attributes that identify a “good” target audience in relation to achieving some marketing goal, such as acquisition, retention or monetization. Conventionally, such a determination has been made primarily by analyzing how various attributes (such as declared or inferred attributes of user interaction with an online service) of dataset records' are correlated to a predetermined measure of success (such as click-through rates, registration rates or purchase activity) in an attempt to determine which attributes are most associated with “good” records.
In accordance with a conventional supervised classification approach, target objectives are classified by humans into “positive” (e.g., revenue greater than $10) and “negative” (e.g., profit less than $0) measures of “goodness.” All records are then marked with their target objective value. The thus-classified records are then used to create a scoring algorithm that ranks the importance of the record attributes as predictors of the target objective. There is a substantial risk, however, that the distribution of heterogeneous clusters of records within the data (e.g., attributes associated with males have a different correlation with the target objective than those associated with females) will disadvantageously bias the resulting rank of input attributes.
On the other hand, in an unsupervised classification approach, the classification of records employs statistical processing to group together sets of similar records without regard to the meaning associated with their attributes. In the statistical processing, the records' attributes are essentially treated as random variables, with no a priori assumptions about their usefulness as targeting attributes. This can result in groupings of records that, while consistent with the statistical processing, are incongruous with a meaningful marketing segmentation (e.g., each cluster is more likely to have a homogenous distribution of “good” records as the number of attributes in the data set not correlated with the target objective increases).