Published GB patent application GB 2 343 075 of Sony United Kingdom, Ltd., describes a broadcast receiver containing a data store for holding a set of user preferences relating to categories of broadcast programs. Electronic program guide information is received by the apparatus and the bibliographic details of the program guide are compared with the user preferences. Those programs exhibiting at least a predetermined degree of match with the user preferences are displayed to the user.
Unpublished PCT patent application PCT/IL2006/001003 of NDS Ltd., filed 29 Aug. 2006, describes a method and system for rating programs, the method including, receiving a sample of viewing logs from a plurality of set top boxes (STBs), determining, from the sample of viewing logs, at least groups of viewers sharing similar interests, and groups of programs sharing similar audience, computing time dynamics of rating distribution for the groups of viewers and the groups of programs, and incorporating at least one of the following into broadcast metadata the time dynamics of rating distributions of the groups of viewers for each of the groups of programs, rating distributions of the groups of viewers for each of the groups of programs marginalized by time, rating distributions of the groups of viewers for each program marginalized by time, relative sizes of each group of viewers, and a mapping of each program to groups of programs, thereby making the broadcast metadata available to the plurality of STBs for use in computing ratings.
The Information Bottleneck Method, by Tishby et al., Proceedings of the 37th Annual Conference on Communication, Control, and Computing, 1999, defines relevant information in a signal xεX as being the information that this signal provides about another signal yεY. Examples include the information that face images provide about the names of the people portrayed, or the information that speech sounds provide about the words spoken. Understanding the signal x requires more than just predicting y, it also requires specifying which features of X play a role in the prediction. The problem is formalized as that of finding a short code for X that preserves the maximum information about Y. That is, the information that X provides about Y is squeezed through a ‘bottleneck’ formed by a limited set of codewords {tilde over (X)}. This constrained optimization problem can be seen as a generalization of rate distortion theory in which the distortion measure d(x,{tilde over (x)}) emerges from the joint statistics of X and Y. The approach yields an exact set of self-consistent equations for the coding rules X→{tilde over (X)} and {tilde over (X)}→Y. Solutions to these equations can be found by a convergent re-estimation method that generalizes the Blahut-Arimoto algorithm.
The Power of Word Clusters for Text Classification, by Slonim, et al., Proceedings of the 23rd European Colloquium on Information Retrieval Research, 2001, cites the above-mentioned Tishby, et al. The Information Bottleneck method provides an information theoretic framework, for extracting features of one variable, that are relevant for the values of another variable. Several previous works already suggested applying this method for document clustering, gene expression data analysis, spectral analysis and more. In this work they present a novel implementation of this method for supervised text classification. Specifically, we apply the information bottleneck method to find word-clusters that preserve the information about document categories and use these clusters as features for classification.
Unsupervised Document Classification Using Sequential Information Maximization, by Slonim, et al., Proceedings of the 25th ACM International Conference on Research and Development of Information Retrieval, SIGIR 2002, Tampere, Finland, Copyright 2002 ACM 1-58113-561-0/02/0008, presents a novel sequential clustering algorithm which is motivated by the Information Bottleneck (IB) method. In contrast to the agglomerative IB algorithm, the new sequential (sIB) approach is guaranteed to converge to a local maximum of the information, as required by the original IB principle. Moreover, the time and space complexity are significantly improved. We apply this algorithm to unsupervised document classification. In our evaluation, on small and medium size corpora, the sIB is found to be consistently superior to all the other clustering methods we examine, typically by a significant margin. Moreover, the sIB results are comparable to those obtained by a supervised Naive Bayes classifier. Finally, we propose a simple procedure for trading cluster's recall to gain higher precision, and show how this approach can extract clusters which match the existing topics of the corpus almost perfectly.
Evaluating Collaborative Filtering Recommender Systems, by J. Herlocker et al, published in ACM Transactions on Information Systems, Vol. 22, issue, January 2004, reviews the key decisions in evaluating collaborative filtering recommender systems: the user tasks being evaluated, the types of analysis and datasets being used, the ways in which prediction quality is measured, the evaluation of prediction attributes other than quality, and the user-based evaluation of the system as a whole. In addition to reviewing the evaluation strategies used by prior researchers, the paper presents empirical results from the analysis of various accuracy metrics on one content domain where all the tested metrics collapsed roughly into three equivalence classes. Metrics within each equivalency class were strongly correlated, while metrics from different equivalency classes were uncorrelated.
Multivariate Information Bottleneck, by Slonim, et al., published by the MIT Press, 2006, describes the information bottleneck (IB) method, an unsupervised model independent data organization technique. Given a joint distribution, p(X, Y), this method constructs a new variable, T, that extracts partitions, or clusters, over the values of X that are informative about Y. Algorithms that are motivated by the IB method have already been applied to text classification, gene expression, neural code, and spectral analysis. A general principled framework for multivariate extensions of the IB method is introduced. This allows considering multiple systems of data partitions that are interrelated. The approach utilizes Bayesian networks for specifying the systems of clusters and which information terms should be maintained. It is shown that this construction provides insights about bottleneck variations and enables us to characterize the solutions of these variations. Four different algorithmic approaches are developed, allowing construction of solutions in practice and applying them to several real-world problems.
Biclustering Algorithms for Biological Data Analysis: A Survey, by S. Madiera, et al., published in IEEE/ACM Transactions on Computational Biology and Bioinformatics, Volume 1, Issue 1 (January 2004), pages 24-45, describes how a large number of clustering approaches have been proposed for the analysis of gene expression data obtained from microarray experiments. However, the results from the application of standard clustering methods to genes are limited. This limitation is imposed by the existence of a number of experimental conditions where the activity of genes is uncorrelated. A similar limitation exists when clustering of conditions is performed. For this reason, a number of algorithms that perform simultaneous clustering on the row and column dimensions of the data matrix have been proposed. The goal is to find submatrices, that is, subgroups of genes and subgroups of conditions, where the genes exhibit highly correlated activities for every condition. In the Madiera, et al. paper, they refer to this class of algorithms as biclustering. Biclustering is also referred in the literature as coclustering and direct clustering, among others names, and has also been used in fields such as information retrieval and data mining. In this comprehensive survey, they analyze a large number of existing approaches to biclustering, and classify them in accordance with the type of biclusters which can be found, the patterns of biclusters that are discovered, the methods used to perform the search, the approaches used to evaluate the solution, and the target applications.
Websites such as www.shopping.com; www.zap.co.il; and shopping.yahoo.com all comprise displays of a large content database arranged in an efficient fashion.
The disclosures of all references mentioned above and throughout the present specification, as well as the disclosures of all references mentioned in those references, are hereby incorporated herein by reference.