Pattern classification in a distributed environment often involves vertically partitioned data. Vertically partitioned data is data for which each classifier can observe only a subset of the attributes in the data, and the classifiers do not share the data sets between themselves for reasons of privacy and security.
There may be an overlap between the attribute sets available to different classifiers, though each classifier often has knowledge about which overlapped subset of attributes is shared by another classifier. The problem arises of how to make a classification decision based on the decisions made by the local classifiers.
The mixture-of-experts framework (described in Jacobs, R. A., Jordan, M. I., Nowlan, S. J., and Hinton, G. E., “Adaptive mixtures of local experts”, Neural Computation, 1991, volume 3, no. 1, pages 79 to 87) proposes that each expert solve a simpler problem, and the combination of the outputs of the individual experts, provide a solution to the more complex problem. Though typically each expert in such a mixture-of-experts framework “sees” the entire input, each expert can conceivably observe certain features and the entire framework is usable even when data is vertically partitioned.
Each expert in a mixture-of-experts framework partitions the input space and establishes local regression surfaces in each partition. When used with vertically partitioned data, such regression surfaces are defined over regions of a subspace, and there is no guarantee that the approximation is close to the desired approximation (unless the function to be approximated is separable).
Inducing a classifier with vertically partitioned data may also be viewed from the perspective of missing data. A classifier induced from the features in a data partition may view the unobserved features as features whose value is always missing. Ahmad et al (Ahmad, S., and Tresp, V., “Some solutions to the missing feature problem in vision”, Proceedings of Advances in Neural Information Processing Systems, 1993, Hanson, S. J. Cowan, J. D. and Giles, C. L. (Editors), San Mateo, Calif.) describe computing the posterior probabilities by “marginalizing out” the missing features. Usually, such an approach is useful when the amount of missing information is small and the available information can, for the most part, constrain the class label. For vertically partitioned data, however, the information available to each classifier is small. That is, the number of observed features is often a small fraction of the total number of features.
Accordingly, improved techniques for distributed classification of vertically partitioned data are desirable.