Clustering and classification tend to be important operations in certain data mining applications. For instance, data within a dataset may need to be clustered and/or classified in a data system with a purpose of assisting a user in searching and automatically organizing content, such as recorded television programs, electronic program guide entries, and other types of multimedia content.
Generally, many clustering and classification algorithms work well when the dataset is numerical (i.e., when datum within the dataset are all related by some inherent similarity metric or natural order). Numerical datasets often describe a single attribute or category. Categorical datasets, on the other hand, describe multiple attributes or categories that are often discrete, and therefore, lack a natural distance or proximity measure between them.