In data processing, a machine may be configured to analyze data items and group them into clusters, which may be referred to as clustering the data items. Typically, data items are clustered according to various commonalities in their attributes. These attributes may be specified by the data items themselves, specified in corresponding metadata, or any suitable combination thereof. In some situations, a data item (e.g., a media item, such as a video file or an audio file, or an identifier of a media item) can be described by one or more attribute-value pairs, and a group of such attribute-value pairs can be represented (e.g., in a computer memory) as a multidimensional vector. As an example, for a data item describable by 100 attribute-value pairs, a 100-dimensional descriptive vector of the data item can be generated such that each of the 100 dimensions represents a different attribute and has a corresponding scalar value. Data items represented by such descriptive vectors thus can be clustered by clustering their descriptive vectors.