Embodiments of the present disclosure generally relate to data item clustering.
In a fraud investigation an analyst may have to make decisions regarding selection of electronic data items within an electronic collection of data. Such a collection of data may include a large number of data items that may or may not be related to one another, and which may be stored in an electronic data store or memory. For example, such a collection of data may include hundreds of thousands, millions, tens of millions, hundreds of millions, or even billions of data items, and may consume significant storage and/or memory. Determination and selection of relevant data items within such a collection of data may be extremely difficult for the analyst. Further, processing of such a large collection of data (for example, as an analyst uses a computer to sift and/or search through huge numbers of data items) may be extremely inefficient and consume significant processing and/or memory resources.
In some instances related electronic data items may be clustered and stored in an electronic data store. Even when electronic data items are clustered, however, the electronic collection of data may include hundreds of thousands, millions, tens of millions, hundreds of millions, or even billions of clusters of data items. As with individual data items, determination and selection of relevant clusters of data items within such a collection of data may be extremely difficult for the analyst. Further, processing and presenting such clusters of data items in an efficient way to an analyst may be a very challenging task.