Modern media content providers offer streaming and/or downloadable media from a large content catalog. Indeed, streaming music services may offer access to millions of songs. In order to provide the best service to their customers, content providers offer many different ways for users to search for and identify content to consume. For example, in the context of a streaming music provider, users are able to search for individual tracks or albums, or to search by artist.
For both the content providers and consumers, it is convenient to associate each content item with a unique artist identifier, so that tracks by a particular artist can be quickly and easily located. Typically, it is sufficient to apply the same unique artist identifier to all tracks associated with the same “artist name” (which is a common metadata field for music tracks). Sometimes, though, different artists have the same name, which can lead to tracks from multiple artists being associated with the same artist identifier in the content catalog. This can make it difficult for users to locate certain tracks or certain artists, and can reduce the visibility of real-world artists who should be separately identified. For example, the name “Prince” is associated with both the well-known U.S.-based pop musician and a lesser known Caribbean artist. Unless such ambiguities are recognized and the catalog is corrected to associate different real-world artists with different unique artist identifiers, ambiguous artist identifiers will continue to plague search results, leading to user confusion and a generally poor user experience.
Given the large number of artists in the database, though, it is not feasible to manually review every artist identifier to ensure that it is not ambiguous (i.e., associated with content items from a multiple different real-world artists). Accordingly, there is a need to provide ways to detect artist ambiguity in a large content catalog.