Online targeting and delivery of content accounts for substantial revenue generated using media such as the Internet and World Wide Web (“web”). For example, online advertisements can be targeted to specific users or types of users at advertising rates that are directly related to the degree of targeting accuracy. In some conventional solutions, user activity can be monitored by observing text entry or other input provided by the user. However, conventional solutions are inaccurate, problematic, and, in many cases, crude.
Conventionally, advertisements are a form of content that may be generated in various types of formats, including text, audio, video, images, photos, and other types. Analyzing content to determine what types of advertisements should be presented to a user is a challenging task often relying upon the observation of user and system inputs, including preferences, behavior, and other parameters. When user behavior is observed, advertisements are presented based on associated identifying information such as metadata. As an example, an automotive advertisement featuring a static display banner advertisement can be identified and placed in an “automotive” category by the advertising agency or advertisement provider of the ad. In some conventional solutions, when advertisements including content other than text or static display information (i.e., video, audio, multimedia) are analyzed problems can occur.
With multimedia content (i.e., content that includes video, audio, text, images, photos, or a combination thereof), determining which content to deliver to a user based on observed behavior, specified preferences, or other criteria is difficult. Conventional solutions for analyzing multimedia content to determine how to classify and target the content also require highly customized application development, requiring high development costs and resources. Using techniques such as speech recognition, content (e.g., audio, video, text, graphics, images, and others) can be analyzed, classified, and categorized, but incur significant costs. For example, audio content (e.g., audio files (e.g., songs, music files, and the like), video containing audio, audio signals transmitted as digital data using protocols such as voice-over-Internet-Protocol (“VoIP”), and others) is difficult to analyze and requires using speech recognition techniques that are processor-intensive, requiring substantial time, processor/compute resources, and highly skilled programmers (i.e., developers) to write complex applications employing analytical techniques such as neural networks and Hidden Markov Models (HMMs) to perform speech recognition. However, conventional solutions employing these techniques are expensive and require substantial system, processor, and development requirements.
Thus, a solution for audio comparison without the limitations of conventional techniques is needed.