As technology has progressed over the last decade, more and more content is being made available to users online, including audio clips, images, and videos. Hundreds of millions of people view billions of videos online every year. Often these online videos are advertisements for various goods and/or services. If a user watches an advertisement and sees something that they would like to purchase, or gather more information about, the user typically has to visit the seller's website, or another website, to purchase or get additional information. For example, if a user sees a video advertisement for a pair of shoes from Company A, the user may have to leave the current webpage (e.g., the webpage with the video advertisement) and visit Company A's website. However, sometimes the user may not have Company A's web address and may resort to a search engine or other means to find a retailer that sells the shoes. As a result, sales may be diverted away from Company A, even though their advertisement drove the user to purchase the shoes.
In other instances, some video advertisements show many different goods and/or services. For example, an advertisement for a clothing company may include shoes, shirts, pants, and a variety of other goods/services that, they offer. Similar to before, the user may have to resort to search engines or other methods to locate a particular product that they saw in the video advertisement, which again may divert sales away from the clothing company. Currently, some solutions have a user manually adding metadata to a video after the video is recorded. Unfortunately, this solution is rather labor intensive and time consuming. Also, object recognition is often inaccurate and unreliable to track objects within an image. Thus, it is with respect to these considerations and others that the invention has been made.