Every year the film industry loses billions of dollars due to the distribution of pirated copies that are published online and become instantly available to a large number of users. Piracy means copying and distributing works, in particular films, cartoons, clips, TV series, and other video content protected by copyright, without the permission of the author or owner, or in violation of the conditions of an agreement on the use of such works. According to the data of the company Group-IB an average pirate movie theater makes its owners an annual income of approximately $90,000, target audience outflow to pirate sites is approximately 60%, and loss of revenue suffered by media companies as a result of the actions of pirates may be as high as 45%.
The improvement in content distribution tools, concealment of pirated copies online, and increase in the scale of violations combine to make the development of mechanisms to detect and suppress piracy a priority. Broadly speaking, some of the requirements for the antipiracy solutions are speed, low search cost, and reduced involvement of the human analysts/assessors.
Traditional solutions to search for pirated video content use parsing of web pages. However, many web pages that contain pirated copies may be indistinguishable from pages with trailers that can be legally distributed. Also, for example, the first few episodes of a TV series may be freely distributed, so it is important to not only identify the TV series, but also to identify the specific episode whose duplicate is posted on a particular web resource.
Another method known in the art is metadata analysis, which allows a comparison of the length, frame size, recording quality, and other technical information. However, the search for a pirated copy of the sought film might be successful only when the copy is identical to the original, or not significantly different from it. When the original video content of the file is recoded, a fuzzy duplicate may be created, where the majority of its metadata may be different from the metadata of the original video content. A change of metadata is possible, for example, as a result of recoding or compressing the original video content, changing its length, by deleting or adding frames or entire sections (in particular, embedding advertisements), etc. Furthermore, it is possible to edit the metadata of a file separately.
To identify fuzzy copies of video content, an automated review algorithm that simulates user behavior may be used. An automated web browser algorithm that can test the content of web pages is known in the art (see, e.g., project Selenium).
There are known technical solutions that can extract various characteristics from the original video and compare them with the characteristics of the potential fuzzy copies that have been found. The disadvantages of these methods include an increased load on computer resources and data transmission networks, which is caused by the need to download the entire file of a potential fuzzy copy for analysis. Global experience in the battle against cybercrime shows that digital pirates use reliable tools to bypass traditional algorithms for the detection of pirated content. Metadata and the content of materials are changed: video content is cropped, noise is introduced, the color range is changed, etc. For this reason, developing new methods to detect pirated content that are substantially insensitive to video content changes is a very high priority in the art of identifying pirated copies of video content.