The directive 2001/29/EC of the European Parliament and of the EU Council of 22 May 2001 on the harmonisation of copyright-laws in the EU-countries set out the rules for copying and scanning. The equivalent US legislation is the Digital Millennium Copyright Act (DMCA).
Scanning and copying can be performed by means of a conventional scanner, but since conventional scanning of copyright-protected documents violates copyright laws, there is a need for a solution to this problem.
The EU Copyright Directive Article 2 provides the most fundamental “copy” right. It provides exclusive rights over the reproduction of “direct or indirect, temporary or permanent” copies of works to performers, phonogram producers, film producers, broadcasting organisations and authors.
Article 5 in the EU Copyright Directive sets out the limitations and exceptions that may apply to the rights provided in Article 2. The mandatory exception to the reproduction right in Article 5(1) provides for “transient and incidental” reproduction that is an “essential and integral” part of network transmission by an intermediary or allows lawful use of a work with no “independent economic significance.”
It also prevents right holders from controlling all access to works through digital technologies, which by their very design make temporary “copies” of works as they are transferred from a medium such as a DVD to the players's memory for processing, and then to a display or speaker.
Monitoring of copyright-protected publications such as newspapers, magazines, trade journals, scientific journals, and other periodicals is performed systematically e.g. by companies in the media monitoring business for serving their clients by identifying articles or other text sections of interest. Monitoring is performed to help client companies and individuals to keep track of how often and to which degree they are mentioned in the news media.
Conventionally monitoring is based on manual reading of e.g. newspapers. When the newspapers are received e.g. at the media-monitoring company, they are handed over to qualified human readers, who speed-read through the paper looking for relevant articles e.g. articles where their clients are mentioned. The readers are looking for words like company names, names of individuals, and/or other keywords representing certain subjects, topics or themes to determine which of the articles are relevant articles.
The reader marks the keywords when finding them on a page of the newspaper. When the whole page has been read and all keywords have been marked, the reader performs an evaluation keyword-by-keyword to determine whether the article is relevant for a client. If the article is found to be relevant, the reader or an assistant then performs a physical cutting of the article(s) for the client and sends it to him.
The time-consuming part of the process is the time spent on finding the keywords. The reading time pr. page starting at the upper left corner and ending at the right bottom corner is fairly invariable to the number of keywords on a page. This results in a high time-consumption pr. cutting if there are only a few relevant articles in a newspaper. Most of this time spent on reading is thus inefficient.
To some extent automation of the process is possible by use of conventional scanners that scan the entire newspaper page-by-page and produce a digital image of the newspaper pages for the purpose of storing an electronic version, e.g. in a JPG, TIFF or PDF format, for storage in a file system or a database. Subsequently, each file is retrieved for Optical Character Recognition in order to produce files where recognised characters are each represented according to a certain encoding scheme (e.g. ASCII). These files are also stored in a file system or database. Further, a so-called search engine is loaded with a set of keywords and the search engine retrieves the files where characters and sequences of characters are encoded in order to provide an output in the form of cutting lists. A cutting list provides the person who is cutting the articles from the newspapers with information that directs him to the page in the physical newspaper—it stipulates the title of the article to be cut and its approximate location. This automated process gives a good increase in productivity over the manual process.
However, since the digital image represents an electronic copy of the original material this process is considered a violation of the author's copyrights under many legislations. Under certain legislations even showing the scanned image on a display screen is considered an act violating the author's copyrights. The digital images are not directly searchable for text, but require a conversion to a coded digital form by means of Optical Character Recognition, OCR. However, output from this conversion i.e. the coded digital form will also be considered a violation of the copyrights.
In some countries it may be considered not to be a violation of the author's copyright if the electronic copy is “transient and incidental” and an “essential and integral” part of a search-process or summary writing.
If the traditional process of media monitoring by means of manual reading of textual media is automated, copies of the textual media will be made. The creation of copies of textual media is a problem in relation to copyright laws. Copyright laws can thus be a hindrance for automating media monitoring. Thus, automation of the monitoring of textual media and prevention of violating copyright laws are problems which remain to be solved. Thus there is a need for a technical solution to solve the problem of automating a monitoring of textual media (e.g. news papers, books) and the problem of avoiding violation of the copyright laws.