Digital content has been developed for as long as computers have been around. It exists in the form of computer programs, text documents, digital images, digital video, digital audio, software components, and blocks of computer code. Digital content producers integrate, compile and distribute digital content production to end-users who want it for the value, and not for the technology. Examples of such producers include software vendors, web site designers, and audiovisual content producers. During recent years, these organizations producing digital content have chosen, or been forced to, leverage externally developed content to gain efficiency in research and development. As a result, some organizations have chosen to develop digital content components for distribution not to end-users but to digital content producers themselves. For example, some companies sell digital photographs to web-site producers for use in their web sites. Another class of content producer has emerged that has chosen to produce digital content or digital content components and then distribute them for free, or with liberal licenses. A subset of these free content developers has chosen to distribute their content freely, but licensed in a way that requires content producers who use their free works either directly or to produce derivative works, to release their work under the same terms. Another trend in content development is the advent and increasing use of the Internet and the world-wide web.
Finding digital content has become easier, faster, and acceptable, to the extent that it is often expedient for digital content developers and their companies to acquire digital content or digital content components from the Internet and produce a derivative work, rather than producing original content from scratch. Alternatively developers are increasing merging externally sourced digital content, or digital content components, and embedding them within their own digital content. For example, a developer generating software for an MP3 music player might download and embed a search algorithm, allowing the user to easily search for the song they want, or an enhanced display driver produced by another developer already using the same LCD display for example.
Whilst the increased breadth and speed of access globally to digital content has significantly eased the digital content development process, the commercial enterprises ability to establish the intellectual property rights of digital content has gotten more difficult, and increases in complexity continuously as developers select, embed in real time, and in some instances with multiple development teams globally distributed to provide 24 hour code development, or addressing multiple elements of the digital content. Knowing these intellectual property rights is crucial when establishing the valuation of businesses that derive revenue or cut costs from generating and distributing original digital content, such as software companies, or companies that use digital content to derive revenue or cut costs such as television broadcasters. When a business is being audited and evaluated, they must produce accurate records detailing all external digital content in their digital content systems, including the copyright ownership, license agreements, and other terms and conditions. Given that it only takes seconds to copy significant amounts of external digital content into an enterprise, using anything from subsystem copying, downloading software from the Internet, and cutting and pasting images and text from hypertext documents from the Internet, the continuous monitoring and establishment of these property rights is difficult.
For a digital content provider a typical high-level process for documenting external content is as follows:                Identify and document each piece of external digital content in your digital content system;        Compare each documented piece of external content with publicly comparable external content, and if there is a match annotate the content with copyright owner, license, author(s), etc;        Compare all of your content with publicly comparable content, and if there is a match annotate the content with copyright owner, license, author(s);        For the remaining external content still not annotated, annotate them manually to the best of your ability with the copyright owner, license, author(s), etc        
Intellectual property lawyers and software experts are often brought into the digital content developer business to drive this process, and key content developers and project leaders must spend much time compiling these lists and reports. In reality this process is often prohibitively expensive because it requires manual labor and guesswork by highly qualified and expensive intellectual property lawyers and content developers. It is also error-prone, and subject to abuse by developers' intent on hiding the source of their specific portions of the overall code forming the digital content offered by their employer or contract provider.
Additionally a large volume of digital content, such as for example a software suite or video game, may have a significant number of inserted portions of external content from a similarly large number of sources. Many such sources may in fact be private repositories of digital content, individuals developing digital content or other sources which are difficult to locate, access and verify that the digital content they host was employed within the produced digital content.
It would therefore be beneficial for digital content providers and developers to have available a centralized repository of information relating to external digital content allowing effective automation of the process described above, and thus enabling them to confidently declare the intellectual property ownership of their digital content productions. Additionally it would be beneficial for digital content providers and developers to have a means of bringing uniformity to both the digital content and the digital content metadata, thereby reducing content production costs and/or liabilities. Such uniformity is typically established via policies or rules within a development organization, each organization having different policies. In these policies and rules many of the aspects affecting the development organization are not necessarily those the developer focuses to in sourcing and introducing external content. Hence, a developer may be more interested in aspects such as file size, speed of processing, code complexity, image resolution, etc, whereas the development organization is concerned with licensing, territory restrictions, copyright, cost of use, organization sourced from, etc.
As a result any automated or even non-automated means of verifying, checking, reviewing any aspect of external digital content introduced into the development environment and introduced to a digital content under development therefore benefits from access to the fullest extent of information relating to that external digital content. As such it would be beneficial to identify such external digital content upon its introduction and extract the fullest extent of information from a centralized repository of information relating to external digital content. It would be typical for such a centralized repository to employ search engines (typically referred to a web crawler) to explore the Internet, identify digital content and store all related information gathered from the external source in association with the external digital content so that it is available to development organizations.
It would be evident that given the immense number of files upon the Internet (World Wide Web) and the rate at which this content is increasing that the web crawlers of a centralized repository have a very difficult task, perhaps one that is not achievable without expending unsustainable resources, to initially identify all new sources, identify content and extract the pertinent data for the activities discussed supra. Additionally the web crawlers should periodically return to all identified digital source locations to identify new content, verify previously identified contents information, or establish modifications to such previously identified external content. A development organization may therefore suffer delays in establishing the verified information relating to an item of external digital content that impact the development of their digital content.
It would therefore be advantageous for the web crawlers to have information relating to the activities within a development organization to establish a weighting in the search activities of the web crawlers. Accordingly a file modification event within a development organization provides data relating to external digital content introduced to adjust in a predetermined manner the activities of the web crawlers.
Accordingly the invention provides a method of automatically adjusting activities of dynamic search engines and web crawlers accessing distributed publicly and privately accessible sources of digital content to improve both decision making of development organizations introducing such digital content into their activities and the establishment of the appropriate intellectual property rights and accreditation.