The creative industries worldwide are facing a time-bomb that threatens their future profitability. It is data, and more specifically the thousands of terabytes of moving image content being created every day. At the same time they are unable to access and realise the value of the hundreds of thousands of hours of archive content they have already created. This failure to “sweat” their most valuable asset, their content, is the biggest barrier to the success and long term value of all video based creative companies, from broadcasters, to government agencies, independent television production companies, to advertising agencies and beyond. They are already unable to cope with the current volume of data, but with the move to filming on Digital Film cameras, which no longer capture to film or tape, the problem is set to explode.
As a temporary expedient the industry has resorted to stopgap measures, whereby millions of pounds worth of footage is being stored on consumer grade portable hard drives in insecure locations, with no backup. Thus, a library of tapes that can be stored securely for up to 30 years and be catalogued, is being replaced by drives that have an average life span of 5 years, on which the data is degrading every single day, with thousands of video files that cannot be searched.
A piece of footage is considered valueless if it cannot be found within two hours. As a result, creative companies are losing millions of pounds worth of assets every single year. If unchecked, hundreds of thousands of hours of content will be lost. This will not only affect straight initial revenue, companies also by definition will not be able to reuse it in future productions, be unable to deliver it to the fast growing (£4.37 billion by 2012) online video market, and miss the opportunity to market raw footage to other content creators.
Their archives of existing content are also sitting, unexploited, in costly storage facilities. In the UK the BBC has 5.5 miles of shelves of un-digitized archive content. IMG media has 300,000 tapes stored at a cost of £2 per tape per year. All such companies are missing out on valuable revenue, with the UK archive market alone valued at £1.5 billion by 2014. These companies are faced with a huge infrastructure and staffing investment in order to rectify this. They simply cannot afford to do it, but nor can they afford not to. The obvious solution is outsourcing and yet, until now, no commercial company has presented a viable alternative.
Video content cannot be found because it cannot be searched. It has no associated words. The key is to add associated words to footage in the form of keywords, known as “metadata”. Once “tagged” with such metadata, the tagged content can be searched by a search engine, either an internal engine or else an external engine, such as Google or Yahoo.
Currently, companies are attempting to automate metadata addition to finished video content, by using technologies such as speech to text, which is only 40% accurate, and face recognition. However, neither technique gives the user the actual content of a scene, which is crucial for making it searchable on multiple criteria, and hence valuable to an end user. Moreover, these methods are not reliable and, since they are only currently used on finished content, do not help content producers search their raw footage in order to create quicker, more profitable programming. In addition there is often no money available to create accurate or adequate metadata, as a programme's budget has already been spent.
The only way of adding rich metadata is to get human beings to do it. However, even then, adding metadata with multiple layers by typing it in manually is far too slow a process. For example, content may be tagged with multiple layers relating to Character, Location, Object, Story, Context, and Emotion. Studies on the manual addition of such rich metadata show it taking between 4-8 hours per hour of content.
Within the production community, basic metadata is being added to raw content by teams of untrained assistants who hate doing the job, and hence do it poorly and slowly. “Logging” as this process is called is frequently still done on paper. However, the logging tools currently in existence do not provide enough fields for rich metadata to be added. As it has to be typed in, the process is too slow. Moreover, because these are bespoke systems, once the footage leaves the system it instantly loses the associated metadata, rendering it less valuable. Some metatagging software has been created to facilitate the tagging process, a good example being “Frameline” (see http://www.frameline.tv). However, as it is still involves manual entry by keyboard, it is too slow, and again, it is bespoke.
Since the only viable current solution to adding metadata involves humans and manually typed input, the process is too slow and makes the proposition of adding metadata quickly to large quantities of footage financially unviable.
The use of manual typed input is one of the key limiting factors. Even with an automatic spell-check facility, it is slow and inaccurate. It also means that an operator has to concentrate on the keyboard as well as the screen, regardless as to whether they are a touch typist. In conjunction with this, in order to move between different metadata layers in each clip, the operator typically has to move a computer mouse to select to different entry boxes. Whilst doing this, the operator is no longer able to type, which means they that the footage being watched must temporarily be paused, thereby slowing the process yet further.
In addition to the issues discussed above, there are a number of other problems which arise when using humans for manual logging. Although straightforward in principle, the repetitive adding of metadata to large quantities of content results in lack of concentration and boredom. As a consequence, greater than a few minutes spent concentrating on a single clip can lead to a rapid decrease in the quality of tagging. Further to this, a major problem is what might be termed “brainfreeze”, where an operator simply runs out of things to say and is left unable to add metadata to content quickly enough. This again, means the footage having to be paused, or most likely rewound, demoralising the operator and resulting in further decreases in quality.
As will be appreciated, there is a clear need for an improved method of processing and metatagging image content such as video content, which would in turn facilitate the provision of such metatagged content and alleviate many of the problems outlined above.