A report of an information technology (IT) market research institution showed that an amount of digital information in the world was approximately 1.8 zettabyte as of 2011, and the amount of digital information is anticipated to grow by more than 50 times to big data in 2020 (IDC & EMC 2011).
Big data refers to an aggregation of massive structured data (numbers, office DBs, and the like) or unstructured data (multimedia such as video, SNS, and the like) beyond capabilities of existing database management tools for collecting, storing, managing, and analyzing data.
Namely, big data is too big in generation amount, period, format, and the like, to be collected, stored, searched, and analyzed through related art methods, relative to existing data, and in terms of “volume” (massiveness), “variety” (various forms), and “velocity” (fast speed), big data is known as 3V, and may also be defined as 4V with “value” as a fourth feature in addition thereto.
The reason why the value of big data has emerged as an importance feature is because most big data, as well as having a massive size, is composed of amorphous text, image, and the like, and such data propagates very fast over time, making it difficult to recognize the entirety thereof and discover a certain pattern thereof, which leads to stressing importance of value creation.
Processing of unstructured data may be divided into natural language processing based on text or language and processing of visual language as a semantic conveyance system through images such as video, photograph, TV, or movie.
Among these, a visual language-based image analysis and search technology has come to prominence as a technology to collectively extract and analyze image information and create new knowledge and information and obtain huge social and economical gains by utilizing infinite information included in images.
A visual language, a semantic conveyance system based on images such as video, photograph, TV, movie, and the like, is different from a natural language based on a language (text of voice)
A natural language conceptually conveys meaning, while a visual language conveys meaning specifically and directly, and a natural language describes objects abstractly and ideationally, while a visual language specifically describes objects, eliminating ambiguousness in meaning.
Also, a visual language transcending the bound of nationality of a language, and in order to configure a visual language from an image, information needs to be extracted from a provided image and analyzed.
In order to promptly and accurately search image data among big data, the necessity of a technique of recognizing images and objects and scenes within images and automatically providing metadata corresponding to contents expressed by images, namely, the necessity of configuring a visual annotation, has been increased.