The present invention relates to a media recognition system, and more particularly to a media recognition system and an application program for media recognition to execute recognition processing of a media data stream of audiovisual and audio signals in accordance with a user's request.
Along with the development of audiovisual technology, application software for recording and/or playing images and music has come to permit configuring with comparative ease. Expectations are now rising of sophisticated media recognition applications which are not limited to mere playing of recorded images, but permit automatic recognition of a specific object, such as goods or a person, and a specific sound or sounds, and access to detailed information on the object by utilizing the result of recognition. Various algorithms to constitute the basis of recognition techniques are proposed under the ISO MPEG-7 Standard.
Media information which is offered as images is appreciated from different points of view depending on the user, and the object noted even in the same frame may differ from one user to another. Therefore, when a specific object in an image is to be automatically recognized by an media recognition application, it is necessary to match the form of designating the media content (also referred to as metadata or media characteristic quantity) to be recognized and the form of displaying and the method of utilizing the result of recognition, with user's preference.
FIG. 8 shows an example of conventional media recognition application for handling media information which continuously arises on a real time basis (hereinafter referred to as a media data stream) such as images and sounds. FIG. 8 is prepared to illustrate the basic structure of application software according to the prior art, and shows the configuration of an media recognition application provided with a function of recognizing a specific object region in an MPEG played image displayed on the screen, analyzing color distribution in the object region and displaying the result of analysis on the screen. If a function, for instance, of recognizing a person or an object included in the object region is added instead of color distribution analysis, identifying information for accessing detailed information on the object can be obtained.
Reference numeral 100 denotes an image playing application for decoding MPEG encoded data read out from a picture file 130 and outputting sounds to a speaker 110 and images to a display 120. The image playing application 100 comprises an MPEG file reading module 101 for reading out MPEG encoded data from the picture file 130 and outputting them as a bit stream of each picture frame, and a media separation module 102 for separating the bit stream into an audio bit stream and a video bit stream.
The audio bit stream is decoded by a voice decoding module 103 and outputted to the speaker 110 as, for instance, a PCM audio waveform signal stream. On the other hand, the video bit stream is decoded by a MPEG picture decoding module 104 and outputted to the display 120 as a picture signal stream.
As stated above, a conventional image playing application has a data flow type program structure in which a plurality of program modules are linked according to the sequence of data processing and the data stream is consecutively processed. This program structure derives from the component-connecting type design concept, such as that of a digital TV broadcast receiver and a video player such as a digital versatile disk (DVD) player configured of hardware circuits. For instance, Directshow (API), which is an OS-standard image playing framework in Microsoft Windows®, also uses a program structure of this type. As the data processing sequence can be readily imaged in a data flow type program structure in which data is successively transferred from preceding modules to subsequent modules, it has an advantage of permitting instinctive understanding of the functions of the application over a variable rewriting type program using a procedural language.
In order to develop the image playing application 100 into an image (media) recognition application provided with a recognizing function regarding a specific object region in an image, or a color distribution analyzing function in this context, a region recognition module 105 for recognizing the existing region of a certain object in a picture frame is connected to the MPEG picture decoding module 104 for instance.
The result of recognition by the region recognition module 105 should be displayed on a played image screen in a user-manipulable form. For this purpose, a region graphical user interface (GUI) module 106 is connected between the region recognition module 105 and the display 120. If color analysis of the picture frame is desired, a color analysis module 107 is connected to the MPEG picture decoding module 104 and, in order to display the result of the color analysis on the played image screen, a color-display graphical user interface (GUI) 108 is connected between the color analysis module 107 and the display 120.
According to those additional modules 105 through 108, it is able to construct an image recognition application capable of executing region recognition and color analysis in parallel with image playing by the image playing application 100 and displaying, as shown in a display example 121, the result of region recognition and that of color analysis on the screen. In this case, connections among the modules can be either embedded in a software module or defined in a module connection table such as the one denoted by 122. Incidentally, the above-stated prior art is described in Mark D. Pesce, Programming Microsoft DirectShow for Digital Video and Television, p. 3 DirectShow Concepts, 2003.