1. Field of the Invention
The present invention relates to an apparatus for filtering a malicious multimedia service based on sequential data processing and a method thereof. More particularly, according to the apparatus and method, multimedia data existing in a variety of forms, including multimedia streaming transmitted online in real time, a multimedia file existing in a storage space and being reproduced, and a multimedia file existing in a storage space, is sequentially input, and a maliciousness class ratio is calculated by using a maliciousness class classification model trained in advance. Then, if the accumulated value of the ratio is equal to or greater than a predetermined class, it is determined that the multimedia data is the maliciousness class. If the accumulated value is equal to or less than the minimum threshold of the predetermined class, it is determined that the multimedia data is another class. If the accumulated value is between the maximum threshold value and the minimum threshold value, an input of next data is received and a maliciousness class ratio is calculated. Then, an accumulated value is again calculated and a maliciousness class is determined in the same manner.
2. Description of the Related Art
Thanks to the recent widespread Internet use and increase in the data transmission speed, real-time multimedia services, such as audio on demand (AOD), video on demand (VOD), e-learning, and online-media, and non-real-time multimedia services in which multimedia data is received though P2P or other Internet service, stored in a PC, and reproduced, have been increasing. Among these services, cyber education, online news, and online theaters provide positive effect in the social, economical, and academic aspects, but malicious multimedia services operating with commercial purposes have bad influences on the Internet users who are not matured and have less judgment and self-control power. In particular, in case of multimedia services, the influences and side effects on the users are greater than the conventional text information services. Accordingly, a method of filtering the malicious multimedia information so that juveniles or users who don't want such services cannot be exposed to the malicious information is needed.
Among the conventional methods of determining the maliciousness of multimedia services, the mainstream methods were that by using additional text information, such as service names and explanations existing in the header of a service, rather than the contents of the services, the text information is compared to malicious word dictionaries for keyword matching. Since these methods are not determination methods based on the contents of malicious multimedia services, there are ways to easily avoid the maliciousness determination methods and these methods cannot be quite effective.
In order to solve this problem, a method of receiving the entire data of a multimedia service and by extracting and analyzing a feature (for example, the ratio of the presence of a predetermined color) from the entire data, determining the maliciousness of the multimedia service, has been introduced. Since all data of the multimedia service should be received and then analyzed according to this method, the method has a disadvantage that it requires a large storage space and much time to determine maliciousness. Also, since it uses a very simple feature in the determination of maliciousness, the performance of the classification is low. Furthermore, it has another problem that the determination is performed after the malicious multimedia is fully exposed to the user.
To solve this problem, there is a method by which data is received in real time, and then, data items are processed one by one to determine the maliciousness and then filtered. However, this method has a low classification performance because a simple feature of data at an examination time is used for determining maliciousness. Furthermore, since it cannot apply the continuous features of data received to the time of examination, harmless data is mistaken as malicious data, or malicious data is mistaken as harmless data and then exposed to the users.