(a) Field of the Invention
The present invention relates to a data processing method and system for checking an interactive communication sequence (ICS) relating to a plurality of users from communication records by using a variable time window, and checking an interactive communication sequence pattern (ICSP) that is a frequently generated interactive communication sequence from the checked interactive communication sequence.
This work was supported by the IT R&D program of MIC/IITA [2006-S-009-02: The Development of Wibro Service and Operating Standard].
(b) Description of the Related Art
With the increase of Internet-based communication services, these services have been used for conspiracy of crime, strangulation, and abetment, and the usage ratio thereof has also been increasing. Differing from the existing public switched telephone networks, computer networks all over the world are freely connected with each other through the Internet, and messages are transferred on the Internet through the packet switching method based on the standardized Internet protocol.
Because of the Internet's packet switching characteristic, messages relating to crimes are mixed with general messages and they are difficult to be distinguished on the Internet. Also, a routing path for the message is dynamically varied depending on the network states (e.g., bandwidth, delay, number of hops, communication cost, load, and reliability), and the contents included in the packet are encrypted according to their application.
Due to these characteristics, it is not easy for law enforcement agencies (LEA's) (e.g., prosecutors or police) to find criminal communication operations from among the Internet-based communication operations.
Many countries have instituted laws to enable communication records to be taken in custody so as to easily and legally intercept Internet-based communication services. Taking communication records in custody represents the storing of call detail records (CDR) for a predetermined time frame or Internet protocol detail records (IPDR).
That is, it means storing records on the telephone callers, call receivers, calling dates, call receiving dates, email senders, email receivers, email sending and receiving dates, web page accessing users, and web page accessing dates. In general, communication contents are excluded from the storage of communication records. The stored communication records can be used by the LEA to investigate crimes.
Methods for the LEA to extract desired information from the communication records include the frequent item set mining, sequential pattern mining, and sub-graph pattern mining.
The frequent item set mining scheme is to collect information on goods that are simultaneously bought by clients at a shop, and to find a buying pattern that is given by common goods buying information of a large volume of clients. The sequential pattern mining scheme is to find a goods buying sequence that appears in common in many clients from goods buying history data of a large volume of clients. Also, the sub-graph pattern mining scheme is to find a sub-graph that is frequently generated in common in a graph-type data set.
However, these conventional schemes are inappropriate for extracting an interactive communication sequence pattern, that is, a communication pattern that is frequently generated from the communication records since the schemes have different data characteristics, access methods, and time problems to be considered.
The above information disclosed in this Background section is only for enhancement of understanding of the background of the invention and therefore it may contain information that does not form the prior art that is already known in this country to a person of ordinary skill in the art.