The popularity of social networking leads to a lot of network crimes, such as the behaviors of spreading pornography message or performing network deception in the network by using multiple accounts. Internet users may change account and post content, or gradually change used keyword terminology to circumvent tracking down. These network criminal behaviors may leave traces in all kinds of social networks.
In the existed tracking down technologies, for example, the police network reconnaissance system makes clear the features of account groups before carrying communications analysis. In the system domain of crime information systems, related interactions are identified between accounts according to the common synonym of the features of two words. For example, the system may retrieve the longest common substring between two words, to calculate a ratio of the length of this common substring to the larger length of words in these two words in order to confirm whether the ratio is greater than a first threshold value; and checks if the calculated edit-distance of these two words is greater than a second threshold value. When the above two conditions are sustained, it is determined that these two words are synonymous.
FIG. 1 shows a schematic view illustrating a context processing system for deciding theme of sentence. The system comprises a theme vector processor 110 that decides the theme of an input sentence 112. This technology is firstly analyzing speech for each word in the input sentence 112, and then using an ontology to analyze the sentence, including identifying semantics of each word to form a semantic theme vector, and comparing the semantic theme vector of this sentence with the semantic theme vector of a training corpus 120, to determine the theme and the class of this sentence.
There is a technology for monitoring and analyzing crime-related information that uses a scheme of event identifier or word search to mark crime-related information sections concerned by the police, to remind investigators to monitor the original voice data of the sections. The event identifier such as a bookmark of event concerned by polices, contains keywords of the concerned event and the voice data of one or more specific persons.
There is a technology for structuring a dataset, which performs clustering based on the personal information provided by users, and uses a series of pre-defined question databases to identify communication-recorded groups with suspicious fraud behaviors. Wherein the structured attribute information of the users used for a basis of clustering may include such as name, phone number, or address, etc. A network crime investigation technology is that, when the Internet user is on-line, the source identification code of an online device performs matching simultaneously in the criminal investigation web site with the telephone number and authorization code of the user's on-line device to verify a true identity of the user.
Another technology for searching multiple identities of criminals is using individual basic feature data, such as name, gender, height, weight, etc., to match multiple identities, and then match the multiple identities of criminal according to an individual role in the crime database and links the relationship among the multiple identities. Yet there is a technology for detecting crime groups through the person's name identification and the related-rules analysis from the documents to identify names group (accomplice) of frequent and co-occurrence.
The technologies for discovering and detecting multiple identifications include the techniques on authorship identification, online writeprint identification, authorship attribution identification, etc. Among them, a technology for authorship identification uses the N-gram features in personal writing text to match multiple identities; a technology for authorship attribution identification matches multiple identities through the N-gram features of variable lengths. A technology for authorship identity adjusts the N-gram feature weights to match multiple identities through local histograms.
In the network of nowadays and future, a technology for discovering suspicious account group needs to have language model adaptation functions with one or more near-synonyms, to analyze the language-fashion similarity of the post contents of accounts, and then discover suspicious the account group with a high speech homogeneity. And after discovering the group of accounts, this technology may also couples with communications analytical technique, to view the interaction connection between accounts. Such technology for discovering suspicious account groups is issues to be explored.