Spam is defined as the transmission of unsolicited electronic message and is considered one of the biggest problems with the Internet. Due to the electronic and anonymous nature of the Internet, entities can transmit thousands of emails to different recipients as a part of marketing certain goods and services, most of which are not wanted by the recipient. Most email users receive more spam emails than regular emails and may end up spending much of their time culling through the spam emails to read the regular emails. A primary reason spam is so prolific is because no solution were developed before the problem of spam arose. Accordingly, most solutions to the spam problem have been band-aid solutions that are one or more steps behind the spamming technology.
With the increasing deployment of Internet telephony, it is expected that a similar form of spam will arise. This SPam over Internet Telephony (SPIT) or Voice over IP (VoIP) spam is defined as the transmission of unsolicited calls over Internet telephony.
It is already possible for entities to transmit unsolicited calls over the traditional Public Switched Telephone Network (PSTN), where telemarketers typically initiate such calls. These unsolicited calls are limited, however, because of the relatively high cost of a PSTN call. The costs associated with using Internet telephony to place calls is substantially, lower than that associated with the PSTN since spam software is much easier to program for the Internet Protocol and a spammer can multiplex multiple calls on a single line. It has been reported that IP-based SPIT is roughly three times cheaper to send than traditional circuit-based telemarketing calls. The cost per spam call of IP-based SPIT is low and is essentially independent of the signaling protocol (e.g., SIP or H.323). These cost factors will undoubtedly increase the demand for SPIT in the future, especially as VoIP technology becomes more prevalent. Further factors which may influence the advancement of SPIT include the fact that it is easy to hide the source of SPIT, which adds difficulty in associating the SPIT call to an entity. This type of anonymity was not previously available with the PSTN calls. It is also quite easy to use unauthorized access and hijacked machines, such as zombie networks, to send SPIT for free.
Accordingly, there is a strong need for SPIT prevention systems that can anticipate and block SPIT messages. Traditional spam filtering techniques may be employed to block some types of SPIT, but these techniques cannot be fairly relied upon to stop any type of sophisticated SPIT. Most of the problems currently encountered by spam detection and management technologies will likely be faced by SPIT prevention technologies such as, false positives, false negatives, computational complexity, processing demands, and so on.
Current SPIT detection/prevention techniques focus mainly on the caller, caller authentication, statistical metrics like call rates, spacing between calls, and call duration. These techniques have their advantages and drawbacks. Content filtering of SPIT, on the other hand, has been largely neglected because of the difficulty of performing speech recognition for finding SPIT. Indeed, if a system tried to analyze speech content through traditional speech recognition/keyword detection (e.g., Bayesian spam filters) for SPIT problems, it would be easy for spammers to throw off speech recognition systems for the following reasons: (1) speech recognition is difficult and inaccurate if vocabulary size is large; (2) speech recognition can be altered by noise and channel, thus spammers can make calls with background noises to fool speech recognition systems; (3) different speakers with different accents or speaker-class characteristics can be used to fool content based analysis tools; and (4) a smart spammer might create certain random variations to pre-recorded audio by inserting music and/or silence to make the audio content of each message seem different while still being understandable for the end listener.
Therefore, it would be easy for spammers to make calls with background noises, poor grammar, varied accents, or add randomness to pre-recorded templates, all of which will throw off traditional recognition systems.