Churn refers generally to the movement of customers from one service provider to another. As used herein a “churner” is customer of a service that unsubscribes to the service or otherwise ceases using the service. Churn is a serious problem in many industries, including the telecom industry. It is a significant problem because customer churn leads to diminished profits for the telecom operator and, perhaps, increased business for a competitor. Moreover, in some aspects, it is more important for a telecom operator to retain its existing customers than to sign-up new customers (i.e., existing customers may be more profitable than new customers given the costs involved in attracting new customers). With the continuous addition of new telecom operators in the market, and with the availability of mobile number portability service, churners are increasing at an alarming rate. Hence, telecom operators would like to identify potential churners so that improved services or other incentives may be offered to these customers in an effort to retain them.
Several methods have been proposed for predicting churners in different domains. Existing approaches to churn prediction pertain to attribute based analysis, which has proven to be relatively time consuming because the process has to be re-run every time the underlying dataset (“churn data”) is fed or updated. Moreover, the proposed classification model proved to face issues with respect to skewness of the churn data. The churn data tends to be imbalanced because the churners tend to be far less in numbers in order of (2%-5%) compared to the non-churners. Due to the existence of class imbalance, the high accuracy value derived from a model in churn prediction analysis poses little or no useful result in real time. Other semi supervised approaches present in the market are not suited to the purpose because of their inherent inability to scale well for huge datasets. Also, these approaches have difficulty in parallelizing certain aspects of their algorithm, which poses a problem in applying them over a telecom dataset. Another drawback is that certain attribute based analysis was found to be specific to a particular dataset, such as data from a developed country, but the same model failed miserably for a different dataset, such as data from a developing country.