Every year, original equipment manufacturers (OEMs) in the automobile industries loose a huge amount of money due to recalls of vehicles with safety related issues. As long as the fault is not known to the OEMs, these vehicles are being produced and shipped to the market. So, by the time the problem is known to the OEMs, the number of affected vehicle in the market is already very high, and so are the costs of recall and fixing of these affected vehicles. Therefore, OEMs are seeking for a quick and reliable way of detecting problematic spare parts to reduce the costs of recalls.
The article “What's buzzing in the blizzard of buzz? Automotive component isolation in social media postings” by Alan S. Abrahams, Jian Jiao, Weiguo Fan, Alan Wang, Zhongju Zhang in Decision Support Systems—55 (2013) 871-882—discusses analysis of social media data in the context of the automotive field. In the article, a set of automotive smoke words that have higher relative prevalence in defects vs. non-defects, and in safety issues vs. other postings have been identified. This set of smoke words is used to automatically identify posts on social data which might contain defects. Additionally, in the article the authors seem to classify the posting with respect to the category of component which is affected by the defect (e.g. air conditioning, transmission etc.). The article, even if it seems to detect to which category the defect belongs to, it fails recognizing the nature and the correlated terms which leads to recognize potential cause of the defect. Furthermore, the article seems to rely on word stemming techniques, but these do not appear suitable for fault analysis and failure detection, since the stemming techniques tend to lump a lot of different words into one root word, which will mislead the auto product managers. This can already be seen from the results of the article in trying to understand and explain some of the words that were stemmed so badly that you have to consult the original raw text to understand what the original words are. This is not possible, when the data size is overwhelming.
US2015/0058344 A1 relates to methods and systems for monitoring and analysing social media data, and proposes a method to classify words or sentences, after being tokenized, based on sentimental analysis. US2015/0058344 A1 seems to represent a generic approach for social data analysis, but it is not able to detect correlation between terms which might be used to determine specific meaning or cause effect relationships.
For data mining in general, please refer to the textbook “Data Mining with Rattle and R” by author Graham Williams.