Software development generally involves programmers' writing textual source code or other software artifacts. The software corresponding to a corpus of source code can exhibit bugs or defects, which are deviations from the specification or expected behaviour of the software (e.g., inaccurate results, unexpected failures, or cosmetic differences from a user-interface specification). Debugging, adjusting the program so that it no longer exhibits those bugs, requires locating the portion(s) of the source code that engender the buggy behaviour.
Fast and accurate localization of software defects continues to be a difficult problem since defects can emanate from a large variety of sources and can often be intricate in nature. It is therefore desirable to provide a search engine that can retrieve software artifacts relevant to a given bug. Examples of software artifacts include source code files and subroutines such as procedures, functions, or methods of objects. Various Information Retrieval (IR) approaches have been proposed towards that end. In IR based bug localization, a query describing some defective behavior of the software is run against the code base in order to rank the software artifacts in the code base with the hope that the highly ranked retrieved artifacts will be those that are likely to have caused the defective behavior.
For example, Ashok et al. (B. Ashok, J. Joy, H. Liang, S. Rajamani, G. Srinivasa, and V. Vangala, “Debugadvisor: a recommender system for debugging,” in Proceedings of the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering. ACM, 2009, pp. 373-382) uses relationship graphs to retrieve source files and prior bugs in response to what they refer to as “fat queries” that includes structured and unstructured data.
In an IR based retrieval framework that leverages the prior evolutionary information concerning the development of the software, Kagdi et al. (H. Kagdi, M. Gethers, D. Poshyvanyk, and M. Collard, “Blending conceptual and evolutionary couplings to support change impact analysis in source code,” in Reverse Engineering (WCRE), 2010 17th Working Conference on, October 2010, pp. 119-128) describe carrying out change impact analysis by exploiting the conceptual and evolutionary couplings that exist between the different software entities. Nguyen et al. (A. T. Nguyen, T. T. Nguyen, J. Al-Kofahi, H. V. Nguyen, and T. Nguyen, “A topic-based approach for narrowing the search space of buggy files from a bug report,” in Automated Software Engineering (ASE), 2011 26th IEEE/ACM International Conference on, November 2011, pp. 263-272) describe BugScout, an automated approach based on Latent Dirichlet Allocation to narrow down the search space while taking into account the defect proneness of the source files.
Reference is made to U.S. Pat. No. 7,685,091, U.S. Pat. No. 8,185,544, and U.S. Pat. No. 8,589,411. Reference is also made to:    S. Rao and A. Kak, “Retrieval from software libraries for bug localization: a comparative study of generic and composite text models,” in Proceeding of the 8th working conference on Mining software repositories. ACM, 2011, pp. 43-52    T. Zimmermann, P. Weissgerber, S. Diehl, and A. Zeller, “Mining version histories to guide software changes,” IEEE Transactions on Software Engineering, pp. 429-445, 2005    N. Nagappan, A. Zeller, T. Zimmermann, K. Herzig, and B. Murphy, “Change bursts as defect predictors,” in Software Reliability Engineering (ISSRE), 2010 IEEE 21st International Symposium on. Ieee, 2010, pp. 309-318