A feature is a functional requirement of a program that produces an observable behavior which users can trigger.
Feature location is the activity of identifying the source code elements (i.e., methods) that implement a feature, e.g., by implementing techniques (systems and methods) for identifying/locating an initial location in a source code that corresponds to a specific functionality. Such techniques are referred to herein as Feature Location Techniques (FLTs).
While source code itself may be broken down for analysis according to different granularity levels: e.g., classes/files, methods or functions, and statements (i.e., basic blocks, lines of code, variables, etc.), the located code entry is also called a feature location. Existing (FLTs) solutions for determining an initial location in a source code include dynamic, static, textual, historical, and hybrid techniques.
Identifying/Locating an initial location in the source code that corresponds to a specific functionality/feature is challenging. Existing approaches however currently have problems including that such techniques do not consider the internal behavior information of each method which leads to a precision loss or recall loss.
For example, FIG. 1A shows a first example excerpt of a C++ program fragment that includes a method 10 named “sellHolding( )” obtained as part of a legacy system source code. This sellHolding( ) function 10 is defined as importing three variables (userID string, symbol, and index integer) and includes an assignment of a value to a variable “success” based on a call to a further object “removeHolding” at method step 12. FIG. 1B shows a second example instance of the same source code method name sellHolding( ) 10′, however found in a more recent version of the same example legacy software. This sellHolding( ) method 10′ function imports an additional variable (userID string, symbol, index integer and quantity integer) and also includes an assignment of a value to a variable “success” at method step 12′ however, based on a call to a completely different object “reduceHolding”. Thus, identifying this feature may lead to precision loss or recall loss given that a call to this method may provide an inconsistency given the different internal behaviors of this same method.
As large systems are componentized by hundreds of applications, middlewares, etc., the system could contain thousands of components, interfaces, millions of lines of code which does far exceed the upper limit of what can be handled by humans. Not only because of an architecture that is complex, but also because the system may rapidly change e.g., banking system.
Thus, it is a challenge how to effectively identify feature locations in large systems, such as legacy computer systems, with high precision and recall.