A portion of the disclosure of this patent document contains material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
1. Field of the Invention
The present invention relates generally to a system providing methods for facilitating development and maintenance of software applications or systems, with particular emphasis on a compiler-assisted method for refactoring of software systems.
2. Description of the Background Art
Before a digital computer may accomplish a desired task, it must receive an appropriate set of instructions. Executed by the computer""s microprocessor, these instructions, collectively referred to as a xe2x80x9ccomputer program,xe2x80x9d direct the operation of the computer. Expectedly, the computer must understand the instructions which it receives before it may undertake the specified activity.
Owing to their digital nature, computers essentially only understand xe2x80x9cmachine code,xe2x80x9d i.e., the low-level, minute instructions for performing specific tasksxe2x80x94the sequence of ones and zeros that are interpreted as specific instructions by the computer""s microprocessor. Since machine language or machine code is the only language computers actually understand, all other programming languages represent ways of structuring human language so that humans can get computers to perform specific tasks. While it is possible for humans to compose meaningful programs in machine code, practically all software development today employs one or more of the available programming languages. The most widely used programming languages are the xe2x80x9chigh-levelxe2x80x9d languages, such C, Pascal, or more recently Java. These languages allow data structures and algorithms to be expressed in a style of writing that is easily read and understood by fellow programmers.
A program called a xe2x80x9ccompilerxe2x80x9d translates these instructions into the requisite machine language. In the context of this translation, the program written in the high-level language is called the xe2x80x9csource codexe2x80x9d or source program. The ultimate output of the compiler is a compiled module such as a compiled C xe2x80x9cobject module,xe2x80x9d which includes instructions for execution ultimately by a target processor, or a compiled Java class, which includes bytecodes for execution ultimately by a Java virtual machine. A Java compiler generates platform-neutral xe2x80x9cbytecodesxe2x80x9dxe2x80x94an architecturally neutral, intermediate format designed for deploying application code efficiently to multiple platforms.
Java bytecodes are designed to be easy to interpret on any machine. Bytecodes are essentially high-level, machine-independent instructions for a hypothetical or xe2x80x9cvirtualxe2x80x9d machine that is implemented by the Java interpreter and runtime system. The virtual machine, which is actually a specification of an abstract machine for which a Java language compiler generates bytecode, must be available for the various hardware/software platforms which an application is to run. The Java interpreter executes Java bytecode directly on any machine for which the interpreter and runtime system of Java have been ported. In this manner, the same Java language bytecode runs on any platform supported by Java.
Conventionally, creation of a software program or system includes creation of individual source code modules. This approach simplifies program development by dividing functionality available in the program into separate source modules. When multiple source modules are employed for creating a program, interdependencies between the individual modules often exist. Program logic in one module can, for instance, reference variables, methods, objects, and symbols imported from another module. By the very same token, that module can also export its own methods, objects, and symbols, making them available for use by other modules.
xe2x80x9cVisualxe2x80x9d development environments, such as Borland""s JBuilder(trademark), are the preferred application development environments for quickly creating production applications. Such environments are characterized by an integrated development environment (IDE) providing a form painter, a property getter/setter manager (xe2x80x9cinspectorxe2x80x9d), a project manager, a tool palette (with objects which the user can drag and drop on forms), an editor, and a compiler. In general operation, the user xe2x80x9cpaintsxe2x80x9d objects on one or more forms, using the form painter. Attributes and properties of the objects on the forms can be modified using the property manager or inspector. In conjunction with this operation, the user attaches or associates program code with particular objects on screen (e.g., button object); the editor is used to edit program code which has been attached to particular objects. After the program code has been developed, the compiler is used to generate binary code (e.g., Java bytecode) for execution on a machine (e.g., a Java virtual machine).
Although visual development environments enable applications to be created quickly, problems remain with the development, implementation, and maintenance of production applications. One problem is that when a large software program or application evolves over time it is common that the initial design gets lost as features that were not in the original specification are added to the application. One way of dealing with this problem of making changes is to design everything with the maximum amount of flexibility. However, this will often lead to unnecessary complexity in the software application, as it is unknown beforehand which parts of the application will require this additional flexibility. Irrespective of how well a system is initially designed or developed, the system is typically modified from time to time during its useful life to improve performance, to accommodate changing needs, to make the system easier to maintain, or for various other reasons. However, during the process of adding features not envisioned in the original specification or otherwise making modifications to the system, one must track how particular terms are defined and used by the system to properly develop the system modifications and to avoid introducing errors during this development process. Specifically, because of interdependencies between modules, when a particular source module is modified (e.g., edited by a developer), the developer must ensure that such modifications are compatible with the other modules of the program. A particular concern is, therefore, that a given change might xe2x80x9cbreakxe2x80x9d the system, because the change is incompatible with other, dependent modules of the system.
xe2x80x9cRefactoringxe2x80x9d is a practice of making structured changes to software applications or systems which add the desired flexibility, but keep the functionality of the system the same. Refactoring involves taking small individual steps that are well defined and that can be applied in succession to yield a more significant change in the application. For example, a developer may wish to perform a xe2x80x9crename refactoringxe2x80x9d to change the name of a particular module (e.g., a class name in a Java program). In order to make this change, the user must locate the definition of this class (i.e., the source code for the class) as well as all uses of the class in other portions of the system. In the case of a class name in a Java program, the class name is typically used not only for defining a variable, but also for constructing instances (or objects) of that class and accessing static members of the class (i.e., class variables). Another example of refactoring may involve moving a specified class to a new package (referred to as xe2x80x9cmove refactoringxe2x80x9d).
Refactoring of a system may be small or extensive, but even small changes can introduce errors or xe2x80x9cbugsxe2x80x9d into the system. Accordingly, refactoring must be done correctly and completely in order to be effective. Good refactoring requires a mechanism for quickly and accurately identifying definitions and usage of a given symbol in a plurality of source files. The xe2x80x9csymbolsxe2x80x9d that may be involved in refactoring include, for example, package names, class names, interfaces, methods, fields, variables, and properties. Identification of definitions and usage of a given symbol enables refactoring to be performed responsibly and durably so that no bugs are introduced and no behavior is changed beyond the desired improvements in features, performance, and/or maintainability.
The simplest approach for handling refactoring is to use a textual search and replace. However, this approach has the disadvantages of being both slow and inaccurate as refactoring involves more than a simple search and replace task. References must all be accounted for and properly handled, while patterns must be recognized so that, for instance, overloaded names are handled correctly. When a rename refactoring is performed on an overloaded class name, the class""s new name must be reflected in the class declaration and in every instance of that class and every other reference to that class. However, the new name must only be reflected in the target class, not in the other classes that share its original name or their declarations, instances, references, methods, and the like. For instance, a class name may also be used as part of a method name in another class. A simple search and replace cannot be performed as one must understand the context in which each instance of the name or symbol is used in various portions of a large system. All told, a textual search and replace is a very inefficient tool for handling a complex operation of this nature, as it requires a user to manually review each usage of the target symbol (e.g., class name) to determine whether or not the symbol should be changed in that particular instance.
A slightly more elaborate approach involves combining the textual search with some language knowledge in the form of a source analysis tool. This type of source analysis tool may enable a user to at least narrow down possible candidates for replacement. Another approach is to use a source analysis tool to build an additional cross-reference index of the usage of symbols in the source code. Unfortunately, building an additional cross-reference index requires a separate pass to analyze the structure of the source code, before performing the refactoring. In addition, a problem with both of these approaches is that building this type of automated source analysis tool for a particular programming language largely involves recreating the compiler for the language in order to understand the context in which a particular symbol or token is used in a program. However, recreating the compiler does not take advantage of the native compiler that is available for the language. In addition, the process of attempting to recreate a compiler creates the potential for introducing errors as a result of differences between the newly created compiler and the native compiler that was used in development and implementation of the system.
A better approach is sought for refactoring which leverages a proven compiler that is certified for the language and is used in program development and implementation. The present invention fulfills this and other needs.
The following definitions are offered for purposes of illustration, not limitation, in order to assist with understanding the discussion that follows.
Bytecode: A virtual machine executes virtual machine low-level code instructions called xe2x80x9cbytecodes.xe2x80x9d Both the Sun Microsystems Java virtual machine and the Microsoft. NET virtual machine provide a compiler to transform the respective source program (i.e., a Java program or a C# program, respectively) into virtual machine bytecodes.
Compiler: A compiler is a program which translates source code into binary code to be executed by a computer. The compiler derives its name from the way it works, looking at the entire piece of source code and collecting and reorganizing the instructions. Thus, a compiler differs from an interpreter which analyzes and executes each line of code in succession, without looking at the entire program. A Java compiler translates source code written in the Java programming language into bytecode for the Java virtual machine.
Interpreter: An interpreter is a module that alternately decodes and executes every statement in some body of code. A Java runtime interpreter decodes and executes bytecode for the Java virtual machine.
Java: Java is a general purpose programming language developed by Sun Microsystems. Java is an object-oriented language similar to C++, but simplified to eliminate language features that cause common programming errors. Java source code files (files with a .java extension) are compiled into a format called bytecode (files with a .class extension), which can then be executed by a Java interpreter. Compiled Java code can run on most computers because Java interpreters and runtime environments, known as Java virtual machines (VMs), exist for most operating systems, including UNIX, the Macintosh OS, and Windows. Bytecode can also be converted directly into machine language instructions by a just-in-time (JIT) compiler. Further description of the Java Language environment can be found in the technical, trade, and patent literature; see e.g., Gosling, J. et al., xe2x80x9cThe Java Language Environment: A White Paper,xe2x80x9d Sun Microsystems Computer Company, October 1995, the disclosure of which is hereby incorporated by reference.
Refactoring: Refactoring is the process of making small, structured changes to improve the internal structure of an existing software system without changing its observable behavior. For example, if a user wants to add new functionality to a software system, he or she may decide to refactor the program first to simplify the addition of new functionality and to make the program easier to maintain over time. A software system that undergoes continuous change, such as having new functionality added to its original design, will eventually become more complex and can become disorganized as it grows, losing its original design structure. Refactoring of a software system facilitates building on an existing program in a structured manner that avoids introducing new bugs and problems into the system.
A system providing an improved method for compiler-assisted refactoring of a software application is described. Upon receiving a request for refactoring of a software application (i.e., changing a given symbol or element of the application) from a developer or user, the binary files of the application are parsed to identify those binary files containing references to the given symbol or element. The source files of the identified binary files are then retrieved and fed into a compiler. The compiler is used to generate a list of all uses of the given symbol or element in the software application. This list includes not only the text name of the symbol or element, but also type information and position information regarding its location(s) in the source file. Based upon the list, changes are applied to the software application.
When source code changes are made to a software system, the system and method of the present invention may be utilized to locate dependencies to such source code changes. When a source code change is received, the binary modules of the software system are parsed to determine which binary modules contain dependencies to the source code change. The corresponding source files of the binary modules are then retrieved. The compiler is used to identify each dependency present in the retrieved source code.