1. Field of the Invention
The present invention relates, in general, to a method and system for use with data-processing systems utilizing programs. Specifically, the present invention relates to a method and system, for use with data-processing systems utilizing programs written in mid-level programming languages, such as C++, which allow direct control of memory allocation and access. Yet more specifically, the present invention relates to a method and system, for use with data-processing systems utilizing programs written in mid-level programming languages, such as C++, which allow direct control of memory allocation access, where such method and system increase the accuracy, efficiency, and reliability of such programs.
2. Description of Related Art
A data-processing system is composed of one or more computers, peripheral equipment, and software that perform data-processing. A computer is a programmable functional unit: that is typically controlled by internally stored programs, and uses common storage for all or a part of such programs and for all or part of the data necessary for the execution of such programs; executes user-written or user-designated programs; performs user-designated data manipulation, including arithmetic operations and logic operations; and can execute programs that modify themselves during their execution. A digital computer operates on discrete data represented as strings of binary digits. Furthermore, a computer may be a stand-alone unit or may consist of several connected units. A program consists of a sequence of instructions suitable for processing by a computer, wherein such processing may include the use of an assembler, a compiler, an interpreter, or a translator to prepare the program for execution, as well as to execute it.
The program is written in a programming language. A programming language is any artificial language that can be used to define a sequence of instructions that can ultimately be processed and executed by a computer. However, defining what is or is not a programming language can be problematic, but general usage implies that the programming language requires a translation process, such as from a source code expressed using the programming language to a machine code that a computer, or data-processing system, needs to work with, by means of another program, such as a compiler. Thus, English and other natural languages are typically ruled out, although some subsets of English are used and understood by some fourth generation (i.e., artificial intelligence oriented) languages.
There are many different types of programming languages. Programming languages are typically viewed as belonging to one of three different conceptual classes: a low-level language class, a high-level language class, or a mid-level language class.
In a computer, the instructions contained within a program are interpreted and carried out by a processing unit such as a central processing unit. A processing unit is composed of one or more integrated circuits that process coded instructions and perform a task. The set of possible coded instructions for the central processing unit is called its instruction set. The processing and execution of instructions contained within a program are typically represented in machine language which is generated from one or more of the following: interpreter, assembler, compiler, linker. The most prevalent example of a low-level programming language is assembly language. Whereas machine language is coding that is machine specific, assembly language is a mnemonic representation of machine language intended to be more easily understandable by humans and which is (theoretically) machine independent and thus more portable; typically, assembly language is translated, or converted, to the machine language appropriate to a particular processor by an intermediary, machine-specific, computer program known as an assembler.
Human programmers do not think or reason in terms of logical 1 and logical 0, consequently, human programmers often find it exceedingly difficult to program utilizing assembly language. Instead, human programmers tend to think or reason in terms of natural (i.e., human) language or combination of natural and mathematical language. In light of this realization high-level programming languages have been created.
A high-level programming language is a programming language whose concepts and structures are convenient for human reasoning, such as the following: COBOL (common business-oriented language) which is a high-level programming language, based on English, that is used primarily for business applications; FORTRAN (formula translation) which is a high-level programming language based on English and mathematical language, primarily designed for applications involving numeric computations such as scientific, engineering, and mathematical applications; and Pascal, a high-level general-purpose programming language based on English and mathematical language. A high-level programming language allows a human programmer to write instructions for a computer in a way which is much more analogous to human reasoning than is possible with a low-level programming language (such as assembly language). This is accomplished by employing multiple layers of translation programs which successively transform a program written in a high-level programming language into an equivalent set of machine language instructions which a processing unit can understand and execute.
While high-level programming languages relieve the human programmer from the burden of dealing directly with assembly or machine code, there is a cost associated with such relief: with a high-level programming language a human programmer is no longer able to directly access the true logical structure of the processor in use. That is, due to the fact that the machine code equivalent of the programmer's program written in high-level language is produced via multiple layers of translation programs, the programmer is effectively "screened off" from accessing the true logical structure of the processor directly. Ordinarily, such "screening off" does not pose a problem; however, there are instances, such as programs which are very memory and computationally intensive (e.g., voice recognition programs) wherein the programmer would find it very advantageous to be able to create, access, control, and adjust certain processor and/or memory locations directly. On the other hand, even in such situations the programmer does not desire to return to the tedium and lack of ease-of-use associated with assembly/machine language.
Mid-level programming languages have been created to fill the gap between low-level and high-level programming languages. That is, mid-level programming languages have the "look and feel" of high-level programming languages in that they appear and read in a fashion more similar to ordinary human reasoning than low-level programming languages. However, mid-level programming languages are different from high-level programming languages in that they allow the relatively direct access to and manipulation of logical structures within the purview of the processing unit. This makes mid-level programming languages both powerful and dangerous from a programming perspective.
The mid-level programming languages are powerful in that they allow direct access to and control of logical structures (e.g., memory addresses) thereby allowing a programmer to make more efficient use of computational resources. However, such mid-level programming languages are dangerous in that they will allow a programmer to engage in logical mistakes without returning an error message such as would be returned by a high-level language when the same mistake was made. Furthermore, the "compromise" nature gives rise to several unique dangers and inefficiencies associated with such mid-level programming languages.
Two good examples of mid-level programming languages are C and C++. C is a programming language considered by many to be more a machine-independent assembly language than a high-level language (and hence its characterization here as a "mid-level" language) which has the mid-level features discussed above. C++ is an object-oriented version of the C programming language which also contains the mid-level features discussed above. As used herein, the term "C" is intended to refer to both C and its C++ incarnation. C can be utilized to illustrate concrete examples of the foregoing described possible logical mistakes, unique dangers, and inefficiencies associated with mid-level programming languages.
One particular area in which C can be utilized to illustrate the foregoing described possible logical mistakes, unique dangers, and inefficiencies associated with mid-level programming languages is that related to the way C handles arrays.
C allows the programmer to specify an array of a defined specific size (e.g., an array with ten rows and six columns), which results in the reservation of memory sufficient to contain the array. However, due to its built-in power, C will also allow the programmer to attempt to specify and access an element of the array supposedly in the seventh row and tenth column of the array. Such an element is clearly outside of the defined parameters of the array. In a high-level language, such an attempt to access a nonexistent array element would result in an immediate error message; however, in C no error message will be generated and in fact data may be returned in response to the attempt to access such spurious array element. Thus, it is apparent that C will allow the programmer to engage in logical errors in which he would not be able to engage in a high-level programming language.
C allows a programmer to size arrays dynamically, but only for arrays with one dimension; that is, array sizes cannot be set dynamically for arrays having more than one dimension. Consequently, when the size of an array of dimension two or higher depends upon a value calculated at initial run-time, the only option is to allocate more memory than is expected to be needed by the array of dimension two or higher. Such reservation can obviously give rise to programming inefficiencies in the event that the reserved space is not needed.
C stores data in arrays without compression. Consequently, if an array is a multidimensional array and the data being stored in the array is relatively sparse (meaning that there are relatively few nonzero data elements), storing the data is inefficient since most of the data stored in the array has relatively little information content. Furthermore, if the array in question is multidimensional (i.e., two-dimensional or higher) such inefficiencies are exacerbated.
As has been noted, C stores data in arrays without compression. Furthermore, the data stored within the arrays is usually based upon the standard number of bits utilized to represent either a number (e.g., a 32-bit integer) or character (e.g., an 8-bit character). If the arrays contain data which is limited in a range, for example a multidimensional array of flags wherein the flags are restricted to the values zero and one, then it is very inefficient to use such standard number of bits ordinarily utilized to represent either a number or character for this purpose.
In light of the foregoing it is apparent that a need exists for a method and system, for use with mid-level programming languages such as C and C++ which will preserve the flexibility of such mid-level programming languages while preventing and/or eliminating the aforementioned logical errors, unique dangers, and inefficiencies arising from the inherent flexibility of such mid-level programming languages.