Field of Application
The present invention relates to the field of the management of data structures through electronic processing.
The invention relates to methods and systems for an improved processing of operations on data structures containing large amounts of data, and particularly methods and systems for processing unit for databases, in the context of databases having large size and complexity.
Description of the Prior Art
In the technical field of the management of databases, the execution of operations on structures of coded data is an essential aspect.
In fact, as it is well known, and as it is summarized herein in simplified terms, a set of physical level 0/1 bits is generally coded into bytes. Given a certain code, each byte, e.g., an 8-bit byte, takes a precise meaning: a character, or a meta-character, or an indicator of separation and/or punctuation, or an integer, or a decimal, and so on.
In turn, multiple bytes can form strings of bytes, which may represent “semantic” structures at a higher level, i.e., words, sentences, up to entire files of coded data.
Furthermore, strings of bytes, of any of the above-mentioned types, can be inserted into information fields, suitably stored, which in turn may be mutually linked by relationships and/or logic connections, for example forming the so-called “database records”.
Thus, the structure set forth above defines an articulated “data structure”, having multiple levels, incorporating a plurality of correlations between information fields, each of which contains strings of coded bytes, each of which is formed, at the lowest level, and closest to the physical level, by a plurality of bits.
Therefore, in the context of the database management, given a flow of bits, it is necessary to known the code with which they are coded, in order to identify the bytes, the strings of bytes, and their meaning; hence, it is necessary to known the type of data structure characterizing such data, in order to be able to determine how to operate thereon.
A number of data structures is known and is commonly used in databases, including, e.g., ordered lists, multi-dimensional tables, data arrays, graphs, and so on.
In the context of a “data structure”, as defined above, there is the need to carry out “operations on data structures”: by such a definition are commonly meant operations and/or functions relating to the management of the relationships among data. Usually, these operations are not arithmetic operations, nor, more generally, mathematic and/or numerical calculation operations. Instead, the operations on data structures more frequently relate to the order or the mutual correlation of data within the data structure: for example, the extraction or insertion of selected elements of a list or a table, the reordering according to the most various criteria of a list or of a table row or column, the reconfiguration of the connections between graph nodes, and so on.
Currently, there are, commercially available, a number of known systems that are capable of carrying out these operations and/or functions, which systems are in the form of, e.g., hardware-software platforms for managing databases.
Such known systems contemplate the use of hardware processing units of a standard type (for example, general-purpose CPUs, suitably provided with volatile and non-volatile memories), and they provide that the above-mentioned “operations on data structures” are generally performed by suitable specific software programs. Such software programs operate above the hardware physical level, and are further based on “lower level” software layers (such as, for example, an operating system supervising the operation of the machines in which the database is implemented): in such a sense, they can be defined as “high-level software programs”. A number of “software packages” of such a type exists, for the database management; such “software packages” are loaded and executed by processing units, e.g., standard computers.
The versatility, simplicity, and cost-efficiency of such solutions, which employ standard hardware resources, have been and is often recognized as an advantage, in many possible applications; this explains the above-mentioned well-established trend towards a “software” implementation of the different operations to be carried out on data structures.
However, this type of known solutions also has some drawbacks.
In fact, to execute the requested operations on the data structures, the above-mentioned software programs implement corresponding algorithms, which perform the requested operations based on a plurality of basic steps executable by a general purpose processor. Therefore, each operation on a data structure is carried out by multiple basic/elementary steps and through multiple interaction steps with the processor and the memories thereof.
As the size of the databases to be managed increase, also the amount of hardware resources that is needed to perform the operations increases, approximately linearly, while keeping the response time constant. Vice versa, being the resources constant, the response time tends to linearly increase.
This involves a problem of scalability of the hardware resources, i.e., the need to significantly increase the number of resources to be provided, in terms of general purpose CPUs and memory resources, and therefore, in fact, in terms of the number of computers that are needed.
The above-mentioned drawbacks and problems, which could be considered as marginal in the field of small- or medium-sized databases, are becoming pressing and limiting in view of the current databases (for example, databases of administrative bodies, banks, press archives, and so on), the dimensions of which by now exceed the order of magnitude of the Terabytes, and which provide for more and more sophisticated and complex relationships among their records.
The near-future trend towards even larger and more complex databases make prospectively the currently available solutions less and less efficient and satisfactory.
Consequently, a significant need emerges for improving the way to manage very large-sized databases, in order to save resources, while keeping the performance constant, or to ensure a more satisfactory performance, while keeping the resources constant, or anyhow to find a better performance-resources trade-off, compared to the currently available solutions.
In fact, the above-mentioned currently available solutions do not allow meeting such a need to achieve an improvement in the management of very large-sized databases.
In this regard, it shall be noticed that also the possible use of mathematic co-processors, per se known in other technical fields relating to the electronic processing, would not solve the above-mentioned problems. In fact, the mathematic co-processors are capable of optimizing the execution of merely arithmetic and/or calculation operations, or of operations requiring intensive calculations; on the contrary, as already illustrated, the operations on data structures, at issue herein, are operations of a different type, i.e., operations that are not arithmetic, nor implementable by simple sequences of basic arithmetic calculations.