1. Field of Invention
This invention relates to a distributed structure and a parallel processing technology in computer technology and, in particular, to a distributed-structure-based parallel module structure and parallel processing method.
2. Discussion of Related Art
Since the last 30 years, the research funds and qualified people have been heavily investing in the fields of distributed structure and parallel processing of computer. However many key problems still haven not been solved.
The expected target of the computer structure research is the industrialization. It is particularly true with respect to the achievements obtained in the single CPU computer field owing to its industrialization. For this target, the generality of parallel computer system is far more important than its speed. Toward the generality, many wide gaps can not get through so far.
Although there are many difficulties, works are continued. Today, the research of distributed system still aims at the distributed operating system, and the parallel processing schemes have been focused on three types of structure: symmetric multiprocessor (MP), massively parallel processing system (MPP), and network of workstations (NOW).
FIG. 1A is a schematic diagram of shared memory structure in previous art.
In previous art, the hardware layer of MP enters to a shared-memory-oriented structure. Therefore, from the point of view of programming, it is the common abstraction capability of MP that a plurality of processors 100 shown in FIG. 1A are connected to a shared memory 102 through an internal connection network 101 having various structures, wherein the shared memory 102 may be distributed or centralized in structure.
The advantage of shared memory allows each processor to access shared memory directly by the instructions of single processor. The recent trend of development is the implementation of unified addressing for distributed memory so that the read/write operation in different time, known as non-uniform memory access (NUMA), will be effected.
In a data-sharing model, it is the advantage of instruction direct access that there is no distributed memory structure parameters appearing in programming. However, the data-sharing model does not provide the principle of modeling in a MP hardware environment for user, and also does not provide a direct solution for program synchronization of N processors in general distributed computing.
FIG. 1B is a schematic diagram of network of workstation (NOW) structure.
A plurality of computers in distributed structure are connected by a communication interface. It is their common capability in configuration that a plurality of program flows 103 are connected in a form defined by communication mode 104 respectively, and connected by the communication program control in program flow 103. They have the advantage of distributed structure, but there is still a difficulty of modeling by user.
Regardless of data sharing or massage passing, the multi-machine system is also in difficulty with its data processing and synchronization between a plurality of program flows 100 (or 103). The structures shown in FIG. 1A and FIG. 1B can not work out a solution to the problems of synchronization and modeling at hardware layer. In fact, a common understanding on the key of difficulty in distributed system and parallel processing is not achieved up to now. Some viewpoint considers that the difficulties of distributed system are the real-time ability and the schedulability, and the difficulties of parallel processing are the parallel programming and parallel compiling. However, the parallel processing and parallel compiling are also problems to the distributed system, and the real-time ability and schedulablity are also needed in the parallel processing. The generality, which is the most important problem, is still a research target for distant future without any courage to face it today.
Similarly to the viewpoint of some experts, the inventor considers that “the previous research on parallel processing does not expose the main contradiction (or enter to the real nucleus of problem)”, and the corresponding works should be taken.
Summarizing the previous arts shown in FIG. 1A and FIG. 1B, on the promise of multi-machine architecture, the parallel processing must be characterized by two features as follows:
(1) The control of system running must be a plurality of independent asynchronous-programs; and
(2) The data must be transferred in real-time among a plurality of programs when N distributed programs are performed in parallel (these data are known as data in parallel module or “modular data” for brief).
According to the features described, at least 3 problems must be tackled for implementing a proper model operation and programmability in parallel processing as follows:
(1) the determination of validation has to be made before using a modular data transferred for the program flow;
(2) the configuration parameters should be generated in a hardware transmission path between processors for the data, but they are not the parameters of application model and could not appear in the programming interface; and
(3) In the system operation, not only the synchronization of programs should be implemented, but also the dead lock must be prevented.
Only the three problems described above have a solution, the discussion on parallel processing efficiency (that is, speed) can be tackled.
For decades of years, these three problems (referred briefly to as “3* problem” below) is are obvious, and easy to analysis. However, an optimum scheme for total solution of 3* problem is still not suggested.
The 3* problems is are caused by the fact that the control interface for N programs and a naked-processor comprised of “N independent programs+interconnection structure for multi-machine” are provided by almost all hardware. This interface is substantially intended to tackle the multi-machine problem in the field of Turing machine.
A basic configuration of Turing machine comprising a Turing machine tape (named state tape below) and a read-write head is shown in FIG. 1C.
The Turing machine aims at a state processing, and definitions including “step-by-step”, “left shift or right shift”, “contents of process state square”, etc. Only are provided to the read-write head and the acceptance from reader is required. Based on the concept of process “state tape” of Turing machine in previous art, the “algorithm” discipline is developed with a great contribution to the advance of computer technology. Thus, part of the functions in read-write head such as the moving way shall belong to the algorithm naturally.
The Turing machine is a very important abstraction model in computer science with a clear description of “state element” in computer science and engineering. However, it was only a product in computer budding period without the problems of multi-machine architecture. Its limitation appears when it enters to a multi-machine environment (distributed and parallel processing).
The feature of “N independent programs+interconnection structure between processor” is corresponding to one state tape shared by a plurality of read-write heads, and not departs from the field of Turing machine. The invention recognizes that from the view point of multi-program, the effect eventually corresponds to “one state tape shared by a plurality of read-write heads”, no matter it is a shared memory or a massive passing in previous art.
In some previous arts, it is expected that the synchronization of read-write head can be coordinated by the determination of state-tape information so that the 3* problem can be resolved. The invention recognizes that 3* problem is really a problem, but it is neither a total problem, nor a nucleus of problem. The deeper layer problem is unable to describe and summarize by the Turing machine, and a breakthrough is necessary to make for the Turing machine limitation and the interface of “N programs+interconnection structure between processors” resulted from hardware structure. The invention recognizes that it is necessary to reconsider the problems in a wider range and a deeper layer and to introduce another element for computer science and engineering.
To simplify the description of a new computer element related to the invention, some new concepts and discoveries will introduced herein.
Firstly, the concepts of “flow” and “sequence” will be introduced. A flow represents a dynamic and serial independent-driving configuration, and a sequence represents a static order, generally, a flow results from the hardware structure, and a sequence result from the model analysis. Based on this definition, a computer structure can be abstracted into a flow structure and the Petri-network (it implicates a state description) and the likes are recognized as representation in sequence aspect. The match between a flow and a sequence (may be referred to as a flow-sequence) indicates that a sequence is allowed to load on a flow, and the flow-sequence is running in accordance with the sequence loaded under a driving of the flow. The computer program, in fact, is the result of match with ‘flow’ and ‘sequence’, that is, it is represented with instruction, and also is a flow-sequence with high controllable capability at hardware layer (not all the flow-sequences can be converted into a program).
A flow, a sequence and a match between them are three concepts. The previous various concepts in computer field are developed on the basis of the matching with ‘flow’ and ‘sequence’.
Because the basic thoughts of the invention is a match from ‘sequence layer’ (representing model) for structure and from ‘flow layer’ (representing structure) for model and after the two matching, the invention discover sequence-net, which can match for both flow layer and sequence layer. Therefore, it will be happened that a same structure and relative symbol (for example, Sc and Si) indicates the sequence of a model in the model environment (without the flow concept of computer structure), but indicates a flow in the computer structure environment (without the sequence concept of model), and indicates flow-sequence (in the flow-sequence matching state) in the application environment. In this specification text, it seems that there is confusion in the symbol of Sc, Si and the likes. But it is feature that this invention owns, and it is necessary for describing the thoughts of the invention. So that in the sequence-net computer description, it should be the flow-sequence state which combines flow with sequence.
When a model can be implemented in a single processor, the algorithm will be solvable. However, the same model may be unable to implement in multi-machine, but it can not be recognized that the algorithm is unable to solve for this reason. Therefore, the relation between model and computer structure is not dependent on the algorithm. In this invention, an “adaptability” is introduced to represent the relation between model and computer structure. A well solution of adaptability means the achievement of generality of computer structure. Thus, the adaptability is the key target of research for industrialization.
Two fundamental laws discovered in the research of flow and sequence aimed at adaptability is incorporated directly herein.
(1) An N sequences parallel-processing model comprises N+1 or more sequences in fact, wherein the N+1st sequence is an global sequence.
(2) A flow from the structure is required to match each sequence from the model respectively for forming a flow-sequence, that is, the flow-sequence pairing law.
Furthermore, an inference can be drawn in below:
N+1 sequences can be combined into one sequence (thus, a flow-sequence pairing in single processor is formed).
The laws and inference show the relation steps as follows:
The first step of adaptability research is the permission of modeling. The algorithm research can be entered only after the permission of modeling. According to the concepts of flow and sequence, the match between flow and sequence is the premise of the permission of modeling. Therefore, the quality of a match between flow and sequence is one of the key factors affected the adaptability.
In addition, they show that there are differences between the single-processor and the multi-machine in research environment as follows:
The single-processor has a natural match between flow and sequence, and enters to the algorithm research with the match condition. On the other hand, the distributed and parallel processing enters to the algorithm research without a sufficient match condition.
From the research described above, a fully new viewpoint emerged: there are two elements, “flow-sequence” and “state”, in the computer science.
The “state” element is familiarized through the Turing machine. Ever since the beginning of computer, the state element has been a very important factor. The most abstract and succinct summarization is made for the state element by the Turing machine. The research of state element is deepened by the algorithm for the purpose of running a model in a computer. In fact, the computer background for state element research is the single-processor structure due to its natural match property.
For the “flow-sequence”, although it is not familiarized in its form, it has been perceived by anyone in the art without an expression of it only. In the definition of read-write head in the Turing machine, pair of matching flow-sequences is implicated to represent the feature of single-processor structure. However, for the separation of flow-sequence, the implementation of sequence description, and the deepening of sequence research, the Turing machine is powerless, and a creative research approach and description method is required.
The invention recognizes that the flow-sequence is an independent layer in a computer architecture, and the “flow-sequence element” is same important as that of state element. The research of flow-sequence element will be described by sequence-net below, and one of the research targets is the adaptability between model and computer structure. The invention, as the first research on flow-sequence element, yields an entirely new result. In the previous art, the flow-sequence element is in a state of necessity realm (an objective law necessary to obey although without consciousness) all the time. Thus in the operational application example of parallel processing, it is inevitable to comprise a matching process for flow-sequence element unconsciously (but can not yield a good result).
The invention recognizes that all the previous theories can not allow the “flow and sequence” element to display it talent, and a reconstruction of a layer and method capable of the problem description is required. The invention will start from a most general flow chart.
An abstraction of a typical parallel processing flow chart is shown in FIG. 1D.
This is a structure representation abstracted from a flowchart often appearing on a desk or a workshop wall. In FIG. 1D, the interconnection relation between flow steps is indicated by token mode, and it is taken as a typical parallel module and description method in the invention. According to the invention, the flowchart shows a plurality of parallel sequences and the relation among then in a model and a flow in the flowchart is referred to as a sequence. The flowchart comprises three sequences S1-S3 and four sets of combined token 108. Sequence Si (i=1, 2, 3) (the specific numbers of index I are omitted in below description) are three independent sequences, and the interconnection functions among them are indicated by the combined tokens T1, T2, T3 and T4. The combined token 108 consists of a plurality of tokens including a source token 105 {circle around (·)} and a destination token 106 {circle around (1)} both inserted into the corresponding positions in each sequence Si respectively. Among the sequences, the token transmission path structures 109 are required by the combined token 108.
The parallel run of three sequences are functioned as follows:
A source token value=1 (Valid) is generated when a sequence Si passes a source token 105. The value (=1) of source token 105 is sent to a destination token 106. A test token value is required when a sequence Si passes a destination token. If the tested token value=0 (invalid), the sequence Si will be waiting; if the test token value=1 (valid), the running of sequence Si will continue.
For a model, the run coordination process of several sequences Si described above is very clear for human understanding without any question. In fact, it is the original form of parallel module in previous art even though it is not defined clearly.
The sequence Ms of a module that exists in the interconnection of sequences Si is shown in FIG. 1E.
Based on the component feature of sequence Si and token, a set of directional line can be found in FIG. 1D. They are running zigzag from left to right through the sequences and the combined tokens 108 by branching and passing to indicate the running order among sequence, and are referred to as the sequence of (parallel) module Ms 110 as shown in FIG. 1E.
It is obviously that there are two different parameters in a Ms. The one is an arrow line overlapping with each sequence Si in horizontal direction to represent the sequence of combined token 108 during running. The another one is an arrow line overlapping with internal transmission path structure in vertical direction, and is generated by the multi-machine structure. The key intention of Ms is the sequence of combined token, and the Ms is independent of each Si, however, in a distributed structure environment, the Ms is superimposed sectionally on these Sis, and implemented by the token transfer so that a large amount of structure parameters are mixed into the Ms.
The sequence of module Ms is an global sequence with distribution property in sequence structure. The global property refers to an autonomous sequence defined by the interconnection relation of N sequences Si. The distribution property refers to the distribution of Ms among each Si and its transfer relation among each Si. Thus, the sequences contained in an Ms can not separated from each Si, and also affects the structure of each Si.
The sequence Ms 110 indicates the character of internal sequences of a parallel module. In FIG. 1D, it shows that there is one (at least) global sequence Ms 110 in addition to N (N=3 in here) local sequences. Therefore, One of the basic character of a parallel module is the possession of N+1 sequences (at least) (may be equivalent to N+2, N+3 . . . sequences).
It is also an understanding result from the flowchart by a viewpoint of computer structure that there are N+1 sequences in a parallel module. Thus, there is a misconception, which is neglected very easily, between such result and the intelligent understanding of human brain limited on FIG. 1D. The model's sequence Ms is neglected in previous art due to the misconception. At the same time, a huge amount of data structures and synchronization parameters are increased in the control due to the interconnection of multi-machine structure, known as the information explosion, and the requirement for eliminating them is very hard to realize in previous art. Although the increment of structure parameter is eliminated, the Ms described above will be removed at the same time for the misconception. However, the elimination of Ms is not allowed since it is a model parameter and not the proliferating parameters from a multi-machine structure. How to input the model parameter Ms? This is not only the actual target for computer architecture research in previous art, but also a main source to cause all the problems. Furthermore, it is a start point the invention departs from the previous art.
After the description of FIGS. 1A-FIG. 1E, the points of previous art will be more apparent by observing and explaining it again from a new viewpoint.
The existence of Ms is neglected in previous art.
In the previous parallel processing technology, the efforts are only made toward the structure without the research of a general character over the span across model and computer structure. The distinction of various architectures is the differences in hardware interconnection structure, but they are consistency in neglecting Ms and attempting parallel running control with N independent programs. Therefore, from a standpoint of the invention, the common problem in previous art belongs to a same type that the multi-machine problems are solved by the concept of Turing machine. There are N+1 sequences in a model in fact, and N independent sequences can only be supported by the structure of N computers. Thus, the problem that who can support the N+1st global sequence Ms occurs. Naturally, one may associate such problem with the basic concept “N+1 unknown numbers can not be solved by N equations” in the high school algebra.
The invention recognizes that this is the problem for previous art in thinking.
In the description of the 3* problem originated from the structure research, the problem of N+1st global sequence is not referred. In the previous art, the synchronization only relates to N programs as a traditional understanding in previous art, and the N+1st sequence factor is not taken into account. Obviously, the concept of 3* problem in previous art is error since the key problem, the match between flow and sequence, is neglected, and the object effect of multi-machine synchronization is the implementation of the sequence of Ms.
The reasons for the limitation appearing in the Turing machine faced with the distributed and parallel processing will be described in below:
In the structure defined by Turing machine, the match between one flow and one sequence is complete, and the flow is combined with sequence. Therefore, the Turing machine can not work as a base for the research of flow-sequence element since it represents the state element abstractly in the condition of the accomplishment of a match between flow and sequence.
The reasons for the possession of a match between flow and sequence in a single-processor will be described:
N+1 sequences (N local sequences and 1 global sequence) in a model can be combine into one sequence to support a match between flow and sequence in a single-processor.
In a single-processor, the parallel module shown in FIG. 1D corresponds to a multi-process. In the environment of single-processor, the sequence Ms 110 of a module existed in fact is converted into “a serial sequence generated and arranged in time order”, and each sequence Si is divided into small blocks respectively. Then each small blocks is arranged to meet the requirement of Ms and N number of Si at the same time, and a serial sequence consistent with original N+1 sequences is generated. After the accomplishment of the combination of a plurality of sequences into one sequence, the match with a flow comprised in the single-processor structure in performed. The match is complete at the hardware layer already, thus the single processor is capable of modeling at the hardware layer.
Furthermore, in a single-processor, The environment of 3* problem is changed greatly. The “shared memory” in multi-machine is replaced by the instruction-addressing memory in a single-processor. Each flow-sequence Si is converted into the multi-process in a single-processor without the structure parameters in the access data from the memory, and the data transmission among Sis is converted into a “read after be written” order for a same memory address. Therefore, in the single-processor structure, the token structure shown in FIG. 1D and FIG. 1E disappears, and the sequence element is hidden. For the sequence of module Ms, the works remain a “read after be written” process only, and become an intellectual act corresponding to the algorithm without any relation to the computer structure in the form. This is the reason for the natural possession of a match between flow and sequence, the human effort only directs to the processing form for Ms (for example, converting into a direct valuated statement, or a grammar structure). Moreover, in these processes, Ms enters to the running control of computer as an input information included in the program objectively. As a result, the single processor can further enter to a better adaptability from the match ability, and make a success in the generality.
The model that can be matched on flow-sequence in previous art will be described in below:
In the multi-machine structure, there are N flows naturally, and N independent sequences can be supported. However, there are N+1 sequences in a model in fact, so they can not be matched. In the previous research on distributed and parallel processing, the research of flow-sequence element is not introduced into the computer architecture, and the environment for solution is located in the environment of “N independent programs+interconnection structure for multi-machine”, furthermore, the problem of matching flow and sequence is neglected.
However, according to the principle that the sequence is a model parameter, the user model in any architecture can not operate without the input of Ms. Furthermore, according to the principle of the match among N+1 flow-sequences, the N+1st “global flow” structure capable of supporting the global sequence Ms can be found from the system inevitably for any applicable case in previous art.
The “time slice” structure often used in a parallel system forms the N+1st self-driving flow, which is independent of N single-processor programs, by timing interruption. Obviously, during the matching of the N+1st flow-sequence, the global sequence Ms lies in a distributed sequence state shown in FIG. 1E, and the time slice flow is an global flow having a structure capable of matching with the Ms sequence in distributed state. As a result, the algorithm is limited to a mode that “the synchronization is handled (to solve the communication problem) after the internal computing of time slice”, The control interface of the mode is “N programmable programs+one time slice flow-sequence”.
The Ms structure shown in FIG. 1E can be implemented by the meticulous programming of the programs in each flow-sequence Si. In this case, however, the N+1st flow is a virtual-driving flow superimposed on each flow-sequence Si and connected by tokens, Therefore, for the matching of the N+1st flow-sequence, not only N programs related should be programmed, but also a large amount of multi-machine structure parameters must be handled. Furthermore, this mode relates to the specific model, and an individual matching is required. Although it can be running, it can not provide the general modeling for the user. The invention recognizes that such mode still not reaches stage for a match between flow and sequence in the instruction layer. Therefore, the establishment for a match between flow and sequence with the previous art may be expected only with the high-level language layer.
The reasons for unsuccessful multi-machine adaptability will be described in below:
Firstly, in the multi-machine structure, there are N flows naturally, which can support N independent sequences, though there are N+1 sequences in a model actually. In previous art, however, the working habit was not changed in its research of distributed and parallel processing, and the flow-sequence element is not introduced into the computer architecture (same for the previous single-processor architecture research). Furthermore, the environment of solution is located in the environment of “N independent programs+interconnection structure for multi-machine”. The problem of matching flow and sequence is neglected. As a result, in the previous art, a structure capable of matching can emerge without the teaching of a well-found theory and method.
Secondly, it is an important concept that the sequence of module Ms is a parameter of input information, and an essential breakthrough is made hardly for an architecture research without such concept. In the previous art, the algorithm in the signal-processor environment is simulated, and the problem of sequence Ms solution is excluded from the architecture research, leaving it to the algorithm or modeling. Thus, the modeling is very difficult, and various characters of sequence Ms are generated naturally, even though the matching is successful. The unifying multi-machine operating system can not be generated since there is no processing specification for Ms. Therefore, the arrangement of leaving Ms with the algorithm is a direct reason for the failure to generalize so that the industrialization can not be realized forever in previous art.
Thirdly, in the previous example capable of matching, N+1 flows are consistent with N+1 sequences in their numbers, and the match between flow and sequence is obtained so that a multi-machine system can be operated. For the match between flow and sequence, some different modes are developed. However, the layer and control interface in all these matching modes can not meet the need of diversity for adaptability.
In the previous technology capable of modeling, the distributed structure in a model is not improved so that the N+1st flow built unconsciously by a user must be an “global+distributed” flow. Because it is very difficult to build such a flow in the multi-machine structure, an global parameter (for example, time or synchronous instruction, etc.) must be introduced to generate an global flow based on the new parameter, and the Ms should be modified to match with the new parameter. After the implementation of an associated match between the N+1st flow and the new parameter, the match between flow and sequence for original N programs should to changed to connect the N+1st flow-sequence (based an other parameters). At last, the system is operational, and the system modeling is clear. However, the generality of model is limited by the new parameter, moreover, the well adaptability is lost.
All the N+1st flow build can not reside in the hardware layer and enter to the instruction flow format as those in the single-processor. Therefore, the matching quality of all N+1 flows (especially the N+1st flow) is the essential reason for the failure to popularize these applications.