A. Field of the Invention
This invention relates to the field of digital data processing systems wherein one or more host data processors utilize one or more supporting scientific processors in conjunction with storage systems that are commonly accessible. More particularly it relates to an improved High Performance Storage Unit (HPSU) memory resource for use in such a digital data processing system. Still more particularly it relates to an improvement in the manner of simultaneously moving a large number of data words within a particular functional area of such an HPSU, particularly between a number of memory storage areas concurrently operative for the reading of such data words to a number of output registers which supply such read data to those particular requestors of such HPSU which did make reference thereto. Still more particularly, it relates to a plurality of concurrently operative storage memory banks (nominally 8) containing storage memory modules (nominally 4 in each bank) the multiple output ports (nominally 4 called SP0, SP1, IOP, and IP) of each which module are respectively wired-OR interconnected to form nominally 4 very wide (nominally 144 data bits+16 parity bits) and fast (the data outputs of addressable memory stores of 4,194,304 words of 36-data-bits+8-error-syndrome and parity bits, which stores are of necessity physically extensive, will be moved to register/drivers outputting such data to appropriate requestors in nominal 22.5 nanoseconds) wired-OR data buses.
B. State of the Prior Art
1. Environment of the Invention
Digital data processing systems are known wherein one or more independently operable data processors function with one or more commonly accessible main storage systems. Systems are also known that utilize a support processor with its associated dedicated supporting, or secondary storage system. Such support processors are often configured to perform specialized scientific computations and are commonly under task assignment control of one of the independently operable data processors. The controlling data processor is commonly referred to as a "host processor". The host processor characteristically functions to cause a task to be assigned to the support processor; to cause required instructions and data to be transferred to the secondary storage system; to cause the task execution to be initiated; and to respond to signals indicating the task has been completed, so that results can be transferred to the selected main storage systems. It is also the duty of the host processor to recognize and accommodate conflicts in usage and timing that might be detected to exist. Commonly, the host processor is free to perform other data processing matters while the support processor is performing its assigned tasks. It is also common for the host processor to respond to intermediate needs of the support processor, such as providing additional data if required, responding to detected fault conditions and the like.
In the past, support scientific data processors have been associated with host data processing systems. One such prior art scientific processor is disclosed in U.S. Pat. No. 4,101,960, entitled "Scientific Processor" and assigned to Burroughs Corporation, of Detroit, Michigan. In that system, a single instruction multiple data processor, which is particularly suited for scientific applications, includes a high level language programmable frontend processor; a parallel task processor with an array memory; a large high speed secondary storage system having a multiplicity of high speed input/output channels commonly coupled to the front-end processor and to the array memory; and an over-all control unit. In operation of that system, an entire task is transferred from the front-end processor to the secondary storage system whereupon the task is thereafter executed on the parallel task processor under the supervision of the control unit, thereby freeing the front-end processor to perform general purpose input/output operations and other tasks. Upon parallel task completion, the complete results are transferred back to the front-end processor from the secondary storage system.
It is believed readily seen that the front-end processor used in this earlier system is a large general purpose data processing system which has its own primary storage system. It is from this primary storage system that the entire task is transferred to the secondary storage system. Further, it is believed to be apparent that the entire task is transferred to the secondary storage system. Further, it is believed to be apparent that an input/output path exists to and from the secondary storage system from this front-end processor. Since task transfers involve the use of the input/output path of the front-end processor, it is this input/output path and the transfer of data thereon between the primary and secondary storage systems which becomes the limiting link between the systems. Such a limitation is not unique to the Scientific Processor as disclosed in U.S. Pat. No. 4,101,960. Rather, this input/output path and the transfers of data are generally considered to be the bottleneck in many such earlier known systems.
The present scientific data processing system is considered to overcome the data transfer bottleneck by providing an unique system architecture using a high speed memory unit which is commonly accessible by the host processor and the scientific processor. Further, when multiple high speed storage units are required, a multiple unit adapter is coupled between a plurality of high speed memory units and the scientific processor.
Data processing systems are becoming more and more complex. With the advent of integrated circuit fabrication technology, the cost per gate of logic elements is greatly reduced and the number of gates utilized is ever-increasing. A primary goal in architectural design is to improve the through-put of problem solutions. Such architectures often utilize a plurality of processing units in cooperation with one or more multiple port memory systems, whereby portions of the same problem solution may be parcelled out to different processors or different problems may be in the process of solution simultaneously.
2. Description of the Prior Art
It is of known utility in the prior computer art that a multiplicity of requester processors, such as commandarithmetic processors, input/output processors, and/or scientific processors, should be communicative with a single, common, concurrently shared random access memory storage resource. Such requester users of the memory resource are communicative with the resource through physical ports, upon which ports requests are registered and results are obtained. Such a common, concurrently shared, computer memory resource may be called, especially if such resource is large and fast, a High Performance Storage Unit.
Within the common, concurrently shared, memory resource or High Performance Storage Unit, the numbers of the storage sites which are concurrently operative in parallel in service of multiple ones of communicating requesters is a function of the desired bandwidth of response of such memory resource to individual ones of such requesters, and to multiple ones or to all of such requesters in aggregate. A typical division, or site, of a memory resource is called a bank, of which, for example, there might be eight such each containing 524K 44 bit words within a very large memory unit. Normally, diversity of addressable references in pending requests permitting, all such banks may be simultaneously operative in the performance of separate and unrelated address references to storage locations therein, such as reading and writing references. Within a memory bank, further subdivisions, or subsites, of memory function may further exist. These subsites are nominally called storage modules, of which, for example, four such each comprising, for example, 131K words of 44 bits each, might constitute one bank. The storage modules within a single bank are also supportive, the addressable references and the types of operations performed permitting, of simultaneous operation on related, contiguous, addresses. Successive operations within each such storage module are also normally pipelined. The particular prior art numbers, constructions, and concurrencies of operation of memory banks and memory storage modules in enabling the prior art performance of the concurrently shared memory resource function is not of importance in understanding the present invention, the pertinent concept being only that within a high performance concurrently shared memory resource there are necessarily a number of memory storage areas, be they banks or storage modules or whatever, which are concurrently active to perform memory references, such as data reads and writes.
To this point, there are thus three concepts in the prior art construction of high performance memories which are pertinent to the understanding of the present invention. First, the memory resource is shared. Shared does not merely mean that the memory resource is universally communicative, and that the various requesters of such memory resource may ultimately sequentially obtain reference to all addressable parts thereof, but rather additionally means that all requesters do compete, in a priority scheme within the common front-end logics of the memory resource, for access to the total referencable stores thereof. The second pertinent prior art concept is that in order to support concurrent access, then such one memory resource, communicative with a multiplicity of requesters, will contain internal referencable and addressable memory stores which are granularized, by banks and/or by storage modules or by whatsoever named subdivision, to be concurrently and simultaneously operative in the provision of response to memory references, such as the reading of data words. Finally, the third pertinent prior art concept is that the referenced data simultaneously developed at the multiplicity of internal memory stores sections and subsections, the banks and storage modules, must be correctly distributed to the multiplicity of requesters which have made the corresponding references of the memory resource, and of the memory stores therein. It is this functional area of a concurrently shared memory resource--how a multiplicity of data words concurrently referenced within a shared memory store may be distributed to that multiplicity of requestors which (in a priority scheme) did give rise to such concurrent references--that is the area of the present invention.
The prior art computer science method for moving the outputs of a multiplicity of data sources, such as the banks and storage modules of a shared memory resource, to a multiplicity of destination sinks for such data, such as the requesters of a memory resource, is called multiplexing. The circuitry which performs this distribution function is a multiplexer. A multiplicity of multiplexers may operate concurrently. Therefore it starts to look like a lot of words which define the area of the present invention--words like "shared", "priority", and "bank" or "storage modules"--are irrelevant: multiplex the data distribution between the sources and the destinations and be done with it.
Now it is indeed known in the prior art to use brute force multiplexing within a memory resource to perform distribution between a multiplicity of source sites, or banks, concurrently operative therein to deliver data words and those destinations, or ports, to which such data words must be delivered. The alternative solution of the present invention will, however, be based on the consideration of some secondary factors reading on the desirability of brute force multiplexing of data from sources (banks) to destinations (ports) within a shared memory resource--factors associated with words like "shared" and "priority", plus "bank" and "storage module".
The prior art known to the disclosers of the present invention which is associated with the sharing of data stores, and the prioritization of multiple concurrent references thereto does not extend so far that such prior art can be clearly seen to be a factor motivating that solution offered by the present invention. The improvement to such prior art taught in U.S. patent application Ser. No. 596,206 for a MULTILEVEL PRIORITY SYSTEM in the name of J. H. Scheuneman, et al., filed on an equal date with the present disclosure and assigned to the same assignee, does extend the computer art associated with the prioritization of multiple concurrent references so far that such art can more clearly seen to be a factor motivating that solution offered by the present invention. All such art associated with the prioritization of multiple concurrent references to shared data stores, both the prior art and the extension to such art in U.S. patent application Ser. No. 596,206 is discussed in this BACKGROUND OF THE INVENTION section in order to sensitize the reader to certain features in such art which bear upon the present invention. These features are located in that prioritization occurring in the "front" end of a shared memory resource and thus seemingly remote from the multiplexed distribution of data occurring in the "back" end of the same shared memory resource. In the prior art it is known to prioritize a small number of requestors, say four requestors, for shared concurrent access to memory stores within a number of memory banks, say eight banks. The prior art multiplexed distribution of the data from the eight banks to the four parts then requires and uses four eight-to-one (8:1) multiplexers (sometimes called demultiplexers). In the prior art it is also known to prioritize a larger number of requestors, say eight requestors, for shared concurrent access to memory stores within a number of memory banks, say eight banks. The prior art multiplexed distribution of the data from the eight ports to the eight banks then requires and uses eight eight-to-one (8:1) multiplexors. But prioritization of this larger number of requestors, eight requestors, may be unsuitably time consuming in two ways if some one or ones of the eight requestors are uniquely time critical, and require a maximally fast response from the shared memory resource. First, the wider width of the priority resolution, eight requestors wide, becomes much slower by a time which may be a very significant fraction of the overall memory resource response time. Second, if the uniquely time critical requestor(s), which of course are accorded the highest priorities within any one priority scan, is (are) itself (themselves) very fast with repetitive requests to the memory resource (which is entirely probable--the reason that a critical requestor(s) wants memory data fast is usually that it runs very fast), then such time critical requestor(s) will monopolize the memory, locking out lower priority requestors from access. In order to prevent this, a snapshot priority, wherein all pending requests are serviced at least once within each priority scan, is often employed. But such snapshot priority intersperses numbers of lower priority requests with the (higher priority) requests of time critical requestors--thus defeating the maximization of the time performance of the memory resource to such time critical requestors.
The invention of a MULTILEVEL PRIORITY SYSTEM within U.S. patent application Ser. No. 596,206 is one approach to solving both problems: permitting a relatively more narrow and faster priority resolution (at least to the time critical requestor(s)) while also according that a larger number of requestors (nominally eight) may be prioritized so that lower priority requestors are not locked out even though time critical (high priority) requestors are not unduely delayed in obtaining maximal time response from the memory resource. When such a MULTILEVEL PRIORITY SYSTEM is within the "front" end of a memory resource, the present invention of a multiple output port memory storage module is particularly advantageous. Basically, and without explaining both the invention of U.S. patent application Ser. No. 596,206 and the presently disclosed invention, the interaction between any, prior art or not, priority determination within the "front" end of a memory resource and the action of the present invention to distribute data within the "back" end of the same resource is as follows. If the priority system within the "front" end of a memory resource is able to prioritize a large number of requestors (e.g., ten) by classes or parts (e.g., a first class of one time critical requestor member plus a second class of one time critical requestor member plus a third class of four requestor members plus a fourth class of four requestor members) then, by definition, only one requestor within any of such classes or parts will be simultaneously serviced by the memory resource (e.g., four requestors, one from each class, can be simultaneously serviced). This means that at the very "back" end, output, data communication drivers which exist in sets (e.g., ten sets) for communication to each of the (ten) requestors, only one such set as is associated with one only requestor within a class of requestors, will be operative at any one time (e.g., only one set of four such sets serving the four requestor members of the third class could be operative at any one time). This means, still at the "back" end of memory, that one only data output register needs supply all the sets of data communication drivers as are associated with each class of requestors. The total data output registers (e.g., four such in service of four requestor classes) are the "destinations" to which the data simultaneously developed of the "sources (e.g., the eight memory banks of which maximally four only can be simultaneously accessed by requests of the four requestor classes) needs be distributed (e.g., data from eight sources (banks) of which four are active needs be distributed to four destinations (data outputs registers)). Now, of course, this required data distribution can be straight forwardly multiplexed (e.g., by four eight-to-one (8:1) multiplexers) as is taught in the prior art. Certain information, mainly that one only of the requestors within each requestor class will be simultaneously operative, exists upon the existence of prioritization by parts, however, so that the improved apparatus and method of the present invention, utilizing such information, may be particularly advantageous. Of course, in the degenerate case wherein each requestor class consists of but a single requestor, the apparutus and method of the present invention will still function.
Having dispensed with the somewhat complex concept that the prior art in establishing "priority" to concurrently "shared" memory stores, performed by circuitry within the "front" end of memory should have some bearing on the present invention of multiple output port memory storage modules, performed by circuitry within the "back" end of memory, it is also necessary to understand some prior art physical fundamentals of "banks" and "storage modules". Banks and storage modules, either or both, are likely to be modular replaceable units, such as on pluggable assemblies or printed circuit boards. Although some small quantum of additional room, in which logic circuits may be emplaced, may be present in a modular bank or storage module, it is not normally desired, either for maintainability nor for the economics of partitioning functionality across replaceable assemblies, to distribute the output logics and drivers, which may in aggregate be very extensive, of a shared access memory resource onto these banks and/or storage module assemblies. Indeed, the memory resource final output registers, drivers, and control logics are often themselves sufficiently extensive so that such are themselves distributed across plural pluggable assemblies or printed circuit boards. Within such a functional partitionment onto different physical assemblies, which partitionment is extremely common if not mandated for very large scale shared memory resources, the banks and/or storage modules are going to be at one physical place and the final output registers, drivers, and control logics are going to be at another physical place. The information communicatively distributed between these two places is data, and in very large scale shared memory resources operating at very high external communication bandwidths (e.g., in the range of 11.4 gigabits/second to aggregate requestors), the amount of this data is prodigious in both the bit widths of the communication and in the rates thereof. Very high rates imply communication which is costly in both the active elements and the communication channel medium. Very wide bit widths of communication imply large numbers of these expensive high performance communication paths. Obviously it thus becomes desirable to maximize the duty cycle, to attempt to obtain 100% utilization, of these numerous and costly data communication paths between the banks and/or storage modules which originate data and the final output registers and data drivers which are the (immediate next) destination of such data. Although organization of the distributive communication paths between memory resource banks and/or storage modules can be considered an issue of partitionment, and not of functionality, the organization of such communication to be from multiple output port storage modules in accordance of the present invention will generally permit that in actual, common, physically realizable partitionments of a memory resource the duty cycle over these numerous and costly distributive communication paths will be 100%, or twice that 50% duty cycle obtained by the full multiplexed solution of the prior art.