Nowadays the huge grown in performances and density of gates in modern gate arrays has made possible very complex design implementations. The resulting challenge is to develop fully verified complex designs granting at the same time a short time to market. The nowadays tendency is to manage complexity by exploiting CAD (Computer Aided Design) tools made available on the market to the designer for promoting large net-list integration directly on a silicon die, i.e. Field Programmable Gate Arrays (FPGA), or Application Specific Integrated Circuit (ASIC). A parallel way to dominate the increasing complexity of designs and to allow an easier verification is the nowadays tendency of splitting them in macro-cells which are self consistent and pre-verified design elements that connected as “virtual components” build the system.
Macro-cells can be either developed by the user or bought on the market as Intellectual Properties (IP). The IP are sold as both hard macro and soft macro. Hard macros or hard cores are predefined logic blocks referring to a specific technology (geometric boundaries for logic functions are part of the description) with accurate timing specification that a user can simply drop into a chip. Soft macros or soft cores are predefined portable logic blocks (geometric boundaries for logic functions are not part of the description) not bounded to a specific technology. Soft macros are almost always described using a Hardware Description Language (HDL). The user has to use a synthesis tool to create the gate level representation of the soft core and to target it to a specific technology. Technology vendors sell both hard macros and soft macros while third party vendors are forced to propose soft macros.
At soft-core level a macro-cell can be defined as a self-consistent design element with the following properties:    1. it is able to implement a well defined behavior (macro-cell function);    2. its behavior is generally described by means of Hardware Description Language (HDL) as Very High speed integrated circuits Description Language (VHDL) and Verilog;    3. it can be composed by several primitives (e.g. memories, etc.);    4. it is linked to other components by means of a limited well specified series of interfaces;    5. it is suitable to be technologically remapped without ANY changes in the description;    6. it is pre-verified at logical (no timing) level, that is verified by simulations that it effectively performs the macro-cell function.
A generic known Macro-Cell (MC) is shown in FIG. 1. Primary Inputs (PI) are located on the left side while Primary Outputs (PO) are located on the right side. Those inputs and outputs have to deal with the function performed by the macro-cell, which is the reason why the macro-cell has been designed.
At the topside are present Configuration and Control inputs and outputs used to: configure the macro-cell for proper operation, receive commands indicating operations to perform, monitor the status of the macro-cell to verify proper operation.
All these inputs and outputs need support registers (flip-flops): this means that configurations are memorized into configuration registers, commands are evaluated in command registers, status are memorized into status registers. These support registers are implemented in a portion of the macro-cell named MACRO-CELL LOGIC which hosts the so-called “glue logic” (generic combinatorial and/or sequential logic networks) plus control FSM (Finite States Machines).
A set of memory based devices (pure memory and FIFO (First In First Out) buffers) are located at the bottom side: WRITE FIFO (WFi), READ FIFO (RFi) and MEMORY (Mi). These devices are mainly used to buffer data stream flows transmitted/received from the macro-cell via microprocessor interface or local bus interface. In most of cases these stream data flows are constituted by functional data (data which have to deal with macro-cell function). To control these memory based devices some control FSM are needed, in particular: an FSM based control block for the WRITE FIFO named WRITE FIFO CONTROLLER, an FSM based control block for READ FIFO named READ FIFO CONTROLLER and an FSM based control block for the MEMORY named MEMORY CONTROLLER. These control machines are embedded into the MACRO-CELL LOGIC block. The MEMORY LMj (LOCAL MEMORY j), shown at the topside, is local to the macro-cell. It is used as storage resource for the algorithm performed by the macro-cell; alternatively it implements a FIFO charged to exchange data with another macro-cell or with Primary Inputs or Primary Outputs of the device which hosts the macro-cell. An FSM based memory control block also exists for each LOCAL MEMORY block but is not drawn in FIG. 1.
Is necessary to introduce an implementation note about FIFO controllers. Usually FIFOs are realized by using dual port memories. The circular buffer realized by a FIFO is obtained by means of a circuit named FIFO CONTROLLER that manipulates the memory addresses to implement the circular list. This is the most general situation, in fact built-in FIFOs (FIFO not based on dual port memories with embedded controller) are not diffused, especially in microelectronics. By this reason FIFO CONTROLLERs are very used and also provided as cores from IP vendors. Moreover, several types of FIFO CONTROLLERs exist. A SYNCHRONOUS FIFO (a FIFO where read port and write port are operated with the same clock) can be realized using a SYNCHRONOUS FIFO CONTROLLER. An ASYNCHRONOUS FIFO (a FIFO where read port and write port are operated with different, not synchronous, clocks) needs to be realized an ASYNCHRONOUS FIFO CONTROLLER. Referring to FIG. 1, even if (as we said) in most of cases FIFO are based on dual port memories, by the sake of simplicity, they are not represented as dual port memories plus a FIFO controller but as a FIFO buffer plus a FIFO controller. Even if not represented in the generic known Macro-Cell of FIG. 1 a LOCAL READ FIFO and a LOCAL WRITE FIFO and relative controllers, which can be present.
A macro-cell based design is a design developed by the user as: a set of user developed Macro-Cells, macro-cells bought as intellectual properties IPi, MEMORY resources Mi or LMj, READ FIFO RFi and WRITE FIFO WFi (First In First Out buffers), connected together. This design constitutes a system or subsystem that can be physically implemented either on a board of FPGA or an ASIC.
If short design time and high reliability are possible due to macro-cell oriented design the flexibility of implemented systems is still due to the widespread diffused microprocessor. Microprocessors are employed as both configuration and control processors or also as elaboration processors. Elaboration processors are used in the system to process functional data executing the system software (the elaboration processors belong to several categories: general purpose, micro-controllers, Digital Signal Processor (DSP), coprocessors, etc.). On the contrary configuration and control processor is the microprocessor charged to configure and control the system; it may either be coincident, partially coincident or distinct from other elaboration processors used in the system.
The developed system consisting in a macro-cell based design implemented on a set of ASIC or FPGA is generally placed on a board with at least one microprocessor acting as configuration and control processor. Here a first need arises. To make possible communication between microprocessor and developed components those components need a microprocessor interface. The microprocessor interface is generally build as a macro-cell and embedded in one of the user developed components on the boards; it can either be a user developed or bought on the market. If the developed system is complex, and/or it needs to be modular, and/or easy to maintain, it is split on a set boards. In the simplest implementation one of the boards hosts the main processor while other boards do not. On the contrary, in multiprocessor systems, more boards host a microprocessor. One or more boards grouped constitute a subsystem of the whole system.
In the described case the system is generally constituted by a rack which is a case hosting all the boards and connecting them by a back-plane. A back-plane is a physical media on which is implemented a bus, which is a shared interconnection resource accessed in parallel that allows great modularity. On the back-plane a back-plane bus is implemented. To promote a common design criteria standard back-plane buses are specified; a widely diffused standard is the so-called VME bus (Versa Module European—IEEE-P 1014). Generally boards directly connected to the back-plane are “intelligent”, that is each of them hosts a microprocessor. Back-plane buses allows very long buses with slow/medium throughput but the microprocessor is generally required to control data exchange.
In very complex cases further bus based interconnection resources are present. Two main cases arises: each board either has an internal hierarchy consisting in more devices connected by a local bus lied on the board itself, or more boards are connected together by means of a local bus engraved on a back-plane parallel to the one which hosts the back-plane bus. Generally, in the latter case, one of the boards connected together by said local bus is also connected to the back-plane bus and hosts a microprocessor (“intelligent” board) while other boards do not.
Some peculiar, but very diffused architectures, also exists. In small systems, that nevertheless need to be modular and/or easy to maintain, like personal computers, the architecture is based on a main-board and the back-plane bus is not present. The main-board is characterized in that it hosts a main microprocessor and a set of cards (small peripheral boards, generally without processor) connected together and to the main microprocessor by means of a local bus lied on said main-board.
In the large variety of system implementations, another recurrent architecture, based on local bus is present. It is based on a “limited” number boards connected by a local bus lied on a “short” back-plane. If the number of boards to connect exceeds the maximum allowed local bus length and capacitive load then different local buses can be connected by means of local bus bridges.
While back-plane buses are used to connect “intelligent” modules (e.g. boards hosting microprocessors), local buses are generally used to allow communication between peripherals. Generally speaking a peripheral is a “stupid” device, that is, a device which does not embed a microprocessor; peripherals is usually lodged on each board or on the cards which are in turn hosted by a main-board. Again, to promote a common design criteria standard local buses are specified; a widely diffused standard is the so-called Peripheral Components Interface bus (PCI). PCI bus offers a processor-independent data path among peripherals and between the microprocessor and peripherals; said peripherals can be directly hosted on a board, hosted on different boards or on cards hosted in turn on a main-board. Local buses allow very short buses with very high throughput and do not require the microprocessor to control the data exchange (microprocessor independence).
Due to the microprocessor independence, local buses are generally more complex than back-plane buses in terms of protocol and more sophisticated in terms of features. Moreover, several reasons spanning from the historical ones to the implementation complexity, to the convenience of integrating a local bus interface into a peripheral device, promoted the development of several “virtual components” for local buses. Nowadays, master interfaces, slave interfaces and bridges are available on the market as IP (Intellectual Property) in form of hard and soft cores.
Here a second need arises. To make possible communication between different boards (cards) connected to the local bus each card needs a local bus interface; this local bus interface can either be a physical component or a macro-cell embedded into a component developed by the user. In the latter case the macro-cell can either be user developed or bought on the market (IP).
The basic architecture described above is present in a large variety of electronics systems belonging to almost all areas of design: Information Technology, Communication, System Automation, Space and Military Electronics and Automotive. In control area numerical axles control systems are characterized in that each axle has a dedicated control card and all the cards are hosted into the same board. In communication area telephone network switches are characterized in that each end-user has an its own line termination card and a set of line termination cards are hosted by the same module. In information technology area a computer motherboard has several equivalent slots to host cards.
FIG. 2 shows an example of complex bus based multi board architecture. Five boards are present: Board00, Board01, Board02, Board10 and Board11. Boards Board00, Board01 and Board02 are connected together by a local bus named LB0 while other two boards Board10 and Board11 are connected together by a local bus named LB1. LB0 and LB1 are physically separated, that is no communication can take place between them. A back-plane bus BB spanning over all the boards is present. To the back-plane bus BB are directly connected Board02 and Board10. As a consequence boards Board02 and Board10 can communicate directly by means of BB while other boards do not. If boards Board00 and Board01 want to communicate with boards placed on local bus LB1 have to pass through Board02. In the same manner if Board11 wants to communicate with boards placed on local bus LB0 has to pass through Board10.
The architecture of boards is now examined. Board02 hosts a microprocessor MuP02 and a memory bank MM02 connected together by a bus uPB02 of the microprocessor MuP02. The last is the main processor for the subsystem constituted by the group of boards (Board00, Board01 and Board02) and MM02 is the main memory for the same subsystem. The microprocessor MuP02 is interfaced to the back-plane bus BB by a microprocessor to back-plane bus bridge named uP/BB02, also connected to the bus uPB02. A microprocessor to local bus bridge is a device able to allow communication between different protocols: specific microprocessor protocol from one side and a specific back-plane bus protocol from the other side. This means that a microprocessor to back-plane bus bridge has a microprocessor bus from one side and a back-plane bus from the other side. An integrated circuit IC02 connected to the bus uPB02 embeds a local bus interface and is interfaced to the local bus LB0. Board01 hosts an integrated circuit named IC01 (either an ASIC or an FPGA), but the argument is still valid for a set of integrated circuits, directly interfaced with local bus LB0. Board00 hosts an integrated circuit named IC00 and a microprocessor named uP00 locally interfaced with the IC00 via a bus uPB00 of the microprocessor uP00 (the microprocessor can be a coprocessor). The integrated circuit IC00 is further interfaced with local bus LB0. Board10 hosts a processor MuP10 and a memory MM10 connected together by a bus uPB10 of the microprocessor MuP10. The last is the main processor for the subsystem constituted by the group of boards (Board10 and Board11) and MM10 is the main memory for the same subsystem. The microprocessor MuP10, the memory MM10 and an integrated circuit named IC10 are connected together by a microprocessor bus named uPB10. Devices connected to uPB10 can communicate with local bus LB1 via a microprocessor to local bus bridge uP/LB10 that is connected from one side to the uPB10 bus and from the other side to local bus LB1. Moreover, the same side of the microprocessor to local bus bridge uP/LB10 connected to local bus LB1, is connected to a local bus to back-plane bus bridge named LB/BB10 which in its turn is interfaced with the back-plane bus BB. Board11 hosts an integrated circuit IC11 directly interfaced with local bus LB1; a microprocessor uP11 (which can be a coprocessor) is interfaced with the same local bus via a microprocessor to local bus bridge uP/LB11 connected to a bus uPB11 of the microprocessor uP11.
To summarize: the boards belonging to the same subsystem can communicate via the local bus, while boards belonging to different subsystem can communicate via back-plane bus.
Now the detailed architecture of some of the boards shown in FIG. 2 is discussed with the goal to check the needs of each integrated circuit on the boards in terms of microprocessor interface macro-cells or local bus interface macro-cells.
In FIG. 3 the detailed architecture of Board00 (FIG. 2) is shown. With reference to the figure we see that the uP00 processor, with its local ROM (Read Only Memory) and local RAM (Random Access Memory) is directly coupled to the integrated circuit IC00, this imply that the circuit IC00 embeds a microprocessor interface uP INTERFACE block (sketched in the Figure). The macro-cells embedded into the IC00 are configured and controlled by configuration and control processor (the latter for Board00 may be the same local uP00 on the card itself) via said microprocessor interface. Moreover the integrated circuit IC00 can communicate with the rest of the system via local bus LB0, this implies that the IC00 embeds a LOCAL BUS INTERFACE too (sketched in the Figure). Is useful to remind that configuration and control purposes generally do not require long burst transactions (small amount of data are transferred and burst transfers are not required). On the contrary the main purpose of standard local buses is to transfer functional data at high speed (e.g. disk data or video data on a Personal Computer), so data transferred on a local bus generally involves large amount of data in burst mode.
In FIG. 4 the detailed architecture of Board 11 (FIG. 2) is shown. With reference to the figure we see that the uP11 processor, with its local ROM (Read Only Memory) and local RAM (Random Access Memory) is coupled to the circuit IC11, passing trough a microprocessor to local bus bridge uP/LB11. The IC11 is directly coupled with the local bus LB1, this implies that the circuit IC11 embeds a LOCAL BUS INTERFACE (sketched in the Figure). In this case this is the only interface of the IC11 with the rest of the system; as a consequence it has to be used for both configuration and control of macro-cells embedded into IC11 and functional data transfer purposes. This is true in case of configuration and control performed by the main processor on Board10 (Card 10 in this context), or in case of configuration and control performed by local processor on Board11 (Card 11 in this context). Board01 does not have a local processor, so configuration and control of macro-cells embedded into the circuit IC01 hosted by board01 is surely performed by the main processor MuP02 on board02. As a result, as in the case of board11, both configuration and control flow and functional data flow pass through the LOCAL BUS INTERFACE embedded into IC10.
Very often the system clock (the one which clocks macro-cells of the system) and the microprocessor and/or local bus clock differ each others; this especially happens in the communication area. This must be taken in account in both microprocessor interface and local bus interface design. Actually, in that case, said interface has to communicate with different clock domains. As known communication between different clock domains can take place by means of synchronization systems, this argument shall be detailed later.
Another general crucial aspect that involves all kinds of designs involving macro-cells is the availability of drivers. A driver is a small program controlling a specific device based on one or more macro-cells, or part of a macro-cell, on behalf of the Microprocessor Operating System. It constitutes an interface between hardware and high level application software. This argument shall be detailed at the end of the text.
“Classic” Architectures and Related Open Problems
Microprocessor interfaces, local bus interfaces plus FIFOs, synchronizers and some other minor parts, constitutes recurrent solutions. These solutions, characterized from being reusable, are implemented as macro-cells by both end users and companies that sell them as IP (Intellectual Property) macro-cells. In this paragraph “classic” solution in the area of microprocessor interfaces and local bus interfaces will be described and discussed and their drawback evidenced. In current approach microprocessor interfaces and local bus interfaces are realized with different macro-cells. The microprocessor interface interfaces the configuration and control processor from one-side and user macro-cells which need to be configured and controlled from the other side. The number of interfaced user macro-cells is generally high. The microprocessor interface is charged to:    1. interface a configuration and control processor;    2. interface user macro-cells performing: configurations setting (on user macro-cells), commands issuing (to user macro-cell) and status retrieving (from user macro-cells).
The purposes listed at point 2 generally do not require burst access. Moreover microprocessor, in general, are not optimized for burst transfers except for the DMA (Direct Memory Access) mode. On the contrary local buses are generally specified for high performance in burst transfers.
Let's consider a board being a subsystem or the system itself. In the “classic” approach, named for simplicity Centralized Microprocessor Interface (CMI), a unique microprocessor interface block, also named CMI in the successive Figures, is present in the hierarchy of the subsystem constituted by the board. Moreover each block of each user macro-cell in the system which need to be operated by the configuration and control processor is directly interfaced with the centralized microprocessor interface. Point 2 can be seen as a set of services offered to the software running on the configuration and control processor in term of primitives able to operate on the interfaced user macro-cells. To implement these services a certain amount of logic, based on registers (flip-flops) and FSM (Finite State Machines) are required, this constitute a set of hardware primitives as: configuration register, command register and status register. Moreover firmware has to be developed to handle the hardware primitives: this constitute the set of firmware (software) primitives named driver. For the sake of simplicity, hereinafter, all the hardware primitives plus memories and FIFOs that can be interfaced with CMI will be referred as resources. Two topology of interconnection between CMI and user macro-cells are used:    1. A so called Centralized Multi-Port Interface (CMPI) to user macro-cells based on a set of ports, each one dedicated to a specific service like configuration, command and status retrieve. The logic implementing said services is embedded into user macro-cells and developed by users;    2. A so called Centralized Bus Based Interface (CBBI) to user macro-cells based on a bus; services like configuration, command and status retrieve are embedded into user macro-cells, have to be developed by the user in such manner to be consistent with bus protocol.
The CMPI is generally implemented by end-users and it is about custom designs implemented on ASIC (Application Specific Integrated Circuit) while the CBBI, is used from IP (Intellectual Property) vendors in realizing microprocessors interfaces.
FIG. 5 shows an ASIC implementation of the unique microprocessor interface block Centralized Microprocessor Interface (CMI) when it assumes the architecture of a Centralized Multi-port Macro-cell Interface CMPI. The ASIC is organized in four clusters of macro-cells plus the CMI. There are four user developed macro-cells: MC1, MC2, MC3 and MC4 and two macro-cells bought on the market respectively IP1 and IP2. The memory resources are constituted by two local memories LM1 and LM2. The rectangles drawn close to each LOCAL MEMORY block represent the MEMORY CONTROLLERS of each LOCAL MEMORY block. The rectangles drawn into the macro-cells MC1, MC2, MC3, MC4, IP1 and IP2 represents register based hardware primitives which implements set of services offered to the software running on the microprocessor interfaced with the CMI. From the Figure it is evident that each interfaced resource is connected point to point with the CMI.
FIG. 6 shows an ASIC implementation of the block CMI when it assumes the architecture of a Centralized Bus Based Macro-cells Interface (CBBI). The situation is the same described in FIG. 5 but all the resources are connected to CMI via a bus CMI_bus: this is a more flexible solution with respect to the Centralized Multi-port Macro-cell Interface (CMPI).
The drawbacks of CMI architecture are:    1. The user is forced to design hardware primitives “ad hoc” for the specific application.    2. In the same way macro-cell drivers are designed as a single unstructured code, this way any new application specific device requires a new relative driver designed from scratch.    3. Being hardware primitives, embedded in user macro-cells, when macro-cells implementing the microprocessor interface CMI change, a certain amount of redesign of hardware primitives embedded in user macro-cells is required to allow interfacing with the new CMI.    4. The unique microprocessor interface, block CMI, is designed ad hoc for the particular microprocessor interfaced and there is not result in the art that effective circuital facilities be provided to simplify a possible change of microprocessor type.    5. A certain redesign of user macro-cell is also required when one of the two different clock domains changes, this is due to the embedding in user macro-cells of their clock domain side of the synchronization circuit.    6. The architecture is feasible in case of chip implementation but not in case of multi-chip board implementation. This is manifest for the CMPI topology of interconnection with user macro-cells. In FIG. 5 is shown the ASIC (Application Specific Integrated Circuit) implementation of a CMI with CMPI topology. An equivalent FPGA bread-boarding implementation of this ASIC is realized replacing each “cluster” of macro-cells and the block CMI itself with an FPGA and all devices are hosted on a board. Being, in case of CMPI topology, the number of ports on the CMI equals to the number of hardware primitives connected to it, the number of pins required by the CMI macro-cells can exceed the number of pads of the FPGA charged to host the CMI.
Points 1 to 4 lengthen the design phase impacting on time to market. Point 5 impacts on portability IC to board and vice versa. Nowadays the last aspect is particularly important for rapid prototyping of systems. Rapid prototyping of systems consists in realizing a prototype of a system via FPGA bread boarding. This is useful to explore the correctness of a system's architecture before the production of the system starts. In general for large productions the system will be finally integrated into an ASIC following a strategy of System On a Chip (SOC). “Classic” architectures very often are not compliant to this requirement of portability between ASIC implementation and an FPGA bread boarding implementation.
The macro-cell which implements local bus interface, interfaces the local bus from one side and user macro-cells which need to transmit and/or receive stream data from the other side. The number of interfaced user macro-cells is generally low. The local bus interface is charged to:    1. interface the local bus;    2. interface user macro-cells performing: transmission and/or reception of stream data flows to/from user macro-cells. The main purpose of local bus interfaces is expressed at point 2 even if in many designs it is also used to perform functions typical of microprocessor interfaces (as in Board01 and Board11 of FIG. 2 or in FIG. 4); by this reason a third point can be added:    3. interface user macro-cells performing: configurations setting (on user macro-cells), commands issuing (to user macro-cells) and status retrieving (from user macro-cells).
Let's consider a board being a subsystem, or the system itself. In the “classic” approach, named for simplicity Centralized Local Bus Interface (CLBI), a unique local bus interface block, also named CLBI in the following Figures, is present in the hierarchy of the subsystem constituted by the board. Moreover each block of each user macro-cell in the system which need to be operated by the local bus is directly interfaced with the block CLBI via a bus named CLBI_bus in the following Figures. As in the case of microprocessor interface, the point 3 above can be seen as a set of services offered to the software running on an agent which controls the local bus, in term of primitives able to operate on the interfaced user macro-cells. All the considerations done in the case of microprocessor interface are still valid, i.e. when a local bus interface is used for configuration and control purposes, the list of drawbacks of “classic” solution is the same described for CMI interface.
Also the latter point 2 can be seen as a set of services offered to the software running on the agent which controls the local bus, in term of primitives able to operate on the interfaced user macro-cells. To implement these services a certain amount of logic, based on memory and FSM (Finite State Machine) are required: this constitutes the set of hardware primitives (READ FIFO CONTROLLER, WRITE FIFO CONTROLLER, and MEMORY CONTROLLER). Moreover firmware has to be developed to handle the hardware primitives: this constitute the set of firmware primitives named drivers. As in the case of block CMI, for the sake of simplicity, hereinafter, all the hardware primitives plus memories and FIFOs that can be interfaced with block CLBI will be referred as resources.
The following solutions present on the market are consistent with the CLBI architecture but differs in the topology of interconnection between CLBI block and user macro-cells:    1. A so called Centralized Multi-Port Interface to user macro-cells (CMPI) based on a set of ports, each one dedicated to a specific service. The services are embedded into user macro-cells and developed by users. Several different implementations are possible;    2. A so called Centralized Bus Based Interface (CBBI) to user macro-cells based on a bus; services, embedded into user macro-cells, have to be developed by the user in such manner to be consistent with bus protocol.
FIG. 7 shows an ASIC implementation of the unique local bus interface Centralized Local Bus Interface (CLBI) when it assumes the architecture of a Centralized Bus Based Interface (CBBI). The ASIC is organized in four clusters of macro-cells plus the block CLBI. There are four user developed macro-cells: MC1, MC2, MC3 and MC4 and two macro-cells bought on the market, respectively IP1 and IP2. The memory resources are constituted from two local memories LM1 and LM2, a READ FIFO RF1, a WRITE FIFO WF1 and a MEMORY M1. The rectangles close to each memory resource represent as many controllers of respective memory resources. The READ FIFO CONTROLLER drawn closed to RF1, the WRITE FIFO CONTROLLER drawn closed to WF1, and the MEMORY CONTROLLER drawn closed to M1 represent FSM based hardware primitives which implement a set of services offered to the software running on the agent which controls the local bus. The rectangles drawn into the macro-cells MC1, MC2, MC3, MC4, IP1 and IP2 represent register based hardware primitives which implements set of services offered to the software running on the agent which controls the local bus. From the Figure it is evident that each interfaced resource is connected to the bus CLBI_bus.
On the market are present several IP interfaces that exhibit the CMPI topology of the point 1, they are listed in the following points 1a, 1b and 1c. Point 1a concerns a centralized interface to user macro-cells based on a read port (FIFO based) and a write port (FIFO based) and no address bus available, like in a bridge. Point 1b concerns centralized interface to user macro-cells based on a memory mapped i/o. Two dual port RAM are used to exchange data to and from user macro-cells which are mapped to the memories. Point 1c concerns a more structured solution combining solution 1a or 1b for burst transactions with a bussed interface like CBBI for configuration and control purposes.
FIG. 8 shows an ASIC implementation of the architecture presented at point 1c. The CMPI (Centralized Multi-Port Interface) interconnection topology is used for a burst read port and a burst write port, while the CBBI (Centralized Bus Based Interface) interconnection topology is used for a bussed port (no burst capable) devoted to configuration and control of macro-cells. The ASIC is organized in four clusters of macro-cells plus the CLBI (Centralized Local Bus Interface). There are four user developed macro-cells: MC1, MC2, MC3 and MC4 and two macro-cells bought on the market respectively IP1 and IP2. The memory resources are constituted from two LOCAL MEMORIES LM1 and LM2, a READ FIFO RF1 and a WRITE FIFO WF1. RF1 and WF1 are respectively connected to the burst read port and to the burst write port. The rectangles close to each memory resource represent the controllers of each memory resource. The READ FIFO CONTROLLER drawn closed to RF1, the WRITE FIFO CONTROLLER drawn closed to WF1 represent FSM based hardware primitives which implement a set of services offered to the software running on the agent which controls the local bus. In their turn, the rectangles drawn into the macro-cells MC1, MC2, MC3, MC4, IP1 and IP2 represent register based hardware primitives which implements set of services offered to the software running on the agent which controls the local bus. All the register based hardware primitives are connected to the bus CLBI_bus.
In conclusion, architecture CMPI (Centralized Multi-Port Interface) of the point 1 is related to specific applications (like a bridge). On the contrary architecture CBBI (Centralized Bus Based Interface) of the point 2, even if more complex, is the most general and versatile, especially if many user macro-cells have to be interfaced.
The architectures at the points 1a, 1b and 1c are oriented to interface up to two applications capable of burst (one in read and one in write). When more burst capable applications are involved, the application's designer has to grant the contended access to the unique interface.
The architectural drawbacks of CLBI (Centralized Local Bus Interface) topology are the same exposed in the case of CMI (Centralized Microprocessor Interface). The only difference is that in this case resources interfaced with CLBI may be different from the ones interfaced with CMI. More precisely, when CLBI is employed for the goals exposed at point 3 the interfaced resources embedded in the macro-cells are the same as in case of CMI, while when CLBI is employed for the goals exposed at point 2 the interfaced resources are FIFOs and memories.
All the architectures discussed above can be used combined in applications; for instance IC00 of FIG. 3 (on Board00 of FIG. 2) uses a CMI macro-cell and a CLBI macro-cell, while IC111 of FIG. 4 (on Board11 of FIG. 2) uses only a CLBI macro-cell.
Until now drawbacks of the known interfacing architectures have been outlined concerning hardware implementation of the macro-cells; these architectural drawbacks also reflect into restrictions on the hardware-related protocol which governs the transaction on the CLBI_bus. In particular the interfacing of resources in the known architectures at the points 1a, 1b and 1c seems to be quite rigid. To say that transactions from the unique block CLBI toward the complex of interfaced macro-cells, and vice versa, requires a great deal of design dedicated to the FSMs embedded in the macro-cells to cope with the various interfacing transactions, this makes the interfacing protocol design very cumbersome and not portable.
The lack of a modular structure in the known interfacing architectures also reflects into a similar lack in the software design of drivers for specific devices based on macro-cells. In fact drivers of the known art are generally designed as single unstructured codes in the same way as the code of the respective applications (macro-cell devices), which are historically designed as belonging to a single entity (CMI). This way any new device requires a new driver designed from scratch. Contemporary hardware project style, oriented to the use of reusable functional blocks (macro-cells) should permit a more rational project style for device drivers too. Nevertheless, nowadays device drivers are written again in the traditional way. A plausible explanation is the absence in the art of a modular and well-structured microprocessor-to-macro-cells interface able to stimulate a new software design for drivers.
A serious attempt to reduce time-to-market and allows maximum subsystem re-use in systems that span a wide range of performance characteristics, is disclosed in U.S. Pat. No. 5,948,089 (Sonics, Inc.). The relevant claim 1 recites textually: “A computer bus system comprising:                a synchronous bus operative during a number of bus cycles, said number of bus cycles divided into recurring frames, each frame further divided into packets comprising at least one clock cycle;        at least one initiator subsystem coupled to the bus, the at least one initiator subsystem configured to have at least one packet pre-allocated to the at least one initiator subsystem, said initiator subsystem configured to send out a request during a clock cycle within the at least one pre-allocated packet, said request comprising a command and further comprising an address of a target subsystem;        at least one target subsystem, said target subsystem configured to receive the address of the request and determine if the address corresponds to an address of the target subsystem, wherein if the address of the request corresponds to an address of the target subsystem, said target subsystem responds to the request on a second clock cycle.”        
In the introductory part of the cited patent document it is clearly said the computer bus to work in such a way to de-couple the frequency of the bus from the operating frequencies of the various client subsystems. In such a way each subsystem may operate based on its own requirement, and the subsystem interface modules needn't to be redesigned when the operating frequency of the bus is increased. To meet said requirements, the system visible in FIG. 1 of the citation substantially discloses a fully pipelined fixed-latency communication system based on a computer bus. Said computer bus is shared among various initiator/target subsystems in which initiators have the capability to act, in turn, like a master (or slave) while targets are always slaves. Because of the shared resources, the problem of subdividing transmission bandwidth among various initiator and target subsystems arises. Sonics' invention solves the outlined problem by importing in the computer bus world some solutions well known from ATM (Asynchronous Transfer Mode) networks. Those networks implement a protocol suitable to asynchronously transfer serial packets (cells) to/from various nodes of a telecommunication network. Packets are made of fixed number of serial octets of bits queued into relevant sending/receiving buffers, respectively locate at the two sides of a switching matrix that provides for routing. At this purpose packets have a header for the relative identification by means of a label, other than an information field available for the user need. The header also includes further information that pertains to the ATM layer functionality itself, for example for bandwidth control. A communication computer inside each node manages the packet consumption at the various queues by implementing a policy which takes into account the bandwidth requirements of the different users, in term of bit-rate. To do so an protocol negotiation phase is foreseen in which the guaranteed bandwidth is set at first, then the residual bandwidth is distributed among the various requesters by means of a token mechanism. Communication media supporting serial ATM packets are either physical carriers, like optical fibers or coaxial cables, or radio connections for digitally modulated serial data. As known from techniques concerning serial transmission, a framed timing structure of the bit-stream is needed for synchronization aims. So in synchronous transport layers for ATM, such as STM-n streams (Synchronous Transfer Mode-n) belonging to SDH (Synchronous Digital Hierarchies) links, the ATM asynchronous cells are fitted into SDH frames and therein synchronized by exploiting cell multiplexing/de-multiplexing provisions. The framed packet feature is also reproduced in the Sonics' invention, as clearly recited in the claim 1.
From the above arguments it can be argued that Sonics' on-chip computer bus system is something more than a computer interface: it seems to include all the relevant features of a truly communication network interface (ATM, LAN, Token Ring, etc). The only difference between Sonics' invention and classic communication network interfaces is that in the second case interfaces are connected to a physical media suitable for serial transport (coaxial cable, optical fiber), while in the first case the interface uses a modified computer bus made of parallel metallic paths inside a chip or, at most, extended in the boundary of a board. In conclusion, Sonics' computer bus adopts a distributed architecture managed by a mixed criteria pertains both to the token ring and ATM networks, for the precise aim of promoting re-use of the silicon subsystem designs and reduce on-chip devices time-to-market.
In the Applicant's opinion different solutions are possible to reach the same goals (i.e. reuse and lower time-to-market), without forcing a designer to implement complex features typical of a communication system interface instead the simpler features of a processor interface, in case only the last is needed. Framing and packetization of bus cycles, successive storing of information concerning bandwidth requirements of the multiple subsystems and respective selection by two arbitration levels, turn out to be additional demanding features whether only a processor interface towards a plurality of target subsystems is really implemented.
From a logic point of view a boundary should anyway exist between a true communication system interface, more suitable for a computer network, and a simpler processor interface. A processor interface generally deals with transactions between a single microprocessor and a plurality of target devices appended to the processor bus. Target devices interface the processor either directly or preferably indirectly by means of a common module connected to a standard processor bus, at one side, and to a proprietor bus and/or point-to-point link towards all the target devices, at the other side. Many examples have been already discussed above speaking about the prior art architectures.
By comparison communication network interfaces (as per the Applicant opinion the Sonics' computer bus) exploit communication media in order to extend communication facilities to a plurality of processors and devices. In the framework of communication networks the most relevant problem to be solved at the interface level is that of how to regulate multiple accesses to the common media from the various contenders, in order to both avoid conflicts and meeting different bandwidth requirements. This problem, anyway important, is not as much pressing in a simpler processor interface and can be solved by means of traditional arbitration methods like round-robin one, in case modified as per a variant of the present invention to improve performances with a distributed architecture. Contrarily to Sonics' communication system and to computer network interfaces in gender, a processor interface takes great advantage from burst transactions. Burst is a sequence of bus transactions occurring on consecutive bus cycles and implying address increment or decrement. Resort to burst transactions in processor interfaces having a distributed architecture needs the solution of some incoming problems. The Applicant's invention solves these problems by means of a so-called “PREFETCHABLE FIFO” which extends burst opportunities to the distributed resources embedded in the processor interface. Bursts transactions are quite inapplicable in communication systems like the Sonics' invention, for the reason that the mechanism that supports bursts is inconsistent with mechanisms for distributing bandwidth through a particular policy of the accesses. More precisely the more bursts are long the more they are profitable for reducing subsequent latency (this argument will be detailed later); so by using long bursts or by frequent recourse to shorter bursts the TDMA (Time Division Multiple Access) method taken to fair distributing bandwidth is paralyzed. It's useful to remind that Sonics' invention implements two level arbitration scheme where the first level of arbitration is a framed time-division-multiplexing arbitration scheme and the second level is a fairly-allocated round-robin scheme implemented using a token-passing mechanism: to say two TDMA methods.
The main shortcoming of the Sonics' communication system has been outlined, that is to be unable to improve designs reuse and reduce time-to-market of relevant on-chip devices without introducing technical features typical of computer network interfaces. In application addressed to usual microprocessor interfaces those features should appear like additional and too binding ones.