The subject matter disclosed herein generally relates to controlling a compute circuit and, more particularly, to controlling post-silicon configurable instruction behavior of a compute circuit.
Currently, core logic development can be completed many months to years in advance of a product launch. For example, currently, some timelines have core logic development complete about three years before the logic is returned in silicon and shipped as a product to a customer making it generally available.
This means that decisions are taken very early for the project on the exact implementation of instructions, even before the software teams designing products to run on the core logic have refined and finalized their requirement for a new instruction. In the past, this has led to adding new instructions in hardware that turned out to not be used by software because they did not fulfil all requirements. Accordingly, a lot of development effort and silicon area is potentially wasted. Also, the new instructions may need to be supported for all next generations of chip, increasing the burden and potential waste. Thus, the elongated development time can make it difficult to react to changes e.g. in open source software, when a new technology emerges and a similar instruction is implemented by another and the software tailored for that new instruction.
Currently, in order to take advantage of the most recent instructions, a new machine would be needed to have a competitive implementation that can handle the new instruction technology. However, a new release of a chip can take years and even if the timelines are accelerated, metal layer changes are expensive and the management of existing versions of the chip are challenging.
Alternatively, software can be coded to work around the existing limitation of the hardware instruction, resulting in slower code and overall lower system efficiency and speed. Further, software rewrites can also be cost intensive and are slow to react to changes in open source software. Further, a field-programmable gate array (FPGA) implementation can be used but these devices have limited bandwidth (are “far” from the processor cores) and have a large implementation bill.
Accordingly, there is a desire to provide a system and/or method to handle the special cases in instruction coding that evolve over time by the time a circuit is brought to market.