The present invention relates to the field of computer systems. More particularly, the present invention relates to instructions used to move data within a string from one location to another.
The IEEE standard, xe2x80x9cIEEE 1394 Standard For A High Performance Serial Bus,xe2x80x9d Draft ratified in 1995, is an international standard for implementing an inexpensive high-speed serial bus architecture which supports both asynchronous and isochronous format data transfers. Isochronous data transfers are real-time transfers which deliver data on time without guaranteeing the integrity of the data. Each packet of data transferred isochronously is transferred in its own time period. The IEEE 1394-1995 standard bus architecture provides up to sixty-four (64) channels for isochronous data transfer between applications. A six bit channel number is broadcast with the data to ensure reception by the appropriate application. This allows multiple applications to simultaneously transmit isochronous data across the bus structure. Asynchronous transfers are traditional data transfer operations which guarantee the integrity of the data during delivery using an acknowledgement protocol.
The IEEE 1394-1995 standard provides a high-speed serial bus for interconnecting digital devices thereby providing a universal I/O connection. The IEEE 1394-1995 standard defines a digital interface for the applications thereby eliminating the need for an application to convert digital data to analog data before it is transmitted across the bus. Correspondingly, a receiving application will receive digital data from the bus, not analog data, and will therefore not be required to convert analog data to digital data. Devices can be added and removed from an IEEE 1394-1995 bus while the bus is active. If a device is so added or removed the bus will then automatically reconfigure itself for transmitting data between the then existing nodes. A node is considered a logical entity with a unique identification number on the bus structure. Each node provides an identification ROM, a standardized set of control registers and its own address space.
The IEEE 1394-1995 standard provides for up to sixty-four different isochronous implementations, certain 1394 devices are being built with the capability to only transmit and receive isochronous data over a subset of less than sixty-four channels. When receiving data on an isochronous channel, that data must be processed by the receiving device. This processing includes any or all of displaying, manipulating, forwarding and storing. Often, data received on different isochronous channels is processed differently, depending on the type of device from which the data is received, the type of data that is received and the desired use of the data. If data received on an isochronous channel is not received and processed efficiently, errors in the display or use of the data can result.
There are a wide variety of computer systems capable of processing digital data. A basic structure of a computer system is shown in FIG. 1A. The heart of the computer system 1 is a central processing unit (xe2x80x9cCPUxe2x80x9d) 2. Within a computer system 1 the CPU 2 is coupled to firmware 4, data storage devices 5, ports 3, and random access memory (xe2x80x9cRAMxe2x80x9d) 6 by a bus structure 7. Data storage devices 5 include hard drives, floppy drives, and CD-ROMs. Input/output (xe2x80x9cI/Oxe2x80x9d) devices such as a display monitor 8 and an IEEE 1394-1995 device 10, are coupled to the bus structure 7 through ports 3. A keyboard 9 is also coupled to the CPU 2 through one of the ports 3. Ports 3, both serial or parallel, are used to connect the computer system 1 to modems, printers, and other devices, including other computer systems. FIG. 1B illustrates a computer system 1 coupled to a display monitor 8 and networked to an IEEE 1394-1995 device 10, such as a video camera, through an IEEE 1394-1995 serial cable 11.
In a computer system 1, firmware 4 is used to seek out and load an operating system from one of the data storage devices 5 (usually the hard drive) when the computer system 1 is first turned on. Programs and applications used by the computer system 1 are generally stored on the hard drive and moved at least in part to the RAM 6 during use.
Common CPUs 2 included within a computer system 1 include reduced instruction set computation (xe2x80x9cRISCxe2x80x9d) processors or complex instruction set computation (xe2x80x9cCISCxe2x80x9d) processors. Examples of RISC processors are the PowerPC(trademark) processor manufactured by International Business Machines Corporation and the G3 processor manufactured by Motorola Corporation for Apple Computer Corporation personal computers. Examples of CISC processors are the model 80xc3x9786 processor and the Pentium(trademark) processor, which are both available from Intel Corporation of Santa Clara, Calif.
A CPU 2 stores data in internal memory locations, registers, and memory. Registers are used during program execution to temporarily store intermediate results. The advantage of storing data in a register instead of a memory location is that the data within the register can be accessed much faster. Data that is not used during register operation is stored in memory. Memory associated with a processor (xe2x80x9cassociated memoryxe2x80x9d) is typically located within the CPU 2 itself as L1 cache, nearby the CPU 2 as L2 cache, or in an area separate from the CPU 2.
The location in which data is stored in the registers and memory is identified by an address. A read operation is used to access data found at a specific address. A write operation is used to store data at a specific address. Writing a value to a specific address will erase the value previously found at that address.
Computer systems are controlled by instructions. Instructions are statements specifying an operation to be performed and what data operands are to be processed by the computer system. A queue of pre-selected and sequenced instructions make up each computer program. Each instruction includes an operation code (xe2x80x9copcodexe2x80x9d) and operands. The opcode is the part of the instruction that identifies the operation to be performed. Typical operations are ADD, SUBTRACT, and MOVE.
Operands describe the data to be processed as the operation specified by the opcode is carried out. The instruction""s operands may be an address location or actual data. Placing actual data within the instruction typically results in faster execution of the instruction. Limitation in the instruction""s size, however, usually dictates that most operands are address locations for data stored in memory or registers.
A collection of instructions to be used by a particular computer system 1 are referred to as an instruction set. In RISC architectures, the instructions are of uniform length. In xc3x9786 CISC processors, the length of instructions varies widely. The minimum instruction consists of a single opcode byte and is 8 bits long. A long instruction that includes a prefix byte is as long as 104 bits. Longer instructions containing more than a single prefix byte are also possible.
One common instruction completed by the CPU 2 is a shift instruction. Shifting is the process of moving data that is stored in a storage device relative to the boundaries of the device, as opposed to moving data in or out of the device. The storage device is often a register designed specifically for shifting (xe2x80x9cshift registerxe2x80x9d). The direction of the shift is either left or right. Vacated bit positions (on the leftmost for shift right operations and on the rightmost for shift left operations) are filled with logical ZEROs. Shift operations are often used in field alignments, packing and unpacking of data items into storage units, and high-speed multiplication and division. Simple shift registers shift data only one space per shift. More advanced shift registers shift data any arbitrary number of spaces per individual shift.
An operation very similar to shifting is rotation. Rotation differs from shifting in that, in a left shift operation, a bit rotated out from the left is placed back into the vacated rightmost bit position. Similarly, in a right shift operation, a bit rotated out from the right is placed back into the vacated leftmost bit position. Otherwise shift and rotate operations are identical.
Another common instruction completed by the CPU 2 is a mask instruction. Masking is used to extract desired information from a storage unit while suppressing the undesired information. In the below example, only the 8 least significant bits of the 16 bit string are extracted from the original register bit string:
01010111 01011100 register bit string
00000000 11111111 mask bit string
00000000 01011100 bit string result
As shown, a bitwise logical AND operation is performed with the register bit string and the mask bit string. Where the value of the mask bit is logical ONE, the corresponding register bit is retained in the bit string result. Where the value of the mask bit is logical ZERO, the register bit is suppressed. The mask bit string is generated during the execution of the instruction from data included within the instruction.
A masking operation is used in combination with bit string read operations, shift registers, bitwise logic operations, and bit string write operations to deposit a string of bits into a specific memory or register location. An extract function is a form of a mask operation. For a source bit string S, a destination bit string D, and a mask bit string Mask, an extract function performs a bitwise logical AND operation with the source bit string S and the mask bit string Mask, then places the bit string result into the destination bit string D. In boolean algebra, the equation reads:
D=S AND Mask
A more complex mask operation is the deposit function. In a deposit function, the bits of the destination string D are preserved in the areas masked in the source string S. In boolean algebra, the equation reads:
D=(S AND Mask) OR (D AND xcx9cMask)
Mask bit strings usually follow predictable patterns. First, the logical ONEs of the mask are typically grouped together. Second, the mask is typically right justified or left justified. Below are examples of 16 bit mask strings.
00000011 11111111 example one
11111111 11000000 example two
Due to their predictable patterns, mask bit strings can be defined in fewer bits than their full length. Defining the mask in fewer bits allows instructions sets to save space within the masking instruction. The cost of saved space, however, is that an additional decoding step is required to generate the mask.
The method of and apparatus for extracting a string of bits from a binary bit string and depositing a string of bits onto a binary bit string of the present invention is an improved implementation of deposit and extract instructions wherein the instruction contains an opcode, a source address, a destination address, a shift number, and a K-bit mask string. The opcode describes the operations to be performed upon a J-bit source string and an N-bit destination string. The source address points to the register in the CPU or the location of the J-bit source string. The destination address points to the register in the CPU or the location of the N-bit destination string. The shift number indicates the number of bits the J-bit source string will be shifted to generate a shifted bit string. The combination of the shifted bit string with the N-bit destination string is conducted under the control of the K-bit mask string. The method of and apparatus for extracting a string of bits from a binary bit string and depositing a string of bits onto a binary bit string of the present invention is particularly useful for high speed digital data processing, such as that required by IEEE 1394-1995 compliant devices.
An instruction includes an opcode, a source address, a destination address, a shift number, and a mask bit string. The opcode describes the operations to be performed upon a particular source bit string and destination bit string. The operations include an extract left instruction, an extract right instruction, a deposit left instruction, and a deposit right instruction. The source address points to the register in the CPU or the location of the source bit string. The destination address points to the register in the CPU or the location of the destination bit string. The shift number indicates the number of bits the source bit string is to be shifted to generate a shifted bit string. The direction of shift is dictated by the shift value or the opcode. The combination of the shifted bit string with the destination bit string is conducted under the control of a mask bit string. The more specific implementations of the present invention are the extract and deposit instructions.
The deposit instruction also begins with an instruction comprising an opcode, a source address, a destination address, a shift value, and a K-bit mask bit string. The CPU first reads a J-bit source string located at the source address and an N-bit destination string located at the destination address. The CPU shifts the J-bit source string as determined by the shift number and the opcode to obtain a shifted bit string. The CPU then combines the shifted bit string and the N-bit destination string under control of the K-bit mask string to obtain an N-bit final string, such that: (i) individual bits of the shifted bit string are included in the N-bit final string where the corresponding individual bits of the K-bit mask string have a value equal to logical ONE; and individual bits of the N-bit destination string are included in the N-bit final string where the corresponding individual bits of the K-bit mask string have a value equal to logical ZERO. In a final step, the CPU writes the N-bit final string to the destination address.
There are three additional implementations of the deposit instruction. In the first additional implementation, the numeric values of J, K, and N are equal. In the second additional implementation, the combination step is performed by the following steps: (i) performing a logical AND operation of the shifted bit string and the K-bit mask string to obtain a first bit string; (ii) performing a logical AND operation of the N-bit destination string and the logical complement of the K-bit mask string to obtain a second bit string; and (iii) performing a logical OR operation of the first bit string and the second bit string to obtain the N-bit final string. In the third additional implementation, the processing steps are performed by an embedded stream processor and the registers within the embedded stream processor contain the source address and the destination address.
Like the deposit instruction, the extract instruction begins with an instruction comprising an opcode, a source address, a destination address, a shift number, and a K-bit mask string. The CPU or equivalent means first reads a J-bit source string located at the source address. The CPU shifts the J-bit source string as determined by the shift number and the opcode to obtain a shifted bit string. The CPU then combines the shifted bit string and the K-bit mask string to obtain an N-bit final string, such that: (i) individual bits of the shifted bit string are included in the N-bit final string where the corresponding individual bits of the K-bit mask string have a value of logical ONE; and (ii) remaining individual bits of the N-bit final string have a value of logical ZERO. In a final step, the CPU writes the N-bit final string to the destination address.
There are three additional implementations of the extract instruction. In the first additional implementation, the numeric values of J, K, and N are equal. In the second additional implementation, the combination step is accomplished by performing a bitwise logical AND operation with the shifted bit string and the K-bit mask string. In the third additional implementation, the processing steps are performed by an embedded stream processor and the registers within the embedded stream processor contain the source address and the destination address.