Instruction sets used in FR-V series processors assign certain instruction codes for custom instructions. This allows users to define custom instructions. Using these instruction codes, four types of instructions, CONFIGLOAD, EXEC, LUT, and RSRMOD, are defined.
CONFIGLOAD loads configuration information to the configuration memory. When the instruction translator incorporated in I-unit detects that the next instruction from the instruction fetch block is CONFIGLOAD, it converts the instruction into four double-word load instructions (8-byte load instructions) and issues them to the execution block. The data that is transferred from the cache is not written to the register file, but to the configuration memory. The loading of configuration information to the configuration memory is thus accomplished.
EXEC executes custom instructions. To ensure that there is no restriction on the types of custom instructions that can be defined, custom instructions are not assigned operation codes in fixed combinations. The operation code area includes a field in which configuration memory. entry can be specified. EXEC executes the custom instruction that is stored in the entry specified by that field. This means that, if two custom instructions are represented by the same operation code, but their configuration memory. entries hold different information, they will be executed as different custom instructions.
The LUT instruction specifies that information from the configuration memory be not used to define R-pipe configuration, but used as table memory. The 2n (n = 1, 2, 3) bit in the data stored in the GR specified by the operation code is replaced with the 2n bit included in the 256 bits of configuration information.
RSRMOD operates on special-purpose registers (SPRs) RSR0 and RSR1 which have been newly provided for R-unit.
If different custom instructions are defined although they perform nearly the same processing and differ only in certain quantities, such as shift extents, some configuration memory. area would be wasted. To avoid this waste, the following arrangement is used. Parameter information, denoted by sel and pos, is obtained from special-purpose registers. As a consequence, when the same custom instruction is passed one piece of parameter information at one time and another piece of information at another time, it will be executed as different instructions at the two times. The use of the configuration memory is thus made more efficient.
The sel and pos fields of the RSR register can be set with values by the RSRMOD instruction. However, an automatic updating function is also provided and can be used when the values of these fields need to be updated at regular intervals. This function eliminates the need for setting values each time using the RSRMOD instruction.
2.3 Structure of the reconfigurable unit
Figure 39 shows the R-pipe circuit. The bold arrows in the Figure indicate configuration information. The permutator shown at right top is a block that permutes the 32 bits of input data in a specified way. The pattern generator shown at centre top generates mask data. Actually, it can output three types of mask data based on the sel signal. The LUT selector is a circuit that extracts data when the LUT instruction uses the configuration information as a table. The other R-pipe components include shifters, multiplexers, AND masks, ALUs, among others. The internal structure of these component circuits and the connective relationship among the component circuits are defined by the configuration information. One configuration is defined by 256 bits.
figure 39
I nternal structure of R-pipe
Custom instruction definition examples will be provided later.
3 Performance improvement of DES application
The performance improvement was evaluated when an entire application is realized with this processor. The applications used were DES and Triple DES, which are widely used encryption algorithms.
DES is a block encryption algorithm, which encrypts 64 bits of input data and produces 64 bits of encrypted data. The encrypting procedure begins with initial permutation (IP) on the 64 bits of input data. The input bits are permuted. The next step is called the F function. It is repeated 16 times. Finally, inverse permutation IP–1 is performed to permute the bits again. Encryption is now complete.
Triple DES is an encryption algorithm in which DES is performed three times.
3.1 Initial permutation (IP)
This section explains how to implement the custom instruction that accomplishes initial permutation.
Figure 40
How bits are exchanged by the initial permutation (IP) step
Initial permutation needs to achieve the bit exchange as shown in Fig. 40. The 64 bits of input data are stored in GR1 and GR2. Encrypted data is supposed to be stored in GR3 and GR4. The bit exchange appears random. Actually, however, all of the four 16-bit chunks follow the same exchange pattern. Notice the right 16-bit chunk on GR3. The set of each bit in this chunk that is 32 bit shifted is the left 16-bit chunk on GR3. Similarly, the set of each bit in the right 16-bit chunk on GR3 that is 1-bit shifted is the right 16-bit chunk on GR4. The left 16-bit chunk on GR4 is the set of each bit in the right 16-bit chunk on GR4 that is 1-bit shifted.
Figure 41
Permutator design for the initial permutation (IP) custom instruction
Therefore if the permutator in R-pipe is designed to achieve the bit exchange as shown in Fig. 41, one IP custom instruction can be used for any initial permutation provided that it is combined with an appropriate shift instruction.
f igure 42
R-pipe configuration for the initial permutation (IP) custom instruction
More specifically, R-pipe is configured as shown in Fig. 42. The input from the first operand (rs1) is 4-bit right shifted by the right shifter, and then ANDed with 0xf0f0f0f0. The input from the second operand (rs2) is fed to the permutator, which performs bit permutation as shown in Fig. 41, and then ANDed with 0x0f0f0f0f. The OR of the two AND gate outputs is output as the operation result.
The following six steps (instructions) are performed to achieve initial permutation using a custom instruction that complies with the above specification.
1) Specify GR0 (zero register with all bits set to 0) for rs1 and GR1 for rs2, execute the custom instruction, and store the result in GR3.
2) Specify GR3 for rs1 and GR2 for rs2, execute the custom instruction, and store the result in GR3.
The above two steps store the desired data in GR3.
3) Shift GR1 to the right by 1 bit using a shift instruction, which is a general integer instruction and store the result in GR1.
4) Similarly, shift GR2 to the right by 1 bit and store the result in GR2.
5) Specify GR0 for rs1 and GR1 for rs2, execute the custom instruction, and store the result in GR4.
6) Specify GR3 for rs1 and GR2 for rs2, execute the custom instruction, and store the result in GR4.
The above four steps store the desired data in GR4.
As described above, six steps of executing one custom instruction can accomplish the desired initial permutation.
If an attempt was made to perform the same processing without using a custom instruction, the execution of dozens of instructions would be required because individual bits need to be manipulated for bit permutation.
Share with your friends: |