Kernel functions can use most C control flow constructs. In general, a conditional control flow statement must use a scalar control expression to ensure that all lanes follow the same execution path; this is a limitation of SIMD machine architecture.
-
if (<scalar_expression>) { ... } is converted to a simple branch with the same control flow in every lane; vector control expressions are not allowed, as control flow must be the same for every lane. If the given block executes only sequential read (spi_read) or sequential write (spi_write) operations, spc generates special code to execute the operations without a branch. This allows the use of code such as:
if (<scalar_expression>) { spi_write(s, vi); }
within a software pipelined inner loop.
-
Looping constructs must only use scalar control expressions:
int i;
vec int v_i, d[10];
...
for (i = 0; i < 10; ++i) spi_read(in_str, d[i]); // Legal
while (v_i > 0) { ... } // Illegal - vector expression
-
A switch statement may only have a scalar expression as the switch value.
int8x4 value;
vec int16x2 data;
switch (value)
{
case 0:
spi_read(in_str, data);
break;
case 1:
if (data > 12) data = data + 14;
break;
default:
break;
}
-
goto, break, continue and return are supported, provided they exist outside of any if-statement using a vector expression. return with a value is allowed only within an inline kernel
if (i > 10) return; // Legal
if (v_i != 16) v_i += 16;
else return; // Illegal - if with vector expression
6.7Stream access functions
Kernel functions use Kernel API stream access functions to access stream data. Stream processor hardware supports three different types of stream access from kernel functions: sequential, conditional and array. Sequential access is the most efficient access method, conditional access permits a kernel function to read or write data only to or from selected lanes, and array access permits random stream access.
-
spi_array_read Read data from an array stream
-
spi_array_write Write data to an array stream
-
spi_cond_read Read data from a conditional stream
-
spi_cond_write Write data to a conditional stream
-
spi_eos Check for end of stream
-
spi_read Read data from a sequential stream
-
spi_write Write data to a sequential stream
To use stream data inside a kernel function, you must pass the stream as a parameter and use stream access functions: you cannot access a data buffer directly. This allows for very high performance execution of kernel functions, in keeping with the architecture of the DPU.
Stream access functions read or write data records. The number of records that can be read from an input stream is determined either from the length of a substream attribute in the kernel function call or from the count of the stream (that is, the number of records written by spi_load_* or by a previous call to a kernel function that used the stream as an output).
The kernel function declaration specifies the type and direction of each stream parameter. There are limitations on which combination of stream access functions can be used within a single kernel function. The allowed combinations of stream access functions are shown in the table below.
Stream Type
|
Modifier
|
Stream access Functions
|
|
|
spi_read
|
spi_write
|
spi_cond_read
|
spi_cond_write
|
spi_array_read
|
spi_array_write
|
spi_eos
|
Input sequential
|
in
seq_in
|
|
|
|
|
|
|
|
Output sequential
|
out
seq_out
|
|
|
|
|
|
|
|
Input conditional
|
cond_in
|
|
|
|
|
|
|
|
Output conditional
|
cond_out
|
|
|
|
|
|
|
|
Input array
|
array_in
|
|
|
|
|
|
|
|
Output array
|
array_out
|
|
|
|
|
|
|
|
I/O array
|
array_io
|
|
|
|
|
|
|
|
6.7.1Sequential streams
Sequential streams have the fastest memory performance. spi_read and spi_write read and write data to and from all lanes in a sequential manner. Reading beyond the end of a stream returns zero.
On SP16, three calls to spi_read would read 48 records from the LRF, 16 at a time. The records are striped across the lanes:
|
Lane 0
|
Lane 1
|
...
|
Lane 14
|
Lane 15
|
first spi_read call
|
record 0
|
record 1
|
...
|
record 14
|
record 15
|
second spi_read call
|
record 16
|
record 17
|
...
|
record 30
|
record 31
|
third spi_read call
|
record 32
|
record 33
|
...
|
record 46
|
record 47
|
It is possible conserve space in the LRF by both reading and writing to the same sequential stream in a kernel function. To do this, pass the same stream to the kernel function twice, as both an input stream and an output stream. It is the programmer’s responsibility to make sure that the number of reads exceeds the number of writes at any time, otherwise input data may be overwritten, resulting in undefined behavior.
6.7.2Conditional streams
spi_cond_read reads conditional input stream data into a subset of the lanes, based on the value in each lane of a vector flag variable. Similarly, spi_cond_write writes conditional output stream data from a subset of the lanes, based on the value in each lane of a vector flag variable. As with sequential streams, reading beyond the end of a stream returns zero.
Due to the SIMD structure of the DPU, spi_cond_read overwrites the value of the destination variable in all lanes, regardless of the value of the conditional flag variable in the lane. If the conditional read flag is false for a lane, then the value will be a repeat of the last record read from the stream by the conditional read; if no data has been read, then the value will be zero. It is the programmer’s responsibility to ignore data returned by spi_cond_read in lanes where the read flag is false.
On SP8, three calls to spi_cond_read load 0 to 24 records, depending on the condition flags. For example:
|
Lane 0
|
Lane 1
|
Lane 2
|
Lane 3
|
Lane 4
|
Lane 5
|
Lane 6
|
Lane 7
|
read flag
|
true
|
true
|
false
|
true
|
false
|
false
|
true
|
false
|
first spi_cond_read call
|
r0
|
r1
|
r1
|
r2
|
r2
|
r2
|
r3
|
r3
|
|
|
|
|
|
|
|
|
|
read flag
|
false
|
true
|
true
|
false
|
false
|
true
|
false
|
true
|
second spi_cond_read call
|
r3
|
r4
|
r5
|
r5
|
r5
|
r6
|
r6
|
r7
|
|
|
|
|
|
|
|
|
|
read flag
|
true
|
true
|
true
|
false
|
true
|
true
|
true
|
false
|
third spi_cond_read call
|
r8
|
r9
|
r10
|
r10
|
r11
|
r12
|
r13
|
r13
|
It is possible to conserve space in the LRF by using the same conditional stream as both an input argument and an output argument in a kernel function. It is the programmer’s responsibility to make sure that the number of reads exceeds the writes at any time or input data may be overwritten; otherwise, undefined behavior will result.
6.7.3Array streams
Array streams have the slowest memory performance. spi_array_read and spi_array_write read and write data to and from all lanes in a random access manner. Stream data can be reread as many times as desired. Note that even though the stream is accessed in an arbitrary manner, multiple values are still read sequentially from the stream into each lane for each call to spi_array_read.
|
Lane 0
|
Lane 1
|
...
|
Lane 14
|
Lane 15
|
spi_array_read(str, dest, 0)
|
record 0
|
record 1
|
...
|
record 14
|
record 15
|
spi_array_read(str, dest, 1)
|
record 16
|
record 17
|
...
|
record 30
|
record 31
|
spi_array_read(str, dest, 2)
|
record 32
|
record 33
|
...
|
record 46
|
record 47
|
Reading or writing beyond the end of the stream results in undefined behavior.
Share with your friends: |