6.9__repeat__
The __repeat__ keyword indicates that a block of code should be repeated; its usage is:
__repeat__ ( [ varname ] ; count ) { block }
Here count must be an integer constant expression and the optional varname must be a scalar variable name. Each instance of varname in block is replaced by a current block number between 0 and count - 1 in the expanded code.
__repeat__ may be used in any Steam code, including in kernels. It is particularly useful for coding manually unrolled loops within kernel code.
6.10#pragma pipeline
Software pipelining (SWP) is a VLIW instruction scheduling technique in which a single iteration of a pipelined loop may execute operations from several different iterations of the original loop. Software pipelining can improve the efficiency of scheduled code.
The pipeline pragma instructs the VLIW scheduler to attempt to apply software pipelining to an inner loop; it should not be used on non-inner loops. The user should insert the pragma after the opening brace of an inner loop, as follows:
for (i = 0; i < count; i++) {
#pragma pipeline
...
}
Software pipelining degrades gracefully: if the scheduler cannot apply software pipelining to the loop, it simply schedules it without pipelining. The use of software pipelining can result in a substantial increase in the amount of time required for spc to compile a Stream program.
6.11#pragma local_array_size
By default, spc allocates 256 words (1 Kbyte) per lane of LRF to hold the local arrays for a kernel. Pragma local_array_size preceding a kernel declaration changes the default value for the kernel. Because spc allocates LRF on a per-pipeline basis, the local_array_size pragma must visible during compilation of the Stream pipeline; if kernels and pipelines are compiled from separate sources, it could be in the header that declares the kernel.
For example:
In foo.h:
#pragma local_array_size(k, 1000 * sizeof(int)); // allocate 4Kb per lane for k
extern void kernel k( ... );
In foo.sc:
void kernel k( ... ) {
vec int x[1000];
...
}
7Demo Application spm_demo
This chapter uses a concrete programming example to illustrate the basic concepts of Stream programming. Directory demos/spm_demo of the Stream distribution contains source code for the demo example. Code fragments in this chapter may differ from the distribution source.
The demo application removes a background color (“green screen”, though the background color need not be green) from an image. It performs the following steps:
-
Read a bitmap file (.bmp) containing an image.
-
Find the background color of the image:
-
Subdivide the image into blocks.
-
Compute the average color of each block.
-
Find the most common average block color; this is the background color.
-
Replace the background color with a different color.
-
Write a bitmap file (.bmp) containing an image.
The Stream programming model Component API allows the programmer to define components representing modular pieces of the program. Structuring a program to use the Component API encourages abstraction, modularity and encapsulation, as well as allowing the use of vendor-provided application libraries to perform standard tasks. The component version of spm_demo defines three components, corresponding in obvious fashion to the steps listed above:
-
File input component file_in reads a bitmap (.bmp) input file containing an image and produces an output buffer containing image data.
-
Green screen removal component gsr takes an image data buffer as input, performs green screen removal, and produces an output buffer containing modified image data.
-
File output component file_out reads an image data input buffer and writes bitmap (.bmp) file output.
Alternatively, spm_demo could define four components instead of three, separating background color detection and background color replacement into separate components.
This chapter describes the spm_demo code in some detail, with an emphasis on its use of stream processor resources and the coding of its components. The following chapters describe how to build and run the demo application from the command line and under the Stream integrated development environment spide.
7.1Testbench main
For program development purposes, it is often helpful to separate the essential work of a program from the stream programming model component framework. This allows you to build a functional version of a program that runs on a host processor and then a version that runs purely on DSP MIPS (either in simulation or on a hardware device) before you build the full component-based application that runs on System MIPS and DSP MIPS. The spm_demo source is structured accordingly.
Source file testbench/spimain.c defines a simple spi_main for a testbench version of the demo program. The testbench version of spm_demo does not use components. Instead, its spi_main function calls functions directly to perform the essential work of the program: read the input file, do the green screen removal, and write the output file. With error checking elided, it just consists of the following steps:
spi_buffer_t buffer;
...
buffer = read_bmp_file(argv[1]); // read from .bmp input file into buffer
buffer = gsr_pipeline(buffer); // process input buffer, return output buffer
write_bmp_file(argv[2], buffer); // write from buffer to .bmp output file
Source file file_io.c defines the functions read_bmp_file and write_bmp_file that read and write bitmap files. Stream source file gsr_pipeline.sc defines the function gsr_pipeline that performs the green screen removal. These functions perform all the work for the testbench version of spm_demo.
You might expect these functions to manipulate data in memory (an array). Instead, they use a Stream programming model buffer (type spi_buffer_t). The green screen removal Stream code uses kernels that perform data-parallel computations efficiently on the DPU, and using a buffer allows the Stream runtime to handle DSP MIPS cache coherency and DSP MIPS / DPU synchronization issues without requiring explicit user code. In the non-testbench version of spm_demo, the file input and file output components run on System MIPS while the gsr component runs on DSP MIPS, so passing data between them using memory allocated directly (e.g., statically allocated or allocated using malloc) would not work.
Once you have debugged the basic functionality of a program, you can create an application that runs on the target device. For spm_demo, you can build the green screen removal component as a program that runs on DSP MIPS and uses the power of the DPU. You also build a System MIPS application that contains the file input component, the file output component, and the application main from components/main.c, as described in subsequent sections of this chapter. The functions described above perform the critical work of each component, greatly simplifying the port from debugged testbench version to complete component-based application running on stream processor device hardware.
Share with your friends: |