Comparing the Performance and Memory Properties
of Vector Interleaving and Loop Fusion
Pavel Zelinsky, Elizabeth Jessup, Ian Karlin, Erik Silkensen
Department of Computer Science, University of Colorado at Boulder
{pavel.zelinsky, Jessup, ian.karlin, erik.silkensen}@colorado.edu
Geoffrey Belter, Jeremy Siek
Department of Electrical, Computer, and Energy Engineering, University of Colorado at Boulder
{geoffrey.belter, jeremy.siek}@colorado.edu
Memory bandwidth limits the performance of many scientific applications. The problem of interest is the inefficiency of successive calls to the Basic Linear Algebra Subprograms (BLAS). To solve this problem, we developed a compiler that optimizes linear algebra kernels using loop fusion. In this poster, we compare and contrast loop fusion with vector interleaving.
As shown on the right, loop fusion is an optimization method which combines multiple loops of calculations that access the same data into one. Vector interleaving is an optimization method which combines multiple vectors into a single multivector to improve locality of reference.
Both optimization methods produce speedups through memory reuse, and, although vector interleaving introduces an overhead caused by creating the additional interleaved vector, it can improve cache use more than loop fusion can by further reducing conflict misses. Vector interleaving also decreases the number of lines read in from memory for sparse matrix operations by improving locality. We use hardware counters and timing experiments to determine when vector interleaving is more efficient than loop fusion and why. We discuss these results in relation to our compiler and determine whether vector interleaving should be included for additional optimization.
Share with your friends: |