Comparing the Performance and Memory Properties of Vector Interleaving and Loop Fusion

Download 12.23 Kb.

Date	09.06.2018
Size	12.23 Kb.
	#53776

Comparing the Performance and Memory Properties

of Vector Interleaving and Loop Fusion
Pavel Zelinsky, Elizabeth Jessup, Ian Karlin, Erik Silkensen
Department of Computer Science, University of Colorado at Boulder

{pavel.zelinsky, Jessup, ian.karlin, erik.silkensen}@colorado.edu

Geoffrey Belter, Jeremy Siek

Department of Electrical, Computer, and Energy Engineering, University of Colorado at Boulder

{geoffrey.belter, jeremy.siek}@colorado.edu

Memory bandwidth limits the performance of many scientific applications. The problem of interest is the inefficiency of successive calls to the Basic Linear Algebra Subprograms (BLAS). To solve this problem, we developed a compiler that optimizes linear algebra kernels using loop fusion. In this poster, we compare and contrast loop fusion with vector interleaving.

As shown on the right, loop fusion is an optimization method which combines multiple loops of calculations that access the same data into one. Vector interleaving is an optimization method which combines multiple vectors into a single multivector to improve locality of reference.
Both optimization methods produce speedups through memory reuse, and, although vector interleaving introduces an overhead caused by creating the additional interleaved vector, it can improve cache use more than loop fusion can by further reducing conflict misses. Vector interleaving also decreases the number of lines read in from memory for sparse matrix operations by improving locality. We use hardware counters and timing experiments to determine when vector interleaving is more efficient than loop fusion and why. We discuss these results in relation to our compiler and determine whether vector interleaving should be included for additional optimization.

Download 12.23 Kb.

Share with your friends: