The Landscape of Seven Questions and Seven Dwarfs for Parallel Computing Research: a view from Berkeley



Download 232.56 Kb.
Page4/13
Date28.01.2017
Size232.56 Kb.
#8845
1   2   3   4   5   6   7   8   9   ...   13

3.2 Communication and Dwarfs


We define a dwarf as a numerical method that captures patterns of computation and communication. Whereas Figure 3 describes the computation, Figure 4 describes the persistent patterns of point-to-point communication on parallel computers that are closely related to underlying computational methods or dwarfs. A number of different application studies have analyzed the communication requirements of wide array of scientific applications and associated communication requirements [Vetter and McCracken 2001] [Vetter and Yoo 2002] [Vetter and Meuller 2002] [Kamil et al 2005] [Shalf et al 2005].

Our observations regarding the application data are as follows:



  • Each dwarf demonstrates very consistent communication patterns when comparing different application codes that involve that same dwarf.

  • Point-to-point communication patterns typically underutilize the capabilities of a fully connected network, such as a fat-tree, CLOS, or crossbar, but do not generally map isomorphically to a fixed lower-degree network, such as a mesh, torus, or hypercube.

  • Point-to-point communication is generally bandwidth bound whereas collectives––barriers, reductions, broadcasts––are typically strongly latency bound given their small payloads.

  • The 3D FFT, the last item in the last row of Figure 4, requires a fully connected communication topology if observed over the entire execution time of the code. However, the actual pattern of communication takes place in Log2 (number of processors) discrete phases that can be reordered in myriad ways to reduce contention on a more constrained interconnect topology [Gygi et al 2005]. The selection of an optimal schedule may benefit from an “autotuning” approach (see Section 5.1).

3.3 The Next N Dwarfs


The dwarfs present a method for capturing the common requirements of classes of applications while being reasonably divorced from individual implementations. Although the nomenclature of the dwarfs comes from Phil Colella’s discussion of scientific computing applications, we were interested in applying dwarfs to a broader array of computational methods. This led us naturally to the following questions:

  • How well do the Seven Dwarfs of high performance computing capture computation and communication patterns for a broader range of applications?

  • What dwarfs need to be added to cover the missing important areas beyond high performance computing?

If we find an expanded set of dwarfs is broadly applicable, we can use them to guide innovation and evaluation of new prototypes. As long as the final list contains no more than two- or three-dozen dwarfs, architects and programming model designers can use them to measure success. For comparison, SPEC2006 has 29 benchmarks and EEMBC has 41. Ideally, we would like good performance across the set of dwarfs to indicate that new manycore architectures and programming models will perform well on applications of the future.
We do not claim that there have always been seven dwarfs, nor do we claim that there will always be just seven dwarfs.Dwarfs are specified at a high-level of abstraction that can group related but quite different computational methods. As inquiry into the application kernels proceeds there is an evolution that can eventually make a single dwarf cover such a disparate variety of methods that it should be viewed as multiple distinct dwarfs. As long as we don’t end up with too many dwarfs, it seems wiser to err on the side of embracing new dwarf.s.
For example, consider the problem of a PDE stencil on a uniform grid. The indirection initially might lead one to conceive of this problem as a pentadiagonal or heptadiagonal sparse matrix, and thus lump it together with other sparse matrix problems. However, this ignores the indexing patterns, the regular parallelism, and the overall conceptualization of this and similar problems. Thus, it was better to split the sparse matrix dwarf by creating a structured grid dwarf that more closely matched the behavior of these problems. A similar case can be made for unstructured grids. The most straightforward implementation of could be interpreted ing them aas a sparse matrixce problem, but this woulds both limits the problem to a single level of indirection and disregards too much additional information about the problem. Another example Lwe encountered was thatooking forward DSP , filtering kernels (e.g. FIR or IIR) might could be conceived of as a dense matrix problems, but once again, this eliminates so much knowledge about the problem that it is better to create a new dwarf for them rather than just conceptualizing them as dense or banded matrix operations. .

To investigate the general applicability of the seven dwarfs, we compared the list against other collections of benchmarks: EEMBC from embedded computing, SPEC2006 from desktop and server computing, and an Intel study on future applications (see Section 3.5). All these collections were independent of our study, so they act as validation for whether our small set of computational kernels are good targets for applications of the future.


Figure 5 shows four more dwarfs that were added as a result. Note that we consider the algorithms independent of the data sizes and types (see Section 5.2). In addition, many larger programs will use several dwarfs in different parts of the program so they appear multiple times in the figure (see Section 3.4).
More dwarfs may need to be added to the list. Indeed, we are still investigating machine learning to see whether to expand the list. Nevertheless, we were surprised that we only needed to add four dwarfs to cover so many types of programs.
While 10 of the 11 dwarfs possess some form ofseem parallelism, finite state machines (FSMs) look to be a challenge. Perhaps FSM will prove to be embarrassingly sequential just as Monte Carlo is embarrassingly parallel. If it is still important and does not yield to innovation in parallelism, that will be disappointing, but perhaps the right long-term solution is to change programs. In the era of multicore and manycore, it may be that the popular algorithms from the sequential computing era will fade in popularity. For example, if Huffman decoding proves to be embarrassingly sequential, perhaps we should use a different compression algorithm that is parallel.
In any case, the point of this section is not to identify the low hanging fruit that is highly parallel. The point is to identify the kernels that are the core computation and communication for important applications in the upcoming decade, independent of the amount of parallelism. To develop programming systems and architectures that will run applications of the future as efficiently as possible, we need to learn the limitations as well as the opportunities.


Dwarf

Description

8. Finite State Machine


A finite state machine is a system whose behavior is defined by states, transitions defined by inputs and the current state, and events associated with transitions or states.

9. Graph traversal (e.g., Quicksort, BLAST)

Applies an ordering to a group of objects, or identify certain specific items in a larger group. These applications typically involve many levels of indirection, and a relatively small amount of computation.

10. Combinational Logic (e.g., encryption)

Functions that are implemented with logical functions and stored state.

11. Filter
(e.g.,IIR or FIR)

Filters can generally have either an infinite or finite impulse response. An infinite impulse response filter has an input response that never decays to zero, and therefore requires analysis of all previous inputs to determine the output. The output of a finite impulse response, or FIR, filter only depends on a fixed number of previous input values.

Figure 5. Extensions to the original seven dwarfs.


Download 232.56 Kb.

Share with your friends:
1   2   3   4   5   6   7   8   9   ...   13




The database is protected by copyright ©ininet.org 2024
send message

    Main page