The Landscape of Seven Questions and Seven Dwarfs for Parallel Computing Research: a view from Berkeley



Download 232.56 Kb.
Page5/13
Date28.01.2017
Size232.56 Kb.
#8845
1   2   3   4   5   6   7   8   9   ...   13

3.4 Composition of the 7+ Dwarfs


Any significant application, such as an MPEG4 decoder or an IP forwarder, will contain multiple dwarfs that each represents a significant percentage of the application's computation. When deciding on a target architecture, each of the dwarf's suitability to the target should be considered. Just as important is the consideration of how they will be composed together on the platform. Therefore, designers should understand the options available for implementation.
Analogous to the usage models of reconfigurable fabric [Schaumont et al 2001], dwarfs can be composed on a multiprocessor platform in three different ways:

    1. Temporally distributed or time-shared on a common processor.

    2. Spatially distributed with each dwarf uniquely occupying one or more processors.

    3. Pipelined: a single dwarf is distributed in both space and time over a group of processors. In a given time slice, a dwarf computation is running on a group of processors. On a given processor, a group of dwarf computations run over time.

This naturally leads to two software issues:

  1. The choice of composition model--how the dwarfs are put together to form a complete application. The scientific software community has recently begun the move to component models [Bernholdt et al. 2002]. In these models, however, individual modules are not very tightly coupled together and this may affect the efficiency of the final application.

  2. Data structure translation. Various algorithms may have their own preferred data structures (recursive data layouts for dense matrices, for example). This may be at odds with the efficiency of composition as working sets may have to be translated before use by other dwarfs.

3.5 Intel Study


Intel believes that the increase in demand for computing will come from processing the massive amounts of information that will be available in the “Era of Tera”. [Dubey 2005] Intel classifies the computation into three fundamental types: recognition, mining, and synthesis, abbreviated as RMS. Recognition is a form of machine learning, where computers examine data and construct mathematical models of that data. Once the computers construct the models, Mining searches the web to find instances of that model. Synthesis refers to the creation of new models, such as in graphics. The common computing theme of RMS is “multimodal recognition and synthesis over large and complex data sets.” [Dubey 2005] Intel believes RMS will find important applications in medicine, investment, business, gaming, and in the home. Intel’s efforts in Figure 6 show that Berkeley is not alone in trying to organize the new frontier of computation to underlying computation kernels in order to better guide architectural research.


Figure 6 Intel’s RMS and how it maps down to functions that are more primitive. Of the five categories at the top of the figure, Computer Vision is classified as Recognition, Data Mining is Mining, and Rendering, Physical Simulation, and Financial Analytics are Synthesis. [Chen 2006]

3.6 Dwarfs Summary


Figure 7 shows the presence of the 7+ dwarfs in a diverse set of application benchmarks including EEMBC, SPEC2006, and RMS. As mentioned above, several of the programs use multiple dwarfs, and so they are listed in multiple categories.



Dwarf

EEMBC Kernels

SPEC2006

RMS

Machine Learning

1. Structured Grids

Automotive: FIR, IIR; Consumer: HP Gray-Scale; Consumer: JPEG; Digital Entertainment: MP3 Decode, MPEG-2 Decode, MPEG-2 Encode, MPEG-4 Decode; MPEG-4 Encode; Office Automation: Dithering; Telecom:

Autocorrelation



Fl. Pt.: Quantum chromodynamics (milc),magneto hydrodynamics (zeusmp), general relativity (cactusADM), fluid dynamics (leslie3d-AMR; lbm), finite element methods (dealII-AMR; calculix), Maxwell's E&M eqns solver (GemsFDTD), quantum crystallography (tonto), weather modeling (wrf2-AMR)

PDE: CFD

PDE: Cloth




2. Unstructured Grids







PDE: Face




3. Spectral Methods

Automotive: FFT, iFFT, iDCT; Consumer: JPEG; Entertainment: MP3 Decode




NLP, Media Synthesis, Body Tracking




4. Dense Linear Algebra

Automotive: iDCT, FIR, Matrix Arith; Consumer: JPEG, RGB to CMYK, RGB to YIQ; Digital Entertainment: RSA MP3 Decode, MPEG-2 Decode, MPEG-2 Encode, MPEG-4 Decode; MPEG-4 Encode; Networking: IP Packet; Office Automation: Image Rotation; Telecom:

Convolution Encode



Integer: Quantum computer simulation (libquantum), video compression (h264avc)

Fl. Pl.: Hidden Markov models (sphinx3)


Linear prog., K-means, SMVVM, QP, PDE: Face, PDE: Cloth*

SVM, PCA, ICA

5. Sparse Linear Algebra

Automotive: Basic Int + FP, Bit Manip, CAN Remote Data, Table Lookup, Tooth to Spark; Telecom: Bit Allocation;

Fl. Pt.: Fluid dynamics (bwaves), quantum chemistry (gamess; tonto), linear program solver (soplex)


SMVVM, QP, PDE: Face, PDE: Cloth*

PDE: CFD

SVM, PCA, ICA

6. Particle Methods




Fl. Pt.: Molecular dynamics (gromacs, 32-bit; namd, 64-bit)

Particle Filtering, Body Tracking




7. Monte Carlo




Fl. Pt.: Ray tracer (povray)

Particle Filtering, Option Pricing




8. Finite State Machine

Automotive: Angle To Time, Cache "Buster", CAN Remote Data, PWM, Road Speed, Tooth to Spark; Consumer: JPEG; Digital Entertainment: Huffman Decode, MP3 Decode, MPEG-2 Decode, MPEG-2 Encode, MPEG-4 Decode; MPEG-4 Encode; Networking: QoS, TCP; Office Automation: Text Processing; Telecom: Bit Allocation;

Integer: Text processing (perlbench), compression (bzip2), compiler (gcc), hidden Markov models (hmmer), video compression (h264avc), network discrete event simulation (omnetpp), 2D path finding library (astar), XML transformation (xalancbmk)

NLP




9. Graph

Traversal



Automotive: Pointer Chasing, Tooth to Spark; Networking: IP NAT, OSPF, Route Lookup; Office Automation: Text Processing; Telecom: Viterbi Decode

Integer: go (gobmk), chess (sjeng), network simplex algorithm (mcf)


Global Illumination

Hidden Markov Models, Bayesian Networks

10. Combinational Logic

Digital Entertainment: AES, DES ; Networking: IP Packet, IP NAT, Route Lookup; Office Automation: Image Rotation; Telecom: Convolution Encode, Viterbi Decode










11. Filter

Automotive: FIR, IIR Digital Entertainment: MP3 Decode, MPEG2 Decode, MPEG4 Decode




Body Tracking, Media Synthesis




Figure 7. Mapping of EEMBC, SPEC, and RMS to 7+ dwarfs. *Note that SVM, QP, PDE:Face, and PDE:Cloth may use either dense or sparse matrices, depending on the application.

Figure 7. Mapping of EEMBC, SPEC, and RMS to 7+ dwarfs. *Note that SMV, QP, PDE:Face, and PDE:Cloth may use either dense or sparse matrices, depending on the application.





Download 232.56 Kb.

Share with your friends:
1   2   3   4   5   6   7   8   9   ...   13




The database is protected by copyright ©ininet.org 2024
send message

    Main page