CS267: Applications of Parallel Computers

Assignment 0

Due 9/5/2002

Build a webpage describing an application of parallel computing that interests you, and survey papers or projects that explore this application. The hope is to give you ideas for your final project and to uncover some of the challenges of parallel computing.

 

Mike Barad

Parallel Computing in Environmental Fluid Dynamics

1. Describe the application and provide references / links.

My field of interest is the numerical study of environmental transport in fluid systems. My case study is the numerical simulation of San Francisco Bay, which requires the use of innovative high-performance computational techniques. While coarse meshes can be used for code development, fine meshes and adaptive mesh refinement (AMR) are required to simultaneously capture scales ranging from hundreds of kilometers down to centimeters (e.g. the entire San Francisco Bay down to the dynamic wavy air/water interface). Accurately capturing wind induced waves and wave-current interaction is critical to simulating the complex water motions that exist in the San Francisco Bay. Not only are fine meshes and/or AMR required, but also the use of embedded boundaries and high order accurate interface tracking methods are essential to this work. Parallel high performance computing allows for high spatial and temporal resolution in areas of critical transport complexities (boundaries, fronts, etc.). Instead of a hydrostatic (or shallow water) approximation typically found in estuarine models, my methodology is based on solving the fully non-hydrostatic three-dimensional variable density Navier-Stokes equations. Including the non-hydrostatic terms is key to understanding complex, fully three-dimensional water motions such as flow over rapidly varying slopes, short waves, internal wave dynamics and baroclinic convergence zones. This would not be possible using serial computations.

Individually, research on 1) domain decomposition and AMR, 2) incompressible Navier-Stokes and 3) environmental modeling are relatively mature. Yet, the combined approach of using AMR to solve the incompressible Navier-Stokes equations for environmental flows on parallel computers is still uncharted terrain. The following models are currently the best parallel oceanography codes.

None of these codes solve the incompressible Navier-Stokes equations with AMR on embedded boundaries to 2nd order accuracy. Most of these codes use low accuracy boundary treatment instead of high accuracy embedded boundaries. The above codes typically implement a hydrostatic approximation instead of solving the full Navier-Stokes equations in 3D. And, more relevant to this class they typically use 1- or 2-D domain decomposition without AMR.

For the following questions I will focus on the POP code at NERSC.

2. Describe the platform where the application was run.

The POP code was run on the IBM SP RS/6000 at NERSC (Seaborg). Seaborg is a distributed memory machine with 2,944 compute processors. According to NERSC each processor has a peak performance of 1.5 GFlops.

3. Find peak and LINPACK performance for the platform and its rank on the TOP500 list.

The TOP500 list shows that Seaborg's current peak performance is 4,992 GFlops, while the LINPACK performance is 3,052 GFlops.

4. Find performance of your selected application.

Performance for the POP code is not listed on the website. The NERSC POP website does show speedup information. With a domain size of 680x320x40 the code has a speedup of 94 for 256 processors, or approximately 37% efficiency. For a domain size of 384x288x32 the code has a speedup of ~50 for 128 processors, or approximately 39% efficiency. For a smaller domain of 192x128x20 the code only achieves a speedup of ~6.5 for 128 processors, or approximately 5%. The weak speedup performance for the small domain is most likely due to an increased proportion of communication.

5. What ratio of sustained to peak performance is reported?

Since the sustained to peak performance is not reported for the POP code I will compare using LINPACK. Given a LINPACK score of 3,052 GFlops and a peak performance of 4,992 GFlops the efficiency is 61%. It is highly unlikely that the POP code performs as well as LINPACK.

6. Evaluate the project/papers: How did the application scale? What were the major difficulties in obtaining good performance? What tools and algorithms were used?

The POP code scales to 64 processors for a 192x128x20 domain, while it scales to at least 128 processors for a 384x288x32 domain. For a large domain of 680x320x40, the POP code scaled to 256 processors. Tests were also done using two types of domain decomposition: 1-D, and 2-D. The 2-D domain decomposition scaled better than the 1-D decomposition for a 192x128x20 domain. A major difficulty in obtaining good performance is the classic communications problem. I assume that the POP code must solve a communications expensive elliptic PDE for each timestep (as is the case for most incompressible CFD codes). The numerical solution of Elliptic PDE's on parallel platforms is a communication expensive task.

 

Other Interesting Links:

Parallelization of Structured, Hierarchical Adaptive Mesh Refinement Algorithms

Incompressible Navier-Stokes with Parallel AMR

Coastal Ocean Modeling of the U.S. West Coast with Multiblock Grid and Dual-Level Parallelism

Parallel Ocean Model Development at NERSC (also)

Here's a link to the CS267 class webpage

Here's a link to the rest of my web site