## CoHMM Proxy Application Multi-scale proxy app to test and evaluate runtime systems.

An anticipated important use-case for next-generation supercomputing
is *multiscale* modeling, in which continuum equations for
large-scale material deformation are augmented with high-fidelity,
fine-scale simulations that provide constitutive data on demand. Such
multiscale modeling is the aim of heterogeneous multiscale (HMM),
Equation Free, and related methods
[Theodoropoulos (2000), Kevrekidis (2009), Brandt (2002), E (2003), E (2007)]. The image to the left shows HMM simulations of large-scale shock propagation dynamics in two-dimensions. The initial conditions model correspond to an impact at the center of a copper plate. The large localized stress quickly propagates throughout the whole the plate (green surface). The material deformation is visualized in the bottom plane.

**a**) Following a 10% tensile strain, the subsequent shock dynamics unfolds on macroscopic space and time scales. Red dots denote representative

*microscopic*regions of the system, each of which is modeled by $\sim \! 10^3$ coppers atoms in a defect-free fcc crystal. (

**b**) The heterogeneous multiscale method (HMM) integrates conservation laws to determine the dynamical evolution of the macroscopic fields, such as strain. Constitutive data is provided by molecular dynamics simulations. These fine-scale simulations constitute the overwhelming computational cost. To reduce the number of fine-scale simulations, traditional adaptive sampling may be applied: all fine-scale simulations are stored in a global database, and subsequent calculations can be replaced by database look-ups [Knap (2008), Barton (2008)]. We have implemented an adaptive sampling application based upon HMM, which uses CoMD for fine-scale simulations, and kriging to interpolate database entries [Roehm13]. This traditional adaptive sampling approach will encounter difficulties at the Exascale: it is impractical to globally synchronize a database across $10^9$ threads while retaining millisecond latencies. For this reason, we have developed an alternate

*spatial*approach to adaptive sampling, shown in panel (

**c**). We dynamically adapt the location of microscopic simulations to the regions of interest, and reconstruct the constitutive data by interpolating in real-space. Consequently, we avoid the requirement of a global database, and achieve near perfect parallelism [Rouet13].

#### The heterogeneous multiscale method (HMM)

For simplicity, we base our multiscale proxy apps upon HMM models of one and two dimensional elastodynamics. Beginning with fine-scale molecular dynamics (e.g. a CoMD simulation of a copper crystal) one derives a ``coarse grained'' representation consisting of conservation laws for mass, momentum, and energy densities [Li (2005)],

\begin{align} \partial_{t}\rho+\nabla_{\mathbf{x}}\cdot\mathbf{q} & =0\\ \partial_{t}\mathbf{q}+\nabla_{\mathbf{x}}\cdot\tau & =0\\ \partial_{t}e+\nabla_{\mathbf{x}}\cdot\mathbf{j} & =0 \end{align}These conservation laws, by themselves, are incomplete; we require
constitutive data in the form of momentum and energy flux fields
($\mathbf{\tau}$ and $\mathbf{j}$ respectively) as a function of the
macro-scale state. It is through this closure that the micro- and
macroscales are coupled. At each macro-scale integration step, HMM must
spawn CoMD simulations at a large number of representative microscopic
sample points (red dots in Fig. 2**b**). Each CoMD simulation
contains $\sim 10^3$ copper atoms arranged in a perfect face-centered
cubic (fcc) crystal. The macroscopic configuration provides a local
strain, which we apply using periodic boundary conditions that stretch
the fcc lattice. A short molecular dynamics simulation ($\sim
10^3$-$10^4$ time steps) is sufficient to estimate the required fluxes.
Once this fine-scale data is available, the conservation equations can
be integrated over a macro-scale time step. Each CoMD calculation can
be performed in $\sim 10^1$ seconds on a single node and, naively,
$\sim 10^4$ CoMD simulations are required per HMM time step. Because
CoMD simulation is the dominant cost of HMM, we have investigated two
adaptive sampling approaches to improve efficiency.

#### Traditional adaptive sampling with a kriging database

Many of the CoMD simulations spawned by CoHMM are similar and hence the number of computations can be minimize. We do this on 4 different levels, which can be seen in the animating figure below:

- CoMD Point (blue): Task mapping to remove CoMD tasks with identical input data
- CoMD database (turquoise) to reuse points, which have been computed at earlier times
- Kriging (red) to interpolate between point, which have similar input parameters, combined with task mapping (Kriging Point - orange) to remove duplicated Kriging calculations. Failed Kriging attempts will directly execute CoMD (yellow).
- Kriging database (green) to reuse previous calculations

Task mapping algorithm assigns a hash to every CoMD task as well as the Kriging task and checks for collisions to avoid identical executions. The map is then used to start both CoMD and Kriging tasks in parallel using either OpenMP or CNC.

For the database approach, we chose to use one of the most popular NoSQL databases to take advantage of the key-value store approach. More specifically, we chose the Redis database due to its demonstratively high performance and the ability to use its set data structures to create buckets of nearest neighbors. Besides the fact that Redis is currently the most popular key-value store, it provides the use of distributed nodes, which will be important for the large amount of connections on distributed systems (work in progress).

Kriging is a technique borrowed from geo-science. The use of the word “Kriging” in (geo-)statistics stands for “optimally predicting” or “optimal prediction”. The technique is also known as Gaussian process regression, Kolmogorov Wiener prediction, or Best Linear Unbiased Prediction (BLUP). The power of this method lies in its ability to tread not only scalars but also vectors located in a high dimensional space to predict the value of a vector located at a certain position in the high dimensional space by computing a weighted average of the known vectors in the neighborhood of the point. Using matrix computations it predicts not only the value but computes an error of the estimation at the same time, too. Originate in the spacial prediction on a two dimensional surface in geophysics it has recently become a useful tool for adaptive sampling.

#### Spatial adaptive sampling

To address anticipated challenges at the Exascale, we developed an
alternative, *spatial* approach to adaptive sampling [Rouet13].
This algorithm forms the basis of our CoHMM 2.0 proxy app. Spatial
adaptive sampling enables reduction of fine-scale
simulations *without* the need for a database. Rather, at each
macro-scale time step, we reconstruct the required constitutive data by
interpolating fine-scale simulations on the $d \leq 3$-dimensional
simulation domain. We use a predictive approach to determine the
important sample points at the beginning of each macro-scale time step
. Thus, we achieve adaptive sampling
(i.e., reduced fine-scale simulation) while retaining near perfect
parallelism associated with minimal synchronization requirements.

For simplicity, our CoHMM 2.0 proxy app implements spatial adaptive sampling in the context of one-dimensional elastodynamic shock propagation. To the right, we show the speed-ups gained by spatial adaptive sampling, as a function of continuum resolution (the desired accuracy at macro-scale). The dashed curve represents the theoretical speed-up, which we measure as the fraction of fine-scale calls (e.g. spawned CoMD tasks) that are eliminated by using spatial adaptive sampling in place of brute-force HMM. The solid curve measures the actual wall-clock speed-up, which we measure as fraction of computer cycles saved. At the highest macro-scale accuracies (i.e. continuum resolutions of $\sim 10^5$), we observe a wall-clock speed-up factor of $\sim 10^3$, which is about 2/3 the theoretical expected value. Of the remaining 1/3 computational cost, the majority is associated with Akima spline interpolation of the sample points over the continuum mesh.

Our spatial adaptive sampling approach is particularly simple, and independent of the details of the fine-scale model. Thus, our CoHMM 2.0 proxy-app readily generalizes to more realistic materials models, and is representative of the next-generation computational demands at the Exascale.

#### CoHMM Publications

##### Papers

**
B. Rouet-Leduc, K. Barros, E. Cieren, V. Elango, C. Junghans, T. Lookman,
J. Mohd-Yusof, R. S. Pavel, A. Y. Rivera, D. Roehm, A. L. McPherson and T. C. Germann
**,
*“
Spatial adaptive sampling in multiscale simulation
”*
Comp. Phys. Comm. 185, 1857 (2014).