## CoHMM Proxy ApplicationMulti-scale proxy app to test and evaluate runtime systems.

An anticipated important use-case for next-generation supercomputing is multiscale modeling, in which continuum equations for large-scale material deformation are augmented with high-fidelity, fine-scale simulations that provide constitutive data on demand. Such multiscale modeling is the aim of heterogeneous multiscale (HMM), Equation Free, and related methods [Theodoropoulos (2000), Kevrekidis (2009), Brandt (2002), E (2003), E (2007)]. The image to the left shows HMM simulations of large-scale shock propagation dynamics in two-dimensions. The initial conditions model correspond to an impact at the center of a copper plate. The large localized stress quickly propagates throughout the whole the plate (green surface). The material deformation is visualized in the bottom plane.

Above we illustrate our strategy for multiscale modeling of elastodynamics, for simplicity, in the context of one-dimensional shock propagation. (a) Following a 10% tensile strain, the subsequent shock dynamics unfolds on macroscopic space and time scales. Red dots denote representative microscopic regions of the system, each of which is modeled by $\sim \! 10^3$ coppers atoms in a defect-free fcc crystal. (b) The heterogeneous multiscale method (HMM) integrates conservation laws to determine the dynamical evolution of the macroscopic fields, such as strain. Constitutive data is provided by molecular dynamics simulations. These fine-scale simulations constitute the overwhelming computational cost. To reduce the number of fine-scale simulations, traditional adaptive sampling may be applied: all fine-scale simulations are stored in a global database, and subsequent calculations can be replaced by database look-ups [Knap (2008), Barton (2008)]. We have implemented an adaptive sampling application based upon HMM, which uses CoMD for fine-scale simulations, and kriging to interpolate database entries [Roehm13]. This traditional adaptive sampling approach will encounter difficulties at the Exascale: it is impractical to globally synchronize a database across $10^9$ threads while retaining millisecond latencies. For this reason, we have developed an alternate spatial approach to adaptive sampling, shown in panel (c). We dynamically adapt the location of microscopic simulations to the regions of interest, and reconstruct the constitutive data by interpolating in real-space. Consequently, we avoid the requirement of a global database, and achieve near perfect parallelism [Rouet13].

#### The heterogeneous multiscale method (HMM)

For simplicity, we base our multiscale proxy apps upon HMM models of one and two dimensional elastodynamics. Beginning with fine-scale molecular dynamics (e.g. a CoMD simulation of a copper crystal) one derives a coarse grained'' representation consisting of conservation laws for mass, momentum, and energy densities [Li (2005)],

\begin{align} \partial_{t}\rho+\nabla_{\mathbf{x}}\cdot\mathbf{q} & =0\\ \partial_{t}\mathbf{q}+\nabla_{\mathbf{x}}\cdot\tau & =0\\ \partial_{t}e+\nabla_{\mathbf{x}}\cdot\mathbf{j} & =0 \end{align}

These conservation laws, by themselves, are incomplete; we require constitutive data in the form of momentum and energy flux fields ($\mathbf{\tau}$ and $\mathbf{j}$ respectively) as a function of the macro-scale state. It is through this closure that the micro- and macroscales are coupled. At each macro-scale integration step, HMM must spawn CoMD simulations at a large number of representative microscopic sample points (red dots in Fig. 2b). Each CoMD simulation contains $\sim 10^3$ copper atoms arranged in a perfect face-centered cubic (fcc) crystal. The macroscopic configuration provides a local strain, which we apply using periodic boundary conditions that stretch the fcc lattice. A short molecular dynamics simulation ($\sim 10^3$-$10^4$ time steps) is sufficient to estimate the required fluxes. Once this fine-scale data is available, the conservation equations can be integrated over a macro-scale time step. Each CoMD calculation can be performed in $\sim 10^1$ seconds on a single node and, naively, $\sim 10^4$ CoMD simulations are required per HMM time step. Because CoMD simulation is the dominant cost of HMM, we have investigated two adaptive sampling approaches to improve efficiency.

Many of the CoMD simulations spawned by CoHMM are similar and hence the number of computations can be minimize. We do this on 4 different levels, which can be seen in the animating figure below:

• CoMD Point (blue): Task mapping to remove CoMD tasks with identical input data
• CoMD database (turquoise) to reuse points, which have been computed at earlier times
• Kriging (red) to interpolate between point, which have similar input parameters, combined with task mapping (Kriging Point - orange) to remove duplicated Kriging calculations. Failed Kriging attempts will directly execute CoMD (yellow).
• Kriging database (green) to reuse previous calculations

Task mapping algorithm assigns a hash to every CoMD task as well as the Kriging task and checks for collisions to avoid identical executions. The map is then used to start both CoMD and Kriging tasks in parallel using either OpenMP or CNC.

For the database approach, we chose to use one of the most popular NoSQL databases to take advantage of the key-value store approach. More specifically, we chose the Redis database due to its demonstratively high performance and the ability to use its set data structures to create buckets of nearest neighbors. Besides the fact that Redis is currently the most popular key-value store, it provides the use of distributed nodes, which will be important for the large amount of connections on distributed systems (work in progress).

Kriging is a technique borrowed from geo-science. The use of the word “Kriging” in (geo-)statistics stands for “optimally predicting” or “optimal prediction”. The technique is also known as Gaussian process regression, Kolmogorov Wiener prediction, or Best Linear Unbiased Prediction (BLUP). The power of this method lies in its ability to tread not only scalars but also vectors located in a high dimensional space to predict the value of a vector located at a certain position in the high dimensional space by computing a weighted average of the known vectors in the neighborhood of the point. Using matrix computations it predicts not only the value but computes an error of the estimation at the same time, too. Originate in the spacial prediction on a two dimensional surface in geophysics it has recently become a useful tool for adaptive sampling.

To address anticipated challenges at the Exascale, we developed an alternative, spatial approach to adaptive sampling [Rouet13]. This algorithm forms the basis of our CoHMM 2.0 proxy app. Spatial adaptive sampling enables reduction of fine-scale simulations without the need for a database. Rather, at each macro-scale time step, we reconstruct the required constitutive data by interpolating fine-scale simulations on the $d \leq 3$-dimensional simulation domain. We use a predictive approach to determine the important sample points at the beginning of each macro-scale time step . Thus, we achieve adaptive sampling (i.e., reduced fine-scale simulation) while retaining near perfect parallelism associated with minimal synchronization requirements.
For simplicity, our CoHMM 2.0 proxy app implements spatial adaptive sampling in the context of one-dimensional elastodynamic shock propagation. To the right, we show the speed-ups gained by spatial adaptive sampling, as a function of continuum resolution (the desired accuracy at macro-scale). The dashed curve represents the theoretical speed-up, which we measure as the fraction of fine-scale calls (e.g. spawned CoMD tasks) that are eliminated by using spatial adaptive sampling in place of brute-force HMM. The solid curve measures the actual wall-clock speed-up, which we measure as fraction of computer cycles saved. At the highest macro-scale accuracies (i.e. continuum resolutions of $\sim 10^5$), we observe a wall-clock speed-up factor of $\sim 10^3$, which is about 2/3 the theoretical expected value. Of the remaining 1/3 computational cost, the majority is associated with Akima spline interpolation of the sample points over the continuum mesh.