## CoSP2 Proxy Application Sparse Linear Algebra for Electronic Structure calculations

CoSP2 represents a sparse linear algebra parallel algorithm for calculating the density matrix in electronic structure theory. The algorithm is based on a recursive second-order Fermi-Operator expansion method (SP2) and is tailored for density functional based tight-binding calculations of non-metallic systems. This SP2 algorithm is part of the Los Alamos Transferable Tight-binding for Energetics (LATTE) code, based on a matrix expansion of the Fermi operator in a recursive series of generalized matrix-matrix multiplications. The computational cost scales linearly for shared memory with the system size, which is achieved by using the ELLPACK-R (ELL) sparse matrix data format and OpenMP multi-threading. The CoSP2 implementation uses hierarchical parallelism, MPI between nodes and OpenMP on node.

We achieve efficient shared memory parallelism for generalized sparse matrix-matrix multiplications by using the ELL data structure. The ELL format allows sparse matrix storage in a regular manner (see Fig. 1a). Each sparse matrix is described by three arrays:

1. a 2-dimensional array containing the numerical values,
2. a 2-dimensional array containing the column indices, and
3. a vector containing the number of non-zero entries per row.

The row-wise data storage makes a parallel implementation of a sparse matrix-matrix multiplication fairly straightforward. Fig. 1b illustrates the calculation of a sparse matrix-matrix multiplication, X2, which is the operation governing the cost of the SP2 algorithm. The matrix elements of each row i of X are multiplied with the corresponding column vector elements. However, the matrix X is symmetric and we therefore multiply the corresponding row vector elements and accumulate into a temporary row buffer (i). The temporary row buffer vector represents the result of a new row i in X2, which is stored in the compressed ELL format. Using a distributed/shared memory architecture this scheme can be parallelized over the rows.

Pseudo code for the Second-order spectral projection (SP2) algorithm is shown below.

Estimate εmax and εmin
X = (εmaxI − H)/(εmax − εmin)
TraceX = Tr[X]
BreakLoop = 0
i=0
while BreakLoop = 0 do
i=i+1
Exchange halo information between MPI ranks
TraceXold = TraceX
Xtmp = X2
TraceXtmp = Tr[Xtmp]
if |TraceXtmp − Nocc| − |2*TraceX − TraceXtmp − Nocc| > IdemTol then
X = 2*X − Xtmp
TraceX = 2*TraceX − TraceXtmp
else
X = Xtmp
TraceX = TraceXtmp
end if
IdemErri = |TraceX − TraceXold|
if IdemErri−2 ≤ IdemErri and i > imin then
BreakLoop = 1
end if
Exchange halo chunks of rows across MPI ranks
end while
P=X

The work decomposition across MPI ranks consists of splitting up the Hamiltonian matrix into chunks of rows, where each chunk is assigned to a different rank. Access to halo rows is needed during the computation, requiring an exchange of relevant halo chunks per iteration of the SP2 algorithm.

Performance timings are shown for a polyethylene chain of 1024 molecules (6144 atoms, 12,288 x 12,288 matrix). Fig. 2 shows timings for 1) OpenMP only with 1-16 threads on a single node; 2) MPI with 1 OpenMP thread, running on 1-16 ranks of one node; and 3) MPI with 4 OpenMP threads, running on 1-16 ranks (as separate nodes). We see similar performance for OpenMP and MPI running on a single node. Multi-node MPI+OpenMP shows improved performance due to hierarchical parallelism.

#### Background Papers

##### Papers
1. M. J. Cawkwell, E. J. Sanville, S. M. Mniszewski, A. M. N. Niklasson, , “ Computing the density matrix in electronic structure theory on graphics processing units, ” J. Chem. Theory Comput., 8, 4094−4101, 2012.

2. E. H. Rubensson, A. M. N. Niklasson, , “ Interior eigenvalues from density matrix expansions in quantum mechanical molecular dynamics, ” SIAM J. Sci. Comput. Vol. 36, No. 2, pp. B147–B170, 2014.

3. P. Souvatzis, A. M. N. Niklasson, , “ First principles molecular dynamics without self-consistent field optimization, ” J. Chem. Physics 140, 044117, 2014.

4. M. J. Cawkwell, A. M. N. Niklasson, , “ Energy conserving, linear scaling Born-Oppenheimer molecular dynamics, ” J. Chem. Physics 137, 134105, 2012.

5. A. M. N. Niklasson, P., Steneteg, N. Bock, , “ Extended Lagrangian free energy molecular dynamics, ” J. Chem. Physics 135, 164111, 2011.

6. A. M. N. Niklasson, , “ Extended Born-Oppenheimer molecular dynamics, ” PRL 100, 123004, 2008.

7. A. M. N. Niklasson, , “ Expansion algorithm for the density matrix, ” Phys. Rev. B 66, 155115, 2002.