CoSP2 Proxy Application Sparse Linear Algebra for Electronic Structure calculations
CoSP2 represents a sparse linear algebra parallel algorithm for calculating the density matrix in electronic structure theory. The algorithm is based on a recursive secondorder FermiOperator expansion method (SP2) and is tailored for density functional based tightbinding calculations of nonmetallic systems. This SP2 algorithm is part of the Los Alamos Transferable Tightbinding for Energetics (LATTE) code, based on a matrix expansion of the Fermi operator in a recursive series of generalized matrixmatrix multiplications. The computational cost scales linearly for shared memory with the system size, which is achieved by using the ELLPACKR (ELL) sparse matrix data format and OpenMP multithreading. The CoSP2 implementation uses hierarchical parallelism, MPI between nodes and OpenMP on node.
We achieve efficient shared memory parallelism for generalized sparse matrixmatrix multiplications by using the ELL data structure. The ELL format allows sparse matrix storage in a regular manner (see Fig. 1a). Each sparse matrix is described by three arrays:
 a 2dimensional array containing the numerical values,
 a 2dimensional array containing the column indices, and
 a vector containing the number of nonzero entries per row.
The rowwise data storage makes a parallel implementation of a sparse
matrixmatrix multiplication fairly straightforward.
Fig. 1b illustrates the calculation of a sparse matrixmatrix multiplication, X^{2},
which is the operation governing the cost of the SP2 algorithm.
The matrix elements of each row i of X are multiplied with the corresponding
column vector elements.
However, the matrix X is symmetric and we therefore multiply the corresponding
row vector elements and accumulate into a temporary row buffer (
Pseudo code for the Secondorder spectral projection (SP2) algorithm is shown below.
Estimate ε_{max} and ε_{min} X = (ε_{max}I − H)/(ε_{max} − ε_{min}) TraceX = Tr[X] BreakLoop = 0 i=0 while BreakLoop = 0 do i=i+1 Exchange halo information between MPI ranks TraceXold = TraceX X_{tmp} = X^{2} TraceXtmp = Tr[X_{tmp}] if TraceXtmp − N_{occ} − 2*TraceX − TraceXtmp − N_{occ} > IdemTol then X = 2*X − X_{tmp} TraceX = 2*TraceX − TraceXtmp else X = X_{tmp} TraceX = TraceXtmp end if IdemErr_{i} = TraceX − TraceXold if IdemErr_{i}−2 ≤ IdemErr_{i} and i > i_{min} then BreakLoop = 1 end if Exchange halo chunks of rows across MPI ranks end while P=XThe work decomposition across MPI ranks consists of splitting up the Hamiltonian matrix into chunks of rows, where each chunk is assigned to a different rank. Access to halo rows is needed during the computation, requiring an exchange of relevant halo chunks per iteration of the SP2 algorithm.
Performance timings are shown for a polyethylene chain of 1024 molecules (6144 atoms, 12,288 x 12,288 matrix). Fig. 2 shows timings for 1) OpenMP only with 116 threads on a single node; 2) MPI with 1 OpenMP thread, running on 116 ranks of one node; and 3) MPI with 4 OpenMP threads, running on 116 ranks (as separate nodes). We see similar performance for OpenMP and MPI running on a single node. Multinode MPI+OpenMP shows improved performance due to hierarchical parallelism.
Background Papers
Papers

M. J. Cawkwell, E. J. Sanville, S. M. Mniszewski, A. M. N. Niklasson, , “ Computing the density matrix in electronic structure theory on graphics processing units, ” J. Chem. Theory Comput., 8, 4094−4101, 2012.

E. H. Rubensson, A. M. N. Niklasson, , “ Interior eigenvalues from density matrix expansions in quantum mechanical molecular dynamics, ” SIAM J. Sci. Comput. Vol. 36, No. 2, pp. B147–B170, 2014.

P. Souvatzis, A. M. N. Niklasson, , “ First principles molecular dynamics without selfconsistent field optimization, ” J. Chem. Physics 140, 044117, 2014.

M. J. Cawkwell, A. M. N. Niklasson, , “ Energy conserving, linear scaling BornOppenheimer molecular dynamics, ” J. Chem. Physics 137, 134105, 2012.

A. M. N. Niklasson, P., Steneteg, N. Bock, , “ Extended Lagrangian free energy molecular dynamics, ” J. Chem. Physics 135, 164111, 2011.

A. M. N. Niklasson, , “ Extended BornOppenheimer molecular dynamics, ” PRL 100, 123004, 2008.

A. M. N. Niklasson, , “ Expansion algorithm for the density matrix, ” Phys. Rev. B 66, 155115, 2002.