CoSP2 Proxy Application Sparse Linear Algebra for Electronic Structure calculations
CoSP2 represents a sparse linear algebra parallel algorithm for calculating the density matrix in electronic structure theory. The algorithm is based on a recursive second-order Fermi-Operator expansion method (SP2) and is tailored for density functional based tight-binding calculations of non-metallic systems. This SP2 algorithm is part of the Los Alamos Transferable Tight-binding for Energetics (LATTE) code, based on a matrix expansion of the Fermi operator in a recursive series of generalized matrix-matrix multiplications. The computational cost scales linearly for shared memory with the system size, which is achieved by using the ELLPACK-R (ELL) sparse matrix data format and OpenMP multi-threading. The CoSP2 implementation uses hierarchical parallelism, MPI between nodes and OpenMP on node.
We achieve efficient shared memory parallelism for generalized sparse matrix-matrix multiplications by using the ELL data structure. The ELL format allows sparse matrix storage in a regular manner (see Fig. 1a). Each sparse matrix is described by three arrays:
- a 2-dimensional array containing the numerical values,
- a 2-dimensional array containing the column indices, and
- a vector containing the number of non-zero entries per row.
The row-wise data storage makes a parallel implementation of a sparse
matrix-matrix multiplication fairly straightforward.
Fig. 1b illustrates the calculation of a sparse matrix-matrix multiplication, X2,
which is the operation governing the cost of the SP2 algorithm.
The matrix elements of each row i of X are multiplied with the corresponding
column vector elements.
However, the matrix X is symmetric and we therefore multiply the corresponding
row vector elements and accumulate into a temporary row buffer (
Pseudo code for the Second-order spectral projection (SP2) algorithm is shown below.
Estimate εmax and εmin X = (εmaxI − H)/(εmax − εmin) TraceX = Tr[X] BreakLoop = 0 i=0 while BreakLoop = 0 do i=i+1 Exchange halo information between MPI ranks TraceXold = TraceX Xtmp = X2 TraceXtmp = Tr[Xtmp] if |TraceXtmp − Nocc| − |2*TraceX − TraceXtmp − Nocc| > IdemTol then X = 2*X − Xtmp TraceX = 2*TraceX − TraceXtmp else X = Xtmp TraceX = TraceXtmp end if IdemErri = |TraceX − TraceXold| if IdemErri−2 ≤ IdemErri and i > imin then BreakLoop = 1 end if Exchange halo chunks of rows across MPI ranks end while P=XThe work decomposition across MPI ranks consists of splitting up the Hamiltonian matrix into chunks of rows, where each chunk is assigned to a different rank. Access to halo rows is needed during the computation, requiring an exchange of relevant halo chunks per iteration of the SP2 algorithm.
Performance timings are shown for a polyethylene chain of 1024 molecules (6144 atoms, 12,288 x 12,288 matrix). Fig. 2 shows timings for 1) OpenMP only with 1-16 threads on a single node; 2) MPI with 1 OpenMP thread, running on 1-16 ranks of one node; and 3) MPI with 4 OpenMP threads, running on 1-16 ranks (as separate nodes). We see similar performance for OpenMP and MPI running on a single node. Multi-node MPI+OpenMP shows improved performance due to hierarchical parallelism.
M. J. Cawkwell, E. J. Sanville, S. M. Mniszewski, A. M. N. Niklasson, , “ Computing the density matrix in electronic structure theory on graphics processing units, ” J. Chem. Theory Comput., 8, 4094−4101, 2012.
E. H. Rubensson, A. M. N. Niklasson, , “ Interior eigenvalues from density matrix expansions in quantum mechanical molecular dynamics, ” SIAM J. Sci. Comput. Vol. 36, No. 2, pp. B147–B170, 2014.
P. Souvatzis, A. M. N. Niklasson, , “ First principles molecular dynamics without self-consistent field optimization, ” J. Chem. Physics 140, 044117, 2014.
M. J. Cawkwell, A. M. N. Niklasson, , “ Energy conserving, linear scaling Born-Oppenheimer molecular dynamics, ” J. Chem. Physics 137, 134105, 2012.
A. M. N. Niklasson, P., Steneteg, N. Bock, , “ Extended Lagrangian free energy molecular dynamics, ” J. Chem. Physics 135, 164111, 2011.
A. M. N. Niklasson, , “ Extended Born-Oppenheimer molecular dynamics, ” PRL 100, 123004, 2008.
A. M. N. Niklasson, , “ Expansion algorithm for the density matrix, ” Phys. Rev. B 66, 155115, 2002.