Xu Guo, Shidong Jiang, Yunfeng Xiong, Jiwei Zhang
Earth introduces strong attenuation and dispersion to propagating waves. The time-fractional wave equation with very small fractional exponent, based on Kjartansson's constant-Q theory, is widely recognized in the field of geophysics as a reliable model for frequency-independent Q anelastic behavior. Nonetheless, the numerical resolution of this equation poses considerable challenges due to the requirement of storing a complete time history of wavefields. To address this computational challenge, we present a novel approach: a nearly optimal sum-of-exponentials (SOE) approximation to the Caputo fractional derivative with very small fractional exponent, utilizing the machinery of generalized Gaussian quadrature. This method minimizes the number of memory variables needed to approximate the power attenuation law within a specified error tolerance. We establish a mathematical equivalence between this SOE approximation and the continuous fractional stress-strain relationship, relating it to the generalized Maxwell body model. Furthermore, we prove an improved SOE approximation error bound to thoroughly assess the ability of rheological models to replicate the power attenuation law. Numerical simulations on constant-Q viscoacoustic equation in 3D homogeneous media and variable-order P- and S- viscoelastic wave equations in 3D inhomogeneous media are performed. These simulations demonstrate that our proposed technique accurately captures changes in amplitude and phase resulting from material anelasticity. This advancement provides a significant step towards the practical usage of the time-fractional wave equation in seismic inversion.
Charles L. Epstein, Fredrik Fryklund, Shidong Jiang
A new scheme is proposed to construct an n-times differentiable function extension of an n-times differentiable function defined on a smooth domain D in d-dimensions. The extension scheme relies on an explicit formula consisting of a linear combination of n+1 function values in D, which extends the function along directions normal to the boundary. Smoothness tangent to the boundary is automatic. The performance of the scheme is illustrated by using function extension as a step in a numerical solver for the inhomogeneous Poisson equation on multiply connected domains with complex geometry in two and three dimensions. We show that the modest additional work needed to do function extension leads to considerably more accurate solutions of the partial differential equation.
Ludvig af Klinteberg, Leslie Greengard, Shidong Jiang, Anna-Karin Tornberg
Classical Ewald methods for Coulomb and Stokes interactions rely on ``kernel-splitting," using decompositions based on Gaussians to divide the resulting potential into a near field and a far field component. Here, we show that a more efficient splitting for the scalar biharmonic Green's function can be derived using zeroth-order prolate spheroidal wave functions (PSWFs), which in turn yields new efficient splittings for the Stokeslet, stresslet, and elastic kernels, since these Green's tensors can all be derived from the biharmonic kernel. This benefits all fast summation methods based on kernel splitting, including FFT-based Ewald summation methods, that are suitable for uniform point distributions, and DMK-based methods that allow for nonuniform point distributions. The DMK (dual-space multilevel kernel-splitting) algorithm we develop here is fast, adaptive, and linear-scaling, both in free space and in a periodic cube. We demonstrate its performance with numerical examples in two and three dimensions.
Hai Zhu, Chia-Nan Yeh, Miguel A. Morales, Leslie Greengard, Shidong Jiang, Jason Kaye
We generalize the interpolative separable density fitting (ISDF) method, used for compressing the four-index electron repulsion integral (ERI) tensor, to incorporate adaptive real space grids for potentially highly localized single-particle basis functions. To do so, we employ a fast adaptive algorithm, the recently-introduced dual-space multilevel kernel-splitting method, to solve the Poisson equation for the ISDF auxiliary basis functions. The adaptive grids are generated using a high-order accurate, black-box procedure that satisfies a user-specified error tolerance. Our algorithm relies on the observation, which we prove, that an adaptive grid resolving the pair densities appearing in the ERI tensor can be straightforwardly constructed from one that resolves the single-particle basis functions, with the number of required grid points differing only by a constant factor. We find that the ISDF compression efficiency for the ERI tensor with highly localized basis sets is comparable to that for smoother basis sets compatible with uniform grids. To demonstrate the performance of our procedure, we consider several molecular systems with all-electron basis sets which are intractable using uniform grid-based methods. Our work establishes a pathway for scalable many-body electronic structure simulations with arbitrary smooth basis functions, making simulations of phenomena like core-level excitations feasible on a large scale.
Leslie Greengard, Shidong Jiang, Jun Wang
Two fundamental difficulties are encountered in the numerical evaluation of time-dependent layer potentials. One is the quadratic cost of history dependence, which has been successfully addressed by splitting the potentials into two parts - a local part that contains the most recent contributions and a history part that contains the contributions from all earlier times. The history part is smooth, easily discretized using high-order quadratures, and straightforward to compute using a variety of fast algorithms. The local part, however, involves complicated singularities in the underlying Green's function. Existing methods, based on exchanging the order of integration in space and time, are able to achieve high order accuracy, but are limited to the case of stationary boundaries. Here, we present a new quadrature method that leaves the order of integration unchanged, making use of a change of variables that converts the singular integrals with respect to time into smooth ones. We have also derived asymptotic formulas for the local part that lead to fast and accurate hybrid schemes, extending earlier work for scalar heat potentials and applicable to moving boundaries. The performance of the overall scheme is demonstrated via numerical examples.
Johan Helsing, Shidong Jiang
A numerical scheme is presented for solving the Helmholtz equation with Dirichlet or Neumann boundary conditions on piecewise smooth open curves, where the curves may have corners and multiple junctions. Existing integral equation methods for smooth open curves rely on analyzing the exact singularities of the density at endpoints for associated integral operators, explicitly extracting these singularities from the densities in the formulation, and using global quadrature to discretize the boundary integral equation. Extending these methods to handle curves with corners and multiple junctions is challenging because the singularity analysis becomes much more complex, and constructing high-order quadrature for discretizing layer potentials with singular and hypersingular kernels and singular densities is nontrivial. The proposed scheme is built upon the following two observations. First, the single-layer potential operator and the normal derivative of the double-layer potential operator serve as effective preconditioners for each other locally. Second, the recursively compressed inverse preconditioning (RCIP) method can be extended to address "implicit" second-kind integral equations. The scheme is high-order, adaptive, and capable of handling corners and multiple junctions without prior knowledge of the density singularity. It is also compatible with fast algorithms, such as the fast multipole method. The performance of the scheme is illustrated with several numerical examples.
Jiuyang Liang, Libin Lu, Shidong Jiang
We present an NPT extension of Ewald summation with prolates (ESP), a spectrally accurate and scalable particle-mesh method for molecular dynamics simulations of periodic, charged systems. Building on the recently introduced ESP framework, this work focuses on rigorous and thermodynamically consistent pressure/stress evaluation in the isothermal--isobaric ensemble. ESP employs prolate spheroidal wave functions as both splitting and spreading kernels, reducing the Fourier grid size needed to reach a prescribed pressure accuracy compared with current widely used mesh-Ewald methods based on Gaussian splitting and B-spline spreading. We derive a unified pressure-tensor formulation applicable to isotropic, semi-isotropic, anisotropic, and fully flexible cells, and show that the long-range pressure can be evaluated with a single forward FFT followed by diagonal scaling, whereas force evaluation requires both forward and inverse transforms. We provide production implementations in LAMMPS and GROMACS and validate pressure and force accuracy on bulk water, LiTFSI ionic liquids, and a transmembrane system. Benchmarks on up to $3\times 10^3$ CPU cores demonstrate strong scaling and reduced communication cost at matched accuracy, particularly for NPT pressure evaluation.
Jiuyang Liang, Libin Lu, Alex Barnett, Leslie Greengard, Shidong Jiang
The evaluation of long-range Coulomb interactions is a significant cost in molecular dynamics (MD), even when using Particle Mesh Ewald (PME) or Particle-Particle-Particle-Mesh (PPPM) methods, which rely on Ewald splitting and the fast Fourier transform to achieve near-linear scaling. We introduce ESP -- Ewald summation with prolate spheroidal wave functions (PSWFs) -- which leads to a more efficient Fourier representation and a reduction in the required grid size, global communication, and particle-grid operations, without loss of accuracy. We have integrated the ESP method into two widely-used open-source MD packages, LAMMPS and GROMACS, enabling rapid comparison and adoption. Relative to PME/PPPM baselines at error tolerances $10^{-3}$ to $10^{-4}$, ESP gives roughly a $3$-fold acceleration of electrostatic interactions, and a $2.5$-fold speed-up in the MD simulation when using about $10^3$ compute cores. At high accuracy ($10^{-5}$), these increase to $10$-fold for the far-field electrostatics and $5$-fold for MD simulation. Furthermore, we show that the accelerated codes have improved strong scaling with core count, and validate them in realistic long-time biological and material simulations. ESP thus offers a practical, drop-in path to reduce the time-to-solution and energy footprint of MD workflows.
Shidong Jiang, Leslie Greengard
We introduce a new class of multilevel, adaptive, dual-space methods for computing fast convolutional transforms. These methods can be applied to a broad class of kernels, from the Green's functions for classical partial differential equations (PDEs) to power functions and radial basis functions such as those used in statistics and machine learning. The DMK (dual-space multilevel kernel-splitting) framework uses a hierarchy of grids, computing a smoothed interaction at the coarsest level, followed by a sequence of corrections at finer and finer scales until the problem is entirely local, at which point direct summation is applied. The main novelty of DMK is that the interaction at each scale is diagonalized by a short Fourier transform, permitting the use of separation of variables, but without requiring the FFT for its asymptotic performance. The DMK framework substantially simplifies the algorithmic structure of the fast multipole method (FMM) and unifies the FMM, Ewald summation, and multilevel summation, achieving speeds comparable to the FFT in work per gridpoint, even in a fully adaptive context. For continuous source distributions, the evaluation of local interactions is further accelerated by approximating the kernel at the finest level as a sum of Gaussians with a highly localized remainder. The Gaussian convolutions are calculated using tensor product transforms, and the remainder term is calculated using asymptotic methods. We illustrate the performance of DMK for both continuous and discrete sources with extensive numerical examples in two and three dimensions.
Zydrunas Gimbutas, Shidong Jiang, Li-Shi Luo
A numerical scheme is developed for the evaluation of Abramowitz functions $J_n$ in the right half of the complex plane. For $n=-1,\, \ldots,\, 2$, the scheme utilizes series expansions for $|z|<1$ and asymptotic expansions for $|z|>R$ with $R$ determined by the required precision, and modified Laurent series expansions which are precomputed via a least squares procedure to approximate $J_n$ accurately and efficiently on each sub-region in the intermediate region $1\le |z| \le R$. For $n>2$, $J_n$ is evaluated via a recurrence relation. The scheme achieves nearly machine precision for $n=-1, \ldots, 2$, with the cost about four times of evaluating a complex exponential per function evaluation.
Shidong Jiang
We present a fast Gauss transform in one dimension using nearly optimal sum-of-exponentials approximations of the Gaussian kernel. For up to about ten-digit accuracy, the approximations are obtained via best rational approximations of the exponential function on the negative real axis. As compared with existing fast Gauss transforms, the algorithm is straightforward for parallelization and very simple to implement, with only twenty-four lines of code in MATLAB. The most expensive part of the algorithm is on the evaluation of complex exponentials, leading to three to six complex exponentials FLOPs per point depending on the desired precision. The performance of the algorithm is illustrated via several numerical examples.
Ruqi Pei, Travis Askham, Leslie Greengard, Shidong Jiang
A new scheme is presented for imposing periodic boundary conditions on unit cells with arbitrary source distributions. We restrict our attention here to the Poisson, modified Helmholtz, Stokes and modified Stokes equations. The approach extends to the oscillatory equations of mathematical physics, including the Helmholtz and Maxwell equations, but we will address these in a companion paper, since the nature of the problem is somewhat different and includes the consideration of quasiperiodic boundary conditions and resonances. Unlike lattice sum-based methods, the scheme is insensitive to the unit cell's aspect ratio and is easily coupled to adaptive fast multipole methods (FMMs). Our analysis relies on classical "plane-wave" representations of the fundamental solution, and yields an explicit low-rank representation of the field due to all image sources beyond the first layer of neighboring unit cells. When the aspect ratio of the unit cell is large, our scheme can be coupled with the nonuniform fast Fourier transform (NUFFT) to accelerate the evaluation of the induced field. Its performance is illustrated with several numerial examples.
Leslie Greengard, Shidong Jiang, Manas Rachh, Jun Wang
We present a new version of the fast Gauss transform (FGT) for discrete and continuous sources. Classical Hermite expansions are avoided entirely, making use only of the plane-wave representation of the Gaussian kernel and a new hierarchical merging scheme. For continuous source distributions sampled on adaptive tensor-product grids, we exploit the separable structure of the Gaussian kernel to accelerate the computation. For discrete sources, the scheme relies on the nonuniform fast Fourier transform (NUFFT) to construct near field plane wave representations. The scheme has been implemented for either free-space or periodic boundary conditions. In many regimes, the speed is comparable to or better than that of the conventional FFT in work per gridpoint, despite being fully adaptive.
Johan Helsing, Shidong Jiang
A numerical scheme is presented for the solution of Fredholm second-kind boundary integral equations with right-hand sides that are singular at a finite set of boundary points. The boundaries themselves may be non-smooth. The scheme, which builds on recursively compressed inverse preconditioning (RCIP), is universal as it is independent of the nature of the singularities. Strong right-hand-side singularities, such as $1/|r|^α$ with $α$ close to $1$, can be treated in full machine precision. Adaptive refinement is used only in the recursive construction of the preconditioner, leading to an optimal number of discretization points and superior stability in the solve phase. The performance of the scheme is illustrated via several numerical examples, including an application to an integral equation derived from the linearized BGKW kinetic equation for the steady Couette flow.
Tristan Goodwill, Shidong Jiang, Manas Rachh, Kosuke Sugita
We analyze and develop numerical methods for time-harmonic wave scattering in metallic waveguide structures of infinite extent. We show that radiation boundary conditions formulated via projectors onto outgoing modes determine the coefficients of propagating modes uniquely, even when the structure supports trapped modes. Building on this, we introduce a fast divide-and-conquer solver that constructs solution operators on subdomains as impedance-to-impedance maps and couples them by enforcing continuity conditions across their interfaces. For Dirichlet waveguides, the computation of impedance-to-impedance maps requires the solution of mixed Dirichlet-Impedance boundary value problems. We construct a second-kind Fredholm integral equation that avoids near-hypersingular operators, requiring only integral operators whose kernels are at most weakly singular. Numerical experiments on large structures with many circuit elements demonstrate substantial efficiency gains: the proposed approach typically outperforms state-of-the-art fast iterative and fast direct solvers by one to two orders of magnitude.
Jun Wang, Leslie Greengard, Shidong Jiang, Shravan Veerapaneni
We present a family of integral equation-based solvers for the linear or semilinear heat equation in complicated moving (or stationary) geometries. This approach has significant advantages over more standard finite element or finite difference methods in terms of accuracy, stability and space-time adaptivity. In order to be practical, however, a number of technical capabilites are required: fast algorithms for the evaluation of heat potentials, high-order accurate quadratures for singular and weakly integrals over space-time domains, and robust automatic mesh refinement and coarsening capabilities. We describe all of these components and illustrate the performance of the approach with numerical examples in two space dimensions.
Jun Lai, Shidong Jiang
We present a second kind integral equation (SKIE) formulation for calculating the electromagnetic modes of optical waveguides, where the unknowns are only on material interfaces. The resulting numerical algorithm can handle optical waveguides with a large number of inclusions of arbitrary irregular cross section. It is capable of finding the bound, leaky, and complex modes for optical fibers and waveguides including photonic crystal fibers (PCF), dielectric fibers and waveguides. Most importantly, the formulation is well conditioned even in the case of nonsmooth geometries. Our method is highly accurate and thus can be used to calculate the propagation loss of the electromagnetic modes accurately, which provides the photonics industry a reliable tool for the design of more compact and efficient photonic devices. We illustrate and validate the performance of our method through extensive numerical studies and by comparison with semi-analytical results and previously published results.
Xuanzhao Gao, Shidong Jiang, Jiuyang Liang, Zhenli Xu, Qi Zhou
The quasi-2D electrostatic systems, characterized by periodicity in two dimensions with a free third dimension, have garnered significant interest in many fields. We apply the sum-of-Gaussians (SOG) approximation to the Laplace kernel, dividing the interactions into near-field, mid-range, and long-range components. The near-field component, singular but compactly supported in a local domain, is directly calculated. The mid-range component is managed using a procedure similar to nonuniform fast Fourier transforms in three dimensions. The long-range component, which includes Gaussians of large variance, is treated with polynomial interpolation/anterpolation in the free dimension and Fourier spectral solver in the other two dimensions on proxy points. Unlike the fast Ewald summation, which requires extensive zero padding in the case of high aspect ratios, the separability of Gaussians allows us to handle such case without any zero padding in the free direction. Furthermore, while NUFFTs typically rely on certain upsampling in each dimension, and the truncated kernel method introduces an additional factor of upsampling due to kernel oscillation, our scheme eliminates the need for upsampling in any direction due to the smoothness of Gaussians, significantly reducing computational cost for large-scale problems. Finally, whereas all periodic fast multipole methods require dividing the periodic tiling into a smooth far part and a near part containing its nearest neighboring cells, our scheme operates directly on the fundamental cell, resulting in better performance with simpler implementation. We provide a rigorous error analysis showing that upsampling is not required in NUFFT-like steps, achieving $O(N\log N)$ complexity with a small prefactor. The performance of the scheme is demonstrated via extensive numerical experiments.
Bo Wang, Zhiguo Yang, Lilian Wang, Shidong Jiang
In this paper, we present analytic formulas of the temporal convolution kernel functions involved in the time-domain non-reflecting boundary condition (NRBC) for the electromagnetic scattering problems. Such exact formulas themselves lead to accurate and efficient algorithms for computing the NRBC for domain reduction of the time-domain Maxwell's system in $\mathbb R^3$. A second purpose of this paper is to derive a new time-domain model for the electromagnetic invisibility cloak. Different from the existing models, it contains only one unknown field and the seemingly complicated convolutions can be computed as efficiently as the temporal convolutions in the NRBC. The governing equation in the cloaking layer is valid for general geometry, e.g., a spherical or polygonal layer. Here, we aim at simulating the spherical invisibility cloak. We take the advantage of radially stratified dispersive media and special geometry, and develop an efficient vector spherical harmonic (VSH)-spectral-element method for its accurate simulation. Compared with limited results on FDTD simulation, the proposed method is optimal in both accuracy and computational cost. Indeed, the saving in computational time is significant.
Shidong Jiang, Jiwei Zhang, Qian Zhang, Zhimin Zhang
We present an efficient algorithm for the evaluation of the Caputo fractional derivative $_0^C\!D_t^αf(t)$ of order $α\in (0,1)$, which can be expressed as a convolution of $f'(t)$ with the kernel $t^{-α}$. The algorithm is based on an efficient sum-of-exponentials approximation for the kernel $t^{-1-α}$ on the interval $[Δt, T]$ with a uniform absolute error $\varepsilon$, where the number of exponentials $N_{\text{exp}}$ needed is of the order $O\left(\log\frac{1}{\varepsilon}\left( \log\log\frac{1}{\varepsilon}+\log\frac{T}{Δt}\right) +\log\frac{1}{Δt}\left( \log\log\frac{1}{\varepsilon}+\log\frac{1}{Δt}\right) \right)$. As compared with the direct method, the resulting algorithm reduces the storage requirement from $O(N_T)$ to $O(N_{\text{exp}})$ and the overall computational cost from $O(N_T^2)$ to $O(N_TN_{\text{exp}})$ with $N_T$ the total number of time steps. Furthermore, when the fast evaluation scheme of the Caputo derivative is applied to solve the fractional diffusion equations, the resulting algorithm requires only $O(N_SN_{\text{exp}})$ storage and $O(N_SN_TN_{\text{exp}})$ work with $N_S$ the total number of points in space; whereas the direct methods require $O(N_SN_T$) storage and $O(N_SN_T^2)$ work. The complexity of both algorithms is nearly optimal since $N_{\text{exp}}$ is of the order $O(\log N_T)$ for $T\gg 1$ or $O(\log^2N_T)$ for $T\approx 1$ for fixed accuracy $\varepsilon$. We also present a detailed stability and error analysis of the new scheme for solving linear fractional diffusion equations. The performance of the new algorithm is illustrated via several numerical examples. Finally, the algorithm can be parallelized in a straightforward manner.