Esmail Abdul Fattah, Elias Krainski, Janet van Niekerk, Håvard Rue
This paper aims to extend the Besag model, a widely used Bayesian spatial model in disease mapping, to a non-stationary spatial model for irregular lattice-type data. The goal is to improve the model's ability to capture complex spatial dependence patterns and increase interpretability. The proposed model uses multiple precision parameters, accounting for different intensities of spatial dependence in different sub-regions. We derive a joint penalized complexity prior for the flexible local precision parameters to prevent overfitting and ensure contraction to the stationary model at a user-defined rate. The proposed methodology can be used as a basis for the development of various other non-stationary effects over other domains such as time. An accompanying R package 'fbesag' equips the reader with the necessary tools for immediate use and application. We illustrate the novelty of the proposal by modeling the risk of dengue in Brazil, where the stationary spatial assumption fails and interesting risk profiles are estimated when accounting for spatial non-stationary.
Esmail Abdul Fattah, Hatem Ltaief, Havard Rue, David Keyes
Selected inversion is essential for applications such as Bayesian inference, electronic structure calculations, and inverse covariance estimation, where computing only specific elements of large sparse matrix inverses significantly reduces computational and memory overhead. We present an efficient implementation of a two-phase parallel algorithm for computing selected elements of the inverse of a sparse symmetric matrix A, which can be expressed as A = LL^T through sparse Cholesky factorization. Our approach leverages a tile-based structure, focusing on selected dense tiles to optimize computational efficiency and parallelism. While the focus is on arrowhead matrices, the method can be extended to handle general structured matrices. Performance evaluations on a dual-socket 26-core Intel Xeon CPU server demonstrate that sTiles outperforms state-of-the-art direct solvers such as Panua-PARDISO, achieving up to 13X speedup on large-scale structured matrices. Additionally, our GPU implementation using an NVIDIA A100 GPU demonstrates substantial acceleration over its CPU counterpart, achieving up to 5X speedup for large, high-bandwidth matrices with high computational intensity. These results underscore the robustness and versatility of sTiles, validating its effectiveness across various densities and problem configurations.
Esmail Abdul Fattah, Janet Van Niekerk, Haavard Rue
Computing the gradient of a function provides fundamental information about its behavior. This information is essential for several applications and algorithms across various fields. One common application that require gradients are optimization techniques such as stochastic gradient descent, Newton's method and trust region methods. However, these methods usually requires a numerical computation of the gradient at every iteration of the method which is prone to numerical errors. We propose a simple limited-memory technique for improving the accuracy of a numerically computed gradient in this gradient-based optimization framework by exploiting (1) a coordinate transformation of the gradient and (2) the history of previously taken descent directions. The method is verified empirically by extensive experimentation on both test functions and on real data applications. The proposed method is implemented in the R package smartGrad and in C++.
Esmail Abdul Fattah, Haavard Rue
We address in this paper a new approach for fitting spatiotemporal models with application in disease mapping using the interaction types 1,2,3, and 4. When we account for the spatiotemporal interactions in disease-mapping models, inference becomes more useful in revealing unknown patterns in the data. However, when the number of locations and/or the number of time points is large, the inference gets computationally challenging due to the high number of required constraints necessary for inference, and this holds for various inference architectures including Markov chain Monte Carlo (MCMC) and Integrated Nested Laplace Approximations (INLA). We re-formulate INLA approach based on dense matrices to fit the intrinsic spatiotemporal models with the four interaction types and account for the sum-to-zero constraints, and discuss how the new approach can be implemented in a high-performance computing framework. The computing time using the new approach does not depend on the number of constraints and can reach a 40-fold faster speed compared to INLA in realistic scenarios. This approach is verified by a simulation study and a real data application, and it is implemented in the R package INLAPLUS and the Python header function: inla1234().
Esmail Abdul Fattah, Elias Krainski, Havard Rue
Bayesian inference often relies on Markov chain Monte Carlo (MCMC) methods, particularly required for non-Gaussian data families. When dealing with complex hierarchical models, the MCMC approach can be computationally demanding in workflows that require repeated model fitting or when working with models of large dimensions with limited hardware resources. The Integrated Nested Laplace Approximations (INLA) is a deterministic alternative for models with non-Gaussian data that belong to the class of latent Gaussian models (LGMs), yielding accurate approximations to posterior marginals in many applied settings. The INLA method was implemented in C as a standalone program, inla, that is widely used in R through the INLA package. This paper introduces PyINLA, a dedicated Python package that provides a Pythonic interface directly to the inla program. Therefore, PyINLA enables specifying LGMs, running INLA-based inference, and accessing posterior summaries directly from Python while leveraging the established INLA implementation. We describe the package design and illustrate its use on representative models, including generalized linear mixed models, time series forecasting, disease mapping, and geostatistical prediction, demonstrating how deterministic Bayesian inference can be performed in Python using INLA in a way that integrates naturally with common scientific computing workflows.
Esmail Abdul Fattah, Hatem Ltaief, Havard Rue, David Keyes
This paper introduces sTiles, a GPU-accelerated framework for factorizing sparse structured symmetric matrices. By leveraging tile algorithms for fine-grained computations, sTiles uses a structure-aware task execution flow to handle challenging arrowhead sparse matrices with variable bandwidths, common in scientific and engineering fields. It minimizes fill-in during Cholesky factorization using permutation techniques and employs a static scheduler to manage tasks on shared-memory systems with GPU accelerators. sTiles balances tile size and parallelism, where larger tiles enhance algorithmic intensity but increase floating-point operations and memory usage, while parallelism is constrained by the arrowhead structure. To expose more parallelism, a left-looking Cholesky variant breaks sequential dependencies in trailing submatrix updates via tree reductions. Evaluations show sTiles achieves speedups of up to 8.41X, 9.34X, 5.07X, and 11.08X compared to CHOLMOD, SymPACK, MUMPS, and PARDISO, respectively, and a 5X speedup compared to a 32-core AMD EPYC CPU on an NVIDIA A100 GPU. Our generic software framework imports well-established concepts from dense matrix computations but they all require customizations in their deployments on hybrid architectures to best handle factorizations of sparse matrices with arrowhead structures.
Esmail Abdul-Fattah, Janet Van Niekerk, Haavard Rue
The integrated nested Laplace approximations (INLA) method has become a widely utilized tool for researchers and practitioners seeking to perform approximate Bayesian inference across various fields of application. To address the growing demand for incorporating more complex models and enhancing the method's capabilities, this paper introduces a novel framework that leverages dense matrices for performing approximate Bayesian inference based on INLA across multiple computing nodes using HPC. When dealing with non-sparse precision or covariance matrices, this new approach scales better compared to the current INLA method, capitalizing on the computational power offered by multiprocessors in shared and distributed memory architectures available in contemporary computing resources and specialized dense matrix algebra. To validate the efficacy of this approach, we conduct a simulation study then apply it to analyze cancer mortality data in Spain, employing a three-way spatio-temporal interaction model.