Martin R. Albrecht, Gregory V. Bard, Clément Pernet
In this work we describe an efficient implementation of a hierarchy of algorithms for Gaussian elimination upon dense matrices over the field with two elements. We discuss both well-known and new algorithms as well as our implementations in the M4RI library, which has been adopted into Sage. The focus of our discussion is a block iterative algorithm for PLE decomposition which is inspired by the M4RI algorithm. The implementation presented in this work provides considerable performance gains in practice when compared to the previously fastest implementation. We provide performance figures on x86_64 CPUs to demonstrate the alacrity of our approach.
Claude-Pierre Jeannerod, Clément Pernet, Arne Storjohann
Transforming a matrix over a field to echelon form, or decomposing the matrix as a product of structured matrices that reveal the rank profile, is a fundamental building block of computational exact linear algebra. This paper surveys the well known variations of such decompositions and transformations that have been proposed in the literature. We present an algorithm to compute the CUP decomposition of a matrix, adapted from the LSP algorithm of Ibarra, Moran and Hui (1982), and show reductions from the other most common Gaussian elimination based matrix transformations and decompositions to the CUP decomposition. We discuss the advantages of the CUP algorithm over other existing algorithms by studying time and space complexities: the asymptotic time complexity is rank sensitive, and comparing the constants of the leading terms, the algorithms for computing matrix invariants based on the CUP decomposition are always at least as good except in one case. We also show that the CUP algorithm, as well as the computation of other invariants such as transformation to reduced column echelon form using the CUP algorithm, all work in place, allowing for example to compute the inverse of a matrix on the same storage as the input matrix.
Erich L. Kaltofen, Clément Pernet
We present algorithms performing sparse univariate polynomial interpolation with errors in the evaluations of the polynomial. Based on the initial work by Comer, Kaltofen and Pernet [Proc. ISSAC 2012], we define the sparse polynomial interpolation codes and state that their minimal distance is precisely the length divided by twice the sparsity. At ISSAC 2012, we have given a decoding algorithm for as much as half the minimal distance and a list decoding algorithm up to the minimal distance. Our new polynomial-time list decoding algorithm uses sub-sequences of the received evaluations indexed by a linear progression, allowing the decoding for a larger radius, that is, more errors in the evaluations while returning a list of candidate sparse polynomials. We quantify this improvement for all typically small values of number of terms and number of errors, and provide a worst case asymptotic analysis of this improvement. For instance, for sparsity T = 5 with up to 10 errors we can list decode in polynomial-time from 74 values of the polynomial with unknown terms, whereas our earlier algorithm required 2T (E + 1) = 110 evaluations. We then propose two variations of these codes in characteristic zero, where appropriate choices of values for the variable yield a much larger minimal distance: the length minus twice the sparsity.
Clément Pernet, Aude Rondepierre, Gilles Villard
We present two algorithms for the computation of the Kalman form of a linear control system. The first one is based on the technique developed by Keller-Gehrig for the computation of the characteristic polynomial. The cost is a logarithmic number of matrix multiplications. To our knowledge, this improves the best previously known algebraic complexity by an order of magnitude. Then we also present a cubic algorithm proven to more efficient in practice.
Jean-Guillaume Dumas, Joris Van Der Hoeven, Clément Pernet, Daniel Roche
We present new algorithms to detect and correct errors in the lower-upper factorization of a matrix, or the triangular linear system solution, over an arbitrary field. Our main algorithms do not require any additional information or encoding other than the original inputs and the erroneous output. Their running time is softly linear in the dimension times the number of errors when there are few errors, smoothly growing to the cost of fast matrix multiplication as the number of errors increases. We also present applications to general linear system solving.
Jean-Guillaume Dumas, Clément Pernet, B. David Saunders
We present algorithms and heuristics to compute the characteristic polynomial of a matrix given its minimal polynomial. The matrix is represented as a black-box, i.e., by a function to compute its matrix-vector product. The methods apply to matrices either over the integers or over a large enough finite field. Experiments show that these methods perform efficiently in practice. Combined in an adaptive strategy, these algorithms reach significant speedups in practice for some integer matrices arising in an application from graph theory.
Morgan Barbier, Clément Pernet, Guillaume Quintin
In this paper we investigate the structure of quasi-BCH codes. In the first part of this paper we show that quasi-BCH codes can be derived from Reed-Solomon codes over square matrices extending the known relation about classical BCH and Reed-Solomon codes. This allows us to adapt the Welch-Berlekamp algorithm to quasi-BCH codes. In the second part of this paper we show that quasi-BCH codes can be seen as subcodes of interleaved Reed-Solomon codes over finite fields. This provides another approach for decoding quasi-BCH codes.
Jean-Guillaume Dumas, Clément Pernet, Ziad Sultan
The row (resp. column) rank profile of a matrix describes the staircase shape of its row (resp. column) echelon form. In an ISSAC'13 paper, we proposed a recursive Gaussian elimination that can compute simultaneously the row and column rank profiles of a matrix as well as those of all of its leading sub-matrices, in the same time as state of the art Gaussian elimination algorithms. Here we first study the conditions making a Gaus-sian elimination algorithm reveal this information. Therefore, we propose the definition of a new matrix invariant, the rank profile matrix, summarizing all information on the row and column rank profiles of all the leading sub-matrices. We also explore the conditions for a Gaussian elimination algorithm to compute all or part of this invariant, through the corresponding PLUQ decomposition. As a consequence, we show that the classical iterative CUP decomposition algorithm can actually be adapted to compute the rank profile matrix. Used, in a Crout variant, as a base-case to our ISSAC'13 implementation, it delivers a significant improvement in efficiency. Second, the row (resp. column) echelon form of a matrix are usually computed via different dedicated triangular decompositions. We show here that, from some PLUQ decompositions, it is possible to recover the row and column echelon forms of a matrix and of any of its leading sub-matrices thanks to an elementary post-processing algorithm.
Jean-Guillaume Dumas, Thierry Gautier, Clément Pernet, Ziad Sultan
We propose efficient parallel algorithms and implementations on shared memory architectures of LU factorization over a finite field. Compared to the corresponding numerical routines, we have identified three main difficulties specific to linear algebra over finite fields. First, the arithmetic complexity could be dominated by modular reductions. Therefore, it is mandatory to delay as much as possible these reductions while mixing fine-grain parallelizations of tiled iterative and recursive algorithms. Second, fast linear algebra variants, e.g., using Strassen-Winograd algorithm, never suffer from instability and can thus be widely used in cascade with the classical algorithms. There, trade-offs are to be made between size of blocks well suited to those fast variants or to load and communication balancing. Third, many applications over finite fields require the rank profile of the matrix (quite often rank deficient) rather than the solution to a linear system. It is thus important to design parallel algorithms that preserve and compute this rank profile. Moreover, as the rank profile is only discovered during the algorithm, block size has then to be dynamic. We propose and compare several block decomposition: tile iterative with left-looking, right-looking and Crout variants, slab and tile recursive. Experiments demonstrate that the tile recursive variant performs better and matches the performance of reference numerical software when no rank deficiency occur. Furthermore, even in the most heterogeneous case, namely when all pivot blocks are rank deficient, we show that it is possbile to maintain a high efficiency.
Jean-Guillaume Dumas, David Lucas, Clément Pernet
In this paper, we give novel certificates for triangular equivalence and rank profiles. These certificates enable to verify the row or column rank profiles or the whole rank profile matrix faster than recomputing them, with a negligible overall overhead. We first provide quadratic time and space non-interactive certificates saving the logarithmic factors of previously known ones. Then we propose interactive certificates for the same problems whose Monte Carlo verification complexity requires a small constant number of matrix-vector multiplications, a linear space, and a linear number of extra field operations. As an application we also give an interactive protocol, certifying the determinant of dense matrices, faster than the best previously known one.
David Lucas, Vincent Neiger, Clément Pernet, Daniel S. Roche, Johan Rosenkilde
We design and analyze new protocols to verify the correctness of various computations on matrices over the ring F[x] of univariate polynomials over a field F. For the sake of efficiency, and because many of the properties we verify are specific to matrices over a principal ideal domain, we cannot simply rely on previously-developed linear algebra protocols for matrices over a field. Our protocols are interactive, often randomized, and feature a constant number of rounds of communication between the Prover and Verifier. We seek to minimize the communication cost so that the amount of data sent during the protocol is significantly smaller than the size of the result being verified, which can be useful when combining protocols or in some multi-party settings. The main tools we use are reductions to existing linear algebra verification protocols and a new protocol to verify that a given vector is in the F[x]-row space of a given matrix.
Jean-Guillaume Dumas, Erich Kaltofen, David Lucas, Clément Pernet
In this paper, we give novel certificates for triangular equivalence and rank profiles. These certificates enable somebody to verify the row or column rank profiles or the whole rank profile matrix faster than recomputing them, with a negligible overall overhead. We first provide quadratic time and space non-interactive certificates saving the logarithmic factors of previously known ones. Then we propose interactive certificates for the same problems whose Monte Carlo verification complexity requires a small constant number of matrix-vector multiplications, a linear space, and a linear number of extra field operations , with a linear number of interactions. As an application we also give an interactive protocol, certifying the determinant or the signature of dense matrices, faster for the Prover than the best previously known one. Finally we give linear space and constant round certificates for the row or column rank profiles.
Gaspard Anthoine, Jean-Guillaume Dumas, Michael Hanling, Mélanie de Jonghe, Aude Maignan, Clément Pernet, Daniel Roche
Proofs of Retrievability (PoRs) are protocols which allow a client to store data remotely and to efficiently ensure, via audits, that the entirety of that data is still intact. A dynamic PoR system also supports efficient retrieval and update of any small portion of the data. We propose new, simple protocols for dynamic PoR that are designed for practical efficiency, trading decreased persistent storage for increased server computation, and show in fact that this tradeoff is inherent via a lower bound proof of time-space for any PoR scheme. Notably, ours is the first dynamic PoR which does not require any special encoding of the data stored on the server, meaning it can be trivially composed with any database service or with existing techniques for encryption or redundancy. Our implementation and deployment on Google Cloud Platform demonstrates our solution is scalable: for example, auditing a 1TB file takes just less than 5 minutes and costs less than $0.08 USD. We also present several further enhancements, reducing the amount of client storage, or the communication bandwidth, or allowing public verifiability, wherein any untrusted third party may conduct an audit.
Jean-Guillaume Dumas, Clément Pernet, Alexandre Sedoglavic
We present a non-commutative algorithm for the multiplication of a 2 x 2 block-matrix by its adjoint, defined by a matrix ring anti-homomorphism. This algorithm uses 5 block products (3 recursive calls and 2 general products)over C or in positive characteristic. The resulting algorithm for arbitrary dimensions is a reduction of multiplication of a matrix by its adjoint to general matrix product, improving by a constant factor previously known reductions. We prove also that there is no algorithm derived from bilinear forms using only four products and the adjoint of one of them. Second we give novel dedicated algorithms for the complex field and the quaternions to alternatively compute the multiplication taking advantage of the structure of the matrix-polynomial arithmetic involved. We then analyze the respective ranges of predominance of the two strategies. Finally we propose schedules with low memory footprint that support a fast and memory efficient practical implementation over a prime field.
Jean-Guillaume Dumas, Pascal Giorgi, Clément Pernet
In the past two decades, some major efforts have been made to reduce exact (e.g. integer, rational, polynomial) linear algebra problems to matrix multiplication in order to provide algorithms with optimal asymptotic complexity. To provide efficient implementations of such algorithms one need to be careful with the underlying arithmetic. It is well known that modular techniques such as the Chinese remainder algorithm or the p-adic lifting allow very good practical performance, especially when word size arithmetic are used. Therefore, finite field arithmetic becomes an important core for efficient exact linear algebra libraries. In this paper, we study high performance implementations of basic linear algebra routines over word size prime fields: specially the matrix multiplication; our goal being to provide an exact alternate to the numerical BLAS library. We show that this is made possible by a carefull combination of numerical computations and asymptotically faster algorithms. Our kernel has several symbolic linear algebra applications enabled by diverse matrix multiplication reductions: symbolic triangularization, system solving, determinant and matrix inverse implementations are thus studied.
Vincent Neiger, Clément Pernet, Gilles Villard
Krylov methods rely on iterated matrix-vector products $A^k u_j$ for an $n\times n$ matrix $A$ and vectors $u_1,\ldots,u_m$. The space spanned by all iterates $A^k u_j$ admits a particular basis -- the \emph{maximal Krylov basis} -- which consists of iterates of the first vector $u_1, Au_1, A^2u_1,\ldots$, until reaching linear dependency, then iterating similarly the subsequent vectors until a basis is obtained. Finding minimal polynomials and Frobenius normal forms is closely related to computing maximal Krylov bases. The fastest way to produce these bases was, until this paper, Keller-Gehrig's 1985 algorithm whose complexity bound $O(n^ω\log(n))$ comes from repeated squarings of $A$ and logarithmically many Gaussian eliminations. Here $ω>2$ is a feasible exponent for matrix multiplication over the base field. We present an algorithm computing the maximal Krylov basis in $O(n^ω\log\log(n))$ field operations when $m \in O(n)$, and even $O(n^ω)$ as soon as $m\in O(n/\log(n)^c)$ for some fixed real $c>0$. As a consequence, we show that the Frobenius normal form together with a transformation matrix can be computed deterministically in $O(n^ω(\log\log(n))^2)$, and therefore matrix exponentiation~$A^k$ can be performed in the latter complexity if $\log(k) \in O(n^{ω-1-\varepsilon})$ for some fixed $\varepsilon>0$. A key idea for these improvements is to rely on fast algorithms for $m\times m$ polynomial matrices of average degree $n/m$, involving high-order lifting and minimal kernel bases.
Jean-Guillaume Dumas, Thierry Gautier, Clément Pernet, B. David Saunders
To maximize efficiency in time and space, allocations and deallocations, in the exact linear algebra library \linbox, must always occur in the founding scope. This provides a simple lightweight allocation model. We present this model and its usage for the rebinding of matrices between different coefficient domains. We also present automatic tools to speed-up the compilation of template libraries and a software abstraction layer for the introduction of transparent parallelism at the algorithmic level.
Brice Boyer, Jean-Guillaume Dumas, Pascal Giorgi, Clément Pernet, B. David Saunders
We describe in this paper new design techniques used in the \cpp exact linear algebra library \linbox, intended to make the library safer and easier to use, while keeping it generic and efficient. First, we review the new simplified structure for containers, based on our \emph{founding scope allocation} model. We explain design choices and their impact on coding: unification of our matrix classes, clearer model for matrices and submatrices, \etc Then we present a variation of the \emph{strategy} design pattern that is comprised of a controller--plugin system: the controller (solution) chooses among plug-ins (algorithms) that always call back the controllers for subtasks. We give examples using the solution \mul. Finally we present a benchmark architecture that serves two purposes: Providing the user with easier ways to produce graphs; Creating a framework for automatically tuning the library and supporting regression testing.
Clément Pernet, Hippolyte Signargout, Pierre Karpman, Gilles Villard
New algorithms are presented for computing annihilating polynomials of Toeplitz, Hankel, and more generally Toeplitz+ Hankel-like matrices over a field. Our approach follows works on Coppersmith's block Wiedemann method with structured projections, which have been recently successfully applied for computing the bivariate resultant. A first baby-step/giant step approach -- directly derived using known techniques on structured matrices -- gives a randomized Monte Carlo algorithm for the minimal polynomial of an $n\times n$ Toeplitz or Hankel-like matrix of displacement rank $α$ using $\tilde O(n^{ω- c(ω)} α^{c(ω)})$ arithmetic operations, where $ω$ is the exponent of matrix multiplication and $c(2.373)\approx 0.523$ for the best known value of $ω$. For generic Toeplitz+Hankel-like matrices a second algorithm computes the characteristic polynomial in $\tilde O(n^{2-1/ω})$ operations when the displacement rank is considered constant. Previous algorithms required $O(n^2)$ operations while the exponents presented here are respectively less than $1.86$ and $1.58$ with the best known estimate for $ω$.
Jean-Guillaume Dumas, Aude Maignan, Clément Pernet, Daniel S. Roche
Proofs of Retrievability are protocols which allow a Client to store data remotely and to efficiently ensure, via audits, that the entirety of that data is still intact. Dynamic Proofs of Retrievability (DPoR) also support efficient retrieval and update of any small portion of the data.We propose a novel protocol for arbitrary outsourced data storage that achieves both low remote storage size and audit complexity.A key ingredient, that can be also of intrinsic interest, reduces to efficiently evaluating a secret polynomial at given public points, when the (encrypted) polynomial is stored on an untrusted Server.The Server performs the evaluations and also returns associated certificates. A Client can check that the evaluations are correct using the certificates and some pre-computed keys, more efficiently than re-evaluating the polynomial.Our protocols support two important features: the polynomial itself can be encrypted on the Server, and it can be dynamically updated by changing individual coefficients cheaply without redoing the entire setup.Our methods rely on linearly homomorphic encryption and pairings, and our implementation shows good performance for polynomial evaluations with millions of coefficients, and efficient DPoR with terabytes of data.For instance, for a 1TB database, compared to the state of art, we can reduce the Client storage by 5000x, communication size by 20x, and client-side audit time by 2x, at the cost of one order of magnitude increase in server-side audit time.