Jean-Guillaume Dumas, Erich Kaltofen
Certificates to a linear algebra computation are additional data structures for each output, which can be used by a---possibly randomized---verification algorithm that proves the correctness of each output. The certificates are essentially optimal if the time (and space) complexity of verification is essentially linear in the input size $N$, meaning $N$ times a factor $N^{o(1)}$, i.e., a factor $N^{η(N)}$ with $\lim\_{N\to \infty} η(N)$ $=$ $0$. We give algorithms that compute essentially optimal certificates for the positive semidefiniteness, Frobenius form, characteristic and minimal polynomial of an $n\times n$ dense integer matrix $A$. Our certificates can be verified in Monte-Carlo bit complexity $(n^2 \log\|A\|)^{1+o(1)}$, where $\log\|A\|$ is the bit size of the integer entries, solving an open problem in [Kaltofen, Nehring, Saunders, Proc.\ ISSAC 2011] subject to computational hardness assumptions. Second, we give algorithms that compute certificates for the rank of sparse or structured $n\times n$ matrices over an abstract field, whose Monte Carlo verification complexity is $2$ matrix-times-vector products $+$ $n^{1+o(1)}$ arithmetic operations in the field. For example, if the $n\times n$ input matrix is sparse with $n^{1+o(1)}$ non-zero entries, our rank certificate can be verified in $n^{1+o(1)}$ field operations. This extends also to integer matrices with only an extra $\|A\|^{1+o(1)}$ factor. All our certificates are based on interactive verification protocols with the interaction removed by a Fiat-Shamir identification heuristic. The validity of our verification procedure is subject to standard computational hardness assumptions from cryptography.
Jean-Guillaume Dumas, Dominique Duval, Laurent Fousse, Jean-Claude Reynaud
In this short note we study the semantics of two basic computational effects, exceptions and states, from a new point of view. In the handling of exceptions we dissociate the control from the elementary operation which recovers from the exception. In this way it becomes apparent that there is a duality, in the categorical sense, between exceptions and states.
Jean-Guillaume Dumas, Jean-Baptiste Orfila
Specific vectorial boolean functions, such as S-Boxes or APN functions have many applications, for instance in symmetric ciphers. In cryptography they must satisfy some criteria (balancedness, high nonlinearity, high algebraic degree, avalanche, or transparency) to provide best possible resistance against attacks. Functions satisfying most criteria are however difficult to find. Indeed, random generation does not work and the S-Boxes used in the AES or Camellia ciphers are actually variations around a single function, the inverse function in F_2^n. Would the latter function have an unforeseen weakness (for instance if more practical algebraic attacks are developped), it would be desirable to have some replacement candidates. For that matter, we propose to weaken a little bit the algebraic part of the design of S-Boxes and use finite semifields instead of finite fields to build such S-Boxes. Since it is not even known how many semifields there are of order 256, we propose to build S-Boxes and APN functions via semifields pseudo-extensions of the form S_{2^4}^2, where S_{2^4} is any semifield of order 16 . Then, we mimic in this structure the use of functions applied on a finite fields, such as the inverse or the cube. We report here the construction of 12781 non equivalent S-Boxes with with maximal nonlinearity, differential invariants, degrees and bit interdependency, and 2684 APN functions.
Jean-Guillaume Dumas, Thierry Gautier, Clément Pernet, Ziad Sultan
We propose efficient parallel algorithms and implementations on shared memory architectures of LU factorization over a finite field. Compared to the corresponding numerical routines, we have identified three main difficulties specific to linear algebra over finite fields. First, the arithmetic complexity could be dominated by modular reductions. Therefore, it is mandatory to delay as much as possible these reductions while mixing fine-grain parallelizations of tiled iterative and recursive algorithms. Second, fast linear algebra variants, e.g., using Strassen-Winograd algorithm, never suffer from instability and can thus be widely used in cascade with the classical algorithms. There, trade-offs are to be made between size of blocks well suited to those fast variants or to load and communication balancing. Third, many applications over finite fields require the rank profile of the matrix (quite often rank deficient) rather than the solution to a linear system. It is thus important to design parallel algorithms that preserve and compute this rank profile. Moreover, as the rank profile is only discovered during the algorithm, block size has then to be dynamic. We propose and compare several block decomposition: tile iterative with left-looking, right-looking and Crout variants, slab and tile recursive. Experiments demonstrate that the tile recursive variant performs better and matches the performance of reference numerical software when no rank deficiency occur. Furthermore, even in the most heterogeneous case, namely when all pivot blocks are rank deficient, we show that it is possbile to maintain a high efficiency.
Jean-Guillaume Dumas, David Lucas, Clément Pernet
In this paper, we give novel certificates for triangular equivalence and rank profiles. These certificates enable to verify the row or column rank profiles or the whole rank profile matrix faster than recomputing them, with a negligible overall overhead. We first provide quadratic time and space non-interactive certificates saving the logarithmic factors of previously known ones. Then we propose interactive certificates for the same problems whose Monte Carlo verification complexity requires a small constant number of matrix-vector multiplications, a linear space, and a linear number of extra field operations. As an application we also give an interactive protocol, certifying the determinant of dense matrices, faster than the best previously known one.
Jean-Guillaume Dumas, Aude Maignan, Luiza Soezima
Private Set Multi-Party Computations are protocols that allow parties to jointly and securely compute functions: apart from what is deducible from the output of the function, the input sets are kept private. Then, a Private Set Union (PSU), resp. Intersection (PSI), is a protocol that allows parties to jointly compute the union, resp. the intersection, between their private sets. Now a structured PSI, is a PSI where some structure of the sets can allow for more efficient protocols. For instance in Fuzzy PSI, elements only need to be close enough, instead of equal, to be part of the intersection. We present in this paper, Fuzzy PSU protocols (FPSU), able to efficiently take into account approximations in the union. For this, we introduce a new efficient sub-protocol, called Oblivious Key Homomorphic Encryption Retrieval (OKHER), improving on Oblivious Key-Value Retrieval (OKVR) techniques in our setting. In the fuzzy context, the receiver set $X=\{x_i\}_{1..n}$ is replaced by ${\mathcal B}_δ(X)$, the union of $n$ balls of dimension $d$ with radius $δ$, centered at the $x_i$. The sender set is just its $m$ points of dimension $d$. Then the FPSU functionality corresponds to $X \sqcup \{y \in Y, y \notin {\mathcal B}_δ(X)\}$. Thus, we formally define the FPSU functionality and security properties, and propose several protocols tuned to the patterns of the balls using the $l_\infty$ distance. Using our OKHER routine and homomorphic encryption, we are for instance able to obtain a FPSU protocols with an asymptotic communication volume bound ranging from $O(dm\log(δ{n}))$ to $O(d^2m\log(δ^2n))$, depending on the receiver data set structure.
Brice Boyer, Jean-Guillaume Dumas, Pascal Giorgi
We propose different implementations of the sparse matrix--dense vector multiplication (\spmv{}) for finite fields and rings $\Zb/m\Zb$. We take advantage of graphic card processors (GPU) and multi-core architectures. Our aim is to improve the speed of \spmv{} in the \linbox library, and henceforth the speed of its black box algorithms. Besides, we use this and a new parallelization of the sigma-basis algorithm in a parallel block Wiedemann rank implementation over finite fields.
Grégory Nuel, Jean-Guillaume Dumas
We present two novel approaches for the computation of the exact distribution of a pattern in a long sequence. Both approaches take into account the sparse structure of the problem and are two-part algorithms. The first approach relies on a partial recursion after a fast computation of the second largest eigenvalue of the transition matrix of a Markov chain embedding. The second approach uses fast Taylor expansions of an exact bivariate rational reconstruction of the distribution. We illustrate the interest of both approaches on a simple toy-example and two biological applications: the transcription factors of the Human Chromosome 5 and the PROSITE signatures of functional motifs in proteins. On these example our methods demonstrate their complementarity and their hability to extend the domain of feasibility for exact computations in pattern problems to a new level.
Jean-Guillaume Dumas, Dominique Duval, Laurent Fousse, Jean-Claude Reynaud
We define a proof system for exceptions which is close to the syntax for exceptions, in the sense that the exceptions do not appear explicitly in the type of any expression. This proof system is sound with respect to the intended denotational semantics of exceptions. With this inference system we prove several properties of exceptions.
Jannik Dreier, Jean-Guillaume Dumas, Pascal Lafourcade
Auctions have a long history, having been recorded as early as 500 B.C. Nowadays, electronic auctions have been a great success and are increasingly used. Many cryptographic protocols have been proposed to address the various security requirements of these electronic transactions, in particular to ensure privacy. Brandt developed a protocol that computes the winner using homomorphic operations on a distributed ElGamal encryption of the bids. He claimed that it ensures full privacy of the bidders, i.e. no information apart from the winner and the winning price is leaked. We first show that this protocol -- when using malleable interactive zero-knowledge proofs -- is vulnerable to attacks by dishonest bidders. Such bidders can manipulate the publicly available data in a way that allows the seller to deduce all participants' bids. Additionally we discuss some issues with verifiability as well as attacks on non-repudiation, fairness and the privacy of individual bidders exploiting authentication problems.
Jean-Guillaume Dumas, Thierry Gautier, Clément Pernet, B. David Saunders
To maximize efficiency in time and space, allocations and deallocations, in the exact linear algebra library \linbox, must always occur in the founding scope. This provides a simple lightweight allocation model. We present this model and its usage for the rebinding of matrices between different coefficient domains. We also present automatic tools to speed-up the compilation of template libraries and a software abstraction layer for the introduction of transparent parallelism at the algorithmic level.
Brice Boyer, Jean-Guillaume Dumas, Pascal Giorgi, Clément Pernet, B. David Saunders
We describe in this paper new design techniques used in the \cpp exact linear algebra library \linbox, intended to make the library safer and easier to use, while keeping it generic and efficient. First, we review the new simplified structure for containers, based on our \emph{founding scope allocation} model. We explain design choices and their impact on coding: unification of our matrix classes, clearer model for matrices and submatrices, \etc Then we present a variation of the \emph{strategy} design pattern that is comprised of a controller--plugin system: the controller (solution) chooses among plug-ins (algorithms) that always call back the controllers for subtasks. We give examples using the solution \mul. Finally we present a benchmark architecture that serves two purposes: Providing the user with easier ways to produce graphs; Creating a framework for automatically tuning the library and supporting regression testing.
Jannik Dreier, Jean-Guillaume Dumas, Pascal Lafourcade, Léo Robert
In 1968, Liu described the problem of securing documents in a shared secret project. In an example, at least six out of eleven participating scientists need to be present to open the lock securing the secret documents. Shamir proposed a mathematical solution to this physical problem in 1979, by designing an efficient $k$-out-of-$n$ secret sharing scheme based on Lagrange's interpolation. Liu and Shamir also claimed that the minimal solution using physical locks is clearly impractical and exponential in the number of participants. In this paper we relax some implicit assumptions in their claim and propose an optimal physical solution to the problem of Liu that uses physical padlocks, but the number of padlocks is not greater than the number of participants. Then, we show that no device can do better for $k$-out-of-$n$ threshold padlock systems as soon as $k\geq{\sqrt{2n}}$, which holds true in particular for Liu's example. More generally, we derive bounds required to implement any threshold system and prove a lower bound of $\mathcal{O}{\log(n)}$ padlocks for any threshold larger than $2$. For instance we propose an optimal scheme reaching that bound for $2$-out-of-$n$ threshold systems and requiring less than $2\log_2(n)$ padlocks. We also discuss more complex access structures, a wrapping technique, and other sublinear realizations like an algorithm to generate $3$-out-of-$n$ systems with $2.5\sqrt{n}$ padlocks. Finally we give an algorithm building $k$-out-of-$n$ threshold padlock systems with only $\mathcal{O}{\log(n)^{k-1}}$ padlocks. Apart from the physical world, our results also show that it is possible to implement secret sharing over small fields.
Jean-Guillaume Dumas, Aude Maignan, Clément Pernet, Daniel S. Roche
Proofs of Retrievability are protocols which allow a Client to store data remotely and to efficiently ensure, via audits, that the entirety of that data is still intact. Dynamic Proofs of Retrievability (DPoR) also support efficient retrieval and update of any small portion of the data.We propose a novel protocol for arbitrary outsourced data storage that achieves both low remote storage size and audit complexity.A key ingredient, that can be also of intrinsic interest, reduces to efficiently evaluating a secret polynomial at given public points, when the (encrypted) polynomial is stored on an untrusted Server.The Server performs the evaluations and also returns associated certificates. A Client can check that the evaluations are correct using the certificates and some pre-computed keys, more efficiently than re-evaluating the polynomial.Our protocols support two important features: the polynomial itself can be encrypted on the Server, and it can be dynamically updated by changing individual coefficients cheaply without redoing the entire setup.Our methods rely on linearly homomorphic encryption and pairings, and our implementation shows good performance for polynomial evaluations with millions of coefficients, and efficient DPoR with terabytes of data.For instance, for a 1TB database, compared to the state of art, we can reduce the Client storage by 5000x, communication size by 20x, and client-side audit time by 2x, at the cost of one order of magnitude increase in server-side audit time.
Jean-Guillaume Dumas, Dominique Duval, Jean-Claude Reynaud
A new categorical framework is provided for dealing with multiple arguments in a programming language with effects, for example in a language with imperative features. Like related frameworks (Monads, Arrows, Freyd categories), we distinguish two kinds of functions. In addition, we also distinguish two kinds of equations. Then, we are able to define a kind of product, that generalizes the usual categorical product. This yields a powerful tool for deriving many results about languages with effects.
Jean-Guillaume Dumas
We want to achieve efficiency for the exact computation of the dot product of two vectors over word-size finite fields. We therefore compare the practical behaviors of a wide range of implementation techniques using different representations. The techniques used include oating point representations, discrete logarithms, tabulations, Montgomery reduction, delayed modulus.
Jean-Guillaume Dumas, Clément Pernet, Alexandre Sedoglavic
We propose a non-commutative algorithm for multiplying 2x2 matrices using 7 coefficient products. This algorithm reaches simultaneously a better accuracy in practice compared to previously known such fast algorithms, and a time complexity bound with the best currently known leading term (obtained via alternate basis sparsification). To build this algorithm, we consider matrix and tensor norms bounds governing the stability and accuracy of numerical matrix multiplication. First, we reduce those bounds by minimizing a growth factor along the unique orbit of Strassen's 2x2-matrix multiplication tensor decomposition. Second, we develop heuristics for minimizing the number of operations required to realize a given bilinear formula, while further improving its accuracy. Third, we perform an alternate basis sparsification that improves on the time complexity constant and mostly preserves the overall accuracy.
Jean-Guillaume Dumas, Clément Pernet, Ziad Sultan
Gaussian elimination with full pivoting generates a PLUQ matrix decomposition. Depending on the strategy used in the search for pivots, the permutation matrices can reveal some information about the row or the column rank profiles of the matrix. We propose a new pivoting strategy that makes it possible to recover at the same time both row and column rank profiles of the input matrix and of any of its leading sub-matrices. We propose a rank-sensitive and quad-recursive algorithm that computes the latter PLUQ triangular decomposition of an m \times n matrix of rank r in O(mnr^{ω-2}) field operations, with ωthe exponent of matrix multiplication. Compared to the LEU decomposition by Malashonock, sharing a similar recursive structure, its time complexity is rank sensitive and has a lower leading constant. Over a word size finite field, this algorithm also improveLs the practical efficiency of previously known implementations.
Jean-Guillaume Dumas, Anna Urbanska
We present an algorithm computing the determinant of an integer matrix A. The algorithm is introspective in the sense that it uses several distinct algorithms that run in a concurrent manner. During the course of the algorithm partial results coming from distinct methods can be combined. Then, depending on the current running time of each method, the algorithm can emphasize a particular variant. With the use of very fast modular routines for linear algebra, our implementation is an order of magnitude faster than other existing implementations. Moreover, we prove that the expected complexity of our algorithm is only O(n^3 log^{2.5}(n ||A||)) bit operations in the dense case and O(Omega n^{1.5} log^2(n ||A||) + n^{2.5}log^3(n||A||)) in the sparse case, where ||A|| is the largest entry in absolute value of the matrix and Omega is the cost of matrix-vector multiplication in the case of a sparse matrix.
Jean-Guillaume Dumas, Clément Pernet, Alexandre Sedoglavic
The quest for non-commutative matrix multiplication algorithms in small dimensions has seen a lot of recent improvements recently. In particular, the number of scalar multiplications required to multiply two $4\times4$ matrices was first reduced in \cite{Fawzi:2022aa} from 49 (two recursion levels of Strassen's algorithm) to 47 but only in characteristic 2 or more recently to 48 in \cite{alphaevolve} but over complex numbers. We propose an algorithm in 48 multiplications with only rational coefficients, hence removing the complex number requirement. It was derived from the latter one, under the action of an isotropy which happen to project the algorithm on the field of rational numbers. We also produce a straight line program of this algorithm, reducing the leading constant in the complexity, as well as an alternative basis variant of it, leading to an algorithm running in $7 n^{2+\frac{\log_2 3}{2}} +o\left(n^{2+\frac{log_2 3}{2}}\right)$ operations over any ring containing an inverse of 2.