cs.DS — arXiv2

Showing 1–20 of 28,508 results

Characterizing Streaming Decidability of CSPs via Non-Redundancy

Amatya Sharma, Santhoshini Velusamy

Apr 23, 2026·cs.DS·PDF

We study the single-pass streaming complexity of deciding satisfiability of Constraint Satisfaction Problems (CSPs). A CSP is specified by a constraint language $Γ$, that is, a finite set of $k$-ary relations over the domain $[q] = \{0, \dots, q-1\}$. An instance of $\mathsf{CSP}(Γ)$ consists of $m$ constraints over $n$ variables $x_1, \ldots, x_n$ taking values in $[q]$. Each constraint $C_i$ is of the form $\{R_i,(x_{i_1} + λ_{i_1}, \ldots, x_{i_k} + λ_{i_k})\}$, where $R_i \in Γ$ and $λ_{i_1}, \ldots, λ_{i_k} \in [q]$ are constants; it is satisfied if and only if $(x_{i_1} + λ_{i_1}, \ldots, x_{i_k} + λ_{i_k}) \in R_i$, where addition is modulo $q$. In the streaming model, constraints arrive one by one, and the goal is to determine, using minimum memory, whether there exists an assignment satisfying all constraints. For $k$-SAT, Vu (TCS 2024) proves an optimal $Ω(n^k)$ space lower bound, while for general CSPs, Chou, Golovnev, Sudan, and Velusamy (JACM 2024) establish an $Ω(n)$ lower bound; a complete characterization has remained open. We close this gap by showing that the single-pass streaming space complexity of $\mathsf{CSP}(Γ)$ is precisely governed by its non-redundancy, a structural parameter introduced by Bessiere, Carbonnel, and Katsirelos (AAAI 2020). The non-redundancy $\mathsf{NRD}_n(Γ)$ is the maximum number of constraints over $n$ variables such that every constraint $C$ is non-redundant, i.e., there exists an assignment satisfying all constraints except $C$. We prove that the single-pass streaming complexity of $\mathsf{CSP}(Γ)$ is characterized, up to a logarithmic factor, by $\mathsf{NRD}_n(Γ)$.

A simple $(2+ε)$-approximation for knapsack interdiction

Noah Weninger

Apr 23, 2026·cs.DS·PDF

In the knapsack interdiction problem, there are $n$ items, each with a non-negative profit, interdiction cost, and packing weight. There is also an interdiction budget and a capacity. The objective is to select a set of items to interdict (delete) subject to the budget which minimizes the maximum profit attainable by packing the remaining items subject to the capacity. We present a $(2+ε)$-approximation running in $O(n^3ε^{-1}\log(ε^{-1}\log\sum_i p_i))$ time. Although a polynomial-time approximation scheme (PTAS) is already known for this problem, our algorithm is considerably simpler and faster. The approach also generalizes naturally to a $(1+t+ε)$-approximation for $t$-dimensional knapsack interdiction with running time $O(n^{t+2}ε^{-1}\log(ε^{-1}\log\sum_i p_i))$.

Sampling from the Hardcore Model on Random Regular Bipartite Graphs above the Uniqueness Threshold

Nicholas Kocurek, Shayan Oveis Gharan, Dante Tjowasi

Apr 23, 2026·cs.DS·PDF

We design an efficient sampling algorithm to generate samples from the hardcore model on random regular bipartite graphs as long as $λ\lesssim \frac{1}{\sqrtΔ}$, where $Δ$ is the degree. Combined with recent work of Jenssen, Keevash and Perkins this implies an FPRAS for the partition function of the hardcore model on random regular bipartite graphs at any fugacity. Our algorithm is shown by analyzing two new Markov chains that work in complementary regimes. Our proof then proceeds by showing the corresponding simplicial complexes are top-link spectral expanders and appealing to the trickle-down theorem to prove fast mixing.

Kernelization Bounds for Constrained Coloring

Ishay Haviv

Apr 23, 2026·cs.CC·PDF

We study the kernel complexity of constraint satisfaction problems over a finite domain, parameterized by the number of variables, whose constraint language consists of two relations: the non-equality relation and an additional permutation-invariant relation $R$. We establish a conditional lower bound on the kernel size in terms of the largest arity of an OR relation definable from $R$. Building on this, we investigate the kernel complexity of uniformly rainbow free coloring problems. In these problems, for fixed positive integers $d$, $\ell$, and $q \geq d$, we are given a graph $G$ on $n$ vertices and a collection $\cal F$ of $\ell$-tuples of $d$-subsets of its vertex set, and the goal is to decide whether there exists a proper coloring of $G$ with $q$ colors such that no $\ell$-tuple in $\cal F$ is uniformly rainbow, that is, no tuple has all its sets colored with the same $d$ distinct colors. We determine, for all admissible values of $d$, $\ell$, and $q$, the infimum over all values $η$ for which the problem admits a kernel of size $O(n^η)$, under the assumption $\mathsf{NP} \nsubseteq \mathsf{coNP/poly}$. As applications, we obtain nearly tight bounds on the kernel complexity of various coloring problems under diverse settings and parameterizations. This includes graph coloring problems parameterized by the vertex-deletion distance to a disjoint union of cliques, resolving a question of Schalken (2020), as well as uniform hypergraph coloring problems parameterized by the number of vertices, extending results of Jansen and Pieterse (2019) and Beukers (2021).

Efficient generation of expected-degree graphs via edge-arrivals

Gianlorenzo D'Angelo, Riccardo Michielan

Apr 23, 2026·cs.DS·PDF

We study the efficient generation of random graphs with a prescribed expected degree sequence, focusing on rank-1 inhomogeneous models in which vertices are assigned weights and edges are drawn independently with probabilities proportional to the product of endpoint weights. We adopt a temporal viewpoint, adding edges to the graph one at a time up to a fixed time horizon, and allowing for self-loops or duplicate edges in the first stage. Then, the simple projection of the resulting multigraph recovers exactly the simple Norros--Reittu random graph, whose expected degrees match the prescribed targets under mild conditions. Building on this representation, we develop an exact generator based on \textit{edge-arrivals} for expected-degree random graphs with running time $O(n+m)$, where $m$ is the number of generated edges, and hence proportional to the output size. This removes the typical vertex sorting used by widely-used fast generator algorithms based on \textit{edge-skipping} for rank-1 expected-degree models, which leads to a total running time of $O(n \log n + m)$. In addition, our algorithm is simpler than those in the literature, easy to implement, and very flexible, thus opening up to extensions to directed and temporal random graphs, generalization to higher-order structures, and improvements through parallelization.

Graph Neural Network-Informed Predictive Flows for Faster Ford-Fulkerson and PAC-Learnability

Eleanor Wiesler, Trace Baxley

Apr 23, 2026·cs.LG·PDF

We propose a learning-augmented framework for accelerating max-flow computation and image segmentation by integrating Graph Neural Networks (GNNs) with the Ford-Fulkerson algorithm. Rather than predicting initial flows, our method learns edge importance probabilities to guide augmenting path selection. We introduce a Message Passing GNN (MPGNN) that jointly learns node and edge embeddings through coupled updates, capturing both global structure and local flow dynamics such as residual capacity and bottlenecks. Given an input image, we propose a method to construct a grid-based flow network with source and sink nodes, extract features, and perform a single GNN inference to assign edge probabilities reflecting their likelihood of belonging to high-capacity cuts. These probabilities are stored in a priority queue and used to guide a modified Ford-Fulkerson procedure, prioritizing augmenting paths via an Edmonds-Karp-style search with bottleneck-aware tie-breaking. This avoids repeated inference over residual graphs while leveraging learned structure throughout optimization. We further introduce a bidirectional path construction strategy centered on high-probability edges and provide a theoretical framework relating prediction quality to efficiency via a weighted permutation distance metric. Our method preserves max-flow/min-cut optimality while reducing the number of augmentations in practice. We also outline a hybrid extension combining flow warm-starting with edge-priority prediction, establishing a foundation for learning-guided combinatorial optimization in image segmentation.

On Time-Memory Tradeoffs for Maximal Palindromes with Wildcards and $k$-Mismatches

Amihood Amir, Ayelet Butman, Michael Itzhaki, Dina Sokol

Apr 22, 2026·cs.DS·PDF

This paper addresses the problem of identifying palindromic factors in texts that include wildcards -- special characters that match all others. These symbols challenge many classical algorithms, as numerous combinatorial properties are not satisfied in their presence. We apply existing wildcard-LCE techniques to obtain a continuous time-memory tradeoff, and present the first non-trivial linear-space algorithm for computing all maximal palindromes with wildcards, improving the best known time-memory product in certain parameter ranges. Our main results are algorithms to find and approximate all maximal palindromes in a given text. We also generalize both methods to the $k$-mismatches setting, with or without wildcards.

A rigorous quasipolynomial-time classical algorithm for SYK thermal expectations

Alexander Zlokapa

Apr 22, 2026·quant-ph·PDF

Estimating local observables in Gibbs states is a central problem in quantum simulation. While this task is BQP-complete at asymptotically low temperatures, the possibility of quantum advantage at constant temperature remains open. The Sachdev-Ye-Kitaev (SYK) model is a natural candidate: at any constant temperature, its Gibbs states have polynomial quantum circuit complexity and are not described by Gaussian states. Rigorous analyses of the SYK model are difficult due to the failure of known techniques using random matrix theory, cluster expansions, and rigorous formulations of the quantum path integral and replica trick. Despite this, we give a rigorous proof of a quasipolynomial-time classical algorithm that estimates SYK local thermal expectations at sufficiently high constant temperature. Our result introduces a new Wick-pair cluster expansion that we expect to be broadly useful for disordered quantum many-body systems.

Dynamic Construction of the Lovász Local Lemma

Bernhard Haeupler, Slobodan Mitrović, Srikkanth Ramachandran, Wen-Horng Sheu, Robert Tarjan

Apr 22, 2026·cs.DS·PDF

This paper proves that a wide class of local search algorithms extend as is to the fully dynamic setting with an adaptive adversary, achieving an amortized $\tilde{O}(1)$ number of local-search steps per update. A breakthrough by Moser (2009) introduced the witness-tree and entropy compression techniques for analyzing local resampling processes for the Lovász Local Lemma. These methods have since been generalized and expanded to analyze a wide variety of local search algorithms that can efficiently find solutions to many important local constraint satisfaction problems. These algorithms either extend a partial valid assignment and backtrack by unassigning variables when constraints become violated, or they iteratively fix violated constraints by resampling their variables. These local resampling or backtracking procedures are incredibly flexible, practical, and simple to specify and implement. Yet, they can be shown to be extremely efficient on static instances, typically performing only (sub)-linear number of fixing steps. The main technical challenge lies in proving conditions that guarantee such rapid convergence. This paper extends these convergence results to fully dynamic settings, where an adaptive adversary may add or remove constraints. We prove that applying the same simple local search procedures to fix old or newly introduced violations leads to a total number of resampling steps near-linear in the number of adversarial updates. Our result is very general and yields several immediate corollaries. For example, letting $Δ$ denote the maximum degree, for a constant $ε$ and $Δ= \text{poly}(\log n)$, we can maintain a $(1+ε) Δ$-edge coloring in $\text{poly}(\log n)$ amortized update time against an adaptive adversary. The prior work for this regime has exponential running time in $\sqrt{\log n}$ [Christiansen, SODA '26].

Formal Primal-Dual Algorithm Analysis

Mohammad Abdulaziz, Thomas Ammer

Apr 22, 2026·cs.LO·PDF

We present an ongoing effort to build a framework and a library in Isabelle/HOL for formalising primal-dual arguments for the analysis of algorithms. We discuss a number of example formalisations from the theory of matching algorithms, covering classical algorithms like the Hungarian Method, widely considered the first primal-dual algorithm, and modern algorithms like the Adwords algorithm, which models the assignment of search queries to advertisers in the context of search engines.

Designing Approximate Binary Trees for Trees

Leon Kellerhals, Mitja Krebs, André Nichterlein, Stefan Schmid

Apr 22, 2026·cs.DS·PDF

We study the following problem that is motivated by demand-aware network design: Given a tree~$G$, the task is to find a binary tree~$H$ on the same vertex set. The objective is to minimize the sum of distances in~$H$ between vertex pairs that are adjacent in~$G$. We present a linear-time factor-4 approximation for this problem.

Fully Dynamic Algorithms for Coloring Triangle-Free Graphs

Sepehr Assadi, Helia Yazdanyar

Apr 22, 2026·cs.DS·PDF

A celebrated result of Johansson in graph theory states that every triangle-free graph of maximum degree $Δ$ can be properly colored with $O(Δ/\lnΔ)$ colors, improving upon the "greedy bound" of $Δ+1$ coloring in general graphs. This coloring can also be found in polynomial time. We present an algorithm for maintaining an $O(Δ/\lnΔ)$ coloring of a dynamically changing triangle-free graph that undergoes edge insertions and deletions. The algorithm is randomized and on $n$-vertex graphs has amortized update time of $Δ^{o(1)}\log{n}$ per update with high probability, even against an adaptive adversary. A key to the analysis of our algorithm is an application of the entropy compression method that to our knowledge is new in the context of dynamic algorithms. This technique appears general and is likely to find other applications in dynamic problems and thus can be of its own independent interest.

A weighted angle distance on strings

Grant Molnar

Apr 22, 2026·math.MG·PDF

We define a multi-scale metric $d_ρ$ on strings by aggregating angle distances between all $n$-gram count vectors with exponential weights $ρ^n$. We benchmark $d_ρ$ in DBSCAN clustering against edit and $n$-gram baselines, give a linear-time suffix-tree algorithm for evaluation, prove metric and stability properties (including robustness under tandem-repeat stutters), and characterize isometries.

Cluster Vertex Deletion on Chordal Graphs

Yixin Cao, Peng Li

Apr 22, 2026·cs.DS·PDF

We present a polynomial-time algorithm for the cluster vertex deletion problem on chordal graphs, resolving an open question posed in different contexts by Cao et al. [Theoretical Computer Science, 2018], Aprile et al. [Mathematical Programming, 2023], Chakraborty et al. [Discrete Applied Mathematics, 2024], and Hsieh et al. [Algorithmica, 2024]. We use dynamic programming over clique trees and reduce the computation of the optimal subproblem value to the minimization of a submodular set function.

Nearly Optimal Bounds for Computing Decision Tree Splits in Data Streams

Hoang Ta, Hoa T. Vu

Apr 22, 2026·cs.DS·PDF

We establish nearly optimal upper and lower bounds for approximating decision tree splits in data streams. For regression with labels in the range $\{0,1,\ldots,M\}$, we give a one-pass algorithm using $\tilde{O}(M^2/ε)$ space that outputs a split within additive $ε$ error of the optimal split, improving upon the two-pass algorithm of Pham et al. (ISIT 2025). Furthermore, we provide a matching one-pass lower bound showing that $Ω(M^2/ε)$ space is indeed necessary. For classification, we also obtain a one-pass algorithm using $\tilde{O}(1/ε)$ space for approximating the optimal Gini split, improving upon the previous $\tilde{O}(1/ε^2)$-space algorithm. We complement these results with matching space lower bounds: $Ω(1/ε)$ for Gini impurity and $Ω(1/ε)$ for misclassification (which matches the upper bound obtained by sampling). Our algorithms exploit the Lipschitz property of the loss functions and use reservoir sampling along with Count--Min sketches with range queries. Our lower bounds follow from careful reductions from the INDEX problem.

Blossom VI: A Practical Minimum Weight Perfect Matching Algorithm

Pavel Arkhipov, Vladimir Kolmogorov

Apr 22, 2026·cs.DS·PDF

We implement an algorithm for solving the minimum weight perfect matching problem. Our code significantly outperforms the current state-of-the-art Blossom V algorithm on those families of instances where Blossom V takes superlinear time. In practice, our implementation shows almost-linear runtime on every family of instances on which we have tested it. Our algorithm relies on solving the maximum-cardinality unweighted matching problems during its primal phase. Following the state-of-the-art cherry blossom algorithm, we use cherry trees instead of traditional alternating trees and cherry blossoms instead of traditional blossoms. We shrink cherry blossoms rather than traditional blossoms into supernodes. This strategy allows us to deal with much shallower supernodes.

Greedy Routing in a Sequentially Grown One-Dimensional Random Graph

Alexander Ponomarenko

Apr 21, 2026·math.CO·PDF

We analyze greedy routing in a random graph G_n constructed on the vertex set V = {1, 2, ..., n} embedded in Z. Vertices are inserted according to a uniform random permutation pi, and each newly inserted vertex connects to its nearest already-inserted neighbors on the left and right (if they exist). This work addresses a conjecture originating from empirical studies (Ponomarenko et al., 2011; Malkov et al., 2012), which observed through simulations that greedy search in sequentially grown graphs exhibits logarithmic routing complexity across various dimensions. While the original claim was based on experiments and geometric intuition, a rigorous mathematical foundation remained open. Here, we formalize and resolve this conjecture for the one-dimensional case. For a greedy walk GW starting at vertex 1 targeting vertex n -- which at each step moves to the neighbor closest to n -- we prove that the number of steps S_n required to reach n satisfies S_n = Theta(log n) with high probability. Precisely, S_n = L_n + R_n - 2, where L_n and R_n are the numbers of left-to-right and right-to-left minima in the insertion-time permutation. Consequently, E[S_n] = 2H_n - 2 ~ 2 log n and P(S_n >= (2+c) log n) <= n^(-h(c/2) + o(1)) for any constant c > 0, with an analogous lower tail bound for 0 < c < 2, where h(u) = (1+u) ln(1+u) - u is the Bennett rate function. Furthermore, we establish that this logarithmic scaling is robust: for arbitrary or uniformly random start-target pairs, the expected routing complexity remains E[S_{s,t}] = 2 log n + O(1), closely mirroring decentralized routing scenarios in real-world networks where endpoints are chosen dynamically rather than fixed a priori.

Suffix Random Access via Function Inversion: A Key for Asymmetric Streaming String Algorithms

Panagiotis Charalampopoulos, Taha El Ghazi, Jonas Ellert, Paweł Gawrychowski, Tatiana Starikovskaya

Apr 21, 2026·cs.DS·PDF

Many string processing problems can be phrased in the streaming setting, where the input arrives symbol by symbol and we have sublinear working space. The area of streaming algorithms for string processing has flourished since the seminal work of Porat and Porat [FOCS 2009]. Unfortunately, problems with efficient solutions in the classical setting often do not admit efficient solutions in the streaming setting. As a bridge between these two settings, Saks and Seshadhri [SODA 2013] introduced the asymmetric streaming model. Here, one is given read-only access to a (typically short) reference string $R$ of length $m$, while a text $T$ arrives as a stream. We provide a generic technique to reduce fundamental string problems in the asymmetric streaming model to the online read-only model, lifting several existing algorithms and generally improving upon the state of the art. Most notably, we obtain asymmetric streaming algorithms for exact and approximate pattern matching (under both the Hamming and edit distances), and for relative Lempel-Ziv compression. At the heart of our approach lies a novel tool that facilitates efficient computation in the asymmetric streaming model: the suffix random access data structure. In its simplest variant, it maintains constant-time random access to the longest suffix of (the seen prefix of) $T$ that occurs in $R$. We show a bidirectional reduction between suffix random access and function inversion, a central problem in cryptography. On the way to our upper bound, we propose a variant of the string synchronizing sets ([Kempa and Kociumaka; STOC 2019]) with a local sparsity condition that, as we show, admits an efficient streaming construction algorithm. We believe that our framework and techniques will find broad applications in the development of small-space string algorithms.

Effective Traveling for Metric Instances of the Traveling Thief Problem

Jan Eube, Kelin Luo, Aneta Neumann, Frank Neumann, Heiko Röglin

Apr 21, 2026·cs.DS·PDF

The Traveling Thief Problem (TTP) is a multi-component optimization problem that captures the interplay between routing and packing decisions by combining the classical Traveling Salesperson Problem (TSP) and the Knapsack Problem (KP). The TTP has gained significant attention in the evolutionary computation literature and a wide range of approaches have been developed over the last 10 years. Judging the performance of these algorithms in particular in terms of how close the get to optimal solutions is a very challenging task as effective exact methods are not available due to the highly challenging traveling component. In this paper, we study the tour-optimization component of TTP under a fixed packing plan. We formulate this task as a weighted variant of the TSP, where travel costs depend on the cumulative weight of collected items, and investigate how different distance metrics and cost functions affect computational complexity. We present an $(O(n^2))$-time dynamic programming algorithm for the path metric with general cost functions, prove that the problem is NP-hard even on a star metric, and develop constant-factor approximation algorithms for star metrics. Finally, we also develop an approximation algorithm for the problem under a general metric with a linear cost function. We complement our theoretical results with experimental evaluations on standard TTP instances adjusted to a path metric. Our experimental results demonstrate the practical effectiveness of our approaches by comparing it to solutions produced by popular iterative search algorithms. The results show that our methods are able to significantly improve the quality of solutions for some benchmark instances by optimizing the traveling part while pointing out the optimality of the travel component for other solutions obtained by iterative search methods.

Moderately beyond clique-width: reduced component max-leaf and related parameters

Édouard Bonnet, Yeonsu Chang, Julien Duron, Colin Geniet, O-joung Kwon

Apr 21, 2026·cs.DS·PDF

Reduced parameters [BKW, JCTB '26; BKRT, SODA '22] are defined via contraction sequences. Based on this framework, we introduce the reduced component max-leaf, denoted by $\operatorname{cml}^\downarrow$, where component max-leaf is the maximum number of leaves in any spanning tree of any connected component. Reduced component max-leaf is strictly sandwiched between clique-width and reduced bandwidth, it is bounded in unit interval graphs, and unbounded in planar graphs. We design polynomial-time algorithms for problems such as \textsc{Maximum Induced $d$-Regular Subgraph} and \textsc{Induced Disjoint Paths} in graphs given with a contraction sequence witnessing low $\operatorname{cml}^\downarrow$, unifying and extending tractability results for classes of bounded clique-width and unit interval graphs. We get the following collapses in sparse classes of bounded $\operatorname{cml}^\downarrow$: bounded maximum degree implies bounded treewidth, whereas $K_{t,t}$-subgraph-freeness implies strongly sublinear treewidth; we show the latter, more generally, for classes of bounded reduced cutwidth. We establish the former result by showing that graphs with bounded $\operatorname{cml}^\downarrow$ admit balanced separators dominated by a bounded number of vertices. We then showcase an application of the reduced parameters to establishing non-transducibility results. We prove that for most reduced parameters $p^\downarrow$ (including reduced bandwidth), the family of classes of bounded $p^\downarrow$ is closed under first-order transductions. We then answer a question of [BKW '26] by showing that the 3-dimensional grids have unbounded reduced bandwidth. As the class of planar graphs (or any class of bounded genus) has bounded reduced bandwidth [BKW '26], this reproves a recent result [GPP, LICS '25] that planar graphs do not first-order transduce the 3-dimensional grids.

cs.DS — arXiv2

Showing 1–20 of 28,508 results