Xiaofeng Zhou, Guangyu Hu, Hongce Zhang, Wei Zhang
The IC3 algorithm represents the state-of-the-art (SOTA) hardware model checking technique, owing to its robust performance and scalability. A significant body of research has focused on enhancing the solving efficiency of the IC3 algorithm, with particular attention to the inductive generalization process: a critical phase wherein the algorithm seeks to generalize a counterexample to inductiveness (CTI), which typically is a state leading to a bad state, into a broader set of states. This inductive generalization is a primary source of clauses in IC3 and thus plays a pivotal role in determining the overall effectiveness of the algorithm. Despite its importance, existing approaches often rely on fixed inductive generalization strategies, overlooking the dynamic and context-sensitive nature of the verification environment in which spurious counterexamples arise. This rigidity can limit the quality of generated clauses and, consequently, the performance of IC3. To address this limitation, we propose a lightweight machine-learning-based framework that dynamically selects appropriate inductive generalization strategies in response to the evolving verification context. Specifically, we employ a multi-armed bandit (MAB) algorithm to adaptively choose inductive generalization strategies based on real-time feedback from the verification process. The agent is updated by evaluating the quality of generalization outcomes, thereby refining its strategy selection over time. Empirical evaluation on a benchmark suite comprising 914 instances, primarily drawn from the latest HWMCC collection, demonstrates the efficacy of our approach. When implemented on the state-of-the-art model checker rIC3, our method solves 26 to 50 more cases than the baselines and improves the PAR-2 score by 194.72 to 389.29.
Meghyn Bienvenu, Camille Bourgaux, Robin Jean, Giuseppe Mazzotta
We explore the use of answer set programming (ASP) and its extension with quantifiers, ASP(Q), for inconsistency-tolerant querying of prioritized data, where a priority relation between conflicting facts is exploited to define three notions of optimal repairs (Pareto-, globally- and completion-optimal). We consider the variants of three well-known semantics (AR, brave and IAR) that use these optimal repairs, and for which query answering is in the first or second level of the polynomial hierarchy for a large class of logical theories. Notably, this paper presents the first implementation of globally-optimal repair-based semantics, as well as the first implementation of the grounded semantics, which is a tractable under-approximation of all these optimal repair-based semantics. Our experimental evaluation sheds light on the feasibility of computing answers under globally-optimal repair semantics and the impact of adopting different semantics, approximations, and encodings.
Marcos Cramer, Tom Friese
ASPIC-style structured argumentation frameworks provide a formal basis for reasoning in artificial intelligence by combining internal argument structure with abstract argumentation semantics. A key challenge in these frameworks is ensuring compliance with five critical rationality postulates: closure, direct consistency, indirect consistency, non-interference, and crash-resistance. Recent approaches, including ASPIC$^{\ominus}$ and Deductive ASPIC$-$, have made significant progress but fall short of meeting all postulates simultaneously under a credulous semantics (e.g. preferred) in the presence of undercuts. This paper introduces Deductive ASPIC$^{\ominus}$, a novel framework that integrates gen-rebuttals from ASPIC$^{\ominus}$ with the Joint Support Bipolar Argumentation Frameworks (JSBAFs) of Deductive ASPIC$-$, incorporating preferences. We show that Deductive ASPIC$^{\ominus}$ satisfies all five rationality postulates under a version of preferred semantics. This work opens new avenues for further research on robust and logically sound structured argumentation systems.
Takao Inoué
This paper develops a refined version of TAPO-description logic for the analysis of information behavior. The framework is treated not as a single homogeneous object logic, but as a layered formalism consisting of a static descriptive layer (TBox/ABox), a procedural layer (PBox), and an oracle-sensitive layer (OBox). To make this architecture mathematically explicit, we introduce a metalevel guard-judgment layer governing procedural branching and iteration. On this basis we formulate a core inference system for TAPO-description logic, covering static TBox/ABox reasoning, guarded procedural transition in the PBox, and validated external import in the OBox. We then give a categorical semantics for the resulting framework and indicate its sheaf-theoretic refinement. The theory is illustrated by examples of information-seeking behavior, including simple search behavior and review-sensitive ordering behavior in a curry restaurant. The aim is to treat not only static knowledge representation but also hesitation, external consultation, and action-guiding update within a unified logical setting.
Ömer Şakar, Soham Chakraborty, Marieke Huisman, Anton Wijs
Concurrent programming under weak memory concurrency faces substantial challenges to ensure correctness due to program behaviors that cannot be explained by thread interleaving, a.k.a. sequential consistency. While several program logics are proposed to reason about weak memory concurrency, their usage has been limited to intricate manual proofs. On the other hand, the VerCors verifier provides a rich toolset for automated deductive verification for sequential consistency. In this paper, we bridge this gap for automated deductive verification of weak memory concurrent programs with the VerCors deductive verification tool. We propose an approach to encode weak memory concurrency in VerCors. We develop VerCors-relaxed, where we extend the VerCors atomics support and bring concepts from several protocol automata to encode permission-based separation logics for weak memory concurrency models. To demonstrate the effectiveness of our approach, we encode the relaxed fragment of the SLR program logic, a recent state-of-the-art permission-based separation logic for weak memory concurrency in VerCors-relaxed, our extension of VerCors. We use the SLR encoding on VerCors-relaxed to automatically verify several examples from the literature within realistic performance.
Mohammad Abdulaziz, Thomas Ammer
We present an ongoing effort to build a framework and a library in Isabelle/HOL for formalising primal-dual arguments for the analysis of algorithms. We discuss a number of example formalisations from the theory of matching algorithms, covering classical algorithms like the Hungarian Method, widely considered the first primal-dual algorithm, and modern algorithms like the Adwords algorithm, which models the assignment of search queries to advertisers in the context of search engines.
Shqiponja Ahmetaj, Iovka Boneva, Jan Hidders, Maxime Jakubowski, Jose-Emilio Labra-Gayo, Wim Martens, Fabio Mogavero, Filip Murlak, Cem Okulmus, Ognjen Savković, Mantas Šimkus, Dominik Tomaszuk
As schema languages for RDF data become more mature, we are seeing efforts to extend them with recursive semantics, applying diverse ideas from logic programming and description logics. While ShEx has an official recursive semantics based on greatest fixpoints (GFP), the discussion for SHACL is ongoing and seems to be converging towards least fixpoints (LFP). A practical study we perform shows that, indeed, ShEx validators implement GFP, whereas SHACL validators are more heterogeneous. This situation creates tension between ShEx and SHACL, as their semantic commitments appear to diverge, potentially undermining interoperability and predictability. We aim to clarify this design space by comparing the main semantic options in a principled yet accessible way, hoping to engage both theoreticians and practitioners, especially those involved in developing tools and standards. We present a unifying formal semantics that treats LFP, GFP, and supported model semantics (SMS), clarifying their relationships and highlighting a duality between LFP and GFP on stratified fragments. Next, we investigate to which extent the directions taken by SHACL and ShEx are compatible. We show that, although ShEx and SHACL seem to be going in different directions, they include large fragments with identical expressive power. Moreover, there is a strong correspondence between these fragments through the aforementioned principle of duality. Finally, we present a complete picture of the data and combined complexity of ShEx and SHACL validation under LFP, GFP, and SMS, showing that SMS comes at a higher computational cost under standard complexity-theoretic assumptions.
Naoki Nishida
It has been shown that, regarding a terminating right-linear overlay term rewrite system (TRS), any rewrite sequence ending with a normal form can be simulated by the innermost reduction. In this paper, using this simulation property, we show that for a right-linear overlay TRS, there is no infinite minimal dependency-pair chain if and only if there is no infinite innermost minimal dependency-pair chain. This implies that a right-linear overlay TRS is terminating if and only if it is innermost terminating.
Matthew Collinson
We display a family of Stone-type dualities linking categories of frames carrying pairs of modal operators to categories of spaces carrying a binary relation. Different notions of morphism used on the relational side lead to significant variations in the point construction. We show how the situation simplifies in the case of semicontinuous relations, allowing for straightforward correspondences between modal axioms and relational properties.
Sylvie Boldo, François Clément, Vincent Martin, Micaela Mayero, Houda Mouhcine
Formalization of mathematics is a major topic, that includes in particular numerical analysis, towards proofs of scientific computing programs. The present study is about the finite element method, a popular method to numerically solve partial differential equations. In the long-term goal of proving its correctness, we focus here on the formal definition of what is a finite element. Mathematically, a finite element describes what happens in a cell of a mesh. It notably includes the geometry of the cell, the polynomial approximation space, and a finite set of linear forms that computationally characterizes the polynomials. Formally, we design a finite element as a record in the Rocq proof assistant with both values (such as the vertices of the cell) and proofs of validity (such as the dimension of the approximation space). The decisive validity proof is unisolvence, that makes the previous characterization unique. We then instantiate this record with the most popular and useful, the simplicial Lagrange finite elements for evenly distributed nodes, for any dimension and any polynomial degree, including the difficult unisolvence proof. These proofs require many results (definitions, lemmas, canonical structures) about finite families, affine spaces, multivariate polynomials, in the context of finite or infinite-dimensional spaces.
Arend Rensink
One of the advantages of LTL over CTL is that the notion of a counterexample is easy to grasp, visualise and process: it is a trace that violates the property at hand. In this paper we propose a notion of evidence for CTL properties on explicit-state models -- which equally serves as witness for satisfied properties and counterexample for violated ones -- and how to visualise it, with the main aim of (human) comprehension. The main contribution consists of a formal model of evidence, a characterisation of minimal evidence per temporal operator, and a concrete, implemented proposal for its visualisation. This is the extended version of a paper published in SPIN 2026, containing the proofs of all results.
Zhengyu Li, Curtis Bright, Stefan Trandafir, Adán Cabello, Vijay Ganesh
We present a search for small Kochen-Specker (KS) sets in dimension 3, specifically targeting extensions of the 13-ray Yu-Oh set, which has been proven to be the minimal witness to state-independent contextuality. To enable this search, we introduce a novel SAT-based orderly generation framework integrating recursive canonical labeling (RCL) with the graph isomorphism tool NAUTY. We demonstrate that previous SAT approaches relying on lexicographical canonicity suffer from exponential scaling on canonical graphs. This limitation renders them intractable on the large instances (25 to 33 vertices) encountered in our search, whereas our RCL check maintains consistent millisecond-level performance, effectively eliminating the bottleneck. Overcoming this bottleneck allows us to perform the first exhaustive enumeration of all KS sets with up to 33 rays containing the complete 25-ray state-independent contextuality (SI-C) set obtained by rigid extensions of the Yu-Oh set in 1,641 CPU hours. We found and verified that the 33-ray set discovered by Schütte is the smallest three-dimensional KS set containing the complete 25-ray SI-C set. All non-existence results are backed by independently verifiable proof certificates via an extension of the DRAT proof format.
Mateo Sanabria, Carlos Varela, Camilo Rocha, Nicolas Cardozo
In the rewriting logic framework, equational-based specifications are used to define deterministic functional behavior, abstract data types, and canonical representations of data. These specifications include a (possibly order-sorted) signature and equations interpreted modulo structural axioms, such as associativity, commutativity, and identity. While equational rewriting provides a powerful basis for execution and symbolic reasoning, it does not by itself offer native support for inductive or deductive reasoning. This paper presents maude2athena, a framework that systematically translates Maude's equational theories into Athena, a theorem proving language designed to support natural deduction proofs over many-sorted first-order logic specifications, including inductive reasoning, equational chaining, case-based reasoning, and proofs by contradiction. The translation supports induction-based reasoning modulo structural axioms with parametric induction rules; it faithfully encodes membership equational logic in a many-sorted setting without exponential blowup under reasonable conditions. This approach preserves the semantics of the original specification, while ensuring that the translation remains compact and amenable to deductive reasoning. This work helps bridge the gap between model checking and theorem proving, enabling formal verification efforts that can benefit from both of these approaches.
Kyuhee Kim, Auguste Poiroux, Antoine Bosselut
Formal verification guarantees proof validity but not formalization faithfulness. For natural-language logical reasoning, where models construct axiom systems from scratch without library constraints, this gap between valid proofs and faithful translations is especially acute. We investigate whether frontier models exploit this gap when generating Lean 4 proofs, a behavior we term formalization gaming. We evaluate GPT-5 and DeepSeek-R1 on 303 first-order logic problems (203 from FOLIO, 100 from Multi-LogiEval), comparing unified generation against a two-stage pipeline that separates formalization from proving. Despite compilation rates of 87-99%, we find no evidence of systematic gaming in unified generation: models prefer reporting failure over forcing proofs, even under prompting designed to encourage it. However, unfaithfulness that evades our detection signals may still occur. The two-stage pipeline reveals two distinct modes of unfaithfulness: GPT-5 fabricates axioms during proof generation, a reactive fallback detectable via cross-stage comparison, while DeepSeek-R1 mistranslates premises during formalization, producing internally consistent outputs that evade detection entirely. These findings show that high compilation rates or accuracies should not be equated with faithful reasoning. Code and data are available at https://github.com/koreankiwi99/formalization-gaming.
Alessandro G. Buda, Giuseppe Primiero, Leonardo Ceragioli, Melissa Antonelli
Generative AI systems are known to amplify biases present in their training data. While several inference-time mitigation strategies have been proposed, they remain largely empirical and lack formal guarantees. In this paper we introduce CTLF, a branching-time logic designed to reason about bias in series of generative AI outputs. CTLF adopts a counting worlds semantics where each world represents a possible output at a given step in the generation process and introduces modal operators that allow us to verify whether the current output series respects an intended probability distribution over a protected attribute, to predict the likelihood of remaining within acceptable bounds as new outputs are generated, and to determine how many outputs are needed to remove in order to restore fairness. We illustrate the framework on a toy example of biased image generation, showing how CTLF formulas can express concrete fairness properties at different points in the output series.
Robbe Van den Eede, Marc Denecker
Inductive definitions are an important form of knowledge. The logic FO(ID) is an extension of classical first-order logic FO with general non-monotone inductive definitions. Most existing proof systems for inductive definitions impose syntactic constraints on their definitions, thereby excluding many useful and natural definitions. We extend an existing sequent calculus LKID by Brotherston and Simpson, founded on the principle of mathematical induction, to a sequent calculus SCFO(ID) for FO(ID). The main challenge in this extension is the accommodation of non-monotone inductive definitions. To overcome this challenge, we draw inspiration from the stable semantics, which is a commonly used semantics in logic programming that is closely related to the well-founded semantics behind FO(ID). We corroborate SCFO(ID) by establishing several proof-theoretical properties and through demonstration on various examples. In conclusion, SCFO(ID) is a theoretically substantiated sequent calculus for FO(ID), enabling formal proofs of theorems involving general inductive definitions.
Andrei Bulatov, Xiaoyang Gong, Bakh Khoussainov, Xinyao Wang
We study constraint satisfaction problems (CSPs) where the constraint languages are defined by finite automata, giving rise to automata-based CSPs. The key notion is the concept of Automatic Constraint Satisfaction Problem ($AutCSP$), where constraint languages and instances are specified by finite automata. The $AutCSP$ captures infinite yet finitely describable sets of relations, enabling concise representations of complex constraints. Studying the complexity of the $AutCSP$s illustrates the interplay between classical CSPs, automata, and logic, sharpening the boundary between tractable and intractable constraints. We show that checking whether an operation is a polymorphism of such a language can be done in polynomial time. Building on this, we establish several complexity classification results for the $AutCSP$. In particular, we prove that Schaefer's Dichotomy Theorem extends to the $AutCSP$ over the Boolean domain, and we provide algorithms that decide tractability of some classes of $AutCSP$s over arbitrary finite domains via automatic polymorphisms. An important part of our work is that our polynomial-time algorithms run on $AutCSP$ instances that can be exponentially more succinct than their standard CSP counterparts.
Florentina Voboril, Martin Gebser, Stefan Szeider, Alice Tarzariol
Streamliner constraints reduce the search space of combinatorial problems by ruling out portions of the solution space. We adapt the StreamLLM approach, which uses Large Language Models (LLMs) to generate streamliners for Constraint Programming, to Answer Set Programming (ASP). Given an ASP encoding and a few small training instances, we prompt multiple LLMs to propose candidate constraints. Candidates that cause syntax errors, render satisfiable instances unsatisfiable, or degrade performance on all training instances are discarded. The surviving streamliners are evaluated together with the original encoding, and we report results for a virtual best encoding (VBE) that, for each instance, selects the fastest among the original encoding and its streamlined variants. On three ASP Competition benchmarks (Partner Units Problem, Sokoban, Towers of Hanoi), the VBE achieves speedups of up to 4--5x over the original encoding. Different LLMs produce semantically diverse constraints, not mere syntactic variations, indicating that the approach captures genuine problem structure.
Amirreza Akbari, Amauri H. Souza, Vikas Garg
Graph neural networks (GNNs) are the standard for learning on graphs, yet they have limited expressive power, often expressed in terms of the Weisfeiler-Leman (WL) hierarchy or within the framework of first-order logic. In this context, topological neural networks (TNNs) have recently emerged as a promising alternative for graph representation learning. By incorporating higher-order relational structures into message-passing schemes, TNNs offer higher representational power than traditional GNNs. However, a fundamental question remains open: what is the logical expressiveness of TNNs? Answering this allows us to characterize precisely which binary classifiers TNNs can represent. In this paper, we address this question by analyzing isomorphism tests derived from the underlying mechanisms of general TNNs. We introduce and investigate the power of higher-order variants of WL-based tests for combinatorial complexes, called $k$-CCWL test. In addition, we introduce the topological counting logic (TC$_k$), an extension of standard counting logic featuring a novel pairwise counting quantifier $ \exists^{N}(x_i,x_j)\, \varphi(x_i,x_j), $ which explicitly quantifies pairs $(x_i, x_j)$ satisfying property $\varphi$. We rigorously prove the exact equivalence: $ \text{k-CCWL} \equiv \text{TC}_{k{+}2} \equiv \text{Topological }(k{+}2)\text{-pebble game}.$ These results establish a logical expressiveness theory for TNNs.
David Billington
Defeasible statements are statements that are likely, or probable, or usually true, but may occasionally be false. Plausible reasoning makes conclusions from statements that are either facts or defeasible statements without using numbers. So there are no probabilities or suchlike involved. Seventeen principles of logics that do plausible reasoning are suggested and several important plausible reasoning examples are considered. There are 14 necessary principles and 3 desirable principles, one of which is not formally stated. A first-order logic, called Plausible Logic (PL), is defined that satisfies all but two of the desirable principles and reasons correctly with all the examples. As far as we are aware, this is the only such logic. PL has 8 reasoning algorithms because, from a given plausible reasoning situation, there are different sensible conclusions. This article is a condensation of my book `Plausible Reasoning and Plausible Logic' (PRPL), which is to be submitted. Each section of this article corresponds to a chapter in PRPL, and vice versa. The proofs of all the results are in PRPL, so they are omitted in this article.