Benedikt Bollig, Matthias Függer, Thomas Nowak, Paul Zeinaty
We compare lightweight automata-based models (n-grams) with neural architectures (LSTM, Transformer) for next-activity prediction in streaming event logs. Experiments on synthetic patterns and five real-world process mining datasets show that n-grams with appropriate context windows achieve comparable accuracy to neural models while requiring substantially fewer resources. Unlike windowed neural architectures, which show unstable performance patterns, n-grams provide stable and consistent accuracy. While we demonstrate that classical ensemble methods like voting improve n-gram performance, they require running many agents in parallel during inference, increasing memory consumption and latency. We propose an ensemble method, the promotion algorithm, that dynamically selects between two active models during inference, reducing overhead compared to classical voting schemes. On real-world datasets, these ensembles match or exceed the accuracy of non-windowed neural models with lower computational cost.
Roland Groz, German Eduardo Vega Baez, Adenilso Simao, Catherine Oriat, Neil Walkinshaw, Michael Foster
Extended finite state machines (EFSMs) model stateful systems with internal data variables and have numerous applications in software engineering. A major advantage of this type of model lies in its ability to model both the data flow and the data-dependent control behaviour. In the absence of such models, it is desirable to reverse-engineer them by observing the system's behaviour. However, existing approaches generally require the ability to reset the system during inference, or can only handle situations where the control flow depends exclusively on the input parameters, and not on the values of the stored data. In this work, we present a black-box active learning algorithm that infers EFSMs with guards and registers, and which significantly relaxes the assumptions that have to be made about the system in comparison to previous attempts.
Arend Rensink
One of the advantages of LTL over CTL is that the notion of a counterexample is easy to grasp, visualise and process: it is a trace that violates the property at hand. In this paper we propose a notion of evidence for CTL properties on explicit-state models -- which equally serves as witness for satisfied properties and counterexample for violated ones -- and how to visualise it, with the main aim of (human) comprehension. The main contribution consists of a formal model of evidence, a characterisation of minimal evidence per temporal operator, and a concrete, implemented proposal for its visualisation. This is the extended version of a paper published in SPIN 2026, containing the proofs of all results.
Henning Fernau, Lakshmanan Kuppusamy, Jana Schulz
In this paper, we consider combining the ideas of forbidden random context grammars as well as of ordered grammars with cooperating distributed grammar systems (CDGS). We focus on investigating their generative capacities. Both ideas can be added to CDGS in two ways: either having (e.g.) a strict order of the rules in each component, or having a strict order on the components. This leads to four different scenarios, only some of them have been addressed in the literature before. While in the area of CDGS, many inclusions among language classes have been %are still open questions for decades, the proposed addition of forbidden random context and ordered regulation variants leads to a clear picture which allows us to get down to only five different classes of languages well known from classical regulated rewriting. This way, we also solve some open problems from the literature.
Henning Fernau, Pamela Fleischmann, Kevin Mann, Silas Cato Sacher
In this work, we introduce a new notion for representing graph classes with formal languages. In contrast to the seminal work by Kitaev and Pyatkin to represent graphs by words, we use formal binary languages in order to have a set of patterns (given by the languages' words) defining the edges in the graph. In particular, we investigate famous languages like the palindromes, copy-words, Lyndon words, and Dyck words to represent all graphs or specific graph classes by restricting these languages.
Andrei Bulatov, Xiaoyang Gong, Bakh Khoussainov, Xinyao Wang
We study constraint satisfaction problems (CSPs) where the constraint languages are defined by finite automata, giving rise to automata-based CSPs. The key notion is the concept of Automatic Constraint Satisfaction Problem ($AutCSP$), where constraint languages and instances are specified by finite automata. The $AutCSP$ captures infinite yet finitely describable sets of relations, enabling concise representations of complex constraints. Studying the complexity of the $AutCSP$s illustrates the interplay between classical CSPs, automata, and logic, sharpening the boundary between tractable and intractable constraints. We show that checking whether an operation is a polymorphism of such a language can be done in polynomial time. Building on this, we establish several complexity classification results for the $AutCSP$. In particular, we prove that Schaefer's Dichotomy Theorem extends to the $AutCSP$ over the Boolean domain, and we provide algorithms that decide tractability of some classes of $AutCSP$s over arbitrary finite domains via automatic polymorphisms. An important part of our work is that our polynomial-time algorithms run on $AutCSP$ instances that can be exponentially more succinct than their standard CSP counterparts.
Giuseppe De Giacomo, Christian Hagemeier, Daniel Hausmann, Nir Piterman
We study synthesis for obligation properties expressed in LTLfp, the extension of LTLf to infinite traces. Obligation properties are positive Boolean combinations of safety and guarantee (co-safety) properties and form the second level of the temporal hierarchy of Manna and Pnueli. Although obligation properties are expressed over infinite traces, they retain most of the simplicity of LTLf. In particular, we show that they admit a translation into symbolically represented deterministic weak automata (DWA) obtained directly from the symbolic deterministic finite automata (DFA) for the underlying LTLf properties on trace prefixes. DWA inherit many of the attractive algorithmic features of DFA, including Boolean closure and polynomial-time minimization. Moreover, we show that synthesis for LTLfp obligation properties is theoretically highly efficient - solvable in linear time once the DWA is constructed. We investigate several symbolic algorithms for solving DWA games that arise in the synthesis of obligation properties and evaluate their effectiveness experimentally. Overall, the results indicate that synthesis for LTLfp obligation properties can be performed with virtually the same effectiveness as LTLf synthesis.
Manfred Droste, Vitaly Nürnberg
We introduce weighted finite finance automata (WFFA), a formal framework for modeling and analyzing quantitative properties of financial systems driven by uncertain economic variables such as stock prices, interest rates, and exchange rates. The model provides a compositional and language-theoretic approach to scenario-based financial analysis, enabling systematic evaluation of financial instruments and trading strategies. To specify such systems, we introduce weighted finance regular expressions, a declarative language for quantitative financial properties. We establish a Kleene-Schützenberger-type correspondence between WFFAs and weighted finance regular expressions, together with effective translation procedures between the two formalisms. On the algorithmic side, we investigate fundamental decision and optimization problems for WFFAs, including the computation of extremal payoffs, and identify expressive yet computationally tractable subclasses. These results provide a foundation for formal, compositional, and efficient analysis of financial systems under multiple market scenarios.
Enrico Formenti, Eric Goles, Kévin Perrot, Martín Ríos-Wilson, Domingo Ruiz-Tala
Fungal automata are a nature-inspired computational model, where a rule is alternatively applied verticaly and horizontaly. In this work we study the computational complexity of predicting the dynamics of all fungal freezing totalistic one-dimentional rules of radius $1$, exhibiting various behaviors. Despite efficiently predictable in most cases (with non-deterministic logspace algorithms), a non-linear rule is left open to characterize. We further explore the freezing majority rule (which is totalistic), and prove that at radius $1.5$ it becomes $\mathbf{P}$-complete to predict.
Attila Egri-Nagy, Chrystopher L. Nehaniv
Computational power can be measured by assigning an algebraic structure to a computational device. Here, we convert a small patch of Conway's Game of Life into a transformation semigroup. The conversion captures not only time evolution but also interactive operations. In this way, the cellular automaton becomes directly programmable. Once this measurement is made, we apply hierarchical decompositions to the resulting algebraic object as a way of understanding it. These decompositions are based on a macro/micro-state division inspired by statistical mechanics. However, cellular automata have a large number of global states. Therefore, we focus on partitioning the state space and creating morphic images approximations that can serve as macro-level descriptions. The methods developed here are not limited to cellular automata; they apply more generally to discrete dynamical systems.
Paul C. Bell, George Kenison, Reino Niskanen, Igor Potapov, Pavel Semukhin
Embeddings of word structures into matrix semigroups provide a natural bridge between combinatorics on words and linear algebra. However, low-dimensional matrix semigroups impose strong structural restrictions on possible embeddings. Certain finitely generated groups admit faithful representations in SL(2, C) and other similar matrix groups. On the other hand, it is known that the product of two free semigroups on two generators cannot be embedded into the 2x2 complex matrices. In this paper we study embeddings of word structures into low-dimensional matrix semigroups over the complex numbers and develop new techniques for constructing word representations of the Euclidean Bianchi groups. These representations provide a symbolic framework and a natural first step towards analysing fundamental decision problems in 2x2 matrix semigroups.
Hugo O'Connor
Agent communication languages (ACLs) enable heterogeneous agents to share knowledge and coordinate across diverse domains. This diversity demands extensibility, but expressive extension mechanisms can push the input language beyond the complexity classes where full validation is tractable. We present CBCL (Common Business Communication Language), an agent communication language that constrains all messages, including runtime language extensions, to the deterministic context-free language (DCFL) class. CBCL allows agents to define, transmit, and adopt domain-specific "dialect" extensions as first-class messages; three safety invariants (R1--R3), machine-checked in Lean 4 and enforced in a Rust reference implementation, prevent unbounded expansion, applying declared resource limits, and preserving core vocabulary. We formalize the language and its safety properties in Lean 4, implement a reference parser and dialect engine in Rust with property-based and differential tests, and extract a verified parser binary. Our results demonstrate that homoiconic protocol design, where extension definitions share the same representation as ordinary messages, can be made provably safe. As autonomous agents increasingly extend their own communication capabilities, formally bounding what they can express to each other is a precondition for oversight.
Laurent Bienvenu, Santiago Cifuentes, Hugo Gimbert
For a fixed alphabet A, an infinite sequence X is said to be normal if every word w over A appears in X with the same frequency as any other word of the same length. A classical result relates normality to finite automata as follows: a sequence X is normal if and only if all gambling strategies implementable with finite deterministic automata lose all their capital when trying to predict the next bit of X after seing the ones before. More precisely, Schnorr and Stimm (1972) proved that the capital goes exponentially fast to zero unless the automaton represents the gambler that never bets, in which case the capital remains constant. In this paper we show that an analogous result holds when considering probabilistic automata: a sequence X is normal if and only if for any gambling strategy implementable with probabilistic finite automaton it holds that the expected value of the capital of the gambler converges exponentially fast to a finite value when playing against X. To obtain this result, we show a more general statement related to the convergence of martingales given by finite sets of non-negative matrices {M a } a$\in$A . In particular, we show that X is normal if and only if ||vM X[1] . . . M X[n] || converges exponentially fast to a finite value for any non-negative starting vector v. Moreover, we distinguish three distinctive behaviours that this sequence can attain, and prove that the problem of recognizing, given a family of matrices, to which case it belongs, is decidable.
Mikołaj Bojańczyk, Grzegorz Fabiański, Rafał Stefański
It is open whether equivalence ( f = g ) is decidable for string-to-string polyregular functions. We consider their higher-order extension based on the λ-calculus definition of polyregular functions from Bojańczyk (2018). In this setting, equivalence is undecidable by reduction from the tiling problem.
Yahia Idriss Benalioua, Nathan Lhote, Pierre-Alain Reynier
In this work, we study minimization of rational functions given as appending streaming string transducers (aSST for short). We rely on an algebraic presentation of these functions, known as bimachines, to address the minimization of both states and registers of aSST. First, we show a bijection between a subclass of aSST and bimachines, which maps the numbers of states and registers of the aSST to two natural parameters of the bimachine. Using known results on the minimization of bimachines, this yields a Ptime (resp. NP) procedure to minimize this subclass of aSST with respect to registers (resp. both states and registers). In a second step, we introduce a new model of bimachines, named asynchronous bimachines, which allows to lift the bijection to the whole class of aSST. Based on this, we prove that register minimization with a fixed underlying automaton is NP-complete.
Matan Tal
In [6], a constraint on invariant measures on bi-permutative cellular autmata has been observed: fixed values at the positive indices determine almost-surely a uniform probability on the set of possible values at the zero index. When the alphabet is a finite group and the automaton is multiplication of two neighbors, that set is in fact a coset of some subgroup. In the present paper, we strengthen the formulations in [6] and investigate further the implications of this constraint. In the finite group case mentioned above, relations between some attributes of the group structure and the invariant measures are examined. We also inspect a factor that this constraint induces, and analyze the special case in which it has zero measure-theoretical entropy, thus observing an interplay between zero entropy invariant measures on that factor and the positive entropy measures corresponding to them on the original system. Then, we leave the setting of bi-permutative cellular automata and generalize our results to a wider class which we named RLP subshifts. The peculiar situation is that although we conjecture that this class is much larger than the class of bi-permutative cellular automata, we were able to prove only for essentially one other example that it belongs to it.
Kaiwen Zhang, Guanjun Liu
Recovering concurrency structure directly from source code is difficult because shared-resource identity and protection relations are often obscured by aliasing, ownership, and API-specific idioms. We therefore study a specification-driven, model-first verification architecture for LLM-assisted concurrent program construction. Instead of verifying arbitrary source code, a large language model first synthesizes a verification-oriented concurrency artifact from a natural-language requirement or system specification. The first formalism, the Concurrency Intermediate Representation (Cir), is a statement-level, alias-free model in which shared resources are globally named, protection relations are explicit, and each statement carries a stable identifier. The second formalism, the Concurrency Verification Net (Cvn), is a weighted place/transition Petri net with a finite global store and three-valued guards for data-dependent branching. A validated Cir artifact is translated mechanically to Cvn, explored exhaustively, and any counterexample is mapped back to statement identifiers to guide targeted repair. To reduce the risk of bug-free but behavior-dropping repairs, acceptance additionally applies a lightweight goal-reachability check over designated critical outcomes. We formalize both representations, prove translation-correspondence results for deadlock and signal-loss analysis, define a two-layer checking architecture with 61 static rules and 5 analysis predicates, and evaluate the pipeline on 9 representative bounded-concurrency patterns. The results show that the method supports iterative bug detection and repair on Cir artifacts and that goal reachability helps filter semantically incomplete repairs. The trust boundary of the present work is the generated Cir artifact rather than arbitrary source code.
Mohamad Louai Shehab, Antoine Aspeel, Necmiye Ozay
Reward machines are automaton-like structures that capture the memory required to accomplish a multi-stage task. When combined with reinforcement learning or optimal control methods, they can be used to synthesize robot policies to achieve such tasks. However, specifying a reward machine by hand, including a labeling function capturing high-level features that the decisions are based on, can be a daunting task. This paper deals with the problem of learning reward machines directly from raw state and policy information. As opposed to existing works, we assume no access to observations of rewards, labels, or machine nodes, and show what trajectory data is sufficient for learning the reward machine in this information-scarce regime. We then extend the result to an active learning setting where we incrementally query trajectory extensions to improve data (and indirectly computational) efficiency. Results are demonstrated with several grid world examples.
Zeyu Chen, Junde Wu
Generalized finite automata (GFAs), probabilistic finite automata (PFAs), and one-way general quantum finite automata (1gQFA) recognize the same strict-cutpoint languages, but the state complexity of exact probabilistic simulation has remained unclear. This paper determines that worst-case cost exactly: every \(n\)-state 1gQFA admits exact strict-cutpoint simulation by a one-way PFA with \(O(n^2)\) states, via the standard \(n^2\)-dimensional mixed-state linearization together with an explicit alphabet-preserving construction that converts each \(k\)-state GFA into a one-way PFA with at most \(2k+6\) states; conversely, for every \(n\ge 2\), there exists an \(n\)-state 1gQFA for which every equivalent one-way PFA requires at least \(n^2-1\) states, obtained from a prepare--test construction and a Vapnik--Chervonenkis dimension argument. Hence the worst-case probabilistic state cost of exact strict-cutpoint simulation is \(Θ(n^2)\).
Azadeh Farzan, Umang Mathur
Predictive runtime monitoring asks whether an execution $σ$ of a concurrent program can be used to \emph{soundly predict} the existence of a reordering $ρ$ of $σ$ that satisfies a property $\varphi$. Its effectiveness and efficiency depend on two factors: (a) the complexity of $\varphi$, and (b) the expressive power of the reorderings considered. At one extreme, allowing all reorderings induced by \emph{reads-from equivalence} makes predictive monitoring intractable, even for simple properties such as data races. At the other extreme, restricting to commutativity-based reorderings (Mazurkiewicz trace equivalence) yields efficient algorithms for simple properties, but remains intractable for general regular specifications and offers limited predictive power. We address this tradeoff via \emph{parametrization}. We introduce \emph{sliced reorderings} and their generalization, \emph{$k$-sliced reorderings}. Informally, $ρ$ is a $k$-sliced reordering of $σ$ if $σ$ can be partitioned into $k+1$ ordered subsequences whose concatenation yields $ρ$, while preserving program order and reads-from constraints. Our results are twofold. First, $k$-sliced reorderings form a strictly increasing hierarchy that converges to reads-from equivalence as $k$ grows. Second, for any fixed $k$, predictive monitoring modulo $k$-sliced reorderings against any regular specification admits a constant-space streaming algorithm. Together, these results establish $k$-sliced reorderings as a principled alternative to existing equivalences, enabling a uniform parametrized framework where expressive power can be systematically traded off against computational cost.