Gabriel Riffo, Leonard Schmitz
We introduce SignatureTensors.jl, a new package for computing signature tensors of paths in julia. We present its core functionality and demonstrate its use through illustrative examples. The package is compatible with the computer algebra system OSCAR, enabling both exact and numerical computations with signatures.
Levi Lucio
Model transformations are central to MDE, but formal verification is difficult because mainstream transformation languages are undecidable. DSLTrans was designed to be Turing-incomplete to improve verifiability, yet earlier verification based on path-condition enumeration still suffered exponential blow-up and did not scale to realistic cases. We present a tractable verification workflow for DSLTrans and formalize when it is complete. The method combines three contributions: (i) a Cutoff Theorem proving that bounded model checking is complete for a precise DSLTrans fragment and positive existence/traceability properties, turning an infinite search into a finite computable bound; (ii) composable, soundness-preserving optimizations (per-class bounds, CEGAR-based fragment verification, and trace-aware dependency analysis) that reduce SMT encoding size; and (iii) a Z3-based implementation evaluated on realistic transformations from the ATL Zoo and related benchmarks. On 29 concrete transformations and 899 properties spanning compiler lowering, schema translation, behavioral modeling, graph mapping, and stress tests, 552 properties are proved, 345 produce concrete counterexamples (including intentional negative and boundary cases), and only 2 remain undecided within timeout. For properties beyond the tractability budget, we introduce tractability-driven refinement (precondition specialization, postcondition decomposition, and transformation instrumentation), achieving up to 112x speedup while eliminating spurious counterexamples. The workflow is supported by a web IDE and a concrete execution engine for runtime validation.
Xiakun Li, Hao Wu, Bican Xia, Tengshun Yang, Naijun Zhan
Stochastic constraints, which incorporate both deterministic parameters and random variables, extend classical deterministic constraints by explicitly accounting for uncertainty. These constraints are increasingly prevalent in data science, artificial intelligence, and bioinformatics; however, solving them requires addressing quantitative satisfaction problems that remain a significant challenge in computer science. In this paper, we propose a novel framework for deciding deterministic parameters that maximize the satisfaction probability. Our approach features a unique synergy between stochastic optimization and symbolic techniques: at the high level, it employs \emph{oracle-based stochastic gradient descent} to identify high-quality parameter candidates, while at the low level, it utilizes \emph{interval arithmetic} to compute rigorously certified lower bounds. This framework produces a sequence of sound and increasingly tight lower bounds for the true maximum satisfaction probability, supported by a high-probability convergence guarantee. We demonstrate the effectiveness and efficiency of our approach through its application to Stochastic Satisfiability Modulo Theories (SSMT) problems and a stochastic trajectory planning task.
Karin Yu, Eleni Chatzi, Georgios Kissas
Understanding natural and engineered systems often relies on symbolic formulations, such as differential equations, which provide interpretability and transferability beyond black-box models. We introduce Latent Grammar Flow (LGF), a neuro-symbolic generative framework for discovering ordinary differential equations from data. LGF embeds equations as grammar-based representations into a discrete latent space and forces semantically similar equations to be positioned closer together with a behavioural loss. Then, a discrete flow model guides the sampling process to recursively generate candidate equations that best fit the observed data. Domain knowledge and constraints, such as stability, can be either embedded into the rules or used as conditional predictors.
Cyrus Zhou, Yufei Jin, Yilin Xu, Yu-Chiang Wang, Chieh-Ju Chao, Monica S. Lam
Clinical trials are central to evidence-based medicine, yet many struggle to meet enrollment targets, despite the availability of over half a million trials listed on ClinicalTrials.gov, which attracts approximately two million users monthly. Existing retrieval techniques, largely based on keyword and embedding-similarity matching between patient profiles and eligibility criteria, often struggle with low recall, low precision, and limited interpretability due to complex constraints. We propose SatIR, a scalable clinical trial retrieval method based on constraint satisfaction, enabling high-precision and interpretable matching of patients to relevant trials. Our approach uses formal methods -- Satisfiability Modulo Theories (SMT) and relational algebra -- to efficiently represent and match key constraints from clinical trials and patient records. Beyond leveraging established medical ontologies and conceptual models, we use Large Language Models (LLMs) to convert informal reasoning regarding ambiguity, implicit clinical assumptions, and incomplete patient records into explicit, precise, controllable, and interpretable formal constraints. Evaluated on 59 patients and 3,621 trials, SatIR outperforms TrialGPT on all three evaluated retrieval objectives. It retrieves 32%-72% more relevant-and-eligible trials per patient, improves recall over the union of useful trials by 22-38 points, and serves more patients with at least one useful trial. Retrieval is fast, requiring 2.95 seconds per patient over 3,621 trials. These results show that SatIR is scalable, effective, and interpretable.
Yi-Shuai Niu, Shing-Tung Yau
Polylab is a MATLAB toolbox for multivariate polynomial scalars and polynomial matrices with a unified symbolic-numeric interface across CPU and GPU-oriented backends. The software exposes three aligned classes: MPOLY for CPU execution, MPOLY_GPU as a legacy GPU baseline, and MPOLY_HP as an improved GPU-oriented implementation. Across these backends, Polylab supports polynomial construction, algebraic manipulation, simplification, matrix operations, differentiation, Jacobian and Hessian construction, LaTeX export, CPU-side LaTeX reconstruction, backend conversion, and interoperability with YALMIP and SOSTOOLS. Versions 3.0 and 3.1 add two practically important extensions: explicit variable identity and naming for safe mixed-variable expression handling, and affine-normal direction computation via automatic differentiation, MF-logDet-Exact, and MF-logDet-Stochastic. The toolbox has already been used successfully in prior research applications, and Polylab Version 3.1 adds a new geometry-oriented computational layer on top of a mature polynomial modeling core. This article documents the architecture and user-facing interface of the software, organizes its functionality by workflow, presents representative MATLAB sessions with actual outputs, and reports reproducible benchmarks. The results show that MPOLY is the right default for lightweight interactive workloads, whereas MPOLY-HP becomes advantageous for reduction-heavy simplification and medium-to-large affine-normal computation; the stochastic log-determinant variant becomes attractive in larger sparse regimes under approximation-oriented parameter choices.
Jan A Bergstra, John V Tucker
We contemplate the notion of ambiguity in mathematical discourse. We consider a general method of resolving ambiguity and semantic options for sustaining a resolution. The general discussion is applied to the case of `fraction' which is ill-defined and ambiguous in the literature of elementary arithmetic. In order to clarify the use of `fraction' we introduce several new terms to designate some of its possible meanings. For example, to distinguish structural aspects we use `fracterm', to distinguish purely numerical aspects `fracvalue' and, to distinguish purely textual aspects `fracsign' and `fracsign occurence'. These interpretations can resolve ambiguity, and we discuss the resolution by using such precise notions in fragments of arithmetical discourse. We propose that fraction does not qualify as a mathematical concept but that the term functions as a collective for several concepts, which we simply call a `category'. This analysis of fraction leads us to consider the notion of number in relation to fracvalue. We introduce a way of specifying number systems, and compare the analytical concepts with those of structuralism.
Kosei Fushimi, Kazunobu Serizawa, Junya Ikemoto, Kazumune Hashimoto
Signal Temporal Logic (STL) is widely used to specify timed and safety-critical tasks for cyber-physical systems, but writing STL formulas directly is difficult for non-expert users. Natural language (NL) provides a convenient interface, yet its inherent structural ambiguity makes one-to-one translation into STL unreliable. In this paper, we propose an \textit{ambiguity-preserving} method for translating NL task descriptions into STL candidate formulas. The key idea is to retain multiple plausible syntactic analyses instead of forcing a single interpretation at the parsing stage. To this end, we develop a three-stage pipeline based on Combinatory Categorial Grammar (CCG): ambiguity-preserving $n$-best parsing, STL-oriented template-based semantic composition, and canonicalization with score aggregation. The proposed method outputs a deduplicated set of STL candidates with plausibility scores, thereby explicitly representing multiple possible formal interpretations of an ambiguous instruction. In contrast to existing one-best NL-to-logic translation methods, the proposed approach is designed to preserve attachment and scope ambiguity. Case studies on representative task descriptions demonstrate that the method generates multiple STL candidates for genuinely ambiguous inputs while collapsing unambiguous or canonically equivalent derivations to a single STL formula.
Juan Xu, Huilong Lai, Yingying Cheng, Wenqiang Yang, Changbo Chen
In this paper, we report on the largest labelled dataset constructed so far for solving zero-dimensional square nonlinear systems with subdivision-based methods. A brief, non-exhaustive survey with emphasis on the literature from the past two decades is also provided to accompany with the dataset. The value of the dataset has been demonstrated through benchmarking several solvers as well as being used for learning to classify the real roots of nonlinear parametric systems.
Devashish Gaikwad, Wil M. P. van der Aalst, Gyunam Park
Process anomaly detection is an important application of process mining for identifying deviations from the normal behavior of a process. Neural network-based methods have recently been applied to this task, learning directly from event logs without requiring a predefined process model. However, since anomaly detection is a purely statistical task, these models fail to incorporate human domain knowledge. As a result, rare but conformant traces are often misclassified as anomalies due to their low frequency, which limits the effectiveness of the detection process. Recent developments in the field of neuro-symbolic AI have introduced Logic Tensor Networks (LTN) as a means to integrate symbolic knowledge into neural networks using real-valued logic. In this work, we propose a neuro-symbolic approach that integrates domain knowledge into neural anomaly detection using LTN and Declare constraints. Using autoencoder models as a foundation, we encode Declare constraints as soft logical guiderails within the learning process to distinguish between anomalous and rare but conformant behavior. Evaluations on synthetic and real-world datasets demonstrate that our approach improves F1 scores even when as few as 10 conformant traces exist, and that the choice of Declare constraint and by extension human domain knowledge significantly influences performance gains.
Zimo Yan, Zheng Xie, Chang Liu, Yiqin Lv, Runfan Duan
Positional encoding has become a standard component in graph learning, especially for graph Transformers and other models that must distinguish structurally similar nodes, yet its fundamental identifiability remains poorly understood. In this work, we study node localization under a hybrid positional encoding that combines anchor-distance profiles with quantized low-frequency spectral features. We cast localization as an observation-map problem whose difficulty is controlled by the number of distinct codes induced by the encoding and establish an information-theoretic converse identifying an impossibility regime jointly governed by the anchor number, spectral dimension, and quantization level. Experiments further support this picture: on random $3$-regular graphs, the empirical crossover is well organized by the predicted scaling, while on two real-world DDI graphs identifiability is strongly graph-dependent, with DrugBank remaining highly redundant under the tested encodings and the Decagon-derived graph becoming nearly injective under sufficiently rich spectral information. Overall, these results suggest that positional encoding should be understood not merely as a heuristic architectural component, but as a graph-dependent structural resolution mechanism.
Alexis Kafantaris
Enes The proposed architecture is a mixture of experts, which allows for the model entities, such as the causal relationships, to be further parameterized. More specifically, an attempt is made to exploit a neural net as implementing neurons poses a great challenge for this dataset. To explain, a simple and fast Pearson coefficient linear model usually achieves good scores. An aggressive baseline that requires a really good model to overcome that is. Moreover, there are major limitations when it comes to causal discovery of observational data. Unlike the sachs one did not use interventions but only prior knowledge; the most prohibiting limitation is that of the data which is addressed. Thereafter, the method and the model are described and after that the results are presented.
Andrzej Odrzywołek
A single two-input gate suffices for all of Boolean logic in digital hardware. No comparable primitive has been known for continuous mathematics: computing elementary functions such as sin, cos, sqrt, and log has always required multiple distinct operations. Here I show that a single binary operator, eml(x,y)=exp(x)-ln(y), together with the constant 1, generates the standard repertoire of a scientific calculator. This includes constants such as e, pi, and i; arithmetic operations including addition, subtraction, multiplication, division, and exponentiation as well as the usual transcendental and algebraic functions. For example, exp(x)=eml(x,1), ln(x)=eml(1,eml(eml(1,x),1)), and likewise for all other operations. That such an operator exists was not anticipated; I found it by systematic exhaustive search and established constructively that it suffices for the concrete scientific-calculator basis. In EML (Exp-Minus-Log) form, every such expression becomes a binary tree of identical nodes, yielding a grammar as simple as S -> 1 | eml(S,S). This uniform structure also enables gradient-based symbolic regression: using EML trees as trainable circuits with standard optimizers (Adam), I demonstrate the feasibility of exact recovery of closed-form elementary functions from numerical data at shallow tree depths up to 4. The same architecture can fit arbitrary data, but when the generating law is elementary, it may recover the exact formula.
Peihan Ye, Alfreds Lapkovskis, Alaa Saleh, Qiyang Zhang, Praveen Kumar Donta
The computational demands of modern AI services are increasingly shifting execution beyond centralized clouds toward a computing continuum spanning edge and end devices. However, the scale, heterogeneity, and cross-layer dependencies of these environments make resilience difficult to maintain. Existing fault-management methods are often too static, fragmented, or heavy to support timely self-healing, especially under noisy logs and edge resource constraints. To address these limitations, this paper presents NeSy-Edge, a neuro-symbolic framework for trustworthy self-healing in the computing continuum. The framework follows an edge-first design, where a resource-constrained edge node performs local perception and reasoning, while a cloud model is invoked only at the final diagnosis stage. Specifically, NeSy-Edge converts raw runtime logs into structured event representations, builds a prior-constrained sparse symbolic causal graph, and integrates causal evidence with historical troubleshooting knowledge for root-cause analysis and recovery recommendation. We evaluate our work on representative Loghub datasets under multiple levels of semantic noise, considering parsing quality, causal reasoning, end-to-end diagnosis, and edge-side resource usage. The results show that NeSy-Edge remains robust even at the highest noise level, achieving up to 75% root-cause analysis accuracy and 65% end-to-end accuracy while operating within about 1500 MB of local memory.
Hao Xu, Yuntian Chen, Chongqing Kang, Dongxiao Zhang
The global shift towards renewable energy necessitates the development of ultrahigh-voltage (UHV) AC transmission to bridge the gap between remote energy sources and urban demand. While UHV grids offer superior capacity and efficiency, their implementation is often hindered by corona-induced audible noise (AN) and radio interference (RI). Since these emissions must meet strict environmental compliance standards, accurate prediction is vital for the large-scale deployment of UHV infrastructure. Existing engineering practices often rely on empirical laws, in which fixed log-linear structures limit accuracy and extrapolation. Herein, we present a monotonicity-constrained graph symbolic discovery framework, Mono-GraphMD, which uncovers compact, interpretable laws for corona-induced AN and RI. The framework provides mechanistic insight into how nonlinear interactions among the surface gradient, bundle number and diameter govern high-field emissions and enables accurate predictions for both corona-cage data and multicountry real UHV lines with up to 16-bundle conductors. Unlike black-box models, the discovered closed-form laws are highly portable and interpretable, allowing for rapid predictions when applied to various scenarios, thereby facilitating the engineering design process.
João Filipe, Gregor Behnke
Classical planning problems are typically defined using lifted first-order representations, which offer compactness and generality. While most planners ground these representations to simplify reasoning, this can cause an exponential blowup in size. Recent approaches instead operate directly on the lifted level to avoid full grounding. We explore a middle ground between fully lifted and fully grounded planning by introducing three SAT encodings that keep actions lifted while partially grounding predicates. Unlike previous SAT encodings, which scale quadratically with plan length, our approach scales linearly, enabling better performance on longer plans. Empirically, our best encoding outperforms the state of the art in length-optimal planning on hard-to-ground domains.
Jean-Guillaume Dumas, Clément Pernet, Alexandre Sedoglavic
We propose a more accurate variant of an algorithm for multiplying 4x4 matrices using 48 multiplications over any ring containing an inverse of 2. This algorithm has an error bound exponent of only log 4 $γ$$\infty$,2 $\approx$ 2.386. It also reaches a better accuracy w.r.t. max-norm in practice, when compared to previously known such fast algorithms. Furthermore, we propose a straight line program of this algorithm, giving a leading constant in its complexity bound of 387 32 n 2+log 4 3 + o n 2+log 4 3 operations over any ring containing an inverse of 2. Introduction: An algorithm to multiply two 4x4 complex-valued matrices requiring only 48 non-commutative multiplications was introduced in [16] 1 using a pipeline of large language models orchestrated by an evolutionary coding agent. A matrix multiplication algorithm with that many non-commutative multiplications is denoted by ___4x4x4:48___ in the sequel. An equivalent variant of the associated tensor decomposition defining this algorithm, but over the rationals (more precisely over any ring containing an inverse of 2), was then given in [8]. Most error analysis of sub-cubic time matrix multiplication algorithms [3, 4, 2, 1, 17] are given in the max-norm setting: bounding the largest output error as a function of the max-norm product of the vectors of input matrix coefficients. In this setting, Strassen's algorithm has shown the best accuracy bound, (proven minimal under some assumptions in [2]). In [6, 8], the authors relaxed this setting by shifting the focus to the 2-norm for input and/or output; that allowed them to propose a ___2x2x2:7___ variant with an improved accuracy bound. Experiments show that this variant performs best even when measuring the max-norm of the error bound. We present in this note a variant of the recent ___4x4x4:48___ algorithm over the rationals (again in the same orbit under De Groot isotropies [10]) that is more numerically accurate w.r.t. max-norm in practice. In particular, our new variant improves on the error bound exponent, from log 2 $γ$ $\infty$,2 $\approx$ 2.577 Consider the product of an M x K matrix A by a K x N matrix B. It is computed by a ___m, k, n___ algorithm represented by the matrices L, R, P applied recursively on ${\ell}$ recursive levels and the resulting m 0 x k 0 by k 0 x n 0 products are performed using an algorithm $β$. Here M = m 0 m ${\ell}$ , K = k 0 k ${\ell}$ and n = n 0 n ${\ell}$ . The accuracy bound below uses any (possibly different) p-norms and q-norms for its left-handside, ___$\bullet$___ p and right-hand side, ___$\bullet$___ q . The associated dual norms, are denoted by ___$\bullet$___ p $\star$ and ___$\bullet$___ q $\star$ respectively. Note that, these are vector norms, hence ___A___ p for matrix A in R mxn denotes ___Vect(A)___ p and is the p-norm of the mn dimensional vector of its coefficients, and not a matrix norm.
Md Maruf Hossain, Tim Katzke, Simon Klüttermann, Emmanuel Müller
We propose SYRAN, an unsupervised anomaly detection method based on symbolic regression. Instead of encoding normal patterns in an opaque, high-dimensional model, our method learns an ensemble of human-readable equations that describe symbolic invariants: functions that are approximately constant on normal data. Deviations from these invariants yield anomaly scores, so that the detection logic is interpretable by construction, rather than via post-hoc explanation. Experimental results demonstrate that SYRAN is highly interpretable, providing equations that correspond to known scientific or medical relationships, and maintains strong anomaly detection performance comparable to that of state-of-the-art methods.
Timothy Duff, Kisun Lee
We develop a certified numerical algorithm for computing Galois/monodromy groups of parametrized polynomial systems. Our approach employs certified homotopy path tracking to guarantee the correctness of the monodromy action produced by the algorithm, and builds on previous ``homotopy graph" frameworks. We conduct extensive experiments with an implementation of this algorithm, which we have used to certify properties of several notable Galois/monodromy groups which arise in several examples drawn from pure and applied mathematics.
Robin Kouba, Vincent Neiger, Mohab Safey El Din
We provide a new complexity bound for the computation of grevlex Gröbner bases in the generic zero-dimensional case, relying on Moreno-Socías' conjecture. We first formalize a property of regular sequences that implies a well-known folklore consequence, which we call the increasing degree property. We then derive a new understanding of the selection of pairs in the F4 algorithm based on Moreno-Socías' conjecture. Moreover, we obtain an exact formula for the number of elements in the grevlex Gröbner basis of a given degree, for half of the relevant degrees. Combining these results, we derive a precise complexity formula for the F4 Tracer algorithm, together with its asymptotic behavior when the number of variables tends to infinity. These results yield an improvement over the state-of-the-art complexity bounds by a factor which is exponential in the number of variables.