Zeyang Li, Sunbochen Tang, Navid Azizan
Diffusion and flow policies are gaining prominence in online reinforcement learning (RL) due to their expressive power, yet training them efficiently remains a critical challenge. A fundamental difficulty in online RL is the lack of direct samples from the target distribution; instead, the target is an unnormalized Boltzmann distribution defined by the Q-function. To address this, two seemingly distinct families of methods have been proposed for diffusion policies: a noise-expectation family, which utilizes a weighted average of noise as the training target, and a gradient-expectation family, which employs a weighted average of Q-function gradients. Yet, it remains unclear how these objectives relate formally or if they can be synthesized into a more general formulation. In this paper, we propose a unified framework, reverse flow matching (RFM), which rigorously addresses the problem of training diffusion and flow models without direct target samples. By adopting a reverse inferential perspective, we formulate the training target as a posterior mean estimation problem given an intermediate noisy sample. Crucially, we introduce Langevin Stein operators to construct zero-mean control variates, deriving a general class of estimators that effectively reduce importance sampling variance. We show that existing noise-expectation and gradient-expectation methods are two specific instances within this broader class. This unified view yields two key advancements: it extends the capability of targeting Boltzmann distributions from diffusion to flow policies, and enables the principled combination of Q-value and Q-gradient information to derive an optimal, minimum-variance estimator, thereby improving training efficiency and stability. We instantiate RFM to train a flow policy in online RL, and demonstrate improved performance on continuous-control benchmarks compared to diffusion policy baselines.
Akio Kawasaki, Boris Braverman, Edwin Pedrozo-Peñafiel, Chi Shu, Simone Colombo, Zeyang Li, Vladan Vuletić
In most experiments with atoms trapped in optical lattices, the transverse size of the optical lattice beams is on the order of tens of micrometers, and loading many atoms into smaller optical lattices has not been carefully investigated. We report trapping 1500 $^{171}$Yb atoms in a one-dimensional optical lattice generated by a narrow cavity mode at a distance of 0.14 mm from a mirror surface. The simplest approach of loading atoms from a mirror magneto-optical trap overlapped with the cavity mode allows the adjustment of the loading position by tuning a uniform bias magnetic field. The number of atoms trapped in the optical lattice exhibits two local maxima for different lattice depths, with a global maximum in the deeper lattice. These results open a way to quantum mechanical manipulation of atoms based on strong interaction with a tightly focused light field.
Zeyang Li, Chuxiong Hu, Shengbo Eben Li, Jia Cheng, Yunan Wang
Safety is a primary concern when applying reinforcement learning to real-world control tasks, especially in the presence of external disturbances. However, existing safe reinforcement learning algorithms rarely account for external disturbances, limiting their applicability and robustness in practice. To address this challenge, this paper proposes a robust safe reinforcement learning framework that tackles worst-case disturbances. First, this paper presents a policy iteration scheme to solve for the robust invariant set, i.e., a subset of the safe set, where persistent safety is only possible for states within. The key idea is to establish a two-player zero-sum game by leveraging the safety value function in Hamilton-Jacobi reachability analysis, in which the protagonist (i.e., control inputs) aims to maintain safety and the adversary (i.e., external disturbances) tries to break down safety. This paper proves that the proposed policy iteration algorithm converges monotonically to the maximal robust invariant set. Second, this paper integrates the proposed policy iteration scheme into a constrained reinforcement learning algorithm that simultaneously synthesizes the robust invariant set and uses it for constrained policy optimization. This algorithm tackles both optimality and safety, i.e., learning a policy that attains high rewards while maintaining safety under worst-case disturbances. Experiments on classic control tasks show that the proposed method achieves zero constraint violation with learned worst-case adversarial disturbances, while other baseline algorithms violate the safety constraints substantially. Our proposed method also attains comparable performance as the baselines even in the absence of the adversary.
Zeyang Li, Navid Azizan
Multi-agent reinforcement learning (MARL) has achieved notable success in cooperative tasks, demonstrating impressive performance and scalability. However, deploying MARL agents in real-world applications presents critical safety challenges. Current safe MARL algorithms are largely based on the constrained Markov decision process (CMDP) framework, which enforces constraints only on discounted cumulative costs and lacks an all-time safety assurance. Moreover, these methods often overlook the feasibility issue (the system will inevitably violate state constraints within certain regions of the constraint set), resulting in either suboptimal performance or increased constraint violations. To address these challenges, we propose a novel theoretical framework for safe MARL with $\textit{state-wise}$ constraints, where safety requirements are enforced at every state the agents visit. To resolve the feasibility issue, we leverage a control-theoretic notion of the feasible region, the controlled invariant set (CIS), characterized by the safety value function. We develop a multi-agent method for identifying CISs, ensuring convergence to a Nash equilibrium on the safety value function. By incorporating CIS identification into the learning process, we introduce a multi-agent dual policy iteration algorithm that guarantees convergence to a generalized Nash equilibrium in state-wise constrained cooperative Markov games, achieving an optimal balance between feasibility and performance. Furthermore, for practical deployment in complex high-dimensional systems, we propose $\textit{Multi-Agent Dual Actor-Critic}$ (MADAC), a safe MARL algorithm that approximates the proposed iteration scheme within the deep RL paradigm. Empirical evaluations on safe MARL benchmarks demonstrate that MADAC consistently outperforms existing methods, delivering much higher rewards while reducing constraint violations.
Xin Wei, Zeyang Li, Abhishek V. Karve, Adam L. Shaw, David I. Schuster, Jonathan Simon
Jan 13, 2026·quant-ph·PDF Rapid and programmable shaping of light fields is central to modern microscopy, display technologies, optical communications and sensing, quantum engineering, and quantum information processing. Current wavefront shaping technologies face a fundamental dichotomy: spatial light modulators (SLMs) offer high pixel count but suffer from low refresh rates, while acousto-optic deflectors (AODs) provide moderate speed with restricted optical beam geometries. Though recent advances in photonic integrated circuits achieve fast switching, there is currently no tool that provides MHz-rate, continuous motion, and arbitrarily reconfigurable control over a set of diffraction-limited spots. Here we introduce a new class of spatial light modulator that provides both 2D pixel geometry and high speed. The device operates by encoding spatial information in frequency bins via a broadband optical phase modulator, and decoding them via a first-of-its-kind, high-resolution 2D spectrometer. The spectrometer, based on the architecture which we call the Re-Imaging Phased Array (RIPA), achieves its sensitivity through long path-lengths, enabled by intra-spectrometer re-imaging lens-guides. We demonstrate site-resolved optical pulsing with a 44(1)~ns rise time, corresponding to frame rates exceeding 10 million frames per second, as well as arbitrary, reconfigurable 2D addressing and multi-site operations, including asynchronous, independent beam motion, splitting, and recombination. Leveraging these tools opens new horizons in rapid optical manipulation of matter across science, from fast, scalable control that approaches the inertial and radiation limits of atoms in quantum processors, to dynamically programmable, microsecond-resolved illumination in microscopy and neuro-biological imaging.
Zeyang Li, Kaveh Alim, Navid Azizan
Diffusion and flow-matching have emerged as powerful methodologies for generative modeling, with remarkable success in capturing complex data distributions and enabling flexible guidance at inference time. Many downstream applications, however, demand enforcing hard constraints on generated samples (for example, robot trajectories must avoid obstacles), a requirement that goes beyond simple guidance. Prevailing projection-based approaches constrain the entire sampling path to the constraint manifold, which is overly restrictive and degrades sample quality. In this paper, we introduce a novel framework that reformulates hard-constrained sampling as a trajectory optimization problem. Our key insight is to leverage numerical optimal control to steer the sampling trajectory so that constraints are satisfied precisely at the terminal time. By exploiting the underlying structure of flow-matching models and adopting techniques from model predictive control, we transform this otherwise complex constrained optimization problem into a tractable surrogate that can be solved efficiently and effectively. Furthermore, this trajectory optimization perspective offers significant flexibility beyond mere constraint satisfaction, allowing for the inclusion of integral costs to minimize distribution shift and terminal objectives to further enhance sample quality, all within a unified framework. We provide a control-theoretic analysis of our method, establishing bounds on the approximation error between our tractable surrogate and the ideal formulation. Extensive experiments across diverse domains, including robotics (planning), partial differential equations (boundary control), and vision (text-guided image editing), demonstrate that our algorithm, which we name $\textit{HardFlow}$, substantially outperforms existing methods in both constraint satisfaction and sample quality.
Pai Peng, Zeyang Li, Haoxiong Yan, Ken Xuan Wei, Paola Cappellaro
Many-body localization (MBL), characterized by the absence of thermalization and the violation of conventional thermodynamics, has elicited much interest both as a fundamental physical phenomenon and for practical applications in quantum information. A phenomenological model, which describes the system using a complete set of local integrals of motion (LIOMs), provides a powerful tool to understand MBL, but can be usually only computed approximately. Here we explicitly compute a complete set of LIOMs with a non-perturbative approach, by maximizing the overlap between LIOMs and physical spin operators in real space. The set of LIOMs satisfies the desired exponential decay of weight of LIOMs in real-space. This LIOM construction enables a direct mapping from the real space Hamiltonian to the phenomenological model and thus enables studying the localized Hamiltonian and the system dynamics. We can thus study and compare the localization lengths extracted from the LIOM weights, their interactions, and dephasing dynamics, revealing interesting aspects of many-body localization. Our scheme is immune to accidental resonances and can be applied even at phase transition point, providing a novel tool to study the microscopic features of the phenomenological model of MBL.
Edwin Pedrozo-Peñafiel, Simone Colombo, Chi Shu, Albert F. Adiyatullin, Zeyang Li, Enrique Mendez, Boris Braverman, Akio Kawasaki, Daisuke Akamatsu, Yanhong Xiao, Vladan Vuletić
Jun 12, 2020·quant-ph·PDF State-of-the-art atomic clocks are based on the precise detection of the energy difference between two atomic levels, measured as a quantum phase accumulated in a given time interval. Optical-lattice clocks (OLCs) now operate at or near the standard quantum limit (SQL) that arises from the quantum noise associated with discrete measurement outcomes. While performance beyond the SQL has been achieved in microwave clocks and other atomic sensors by engineering quantum correlations (entanglement) between the atoms, the generation of entanglement on an optical-clock transition and operation of such a clock beyond the SQL represent major goals in quantum metrology that have never been demonstrated. Here we report creation of a many-atom entangled state on an optical transition, and demonstrate an OLC with an Allan deviation below the SQL. We report a metrological gain of $4.4^{+0.6}_{-0.4}$ dB over the SQL using an ensemble consisting of a few hundred 171Yb atoms, allowing us to reach a given stability $2.8{\pm}0.3$ times faster than the same clock operated at the SQL. Our results should be readily applicable to other systems, thus enabling further advances in timekeeping precision and accuracy. Entanglement-enhanced OLCs will have many scientific and technological applications, including precision tests of the fundamental laws of physics, geodesy, or gravitational wave detection.
Simone Colombo, Edwin Pedrozo-Peñafiel, Albert F. Adiyatullin, Zeyang Li, Enrique Mendez, Chi Shu, Vladan Vuletic
In quantum metrology, entanglement represents a valuable resource that can be used to overcome the Standard Quantum Limit (SQL) that bounds the precision of sensors that operate with independent particles. Measurements beyond the SQL are typically enabled by relatively simple entangled states (squeezed states with Gaussian probability distributions), where quantum noise is redistributed between different quadratures. However, due to both fundamental limitations and the finite measurement resolution achieved in practice, sensors based on squeezed states typically operate far from the true fundamental limit of quantum metrology, the Heisenberg Limit. Here, by implementing an effective time-reversal protocol through a controlled sign change in an optically engineered many-body Hamiltonian, we demonstrate atomic-sensor performance with non-Gaussian states beyond the limitations of spin squeezing, and without the requirement of extreme measurement resolution. Using a system of 350 neutral $^{171}$Yb atoms, this signal amplification through time-reversed interaction (SATIN) protocol achieves the largest sensitivity improvement beyond the SQL ($11.8 \pm 0.5$~dB) demonstrated in any interferometer to date. Furthermore, we demonstrate a precision improving in proportion to the particle number (Heisenberg scaling), at fixed distance of 12.6~dB from the Heisenberg Limit. These results pave the way for quantum metrology using complex entangled states, with potential broad impact in science and technology. Potential applications include searches for dark matter and for physics beyond the standard model, tests of the fundamental laws of physics, timekeeping, and geodesy.
Sebastian C. Carrasco, Michael H. Goerz, Zeyang Li, Simone Colombo, Vladan Vuletic, Vladimir S. Malinovsky
We propose a novel scheme for the generation of optimal squeezed states for Ramsey interferometry. The scheme consists of an alternating series of one-axis twisting pulses and rotations, both of which are straightforward to implement experimentally. The resulting states show a metrological gain proportional to the Heisenberg limit. We demonstrate that the Heisenberg scaling is maintained even when placing constraints on the amplitude of the pulses implementing the one-axis twisting and when taking into account realistic losses due to photon scattering.
Sebastián C. Carrasco, Michael H. Goerz, Zeyang Li, Simone Colombo, Vladan Vuletic, Wolfgang P. Schleich, Vladimir S. Malinovsky
We propose a novel method for generating Schrödinger-cat states -- defined as equal superpositions of arbitrary coherent states -- using a concise sequence of rapid twist-and-turn pulses. We demonstrate that the required shearing strength for the protocol, which scales linearly with time, decreases with increasing number of atoms ($N$) in proportion to $1/\sqrt{N}$. The resulting states exhibit optimal quantum Fisher information, making them ideal for surpassing the classical limit of phase sensitivity in quantum metrology applications. Notably, our protocol is compatible with a time-reversal strategy for quantum metrology, ensuring its practical viability. Furthermore, we demonstrate that the Heisenberg limit scaling remains intact even when reducing the twisting employed in tandem with the number of atoms, thereby mitigating realistic losses such as photon scattering.
Jun Wang, Zhao-Yu Han, Song-Bo Wang, Zeyang Li, Liang-Zhu Mu, Heng Fan, Lei Wang
We propose a quantum tomography scheme for pure qudit systems which adopts random base measurements and generative learning methods, along with a built-in fidelity estimation approach to assess the reliability of the tomographic states. We prove the validity of the scheme theoretically, and we perform numerically simulated experiments on several target states including three typical quantum information states and randomly initiated states, demonstrating its efficiency and robustness. The number of replicas required by a certain convergence criterion grows in the manner of low-degree polynomial when the system scales, thus the scheme achieves high scalability that is crucial for practical quantum state tomography.
Zeyang Li, Abhishek V. Karve, Xin Wei, Jonathan Simon
Filters with flat-top pass-bands are a key enabling technology for signal processing. From communication to sensing, the ability to choose a pass \emph{band}, rather than a single pass \emph{frequency}, while still efficiently suppressing backgrounds at other frequencies, is a critical capability for ensuring both detection sensitivity and power efficiency. Efficient transmission of a single frequency can be achieved by a single-pole resonator -- which in optics is a Fabry-Pérot cavity offering linewidths from kHz to GHz and beyond. Coupling multiple resonators allows for the construction of flat-top multi-pole filters. These, although straightforward from RF to THz where resonators are macroscopic and tunable, are more difficult to control in the optical band and typically realized with dielectric stacks, whose passband widths exceed 100 GHz. Here, we bridge the gap to narrower bandwidth flat-top filters by proposing and implementing a second-order Butterworth-type optical filter in a single two-mirror Fabry-Pérot cavity, by coupling the two polarization modes. We demonstrate a pass-band width of 2.68(1)~GHz, a maximum stopband suppression of 43~dB, and a passband insertion loss of 2.2(1)~dB, with out-of-band power suppression falling as the fourth power of detuning. This approach is viable down to much narrower filters, and has the potential to improve high-frequency phase noise performance of lasers, enhance the sensitivity of LIDARs, and provide higher quality narrowband filtering, for example, for Raman spectroscopy.
Zeyang Li, Chuxiong Hu, Yunan Wang, Yujie Yang, Shengbo Eben Li
Reinforcement learning (RL) agents are vulnerable to adversarial disturbances, which can deteriorate task performance or compromise safety specifications. Existing methods either address safety requirements under the assumption of no adversary (e.g., safe RL) or only focus on robustness against performance adversaries (e.g., robust RL). Learning one policy that is both safe and robust remains a challenging open problem. The difficulty is how to tackle two intertwined aspects in the worst cases: feasibility and optimality. Optimality is only valid inside a feasible region, while identification of maximal feasible region must rely on learning the optimal policy. To address this issue, we propose a systematic framework to unify safe RL and robust RL, including problem formulation, iteration scheme, convergence analysis and practical algorithm design. This unification is built upon constrained two-player zero-sum Markov games. A dual policy iteration scheme is proposed, which simultaneously optimizes a task policy and a safety policy. The convergence of this iteration scheme is proved. Furthermore, we design a deep RL algorithm for practical implementation, called dually robust actor-critic (DRAC). The evaluations with safety-critical benchmarks demonstrate that DRAC achieves high performance and persistent safety under all scenarios (no adversary, safety adversary, performance adversary), outperforming all baselines significantly.
Chi Shu, Simone Colombo, Zeyang Li, Albert Adiyatullin, Enrique Mendez, Edwin Pedrozo-Peñafiel, Vladan Vuletić
The strong coupling of atoms to optical cavities can improve optical lattice clocks as the cavity enables metrologically useful collective atomic entanglement and high-fidelity measurement. To this end, it is necessary to cool the ensemble to suppress motional broadening, and advantageous to maximize and homogenize the atom-cavity coupling. We demonstrate resolved Raman sideband cooling via the cavity as a method that can simultaneously achieve both goals. In 200 ms, we cool 171Yb atoms to an average vibration number <nx> = 0.23(7) in the tightly binding direction, resulting in 93% optical π-pulse fidelity on the clock transition 1S0 -> 3P0. During cooling, the atoms self-organize into locations with maximal atom-cavity-coupling, which will improve quantum metrology applications.
Zeyang Li, Chuxiong Hu, Yunan Wang, Guojian Zhan, Jie Li, Shengbo Eben Li
Regularization is one of the most important techniques in reinforcement learning algorithms. The well-known soft actor-critic algorithm is a special case of regularized policy iteration where the regularizer is chosen as Shannon entropy. Despite some empirical success of regularized policy iteration, its theoretical underpinnings remain unclear. This paper proves that regularized policy iteration is strictly equivalent to the standard Newton-Raphson method in the condition of smoothing out Bellman equation with strongly convex functions. This equivalence lays the foundation of a unified analysis for both global and local convergence behaviors of regularized policy iteration. We prove that regularized policy iteration has global linear convergence with the rate being $γ$ (discount factor). Furthermore, this algorithm converges quadratically once it enters a local region around the optimal value. We also show that a modified version of regularized policy iteration, i.e., with finite-step policy evaluation, is equivalent to inexact Newton method where the Newton iteration formula is solved with truncated iterations. We prove that the associated algorithm achieves an asymptotic linear convergence rate of $γ^M$ in which $M$ denotes the number of steps carried out in policy evaluation. Our results take a solid step towards a better understanding of the convergence properties of regularized policy iteration algorithms.
Akio Kawasaki, Boris Braverman, Edwin Pedrozo-Peñafiel, Chi Shu, Simone Colombo, Zeyang Li, Özge Özel, Wenlan Chen, Leonardo Salvi, André Heinz, David Levonian, Daisuke Akamatsu, Yanhong Xiao, Vladan Vuletić
Optical cavities are widely used to enhance the interaction between atoms and light. Typical designs using a geometrically symmetric structure in the near-concentric regime face a tradeoff between mechanical stability and high single-atom cooperativity. To overcome this limitation, we design and implement a geometrically asymmetric standing-wave cavity. This structure, with mirrors of very different radii of curvature, allows strong atom-light coupling while exhibiting good stability against misalignment. We observe effective cooperativities ranging from $η_{\rm eff}=10$ to $η_{\rm eff}=0.2$ by shifting the location of the atoms in the cavity mode. By loading $^{171}$Yb atoms directly from a mirror magneto-optical trap into a one-dimensional optical lattice along the cavity mode, we produce atomic ensembles with collective cooperativities up to $Nη=2\times 10^4$. This system opens a way to preparing spin squeezing for an optical lattice clock and to accessing a range of nonclassical collective states.
Zeyang Li, Boris Braverman, Simone Colombo, Chi Shu, Akio Kawasaki, Albert Adiyatullin, Edwin Pedrozo-Peñafiel, Enrique Mendez, Vladan Vuletić
Jun 22, 2021·quant-ph·PDF The interaction between an atomic ensemble and a light mode in a high-finesse optical cavity can easily reach the strong-coupling regime, where quantum effects dominate. In this regime, the interaction can be used to generate both atom-light and atom-atom entanglement. We analyze the dominant effects on the collective atomic state and the light field, and derive a unified approach that can account for atomic entanglement induced both by measurements on the light field, and by ignoring the state of the light field altogether. We present analytical expressions for the entanglement induced by the interaction, and determine the conditions that maximize the entanglement-induced gain over the standard quantum limit in quantum sensors and atomic clocks.
Zeyang Li, Chen Chen, Carlo Fischione
Over-the-air (OTA) computation has emerged as a promising technique for efficiently aggregating data from massive numbers of wireless devices. OTA computations can be performed by analog or digital communications. Analog OTA systems are often constrained by limited function adaptability and their reliance on analog amplitude modulation. On the other hand, digital OTA systems may face limitations such as high computational complexity and limited adaptability to varying network configurations. To address these challenges, this paper proposes a novel digital OTA computation system with a channel-aware constellation design for demodulation mappers. The proposed system dynamically adjusts the constellation based on the channel conditions of participating nodes, enabling reliable computation of various functions. By incorporating channel randomness into the constellation design, the system prevent overlap of constellation points, reduces computational complexity, and mitigates excessive transmit power consumption under poor channel conditions. Numerical results demonstrate that the system achieves reliable NMSE performance across a range of scenarios, offering valuable insights into the choice of signal processing methods and weighting strategies under varying computation point configurations, node counts, and quantization levels. This work advances the state of digital OTA computation by addressing critical challenges in scalability, transmit power consumption, and function adaptability.
Zeyang Li, Simone Colombo, Chi Shu, Gustavo Velez, Saúl Pilatowsky-Cameo, Roman Schmied, Soonwon Choi, Mikhail Lukin, Edwin Pedrozo-Peñafiel, Vladan Vuletić
Dec 24, 2022·quant-ph·PDF Quantum scrambling describes the spreading of local information into many degrees of freedom in quantum systems. This provides the conceptual connection among diverse phenomena ranging from thermalizing quantum dynamics to models of black holes. Here we experimentally probe the exponential scrambling of a multi-particle system near a bistable point in phase space and utilize it for entanglement-enhanced metrology. We use a time-reversal protocol to observe a simultaneous exponential growth of both the metrological gain and the out-of-time-order correlator, thereby experimentally verifying the relation between quantum metrology and quantum information scrambling. Our experiments demonstrate that fast-scrambling dynamics capable of exponentially fast entanglement generation are useful for practical metrology, resulting in 6.8(4) dB gain beyond the Standard Quantum Limit.