"au:"Jun Zhan"" — arXiv2 Search

Showing 1–20 of 25 results

AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling

Jun Zhan, Junqi Dai, Jiasheng Ye, Yunhua Zhou, Dong Zhang, Zhigeng Liu, Xin Zhang, Ruibin Yuan, Ge Zhang, Linyang Li, Hang Yan, Jie Fu, Tao Gui, Tianxiang Sun, Yu-Gang Jiang, Xipeng Qiu

Feb 19, 2024·cs.CL·PDF

We introduce AnyGPT, an any-to-any multimodal language model that utilizes discrete representations for the unified processing of various modalities, including speech, text, images, and music. AnyGPT can be trained stably without any alterations to the current large language model (LLM) architecture or training paradigms. Instead, it relies exclusively on data-level preprocessing, facilitating the seamless integration of new modalities into LLMs, akin to the incorporation of new languages. We build a multimodal text-centric dataset for multimodal alignment pre-training. Utilizing generative models, we synthesize the first large-scale any-to-any multimodal instruction dataset. It consists of 108k samples of multi-turn conversations that intricately interweave various modalities, thus equipping the model to handle arbitrary combinations of multimodal inputs and outputs. Experimental results demonstrate that AnyGPT is capable of facilitating any-to-any multimodal conversation while achieving performance comparable to specialized models across all modalities, proving that discrete representations can effectively and conveniently unify multiple modalities within a language model. Demos are shown in https://junzhan2000.github.io/AnyGPT.github.io/

HFN: Heterogeneous Feature Network for Multivariate Time Series Anomaly Detection

Jun Zhan, Chengkun Wu, Canqun Yang, Qiucheng Miao, Xiandong Ma

Nov 1, 2022·cs.LG·PDF

Network or physical attacks on industrial equipment or computer systems may cause massive losses. Therefore, a quick and accurate anomaly detection (AD) based on monitoring data, especially the multivariate time-series (MTS) data, is of great significance. As the key step of anomaly detection for MTS data, learning the relations among different variables has been explored by many approaches. However, most of the existing approaches do not consider the heterogeneity between variables, that is, different types of variables (continuous numerical variables, discrete categorical variables or hybrid variables) may have different and distinctive edge distributions. In this paper, we propose a novel semi-supervised anomaly detection framework based on a heterogeneous feature network (HFN) for MTS, learning heterogeneous structure information from a mass of unlabeled time-series data to improve the accuracy of anomaly detection, and using attention coefficient to provide an explanation for the detected anomalies. Specifically, we first combine the embedding similarity subgraph generated by sensor embedding and feature value similarity subgraph generated by sensor values to construct a time-series heterogeneous graph, which fully utilizes the rich heterogeneous mutual information among variables. Then, a prediction model containing nodes and channel attentions is jointly optimized to obtain better time-series representations. This approach fuses the state-of-the-art technologies of heterogeneous graph structure learning (HGSL) and representation learning. The experiments on four sensor datasets from real-world applications demonstrate that our approach detects the anomalies more accurately than those baseline approaches, thus providing a basis for the rapid positioning of anomalies.

Loop Current Order on the Kagome Lattice

Jun Zhan, Hendrik Hohmann, Matteo Dürrnagel, Ruiqing Fu, Sen Zhou, Ziqiang Wang, Ronny Thomale, Xianxin Wu, Jiangping Hu

Jun 2, 2025·cond-mat.str-el·PDF

Recent discoveries in kagome materials have unveiled their capacity to harbor exotic quantum states, including intriguing charge density wave (CDW) and superconductivity. Notably, accumulating experimental evidence suggests time-reversal symmetry breaking within the CDW, hinting at the long-pursued loop current order (LCO). Despite extensive research efforts, achieving its model realization and understanding the mechanism through unbiased many-body simulations have remained both elusive and challenging. In this Letter, we develop a microscopic model for LCO on the spinless kagome lattice with nonlocal interactions, utilizing unbiased functional renormalization group calculations to explore ordering tendencies across all two-particle scattering channels. At the Van Hove filling, we identify sublattice interference to suppress onsite CDW order, leaving LCO, charge bond order, and nematic CDW state as the main competitors. Remarkably, a $2\times2$ LCO emerges as the many-body ground state over a significant parameter space with strong second nearest-neighbor repulsion, stemming from the unique interplay between sublattice characters and lattice geometry. The resulting electronic model with LCO bears similarities to the Haldane model and culminates in a quantum anomalous Hall state. We also discuss potential experimental implications for kagome metals.

VStyle: A Benchmark for Voice Style Adaptation with Spoken Instructions

Jun Zhan, Mingyang Han, Yuxuan Xie, Chen Wang, Dong Zhang, Kexin Huang, Haoxiang Shi, DongXiao Wang, Tengtao Song, Qinyuan Cheng, Shimin Li, Jun Song, Xipeng Qiu, Bo Zheng

Sep 9, 2025·cs.SD·PDF

Spoken language models (SLMs) have emerged as a unified paradigm for speech understanding and generation, enabling natural human machine interaction. However, while most progress has focused on semantic accuracy and instruction following, the ability of SLMs to adapt their speaking style based on spoken instructions has received limited attention. We introduce Voice Style Adaptation (VSA), a new task that examines whether SLMs can modify their speaking style, such as timbre, prosody, or persona following natural language spoken commands. To study this task, we present VStyle, a bilingual (Chinese & English) benchmark covering four categories of speech generation: acoustic attributes, natural language instruction, role play, and implicit empathy. We also introduce the Large Audio Language Model as a Judge (LALM as a Judge) framework, which progressively evaluates outputs along textual faithfulness, style adherence, and naturalness, ensuring reproducible and objective assessment. Experiments on commercial systems and open source SLMs demonstrate that current models face clear limitations in controllable style adaptation, highlighting both the novelty and challenge of this task. By releasing VStyle and its evaluation toolkit, we aim to provide the community with a foundation for advancing human centered spoken interaction. The dataset and code are publicly available at \href{https://junzhan2000.github.io/VStyle.github.io/}{project's homepage}.

Detecting pairing symmetry of bilayer nickelates using electronic Raman scattering

Jun Zhan, Matías Bejas, Andreas P. Schnyder, Andrés Greco, Xianxin Wu, Jiangping Hu

Apr 1, 2026·cond-mat.supr-con·PDF

The recent discovery of high-temperature superconductivity in both bulk and thin-film bilayer nickelates La$_3$Ni$_2$O$_7$ has garnered significant attention. However, the corresponding pairing symmetry remains debated in both experiments and theoretical studies due to conflicting experimental evidence from bulk and thin-film materials. In this work, we examine the electronic Raman response across different channels for various pairing symmetries within a two-orbital bilayer model. By comparing Raman susceptibilities obtained from multiorbital and band-additive approaches, we demonstrate that Raman response can distinguish between different pairing symmetries and identify pocket-dependent gap amplitudes for both fully gapped and nodal superconducting states. Specifically, the nodal $d_{x^2-y^2}/d_{xy}$-wave pairing exhibits robust low-energy power-law behavior, distinct from a fully gapped pairing. Additionally, for the $s_{\pm}$-wave pairing, the detailed gap anisotropy on the $β$ pocket can be determined. Possible experimental implications are also discussed. Our results highlight the crucial role of multiorbital effects in shaping the Raman spectra and establish electronic Raman scattering as a powerful and symmetry-resolved probe for determining the superconducting gap in unconventional superconductors.

Aerodynamic Data Predictions Based on Multi-task Learning

Liwei Hu, Yu Xiang, Jun Zhan, Zifang Shi, Wenzheng Wang

Oct 15, 2020·cs.LG·PDF

The quality of datasets is one of the key factors that affect the accuracy of aerodynamic data models. For example, in the uniformly sampled Burgers' dataset, the insufficient high-speed data is overwhelmed by massive low-speed data. Predicting high-speed data is more difficult than predicting low-speed data, owing to that the number of high-speed data is limited, i.e. the quality of the Burgers' dataset is not satisfactory. To improve the quality of datasets, traditional methods usually employ the data resampling technology to produce enough data for the insufficient parts in the original datasets before modeling, which increases computational costs. Recently, the mixtures of experts have been used in natural language processing to deal with different parts of sentences, which provides a solution for eliminating the need for data resampling in aerodynamic data modeling. Motivated by this, we propose the multi-task learning (MTL), a datasets quality-adaptive learning scheme, which combines task allocation and aerodynamic characteristics learning together to disperse the pressure of the entire learning task. The task allocation divides a whole learning task into several independent subtasks, while the aerodynamic characteristics learning learns these subtasks simultaneously to achieve better precision. Two experiments with poor quality datasets are conducted to verify the data quality-adaptivity of the MTL to datasets. The results show than the MTL is more accurate than FCNs and GANs in poor quality datasets.

The $s\pm$ pairing symmetry in the pressured La$_3$Ni$_2$O$_7$ from electron-phonon coupling

Yucong Yin, Jun Zhan, Boyang Liu, Xinloong Han

Feb 28, 2025·cond-mat.supr-con·PDF

The recently discovered bilayer Ruddlesden-Popper nickelate La$_3$Ni$_2$O$_7$ exhibits superconductivity with a remarkable transition temperature $T_c\approx 80 $ K under applied pressures above 14.0 GPa. This discovery of new family of high-temperature superconductors has garnered significant attention in the condensed matter physics community. In this work, we assume the this high-temperature superconductor is mediated by phonons and investigate the pairing symmetry in two distinct models: (i) the full-coupling case, where the Ni-$d_{x^2-y^2}$ and Ni-$d_{3z^2-r^2}$ orbitals are treated equally in both interlayer and intralayer coupling interactions, and (ii) the half-coupling case, where the intralayer coupling involves only the $d_{x^2-y^2}$ orbital, while the interlayer coupling is restricted to the $d_{3z^2-r^2}$ orbital. Our calculations reveal that the interlayer coupling favors an $s\pm$-wave superconducting state, whereas the intralayer coupling promotes an $s++$-wave symmetry. Additionally, we discuss the implications of pair-hopping interactions on the superconducting properties. These findings provide valuable insights into the pairing mechanisms and symmetry of this newly discovered high-temperature superconductor.

Robust topological superconductivity in spin-orbit coupled systems at higher-order van Hove filling

Xinloong Han, Jun Zhan, Fu-chun Zhang, Jiangping Hu, Xianxin Wu

Feb 9, 2023·cond-mat.supr-con·PDF

Van Hove singularities (VHSs) in proximity to the Fermi level promote electronic interactions and generate diverse competing instabilities. It is also known that a nontrivial Berry phase derived from spin-orbit coupling (SOC) can introduce an intriguing decoration into the interactions and thus alter correlated phenomena. However, it is unclear how and what type of new physics can emerge in a system featured by the interplay between VHSs and the Berry phase. Here, based on a general Rashba model on the square lattice, we comprehensively explore such an interplay and its significant influence on the competing electronic instabilities by performing a parquet renormalization group analysis. Despite the existence of a variety of comparable fluctuations in the particle-particle and particle-hole channels associated with higher-order VHSs, we find that the chiral $p \pm ip$ pairings emerge as two stable fixed trajectories within the generic interaction parameter space, namely the system becomes a robust topological superconductor. The chiral pairings stem from the hopping interaction induced by the nontrivial Berry phase. The possible experimental realization and implications are discussed. Our work sheds new light on the correlated states in quantum materials with strong SOC and offers fresh insights into the exploration of topological superconductivity.

Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance

Jiasheng Ye, Peiju Liu, Tianxiang Sun, Jun Zhan, Yunhua Zhou, Xipeng Qiu

Mar 25, 2024·cs.CL·PDF

Pretraining data of large language models composes multiple domains (e.g., web texts, academic papers, codes), whose mixture proportions crucially impact the competence of outcome models. While existing endeavors rely on heuristics or qualitative strategies to tune the proportions, we discover the quantitative predictability of model performance regarding the mixture proportions in function forms, which we refer to as the data mixing laws. Fitting such functions on sample mixtures unveils model performance on unseen mixtures before actual runs, thus guiding the selection of an ideal data mixture. Furthermore, we propose nested use of the scaling laws of training steps, model sizes, and our data mixing law to enable predicting the performance of large models trained on massive data under various mixtures with only small-scale training. Moreover, experimental results verify that our method effectively optimizes the training mixture of a 1B model trained for 100B tokens in RedPajama, reaching a performance comparable to the one trained for 48% more steps on the default mixture. Extending the application of data mixing laws to continual training accurately predicts the critical mixture proportion that avoids catastrophic forgetting and outlooks the potential for dynamic data schedules

Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

Dario Amodei, Rishita Anubhai, Eric Battenberg, Carl Case, Jared Casper, Bryan Catanzaro, Jingdong Chen, Mike Chrzanowski, Adam Coates, Greg Diamos, Erich Elsen, Jesse Engel, Linxi Fan, Christopher Fougner, Tony Han, Awni Hannun, Billy Jun, Patrick LeGresley, Libby Lin, Sharan Narang, Andrew Ng, Sherjil Ozair, Ryan Prenger, Jonathan Raiman, Sanjeev Satheesh, David Seetapun, Shubho Sengupta, Yi Wang, Zhiqian Wang, Chong Wang, Bo Xiao, Dani Yogatama, Jun Zhan, Zhenyao Zhu

Dec 8, 2015·cs.CL·PDF

We show that an end-to-end deep learning approach can be used to recognize either English or Mandarin Chinese speech--two vastly different languages. Because it replaces entire pipelines of hand-engineered components with neural networks, end-to-end learning allows us to handle a diverse variety of speech including noisy environments, accents and different languages. Key to our approach is our application of HPC techniques, resulting in a 7x speedup over our previous system. Because of this efficiency, experiments that previously took weeks now run in days. This enables us to iterate more quickly to identify superior architectures and algorithms. As a result, in several cases, our system is competitive with the transcription of human workers when benchmarked on standard datasets. Finally, using a technique called Batch Dispatch with GPUs in the data center, we show that our system can be inexpensively deployed in an online setting, delivering low latency when serving users at scale.

RIS-Enhanced Information-Decoupled Symbiotic Radio Over Broadcasting Signals

Shu Cai, Ya-Feng Liu, Jun Zhan, Qi Zhang

Jan 18, 2026·eess.SP·PDF

This paper studies a reconfigurable intelligent surface (RIS)-enhanced decoupled symbiotic radio (SR) system in which a primary transmitter delivers common data to multiple primary receivers (PRs), while a RIS-based backscatter device sends secondary data to a backscatter receiver (BRx). Unlike conventional SR, the BRx performs energy detection and never decodes the primary signal, thereby removing ambiguity and preventing exposure of the primary payload to unintended receivers. In this paper, we formulate the problem as the minimization of the transmit power subject to a common broadcast rate constraint across all PRs and a bit error rate (BER) constraint at the BRx. The problem is nonconvex due to the unit-modulus RIS constraint and coupled quadratic forms. Leveraging a rate-balanced reformulation and a monotonic BER ratio characterization, we develop a low-complexity penalty-based block coordinate descent algorithm with closed-form updates. Numerical results show fast convergence of the proposed algorithm and reduced power consumption of the considered RIS-enhanced information-decoupled SR system over conventional SR baselines.

Cooperation between electron-phonon coupling and electronic interaction in bilayer nickelates La$_3$Ni$_2$O$_7$

Jun Zhan, Yuhao Gu, Xianxin Wu, Jiangping Hu

Apr 4, 2024·cond-mat.supr-con·PDF

The recent observation of high-T$_c$ superconductivity in the bilayer nickelate La$_3$Ni$_2$O$_7$ under pressure has garnered significant interests. While researches have predominantly focused on the role of electron-electron interactions in the superconducting mechanism, the impact of electron-phonon coupling (EPC) has remained elusive. In this work, we perform first-principles calculations to study the phonon spectrum and electron-phonon coupling within La$_3$Ni$_2$O$_7$ under pressure and explore of the interplay between EPC and electronic interactions on the superconductivity by employing functional renormalization group approach. Our calculations reveal that EPC alone is insufficient to trigger superconductivity in La$_3$Ni$_2$O$_7$ under pressure. We identify unique out-of-plane and in-plane breathing phonon modes which selectively couple with the Ni $d_{z^2}$ and $d_{x^2-y^2}$ orbitals, showcasing an orbital-selective EPC. Within the bilayer two-orbital model, it is revealed that solely electronic interactions foster $s_{\pm}$-wave pairing characterized by notable frustration in the band space, leading to a low transition temperature. Remarkably, we find that this out-of-plane EPC can act in concert with electronic interactions to promote the onsite and interlayer pairing in the $d_{z^2}$ orbital, partially releasing the pairing frustration and thus elevating T$_c$. In contrast, the inclusion of in-plane EPC only marginally affects the superconductivity, distinct from the cuprates. Potential experimental implications in La$_3$Ni$_2$O$_7$ are also discussed.

Time-Reversal Symmetry Breaking Superconducting State and Collective Modes in Kagome Superconductors

Xinloong Han, Jun Zhan, Jiangping Hu, Fu-chun Zhang, Xianxin Wu

Dec 31, 2025·cond-mat.supr-con·PDF

We comprehensively study the unconventional pairing and collective modes in the multiband kagome superconductors AV$_3$Sb$_5$ (A=$\mathrm{K},\mathrm{Cs},\mathrm{Rb}$). By solving gap equations at zero temperature, we identify a transition from normal $s++/s\pm$-wave pairing to time-reversal symmetry (TRS) breaking pairing with a variation of inter-pocket interactions or density of states. This TRS breaking pairing originates from the superconducting phase frustration of different Fermi pockets and can account for experimental TRS breaking signal in kagome superconductors. Moreover, we investigate collective modes, including the Higgs, Leggett, and Bogoloubov-Anderson-Goldstone modes, arising from fluctuations of the amplitude, relative phase, and overall phase of the superconducting order parameters, respectively. Remarkably, due to the presence of multibands, one branch of the Leggett modes becomes nearly massless near the TRS breaking transition, providing a compelling smoking-gun signature of TRS-breaking superconductivity, in clear contrast to TRS-breaking charge orders. Our results elucidate the rich superconducting physics and its associated collective modes in kagome metals, and suggest feasible experimental detection of TRS breaking pairing.

Impact of Nonlocal Coulomb Repulsion on Superconductivity and Density-Wave Orders in Bilayer Nickelates

Jun Zhan, Congcong Le, Xianxin Wu, Jiangping Hu

Mar 24, 2025·cond-mat.supr-con·PDF

The recent discovery of high-temperature superconductivity in pressurized bilayer nickelate La$_3$Ni$_2$O$_7$ and its thin films has generated significant interest in uncovering the underlying pairing mechanisms and correlated electronic states. While earlier theoretical studies have mainly focused on onsite Coulomb interactions, the role of nonlocal Coulomb repulsion has remained largely unexplored. In this work, we systematically investigate the effects of nonlocal Coulomb interactions, in the presence of onsite interactions, on both superconducting and density-wave instabilities using the functional renormalization group (FRG) approach. We find that the interlayer intraorbital repulsion suppresses the interlayer intraorbital $s_{\pm}$-wave pairing and spin-density-wave (SDW) order, while promoting a transition to an interlayer interorbital $d_{x^2-y^2}$-wave pairing state and a mirror-symmetry-breaking charge order. Remarkably, the critical scale of the interorbital $d_{x^2-y^2}$-wave superconductivity is significantly lower than that of the intraorbital $s_{\pm}$-wave superconductivity, indicating that the former is unlikely to account for the observed high-$T_c$ superconductivity. Moreover, the interlayer interorbital repulsion suppresses this $d_{x^2-y^2}$-wave pairing but enhances the $s_{\pm}$-wave pairing through strengthened interlayer charge fluctuations. In addition, the intralayer nearest-neighbor repulsion favors an in-plane charge-density-wave (CDW) order with wave vector $(π,π)$. Our findings reveal the profound impact of nonlocal Coulomb repulsion and underscore the robustness of interlayer pairing rooted in the bilayer structure and multi-orbital nature, thereby advancing the understanding of the intricate correlation effects in bilayer nickelates.

Raman response in superconducting multiorbital systems with application to nickelates

Matías Bejas, Jun Zhan, Xianxin Wu, Andreas P. Schnyder, Andrés Greco

Apr 13, 2026·cond-mat.supr-con·PDF

The recent discovery of high-$T_c$ superconductivity in pressurized and thin film nickelates is nowadays one of the most relevant and active topics in solid-state physics. The origin of superconductivity together with the relevance of multiorbital physics are highly discussed issues in this field. Knowledge of the size of the gap and its symmetry is of fundamental interest to uncover the superconducting mechanism at play in the nickelates. Electronic Raman scattering is a powerful tool to investigate the main characteristics of the gap. Here, we investigate the Raman response in the superconducting phase for three different models: Two-orbital models, including $d_{x^2-y^2}$ and $d_{z^2}$ orbitals, with one and two layers; as well as a bilayer model with the $d_{x^2-y^2}$ orbital as the only active one. For each of these models, we consider different pairing symmetries and determine their characteristic fingerprints in the Raman response. For the two-orbital models, we perform full multiorbital calculations including interorbital and intraorbital scattering, and compare the results with those obtained using the additive Raman response where each band is considered separately. Our results should be useful for discussing the minimal model for superconductivity and its pairing symmetry in nickelates. The obtained results and discussions, as well as the presented formalism, are also of general interest for other multiorbital systems.

SpeechGPT-Gen: Scaling Chain-of-Information Speech Generation

Dong Zhang, Xin Zhang, Jun Zhan, Shimin Li, Yaqian Zhou, Xipeng Qiu

Jan 24, 2024·cs.CL·PDF

Benefiting from effective speech modeling, current Speech Large Language Models (SLLMs) have demonstrated exceptional capabilities in in-context speech generation and efficient generalization to unseen speakers. However, the prevailing information modeling process is encumbered by certain redundancies, leading to inefficiencies in speech generation. We propose Chain-of-Information Generation (CoIG), a method for decoupling semantic and perceptual information in large-scale speech generation. Building on this, we develop SpeechGPT-Gen, an 8-billion-parameter SLLM efficient in semantic and perceptual information modeling. It comprises an autoregressive model based on LLM for semantic information modeling and a non-autoregressive model employing flow matching for perceptual information modeling. Additionally, we introduce the novel approach of infusing semantic information into the prior distribution to enhance the efficiency of flow matching. Extensive experimental results demonstrate that SpeechGPT-Gen markedly excels in zero-shot text-to-speech, zero-shot voice conversion, and speech-to-speech dialogue, underscoring CoIG's remarkable proficiency in capturing and modeling speech's semantic and perceptual dimensions. Code and models are available at https://github.com/0nutation/SpeechGPT.

Electronic Origin of Density Wave Orders in a Trilayer Nickelate

Jiangang Yang, Jun Zhan, Taimin Miao, Mengwu Huo, Qichen Xu, Yinghao Li, Yuyang Xie, Bo Liang, Neng Cai, Hao Chen, Wenpei Zhu, Mingkai Xu, Shenjin Zhang, Fengfeng Zhang, Feng Yang, Zhimin Wang, Qinjun Peng, Hanqing Mao, Xintong Li, Zhihai Zhu, Guodong Liu, Zuyan Xu, Jiangping Hu, Xianxin Wu, Meng Wang, Lin Zhao, X. J. Zhou

Jan 30, 2026·cond-mat.supr-con·PDF

The discovery of superconductivity in Ruddlesden-Popper nickelates has established a new frontier in the study of high-temperature superconductors. However, the underlying pairing mechanism and its relationship to the material's electronic and magnetic ground states remain elusive. Since unconventional superconductivity often emerges from a complex interplay of magnetic correlations, elucidating the magnetic ground state of the nickelates at ambient pressure is crucial for understanding the emergence of superconductivity under high pressure. Here, we combine high-resolution angle-resolved photoemission spectroscopy with tight-binding model simulation to investigate the electronic structure of the representative trilayer Ruddlesden-Popper nickelate La$_4$Ni$_3$O$_{10}$. We provide the first experimental evidence of band splitting induced by interlayer coupling and further resolve the momentum-dependent density wave gap structures along all the Fermi surfaces. Our findings identify the mirror-selective Fermi surface nesting as the origin of the interlayer antiferromagnetic spin density wave and demonstrate the dominant role of Ni-3d$_{z^2}$ orbitals in the low-energy physics of La$_4$Ni$_3$O$_{10}$. These results provide a fundamental framework for understanding the magnetic interactions and high-temperature superconductivity mechanism in the Ruddlesden-Popper nickelate family.

MOSS-TTSD: Text to Spoken Dialogue Generation

Yuqian Zhang, Donghua Yu, Zhengyuan Lin, Botian Jiang, Mingshu Chen, Yaozhou Jiang, Yiwei Zhao, Yiyang Zhang, Yucheng Yuan, Hanfu Chen, Kexin Huang, Jun Zhan, Cheng Chang, Zhaoye Fei, Shimin Li, Xiaogui Yang, Qinyuan Cheng, Xipeng Qiu

Mar 20, 2026·cs.SD·PDF

Spoken dialogue generation is crucial for applications like podcasts, dynamic commentary, and entertainment content, but poses significant challenges compared to single-utterance text-to-speech (TTS). Key requirements include accurate turn-taking, cross-turn acoustic consistency, and long-form stability, which current models often fail to address due to a lack of dialogue context modeling. To bridge this gap, we present MOSS-TTSD, a spoken dialogue synthesis model designed for expressive, multi-party conversational speech across multiple languages. With enhanced long-context modeling, MOSS-TTSD generates long-form spoken conversations from dialogue scripts with explicit speaker tags, supporting up to 60 minutes of single-pass synthesis, multi-party dialogue with up to 5 speakers, and zero-shot voice cloning from a short reference audio clip. The model supports various mainstream languages, including English and Chinese, and is adapted to several long-form scenarios. Additionally, to address limitations of existing evaluation methods, we propose TTSD-eval, an objective evaluation framework based on forced alignment that measures speaker attribution accuracy and speaker similarity without relying on speaker diarization tools. Both objective and subjective evaluation results show that MOSS-TTSD surpasses strong open-source and proprietary baselines in dialogue synthesis.

Visualizing the interplay of dual electronic nematicities in kagome superconductors

Yunmei Zhang, Jun Zhan, Ping Wu, Yun-Peng Huang, Qixiao Yuan, Hongyu Li, Zhuying Wang, Wanru Ma, Shuikang Yu, Kunming Zhang, Wanlin Cheng, Deshu Chen, Minrui Chen, Tao Wu, Ziji Xiang, Xianxin Wu, Zhenyu Wang, Xianhui Chen

Apr 7, 2026·cond-mat.supr-con·PDF

Kagome superconductor AV$_3$Sb$_5$ (A stands for K, Rb, and Cs) hosts a wealth of intertwined electronic orders driven by geometric frustration and electron correlations. Among them, the breaking of rotational and/or time-reversal symmetry, observed within the triple-$Q$ charge density wave (CDW) phase yet exhibiting a more complex temperature dependence, remains a central puzzle. Here, by using scanning tunneling microscopy to study the electronic structures of CsV$_3$Sb$_5$ as a function of temperature and Ti doping, we disentangle the interrelation between two distinct nematic order parameters, one associated with the CDW and the other manifested as $C_2$ distortion of the V-$d_{x^{2}-y^{2}}$ Fermi pockets without breaking transition symmetry. The latter persists to high doping levels and high temperatures where the long-range CDW is fully suppressed. Moreover, its nematic director is oriented in a lattice direction distinct from that of the CDW-induced nematicity at intermediate doping, and eventually aligns with the strong nematic CDW order in the pristine compound where the quasiparticles of vanadium orbitals become coherent below a lower characteristic temperature. These observations, combined with Ginzburg-Landau analysis, reveal a rich interplay between two nematic orders that can be assigned to distinct kagome-lattice orbitals. Our results shed new light on the enigmatic intertwined orders in this family and establish a rare material platform in which dual nematic orders coexist and couple to give rise to unusual correlated phenomena.

Mirror-Selective Quasiparticle Interference in Bilayer Nickelate Superconductor

Zhongyi Zhang, Jun Zhan, Congcong Le, Hoi Chun Po, Jiangping Hu, Xianxin Wu

Dec 16, 2025·cond-mat.supr-con·PDF

The recent discovery of high-temperature superconductivity in both bulk and thin-film bilayer nickelates has garnered significant attention. In this study, inspired by recent STM experiments on thin films, we investigate the quasiparticle interference (QPI) characteristics of bilayer nickelates in both normal and superconducting states to identify their Fermiology and pairing symmetry. We demonstrate that the mirror symmetry inherent in the bilayer structure induces mirror-selective quasiparticle scattering by establishing selection rules based on the mirror properties of impurities and the mirror eigenvalues of electronic wavefunctions. This mirror-selective scattering allows for the differentiation of distinct Fermiologies, as QPI patterns vary markedly between scenarios with and without the $d_{z^2}$-bonding Fermi surface (FS). Furthermore, it enables the separate detection of sign changes in superconducting gaps both within the same FS and between different FSs. Crucially, if the mirror-symmetry-enforced selection rules are ignored, the QPI response of an $s_\pm$-wave state can masquerade as that of a conventional $s$-wave state, leading to a misidentification of the pairing symmetry. When combined with field-dependent and reference QPI measurements, this approach facilitates the precise determination of pairing symmetry, even in the presence of FS-dependent gaps and gap anisotropy. Additionally, we discuss practical considerations for STM measurements to effectively identify the pairing symmetry. Our findings demonstrate that mirror-selective QPI is a powerful tool for distinguishing between different Fermiologies and pairing states, which is helpful in pinning down pairing symmetry and revealing the pairing mechanism in bilayer nickelates.

Impact of Nonlocal Coulomb Repulsion on Superconductivity and Density-Wave Orders in Bilayer Nickelates

Jun Zhan, Congcong Le, Xianxin Wu, Jiangping Hu

Mar 24, 2025·cond-mat.supr-con·PDF