"au:"Guang Shi"" — arXiv2 Search

Showing 1–20 of 43 results

From Hi-C Contact Map to Three-dimensional Organization of Interphase Human Chromosomes

Guang Shi, D. Thirumalai

Jun 3, 2020·cond-mat.soft·PDF

The probability of two loci, separated by a certain genome length, being in contact can be inferred using the Chromosome Conformation Capture (3C) method and related Hi-C experiments. How to go from the contact map, a matrix listing the mean contact probabilities between a large number of pairs of loci, to an ensemble of three-dimensional structures is an open problem. A solution to this problem, without assuming an assumed energy function, would be the first step in understanding the way nature has solved the packaging of chromosomes in tight cellular spaces. We created a theory, based on polymer physics characteristics of chromosomes and the maximum entropy principles, referred to as HIPPS (Hi-C-Polymer-Physics-Structures) method, that allows us to calculate the 3D structures solely from Hi-C contact maps. We created an ensemble of 3D structures for the 23 chromosomes from lymphoblastoid cells using the measured contact maps as inputs. The HIPPS method shows that conformations of chromosomes are heterogeneous even in a single cell type. The differences in the conformational heterogeneity of the same chromosome in different cell types (normal as well as cancerous cells) can also be quantitatively discerned using our theory. We validate the method by showing that the calculated volumes of the 23 chromosomes from the predicted 3D structures are in good agreement with experimental estimates. Because the method is general, the 3D structures for any species may be calculated directly from the contact map without the need to assume a specific polymer model, as is customarily done.

Conformational Heterogeneity in Human Interphase Chromosome Organization Reconciles the FISH and Hi-C Paradox

Guang Shi, D. Thirumalai

Apr 20, 2019·cond-mat.soft·PDF

Hi-C experiments are used to infer the contact probabilities between loci separated by varying genome lengths. Contact probability should decrease as the spatial distance between two loci increases. However, studies comparing Hi-C and FISH data show that in some cases the distance between one pair of loci, with larger Hi-C readout, is paradoxically larger compared to another pair with a smaller value of the contact probability. Here, we show that the FISH-Hi-C paradox can be resolved using a theory based on a Generalized Rouse Model for Chromosomes (GRMC). The FISH-Hi-C paradox arises because the cell population is highly heterogeneous, which means that a given contact is present in only a fraction of cells. Insights from the GRMC is used to construct a theory, without any adjustable parameters, to extract the distribution of subpopulations from the FISH data, which quantitatively reproduces the Hi-C data. Our results show that heterogeneity is pervasive in genome organization at all length scales, reflecting large cell-to-cell variations.

The Goldstine-Weston theorem in random normed modules

Guang Shi

Oct 21, 2010·math.FA·PDF

This article generalize the classical Goldstine-Weston theorem on normed spaces to one on random normed modules: the image of a random normed module $(E,\|\cdot\|)$ under the random natural embedding $J$ is dense in its double random conjugate space $E^{**}$ with respect to the $(ε,λ)$ weak star topology; and $J(E)$ is also dense in $E^{**}$ with respect to the locally $L^{0}$-convex weak star topology if $E$ has the countable concatenation property.

Quasi--bases for Modules over a Commutative Ring

Guang Shi

Jan 28, 2012·math.RA·PDF

In this paper we present the definition of quasi-bases for modules over a ring that is commutative but not necessarily division and discuss properties that guarantee the existence of quasi-bases. Based on this result we further prove that every finitely generated module over $L^{0}(\mathcal{F},K)$ has a quasi-basis, where $K$ is the scalar field of real numbers or complex numbers and $L^{0}(\mathcal{F},K)$ is the algebra of equivalence classes of $K$--valued random variables defined on a probability space $(Ω,\mathcal{F},P)$.

Theory of the center-of-mass diffusion and viscosity of microstructured and variable sequence copolymer liquids

Guang Shi, Kenneth S. Schweizer

Oct 6, 2023·cond-mat.soft·PDF

Biomolecular condensates formed through the phase separation of proteins and nucleic acids are widely observed, offering a fundamental means of organizing intracellular materials in a membrane-less fashion. Traditionally, these condensates have been regarded as homogeneous isotropic liquids. However, in analogy with some synthetic copolymer systems, our recent theoretical research has demonstrated that model biomolecular condensates can exhibit a microemulsion-like internal structure, contingent upon the specific sequence, inter-chain site-site interactions, and concentrated phase polymer density. In this study, we present a microscopic dynamical theory for the self-diffusion constant and viscosity of concentrated unentangled A/B regular multiblock copolymer solutions. Our approach integrates static equilibrium local and microdomain scale structural information obtained from PRISM integral equation theory and the time evolution of the autocorrelation function of monomer scale forces at the center-of-mass level that determine the polymer diffusion constant and viscosity in a weak caging regime far from a glass or gel transition. We focus on regular multi-block systems both for simplicity and for its relevance to synthetic macromolecular science. The impact of sequence and inter-chain attraction strength on the slowing down of copolymer mass transport and flow due to local clustering enhanced collisional friction and retardation of motion due to emergent microdomain scale ordering are established. Analytic analysis and metrics employed in the study of biomolecular condensates are employed to identify key order parameters that quantity how attractive forces, packing structure, multiblock sequence, and copolymer density determine dynamical slowing down above and below the crossover to a fluctuating polymeric microemulsion state.

The confirmation and revision on the orbital period change of the possible type Ia supernova progenitor V617 Sagittarii

Guang Shi, Sheng-Bang Qian, Eduardo Fernández Lajús

Oct 18, 2013·astro-ph.SR·PDF

This work reports new photometric results of eclipsing cataclysmic variable V617 Sagittarii (V617 Sgr). We analyzed the orbital period change of V617 Sgr, by employing three new CCD eclipse timings since 2010 along with all the available data from the literature. It was found that the orbital period of V617 Sgr undergoes an obvious long-term increase, which confirms the result revealed by Steiner et al. (2006). The rate of orbital period increase was calculated to be ${\dot{P}}$ = +2.14(0.05) $\times$ 10$^{-7}$ day/year. This suggests the lifetime of the secondary star will attain to the end in a timescale of 0.97 $\times$ 10$^6$ years faster than that predicted previously. In particular, a cyclic variation with a period of 4.5 year and an amplitude of 2.3 minutes may present in the O-C diagram. Dominated by the wind-accretion mechanism, high mass transfer from the low mass secondary to the white dwarf is expected to sustain in the V Sge-type star V617 Sgr during its long-term evolution. The mass transfer rate $|\dot{M}_{tr}|$ was estimated to be in the range of about 2.2 $\times$ 10$^{-7}$ to 5.2 $\times$ 10$^{-7}$ M$_{\odot}$ yr$^{-1}$. Accordingly, the already massive ($\geq$ 1.2 M$_{\odot}$) white dwarf primary will process stable nuclear burning, accrete a fraction of mass from its companion to reach the standard Chandrasekhar mass limit ($\simeq$ 1.38 M$_{\odot}$), and ultimately produce a type Ia supernova (SN Ia) within about 4 $\sim$ 8 $\times$ 10$^{5}$ years or earlier.

Epigenetic state encodes locus-specific chromatin mechanics

Guang Shi, D. Thirumalai

Dec 28, 2025·cond-mat.soft·PDF

Chromatin is repeatedly deformed in vivo during transcription, nuclear remodeling, and confined migration - yet how mechanical response varies from locus to locus, and how it relates to epigenetic state, remains unclear. We develop a theory to infer locus-specific viscoelasticity from three-dimensional genome organization. Using chromatin structures derived from contact maps, we calculate frequency-dependent storage and loss moduli for individual loci and establish that the mechanical properties are determined both by chromatin epigenetic marks and organization. On large length scales, chromatin exhibits Rouse-like viscoelastic scaling, but this coarse behavior masks extensive heterogeneity at the single-locus level. Loci segregate into two mechanical subpopulations with distinct longest relaxation times: one characterized by single-timescale and another by multi-timescale relaxation. The multi-timescale loci are strongly enriched in active marks, and the longest relaxation time for individual loci correlates inversely with effective local stiffness. Pull-release simulations further predict a time-dependent susceptibility: H3K27ac-rich loci deform more under sustained forcing yet can resist brief, large impulses. At finer genomic scales, promoters, enhancers, and gene bodies emerge as "viscoelastic islands" aligned with their focal interactions. Together, these results suggest that chromatin viscoelasticity is an organized, epigenetically coupled property of the 3D genome, providing a mechanistic layer that may influence enhancer-promoter communication, condensate-mediated organization, and response to cellular mechanical stress. The prediction that locus-specific mechanics in chromatin are controlled by 3D structures as well as the epigenetic states is amenable to experimental test.

A maximum-entropy model to predict 3D structural ensembles of chromatins from pairwise distances: Applications to Interphase Chromosomes and Structural Variants

Guang Shi, D. Thirumalai

Mar 15, 2022·cond-mat.soft·PDF

The principles that govern the organization of genomes, which are needed for a deeper understanding of how chromosomes are packaged and function in eukaryotic cells, could be deciphered if the three-dimensional (3D) structures are known. Recently, single-cell imaging experiments have determined the 3D coordinates of a number of loci in a chromosome. Here, we introduce a computational method (Distance Matrix to Ensemble of Structures, DIMES), based on the maximum entropy principle, with experimental pair-wise distances between loci as constraints, to generate a unique ensemble of 3D chromatin structures. Using the ensemble of structures, we quantitatively account for the distribution of pair-wise distances, three-body co-localization and higher-order interactions. We demonstrate that the DIMES method can be applied to both small length-scale and chromosome-scale imaging data to quantify the extent of heterogeneity and fluctuations in the shapes on various length scales. We develop a perturbation method that is used in conjunction with DIMES to predict the changes in 3D structures from structural variations. Our method also reveals quantitative differences between the 3D structures inferred from Hi-C and the ones measured in imaging experiments. Finally, the physical interpretation of the parameters extracted from DIMES provides insights into the origin of phase separation between euchromatin and heterochromatin domains.

Frequency-modulated continuous-wave laser distance measurement system using Fabry- Perot cavity as measuring reference

Guang Shi, Kefei Hei, Wen Wang, Nandini Bhattacharya

Jan 2, 2019·physics.ins-det·PDF

Frequency-modulated continuous-wave (FMCW) is a ranging technique that allows for high precision distance measurement over long distances. Scanning nonlinearity and range of the tunable laser are the main factors affecting the measurement accuracy. Frequency-sampling method is a recognized post-processing scheme to compensate the scanning nonlinearity. In this work, an FMCW laser distance measurement system using a high fineness Fabry-Perot (F-P) cavity as a sampling reference is demonstrated. The frequency of the resampled signal is calculated with a Hilbert transform. The high stability of the F-P cavity and the advantages of the Hilbert transform lead to a high measurement precision when an external cavity diode laser (ECDL) with a scanning range of tens of GHz is available. In this experiment, the scanning range of the ECDL is only 88 GHz, and a measurement uncertainty of 76.8 um (with coverage factor of k = 2) within a distance of 6.7 m is demonstrated.

Organization and Dynamics of Chromosomes

D. Thirumalai, Guang Shi, Sucheol Shin, Changbong Hyeon

Oct 2, 2024·cond-mat.soft·PDF

How long threadlike eukaryotic chromosomes fit tidily in the small volume of the nucleus without significant entanglement is just beginning to be understood, thanks to major advances in experimental techniques. Several polymer models, which reproduce contact maps that measure the probabilities that two loci are in spatial contact, have predicted the three-dimensional structures of interphase chromosomes. Data-driven approaches, using contact maps as input, predict that mitotic helical chromosomes are characterized by switch in handedness, referred to as "perversion". By using experimentally derived effective interactions between chromatin loci in simulations, structures of conventional and inverted nuclei have been accurately predicted. Polymer theory and simulations show that the dynamics of individual loci in chromatin exhibit subdiffusive behavior but the diffusion exponents are broadly distributed, which accords well with experiments. Although coarse-grained models are successful, many challenging problems remain, which require the creation of new experimental and computational tools to understand genome biology.

The Algebraic Structure of Finitely Generated $L^{0}(\mathcal{F},K)$-Modules and the Helly Theorem in Random Normed Modules

Tiexin Guo, Guang Shi

Sep 27, 2010·math.FA·PDF

Let $K$ be the scalar field of real numbers or complex numbers and $L^{0}(\mathcal{F},K)$ the algebra of equivalence classes of $K-$valued random variables defined on a probability space $(Ω,\mathcal{F},P)$. In this paper, we first characterize the algebraic structure of finitely generated $L^{0}(\mathcal{F},K)$-modules and then combining the recently developed separation theorem in random locally convex modules we prove the Helly theorem in random normed modules with the countable concatenation property under the framework of random conjugate spaces at the same time a simple counterexample shows that it is necessary to require the countable concatenation property. By the way,we also give an application to the existence problem of the random solution of a system of random linear functional equations.

Static Three-Dimensional Structures Determine Fast Dynamics Between Distal Loci Pairs in Interphase Chromosomes

Guang Shi, Sucheol Shin, D. Thirumalai

Jan 17, 2025·physics.bio-ph·PDF

Live-cell imaging experiments have shown that the distal dynamics between enhancers and promoters are unexpectedly rapid and incompatible with standard polymer models. The discordance between the compact static chromatin organization and dynamics is a conundrum that violates the expected structure-function relationship. We developed a theory to predict chromatin dynamics by accurately determining three-dimensional (3D) structures from static Hi-C contact maps or fixed-cell imaging data. Using the calculated 3D coordinates, the theory accurately forecasts experimentally observed two-point chromatin dynamics. It predicts rapid enhancer-promoter interactions and uncovers a scaling relationship between two-point relaxation time and genomic separation, closely matching recent measurements. The theory predicts that cohesin depletion accelerates single-locus diffusion while significantly slowing relaxation dynamics within topologically associating domains (TADs). Our results demonstrate that chromatin dynamics can be reliably inferred from static structural data, reinforcing the notion that 3D chromatin structure governs dynamic behavior. This general framework offers powerful tools for exploring chromatin dynamics across diverse biological contexts.

Different Electronic Charges in Two-Component Superconductor by Coherent State

Xu Guang Shi

Jul 14, 2016·cond-mat.supr-con·PDF

Recently, the different electronic charges, which are related to the different coupling constants with magnetic field, in the two-component superconductor have been studied in frame of Ginzburg-Landau theory. In order to study the electronic charges in detail we suggest the wave function in the two-component superconductor to be coherent state. We find the different electronic charges exist not only in the coherent state but the incoherent state. But the ratio of the different charges in the coherent state is different from ratio in the incoherence. The expressions of the coupling constants are given directly based on the coherence effects. We also discuss the winding number in such system.

Theory and Simulations of condensin mediated loop extrusion in DNA

Ryota Takaki, Atreya Dey, Guang Shi, Dave Thirumalai

May 29, 2020·cond-mat.soft·PDF

Condensation of hundreds of mega-base-pair-long human chromosomes in a small nuclear volume is a spectacular phenomenon. This process is driven by the formation of chromosome loops. ATP consuming motor, condensin, interacts with chromatin segments to actively extrude loops. Motivated by real-time imaging of loop extrusion (LE) and measurements using magnetic tweezer experiments, we created an analytically solvable model, predicting the LE velocity and step size distribution as a function of external load. The theory fits the experimental data quantitatively, and suggests that condensin must undergo a large conformational change, induced by ATP binding, bringing distant parts of the motor to proximity. Simulations using a simple model confirm that the motor transitions between an open to closed state in order to extrude loops by a scrunching mechanism, similar to that proposed in DNA bubble formation during bacterial transcription. Changes in the orientation of the motor domains are transmitted over $\sim$ 50 nm, connecting the motor head and the hinge, thus providing an allosteric basis for LE.

RFOT theory for glassy dynamics in a single condensed polymer

Hyun Woo Cho, Guang Shi, T. R. Kirkpatrick, D. Thirumalai

Sep 11, 2020·cond-mat.soft·PDF

The number of compact structures of a single condensed polymer (SCP), with similar free energies, grows exponentially with the degree of polymerization. In analogy with structural glasses (SGs), we expect that at low temperatures chain relaxation should occur by activated transitions between the compact metastable states. By evolving the states of the SCP that is linearly coupled to a reference state, we show that, below a dynamical transition temperature ($T_d$), the SCP is trapped in a metastable state leading to slow dynamics. At a lower temperature, $T_K \ne 0$, the configurational entropy vanishes, resulting in a thermodynamic random first order ideal glass transition. The relaxation time obeys the Vogel-Fulcher-Tamman law, diverging at $T=T_0 \approx T_K$. These findings, accord well with the random first order transition theory, establishing that SCP and SG exhibit similar universal characteristics.

Cancer Subtyping by Improved Transcriptomic Features Using Vector Quantized Variational Autoencoder

Zheng Chen, Ziwei Yang, Lingwei Zhu, Guang Shi, Kun Yue, Takashi Matsubara, Shigehiko Kanaya, MD Altaf-Ul-Amin

Jul 20, 2022·cs.LG·PDF

Defining and separating cancer subtypes is essential for facilitating personalized therapy modality and prognosis of patients. The definition of subtypes has been constantly recalibrated as a result of our deepened understanding. During this recalibration, researchers often rely on clustering of cancer data to provide an intuitive visual reference that could reveal the intrinsic characteristics of subtypes. The data being clustered are often omics data such as transcriptomics that have strong correlations to the underlying biological mechanism. However, while existing studies have shown promising results, they suffer from issues associated with omics data: sample scarcity and high dimensionality. As such, existing methods often impose unrealistic assumptions to extract useful features from the data while avoiding overfitting to spurious correlations. In this paper, we propose to leverage a recent strong generative model, Vector Quantized Variational AutoEncoder (VQ-VAE), to tackle the data issues and extract informative latent features that are crucial to the quality of subsequent clustering by retaining only information relevant to reconstructing the input. VQ-VAE does not impose strict assumptions and hence its latent features are better representations of the input, capable of yielding superior clustering performance with any mainstream clustering method. Extensive experiments and medical analysis on multiple datasets comprising 10 distinct cancers demonstrate the VQ-VAE clustering results can significantly and robustly improve prognosis over prevalent subtyping systems.

Adaptive Spike-Like Representation of EEG Signals for Sleep Stages Scoring

Lingwei Zhu, Koki Odani, Ziwei Yang, Guang Shi, Yirong Kan, Zheng Chen, Renyuan Zhang

Apr 2, 2022·eess.SP·PDF

Recently there has seen promising results on automatic stage scoring by extracting spatio-temporal features from electroencephalogram (EEG). Such methods entail laborious manual feature engineering and domain knowledge. In this study, we propose an adaptive scheme to probabilistically encode, filter and accumulate the input signals and weight the resultant features by the half-Gaussian probabilities of signal intensities. The adaptive representations are subsequently fed into a transformer model to automatically mine the relevance between features and corresponding stages. Extensive experiments on the largest public dataset against state-of-the-art methods validate the effectiveness of our proposed method and reveal promising future directions.

UI-TARS: Pioneering Automated GUI Interaction with Native Agents

Yujia Qin, Yining Ye, Junjie Fang, Haoming Wang, Shihao Liang, Shizuo Tian, Junda Zhang, Jiahao Li, Yunxin Li, Shijue Huang, Wanjun Zhong, Kuanye Li, Jiale Yang, Yu Miao, Woyu Lin, Longxiang Liu, Xu Jiang, Qianli Ma, Jingyu Li, Xiaojun Xiao, Kai Cai, Chuang Li, Yaowei Zheng, Chaolin Jin, Chen Li, Xiao Zhou, Minchao Wang, Haoli Chen, Zhaojian Li, Haihua Yang, Haifeng Liu, Feng Lin, Tao Peng, Xin Liu, Guang Shi

Jan 21, 2025·cs.AI·PDF

This paper introduces UI-TARS, a native GUI agent model that solely perceives the screenshots as input and performs human-like interactions (e.g., keyboard and mouse operations). Unlike prevailing agent frameworks that depend on heavily wrapped commercial models (e.g., GPT-4o) with expert-crafted prompts and workflows, UI-TARS is an end-to-end model that outperforms these sophisticated frameworks. Experiments demonstrate its superior performance: UI-TARS achieves SOTA performance in 10+ GUI agent benchmarks evaluating perception, grounding, and GUI task execution. Notably, in the OSWorld benchmark, UI-TARS achieves scores of 24.6 with 50 steps and 22.7 with 15 steps, outperforming Claude (22.0 and 14.9 respectively). In AndroidWorld, UI-TARS achieves 46.6, surpassing GPT-4o (34.5). UI-TARS incorporates several key innovations: (1) Enhanced Perception: leveraging a large-scale dataset of GUI screenshots for context-aware understanding of UI elements and precise captioning; (2) Unified Action Modeling, which standardizes actions into a unified space across platforms and achieves precise grounding and interaction through large-scale action traces; (3) System-2 Reasoning, which incorporates deliberate reasoning into multi-step decision making, involving multiple reasoning patterns such as task decomposition, reflection thinking, milestone recognition, etc. (4) Iterative Training with Reflective Online Traces, which addresses the data bottleneck by automatically collecting, filtering, and reflectively refining new interaction traces on hundreds of virtual machines. Through iterative training and reflection tuning, UI-TARS continuously learns from its mistakes and adapts to unforeseen situations with minimal human intervention. We also analyze the evolution path of GUI agents to guide the further development of this domain.

Seedream 4.0: Toward Next-generation Multimodal Image Generation

Team Seedream, :, Yunpeng Chen, Yu Gao, Lixue Gong, Meng Guo, Qiushan Guo, Zhiyao Guo, Xiaoxia Hou, Weilin Huang, Yixuan Huang, Xiaowen Jian, Huafeng Kuang, Zhichao Lai, Fanshi Li, Liang Li, Xiaochen Lian, Chao Liao, Liyang Liu, Wei Liu, Yanzuo Lu, Zhengxiong Luo, Tongtong Ou, Guang Shi, Yichun Shi, Shiqi Sun, Yu Tian, Zhi Tian, Peng Wang, Rui Wang, Xun Wang, Ye Wang, Guofeng Wu, Jie Wu, Wenxu Wu, Yonghui Wu, Xin Xia, Xuefeng Xiao, Shuang Xu, Xin Yan, Ceyuan Yang, Jianchao Yang, Zhonghua Zhai, Chenlin Zhang, Heng Zhang, Qi Zhang, Xinyu Zhang, Yuwei Zhang, Shijia Zhao, Wenliang Zhao, Wenjia Zhu

Sep 24, 2025·cs.CV·PDF

We introduce Seedream 4.0, an efficient and high-performance multimodal image generation system that unifies text-to-image (T2I) synthesis, image editing, and multi-image composition within a single framework. We develop a highly efficient diffusion transformer with a powerful VAE which also can reduce the number of image tokens considerably. This allows for efficient training of our model, and enables it to fast generate native high-resolution images (e.g., 1K-4K). Seedream 4.0 is pretrained on billions of text-image pairs spanning diverse taxonomies and knowledge-centric concepts. Comprehensive data collection across hundreds of vertical scenarios, coupled with optimized strategies, ensures stable and large-scale training, with strong generalization. By incorporating a carefully fine-tuned VLM model, we perform multi-modal post-training for training both T2I and image editing tasks jointly. For inference acceleration, we integrate adversarial distillation, distribution matching, and quantization, as well as speculative decoding. It achieves an inference time of up to 1.8 seconds for generating a 2K image (without a LLM/VLM as PE model). Comprehensive evaluations reveal that Seedream 4.0 can achieve state-of-the-art results on both T2I and multimodal image editing. In particular, it demonstrates exceptional multimodal capabilities in complex tasks, including precise image editing and in-context reasoning, and also allows for multi-image reference, and can generate multiple output images. This extends traditional T2I systems into an more interactive and multidimensional creative tool, pushing the boundary of generative AI for both creativity and professional applications. We further scale our model and data as Seedream 4.5. Seedream 4.0 and Seedream 4.5 are accessible on Volcano Engine https://www.volcengine.com/experience/ark?launch=seedream.

UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning

Haoming Wang, Haoyang Zou, Huatong Song, Jiazhan Feng, Junjie Fang, Junting Lu, Longxiang Liu, Qinyu Luo, Shihao Liang, Shijue Huang, Wanjun Zhong, Yining Ye, Yujia Qin, Yuwen Xiong, Yuxin Song, Zhiyong Wu, Aoyan Li, Bo Li, Chen Dun, Chong Liu, Daoguang Zan, Fuxing Leng, Hanbin Wang, Hao Yu, Haobin Chen, Hongyi Guo, Jing Su, Jingjia Huang, Kai Shen, Kaiyu Shi, Lin Yan, Peiyao Zhao, Pengfei Liu, Qinghao Ye, Renjie Zheng, Shulin Xin, Wayne Xin Zhao, Wen Heng, Wenhao Huang, Wenqian Wang, Xiaobo Qin, Yi Lin, Youbin Wu, Zehui Chen, Zihao Wang, Baoquan Zhong, Xinchun Zhang, Xujing Li, Yuanfan Li, Zhongkai Zhao, Chengquan Jiang, Faming Wu, Haotian Zhou, Jinlin Pang, Li Han, Qi Liu, Qianli Ma, Siyao Liu, Songhua Cai, Wenqi Fu, Xin Liu, Yaohui Wang, Zhi Zhang, Bo Zhou, Guoliang Li, Jiajun Shi, Jiale Yang, Jie Tang, Li Li, Qihua Han, Taoran Lu, Woyu Lin, Xiaokang Tong, Xinyao Li, Yichi Zhang, Yu Miao, Zhengxuan Jiang, Zili Li, Ziyuan Zhao, Chenxin Li, Dehua Ma, Feng Lin, Ge Zhang, Haihua Yang, Hangyu Guo, Hongda Zhu, Jiaheng Liu, Junda Du, Kai Cai, Kuanye Li, Lichen Yuan, Meilan Han, Minchao Wang, Shuyue Guo, Tianhao Cheng, Xiaobo Ma, Xiaojun Xiao, Xiaolong Huang, Xinjie Chen, Yidi Du, Yilin Chen, Yiwen Wang, Zhaojian Li, Zhenzhu Yang, Zhiyuan Zeng, Chaolin Jin, Chen Li, Hao Chen, Haoli Chen, Jian Chen, Qinghao Zhao, Guang Shi

Sep 2, 2025·cs.AI·PDF

The development of autonomous agents for graphical user interfaces (GUIs) presents major challenges in artificial intelligence. While recent advances in native agent models have shown promise by unifying perception, reasoning, action, and memory through end-to-end learning, open problems remain in data scalability, multi-turn reinforcement learning (RL), the limitations of GUI-only operation, and environment stability. In this technical report, we present UI-TARS-2, a native GUI-centered agent model that addresses these challenges through a systematic training methodology: a data flywheel for scalable data generation, a stabilized multi-turn RL framework, a hybrid GUI environment that integrates file systems and terminals, and a unified sandbox platform for large-scale rollouts. Empirical evaluation demonstrates that UI-TARS-2 achieves significant improvements over its predecessor UI-TARS-1.5. On GUI benchmarks, it reaches 88.2 on Online-Mind2Web, 47.5 on OSWorld, 50.6 on WindowsAgentArena, and 73.3 on AndroidWorld, outperforming strong baselines such as Claude and OpenAI agents. In game environments, it attains a mean normalized score of 59.8 across a 15-game suite-roughly 60% of human-level performance-and remains competitive with frontier proprietary models (e.g., OpenAI o3) on LMGame-Bench. Additionally, the model can generalize to long-horizon information-seeking tasks and software engineering benchmarks, highlighting its robustness across diverse agent tasks. Detailed analyses of training dynamics further provide insights into achieving stability and efficiency in large-scale agent RL. These results underscore UI-TARS-2's potential to advance the state of GUI agents and exhibit strong generalization to real-world interactive scenarios.

UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning

Sep 2, 2025·cs.AI·PDF