Fengxiang Wang, Mingshuo Chen, Yueying Li, Di Wang, Haotian Wang, Zonghao Guo, Zefan Wang, Boqi Shan, Long Lan, Yulin Wang, Hongzhen Wang, Wenjing Yang, Bo Du, Jing Zhang
Ultra-high-resolution (UHR) remote sensing (RS) imagery offers valuable data for Earth observation but pose challenges for existing multimodal foundation models due to two key bottlenecks: (1) limited availability of UHR training data, and (2) token explosion caused by the large image size. To address data scarcity, we introduce SuperRS-VQA (avg. 8,376$\times$8,376) and HighRS-VQA (avg. 2,000$\times$1,912), the highest-resolution vision-language datasets in RS to date, covering 22 real-world dialogue tasks. To mitigate token explosion, our pilot studies reveal significant redundancy in RS images: crucial information is concentrated in a small subset of object-centric tokens, while pruning background tokens (e.g., ocean or forest) can even improve performance. Motivated by these findings, we propose two strategies: Background Token Pruning and Anchored Token Selection, to reduce the memory footprint while preserving key semantics.Integrating these techniques, we introduce GeoLLaVA-8K, the first RS-focused multimodal large language model capable of handling inputs up to 8K$\times$8K resolution, built on the LLaVA framework. Trained on SuperRS-VQA and HighRS-VQA, GeoLLaVA-8K sets a new state-of-the-art on the XLRS-Bench.
Haoyang Chen, Jing Zhang, Hebaixu Wang, Shiqin Wang, Pohsun Huang, Jiayuan Li, Haonan Guo, Di Wang, Zheng Wang, Bo Du
Multi-modal remote sensing imagery provides complementary observations of the same geographic scene, yet such observations are frequently incomplete in practice. Existing cross-modal translation methods treat each modality pair as an independent task, resulting in quadratic complexity and limited generalization to unseen modality combinations. We formulate Any-to-Any translation as inference over a shared latent representation of the scene, where different modalities correspond to partial observations of the same underlying semantics. Based on this formulation, we propose Any2Any, a unified latent diffusion framework that projects heterogeneous inputs into a geometrically aligned latent space. Such structure performs anchored latent regression with a shared backbone, decoupling modality-specific representation learning from semantic mapping. Moreover, lightweight target-specific residual adapters are used to correct systematic latent mismatches without increasing inference complexity. To support learning under sparse but connected supervision, we introduce RST-1M, the first million-scale remote sensing dataset with paired observations across five sensing modalities, providing supervision anchors for any-to-any translation. Experiments across 14 translation tasks show that Any2Any consistently outperforms pairwise translation methods and exhibits strong zero-shot generalization to unseen modality pairs. Code and models will be available at https://github.com/MiliLab/Any2Any.
Goran Zuzic, Di Wang, Aranyak Mehta, D. Sivakumar
We address the challenge of finding algorithms for online allocation (i.e. bipartite matching) using a machine learning approach. In this paper, we focus on the AdWords problem, which is a classical online budgeted matching problem of both theoretical and practical significance. In contrast to existing work, our goal is to accomplish algorithm design {\em tabula rasa}, i.e., without any human-provided insights or expert-tuned training data beyond specifying the objective and constraints of the optimization problem. We construct a framework based on insights and ideas from game theory, adversarial training and GANs Key to our approach is to generate adversarial examples that expose the weakness of any given algorithm. A unique challenge in our context is to generate complete examples from scratch rather than perturbing given examples and we demonstrate this can be accomplished for the Adwords problem. We use this framework to co-train an algorithm network and an adversarial network against each other until they converge to an equilibrium. This approach finds algorithms and adversarial examples that are consistent with known optimal results. Secondly, we address the question of robustness of the algorithm, namely can we design algorithms that are both strong under practical distributions, as well as exhibit robust performance against adversarial instances. To accomplish this, we train algorithm networks using a mixture of adversarial and practical distributions like power-laws; the resulting networks exhibit a smooth trade-off between the two input regimes.
Di Wang, Feiqing Huang, Jingyu Zhao, Guodong Li, Guangjian Tian
Autoregressive networks can achieve promising performance in many sequence modeling tasks with short-range dependence. However, when handling high-dimensional inputs and outputs, the huge amount of parameters in the network lead to expensive computational cost and low learning efficiency. The problem can be alleviated slightly by introducing one more narrow hidden layer to the network, but the sample size required to achieve a certain training error is still large. To address this challenge, we rearrange the weight matrices of a linear autoregressive network into a tensor form, and then make use of Tucker decomposition to represent low-rank structures. This leads to a novel compact autoregressive network, called Tucker AutoRegressive (TAR) net. Interestingly, the TAR net can be applied to sequences with long-range dependence since the dimension along the sequential order is reduced. Theoretical studies show that the TAR net improves the learning efficiency, and requires much fewer samples for model training. Experiments on synthetic and real-world datasets demonstrate the promising performance of the proposed compact network.
Warren Siegel, Di Wang
The exceptional symmetries of supergravity have been reproduced from the Hamiltonian formulation of the classical mechanics of F-theory. We now find the Lagrangian formalism has even larger exceptional symmetries, simplifying its derivation: We discuss D = 5 as an example.
Yu Bai, Haixin Zhang, Mingjing Zhang, Di Wang, Hui Zeng, Hao Xue, Guozheng Wu, Ying Xie, Yuxia Zhang, Hao Jing, Jing Su, Haohai Yu, Zhanggui Hu, Ruwen Peng, Mu Wang, Yicheng Wu
The hybrid perovskite CH3NH3PbX3 (X= Cl, Br, I) is a promising material for developing novel optoelectronic devices. Owing to the intrinsic non-layer structure, it remains challenging to synthesize molecularly thin CH3NH3PbX3 with large size. Here, we report a low-cost and highly efficient fabrication route to obtain large-scale single-crystalline 2D CH3NH3PbX3 perovskites on a mica substrate via liquid epitaxy. The 2D perovskite is characterized as 8 nm in thickness and hundreds of micrometers in lateral size. First-principles calculations suggest the strong potassium-halogen interactions at the perovskite/mica interface lower the interface energy of perovskites, driving their fast in-plane growth. Spectroscopic investigations reveal 2D CH3NH3PbBr3 possess small exciton binding energy of 30 meV, allowing a superior visible-light photodetector with a photoresponsivity of 126 A/W and a bandwidth exceeded 80 kHz. These features demonstrate that liquid epitaxy is a bottom-up approach to fabricate the non-layer structured 2D perovskites, which offer a new material platform for the device applications and fundamental investigations.
Di Wang, Lijie Hu, Huanyu Zhang, Marco Gaboardi, Jinhui Xu
In this paper, we study the problem of estimating smooth Generalized Linear Models (GLMs) in the Non-interactive Local Differential Privacy (NLDP) model. Different from its classical setting, our model allows the server to access some additional public but unlabeled data. In the first part of the paper we focus on GLMs. Specifically, we first consider the case where each data record is i.i.d. sampled from a zero-mean multivariate Gaussian distribution. Motivated by the Stein's lemma, we present an $(ε, δ)$-NLDP algorithm for GLMs. Moreover, the sample complexity of public and private data for the algorithm to achieve an $\ell_2$-norm estimation error of $α$ (with high probability) is ${O}(p α^{-2})$ and $\tilde{O}(p^3α^{-2}ε^{-2})$ respectively, where $p$ is the dimension of the feature vector. This is a significant improvement over the previously known exponential or quasi-polynomial in $α^{-1}$, or exponential in $p$ sample complexities of GLMs with no public data. Then we consider a more general setting where each data record is i.i.d. sampled from some sub-Gaussian distribution with bounded $\ell_1$-norm. Based on a variant of Stein's lemma, we propose an $(ε, δ)$-NLDP algorithm for GLMs whose sample complexity of public and private data to achieve an $\ell_\infty$-norm estimation error of $α$ is ${O}(p^2α^{-2})$ and $\tilde{O}(p^2α^{-2}ε^{-2})$ respectively, under some mild assumptions and if $α$ is not too small ({\em i.e.,} $α\geq Ω(\frac{1}{\sqrt{p}})$). In the second part of the paper, we extend our idea to the problem of estimating non-linear regressions and show similar results as in GLMs for both multivariate Gaussian and sub-Gaussian cases. Finally, we demonstrate the effectiveness of our algorithms through experiments on both synthetic and real-world datasets.
Di Wang, Jing Zhang, Bo Du, Gui-Song Xia, Dacheng Tao
Deep learning has largely reshaped remote sensing (RS) research for aerial image understanding and made a great success. Nevertheless, most of the existing deep models are initialized with the ImageNet pretrained weights. Since natural images inevitably present a large domain gap relative to aerial images, probably limiting the finetuning performance on downstream aerial scene tasks. This issue motivates us to conduct an empirical study of remote sensing pretraining (RSP) on aerial images. To this end, we train different networks from scratch with the help of the largest RS scene recognition dataset up to now -- MillionAID, to obtain a series of RS pretrained backbones, including both convolutional neural networks (CNN) and vision transformers such as Swin and ViTAE, which have shown promising performance on computer vision tasks. Then, we investigate the impact of RSP on representative downstream tasks including scene recognition, semantic segmentation, object detection, and change detection using these CNN and vision transformer backbones. Empirical study shows that RSP can help deliver distinctive performances in scene recognition tasks and in perceiving RS related semantics such as "Bridge" and "Airplane". We also find that, although RSP mitigates the data discrepancies of traditional ImageNet pretraining on RS images, it may still suffer from task discrepancies, where downstream tasks require different representations from scene recognition tasks. These findings call for further research efforts on both large-scale pretraining datasets and effective pretraining methods. The codes and pretrained models will be released at https://github.com/ViTAE-Transformer/ViTAE-Transformer-Remote-Sensing.
Di Wang
Inspired by the recent measurement of $CP$ asymmetry in the individual mode on LHCb, we study $CP$ asymmetry of the $D\to ππ$ system in the isospin and topological analysis. The ratio between penguin and tree amplitudes $P/(T+C)$ in the $D\to ππ$ system is found to be greater than two in most values of the relative strong phase. And $D^0\to π^0π^0$ is a potential mode to reveal the $CP$ asymmetry of the order of $10^{-3}$, which would be observed by Belle II in the future. The large $CP$ asymmetry in the $D\to ππ$ system might be understood in the $t$-channel final-state interaction.
Di Wang, Chenkun Zhou, Alexander S. Filatov, Wooje Cho, Francisco Lagunas, Mingzhan Wang, Suriyanarayanan Vaikuntanathan, Chong Liu, Rober F. Klie, Dmitri V. Talapin
Two-dimensional (2D) transition metal carbides and nitrides (MXenes) are a large family of materials actively studied for various applications, especially in the field of energy storage. To date, MXenes are commonly synthesized by etching the layered ternary compounds, MAX phases. Here we demonstrate a direct synthetic route for scalable and atom-economic synthesis of MXenes, including phases that have not been synthesized from MAX phases, by the reactions of metals and metal halides with graphite, methane or nitrogen. These directly synthesized MXenes showed excellent energy storage capacity for Li-ion intercalation. The direct synthesis enables chemical vapor deposition (CVD) growth of MXene carpets and complex spherulite-like morphologies. The latter form in a process resembling the evolution of cellular membranes during endocytosis.
Di Wang, Xiaoyu Zhang, Guodong Li, Ruey Tsay
The reduced-rank vector autoregressive (VAR) model can be interpreted as a supervised factor model, where two factor modelings are simultaneously applied to response and predictor spaces. This article introduces a new model, called vector autoregression with common response and predictor factors, to explore further the common structure between the response and predictors in the VAR framework. The new model can provide better physical interpretations and improve estimation efficiency. In conjunction with the tensor operation, the model can easily be extended to any finite-order VAR model. A regularization-based method is considered for the high-dimensional estimation with the gradient descent algorithm, and its computational and statistical convergence guarantees are established. For data with pervasive cross-sectional dependence, a transformation for responses is developed to alleviate the diverging eigenvalue effect. Moreover, we consider additional sparsity structure in factor loading for the case of ultra-high dimension. Simulation experiments confirm our theoretical findings and a macroeconomic application showcases the appealing properties of the proposed model in structural analysis and forecasting.
Di Wang, Junzhi Shi, Pingping Wang, Shuo Zhuang, Hongyue Li
We propose a learning framework for calibrating predictive models to make loss-controlling prediction for exchangeable data, which extends our recently proposed conformal loss-controlling prediction for more general cases. By comparison, the predictors built by the proposed loss-controlling approach are not limited to set predictors, and the loss function can be any measurable function without the monotone assumption. To control the loss values in an efficient way, we introduce transformations preserving exchangeability to prove finite-sample controlling guarantee when the test label is obtained, and then develop an approximation approach to construct predictors. The transformations can be built on any predefined function, which include using optimization algorithms for parameter searching. This approach is a natural extension of conformal loss-controlling prediction, since it can be reduced to the latter when the set predictors have the nesting property and the loss functions are monotone. Our proposed method is applied to selective regression and high-impact weather forecasting problems, which demonstrates its effectiveness for general loss-controlling prediction.
Di Wang, Michael Mahoney, Nishanth Mohan, Satish Rao
We provide improved parallel approximation algorithms for the important class of packing and covering linear programs. In particular, we present new parallel $ε$-approximate packing and covering solvers which run in $\tilde{O}(1/ε^2)$ expected time, i.e., in expectation they take $\tilde{O}(1/ε^2)$ iterations and they do $\tilde{O}(N/ε^2)$ total work, where $N$ is the size of the constraint matrix and $ε$ is the error parameter, and where the $\tilde{O}$ hides logarithmic factors. To achieve our improvement, we introduce an algorithmic technique of broader interest: dynamically-bucketed selective coordinate descent (DB-SCD). At each step of the iterative optimization algorithm, the DB-SCD method dynamically buckets the coordinates of the gradient into those of roughly equal magnitude, and it updates all the coordinates in one of the buckets. This dynamically-bucketed updating permits us to take steps along several coordinates with similar-sized gradients, thereby permitting more appropriate step sizes at each step of the algorithm. In particular, this technique allows us to use in a straightforward manner the recent analysis from the breakthrough results of Allen-Zhu and Orecchia [2] to achieve our still-further improved bounds. More generally, this method addresses "interference" among coordinates, by which we mean the impact of the update of one coordinate on the gradients of other coordinates. Such interference is a core issue in parallelizing optimization routines that rely on smoothness properties. Since our DB-SCD method reduces interference via updating a selective subset of variables at each iteration, we expect it may also have more general applicability in optimization.
Monika Henzinger, Satish Rao, Di Wang
We study the problem of computing a minimum cut in a simple, undirected graph and give a deterministic $O(m \log^2 n \log\log^2 n)$ time algorithm. This improves both on the best previously known deterministic running time of $O(m \log^{12} n)$ (Kawarabayashi and Thorup, STOC 2015) and the best previously known randomized running time of $O(m \log^{3} n)$ (Karger, J.ACM 2000) for this problem, though Karger's algorithm can be further applied to weighted graphs. Moreover, our result extends to balanced directed graphs, where the balance of a directed graph captures how close the graph is to being Eulerian. Our approach is using the Kawarabayashi and Thorup graph compression technique, which repeatedly finds low-conductance cuts. To find these cuts they use a diffusion-based local algorithm. We use instead a flow-based local algorithm and suitably adjust their framework to work with our flow-based subroutine. Both flow and diffusion based methods have a long history of being applied to finding low conductance cuts. Diffusion algorithms have several variants that are naturally local while it is more complicated to make flow methods local. Some prior work has proven nice properties for local flow based algorithms with respect to improving or cleaning up low conductance cuts. Our flow subroutine, however, is the first that is both local and produces low conductance cuts. Thus, it may be of independent interest.
Di Wang, Kimon Fountoulakis, Monika Henzinger, Michael W. Mahoney, Satish Rao
Diffusions and related random walk procedures are of central importance in many areas of machine learning, data analysis, and applied mathematics. Because they spread mass agnostically at each step in an iterative manner, they can sometimes spread mass "too aggressively," thereby failing to find the "right" clusters. We introduce a novel Capacity Releasing Diffusion (CRD) Process, which is both faster and stays more local than the classical spectral diffusion process. As an application, we use our CRD Process to develop an improved local algorithm for graph clustering. Our local graph clustering method can find local clusters in a model of clustering where one begins the CRD Process in a cluster whose vertices are connected better internally than externally by an $O(\log^2 n)$ factor, where $n$ is the number of nodes in the cluster. Thus, our CRD Process is the first local graph clustering algorithm that is not subject to the well-known quadratic Cheeger barrier. Our result requires a certain smoothness condition, which we expect to be an artifact of our analysis. Our empirical evaluation demonstrates improved results, in particular for realistic social graphs where there are moderately good---but not very good---clusters.
Di Wang, Jan Hoffmann, Thomas Reps
In this article, we present a semantics-level adaption of the Optional Stopping Theorem, sketch an expected-cost analysis as its application, and survey different variants of the Optional Stopping Theorem that have been used in static analysis of probabilistic programs.
Di Wang
Feb 11, 2024·astro-ph.HE·PDF The origin of the quasi-periodic eruptions (QPEs) is possibly mass loss at the periastron of a body moving around the supermassive black hole (SMBH) in a high eccentric orbit. Such a tidally stripped star is expected to radiate gravitational wave thereby leading to shrinkage of the periastron distance, and thus will eventually be disrupted by the SMBH in the previous studies. This scenario predicts a gradually increasing mass transfer, contradicting the long term evolution of the observed intensity of the QPEs in GSN 069.In this paper, we first revisited the orbital evolution of the stripped star. Then we suggested a model of a tidal stripped WD moving inside an accretion disk for QPEs in GSN 069.We found the effect of the mass transfer finally dominates the orbital evolution, resulting in the stripped star finally escaping the SMBH rather than being disrupted by it. The drag force by the disk can effectively reduce mass transfer and thus explain the observed long term evolution in the intensity of the QPEs in GSN 069. The disk is likely a fallback disk of the tidal disruption event in GSN 069. Considering the evolution of its accretion rate, the increase in the intensity of the latest eruption can also be explained.
Jin-Feng Luo, Di Wang
Isospin symmetry is the most precise flavor symmetry. The effective Hamiltonian of charm quark weak decay is zero under the isospin lowering operators $I_-^n$, which permits us to generate isospin sum rules through several master formulas. In this work, we derive the master formulas of isospin sum rules for the two- and three-body non-leptonic decays of singly and doubly charmed baryons. Hundreds of isospin sum rules are derived to test of isospin symmetry and provide hints for the new decay modes. The isospin sum rules for multi-body decays are not broken by the intermediate resonances and hence can be used to study the isospin partners of exotic hadrons.
Xiaolei Qin, Di Wang, Jing Zhang, Fengxiang Wang, Xin Su, Bo Du, Liangpei Zhang
Satellite image time series (SITS) provide continuous observations of the Earth's surface, making them essential for applications such as environmental management and disaster assessment. However, existing spatiotemporal foundation models rely on plain vision transformers, which encode entire temporal sequences without explicitly capturing multiscale spatiotemporal relationships between land objects. This limitation hinders their effectiveness in downstream tasks. To overcome this challenge, we propose TiMo, a novel hierarchical vision transformer foundation model tailored for SITS analysis. At its core, we introduce a spatiotemporal gyroscope attention mechanism that dynamically captures evolving multiscale patterns across both time and space. For pre-training, we curate MillionST, a large-scale dataset of one million images from 100,000 geographic locations, each captured across 10 temporal phases over five years, encompassing diverse geospatial changes and seasonal variations. Leveraging this dataset, we adapt masked image modeling to pre-train TiMo, enabling it to effectively learn and encode generalizable spatiotemporal representations.Extensive experiments across multiple spatiotemporal tasks-including deforestation monitoring, land cover segmentation, crop type classification, and flood detection-demonstrate TiMo's superiority over state-of-the-art methods. Code, model, and dataset will be released at https://github.com/MiliLab/TiMo.
Si-Hong Liu, Ying-Xin Lai, Di Wang
The doubly charmed baryon was first observed by LHCb via the non-leptonic decay $Ξ_{cc}^{++}\to Λ^+_cK^-π^+π^+$ in 2017. Subsequently, ongoing efforts have been made to identify other doubly charmed baryons. However, there is no systematic analysis of the topological decomposition for non-leptonic decays of doubly charmed baryons. In this work, we study the topological amplitudes of doubly charmed baryon decays in the $SU(3)_F$ limit. Tree- and penguin-induced topological diagrams for the $\mathcal{B}_{cc}\to \mathcal{B}_c M$ and $\mathcal{B}_{cc}\to \mathcal{B} D$ decays are presented. The linear relations between the topological amplitudes and the $SU(3)$ irreducible amplitudes are derived through tensor contraction and $SU(3)$ decomposition. The magnitude pattern of the topological diagrams is analyzed in the rescattering dynamics and the large $N_c$ expansion. In addition, some amplitude relations are derived to test the Körner-Pati-Woo theorem in the isospin limit.