Xue Liu, Wei Wei, Xiangnan Feng, Xiaobo Cao, Dan Sun
Most existing popular methods for learning graph embedding only consider fixed-order global structural features and lack structures hierarchical representation. To address this weakness, we propose a novel graph embedding algorithm named GraphCSC that realizes classification based on skeleton information using fixed-order structures learned in anonymous random walks manner, and component information using different size subgraphs. Two graphs are similar if their skeletons and components are both similar, thus in our model, we integrate both of them together into embeddings as graph homogeneity characterization. We demonstrate our model on different datasets in comparison with a comprehensive list of up-to-date state-of-the-art baselines, and experiments show that our work is superior in real-world graph classification tasks.
Xu Liu, Hongbo Zhu, Yan Bao, Dai Zhou, Zhaolong Han
Direct numerical simulations (DNSs) of turbulent pipe flow subjected to streamwise-varying wall rotation are performed. This control method is able to achieve drag reduction and even relaminarize the flow under certain control parameters at low Reynolds number. Two control parameters, which are velocity amplitude and wavelength, are considered. An annular boundary layer, called Spatial Stokes Layer (SSL), is formed by the wall rotation. Based on the thickness of SSL, two types of drag reduction scenarios are identified. When the thickness is low, the SSL acts as a spacer layer, obstructing the impact of fluids against the wall and hence reducing the shear stress. The flow structures in the outer layer are less affected and are stretched in streamwise direction due to the increased velocity gradient. Within the SSL, the turbulence intensity diminishes dramatically. When the thickness is large, a streamwise wavy pattern of near-wall low-speed streaks is formed. The orientation of streaks is found to lag behind the direction of local mean velocity in phase and conform to the direction of local resultant shear stress. The streamwise scale of near-wall flow structures is significantly reduced, resulting in the disruption of downstream development of flow structures and hence leading to the drag reduction. Besides, it is found that it requires both large enough thickness of SSL and velocity amplitude to relaminarize the turbulence. The relaminarization mechanism is that the annular SSL can continuously absorb energy from wall-normal stress due to the rotation effect, thereby the turbulence self-sustaining process could not be maintained. For the relaminarization cases, the laminar state is stable to even extremely large perturbations, which possibly makes laminar state the only fixed point for the whole system.
Xu Liu, Licheng Jiao, Dan Zhang, Fang Liu
POLSAR image has an advantage over optical image because it can be acquired independently of cloud cover and solar illumination. PolSAR image classification is a hot and valuable topic for the interpretation of POLSAR image. In this paper, a novel POLSAR image classification method is proposed based on polarimetric scattering coding and sparse support matrix machine. First, we transform the original POLSAR data to get a real value matrix by the polarimetric scattering coding, which is called polarimetric scattering matrix and is a sparse matrix. Second, the sparse support matrix machine is used to classify the sparse polarimetric scattering matrix and get the classification map. The combination of these two steps takes full account of the characteristics of POLSAR. The experimental results show that the proposed method can get better results and is an effective classification method.
Xu Liu, Jianing Li, Xiaopeng Fan, Yonghong Tian
Event cameras, offering high temporal resolutions and high dynamic ranges, have brought a new perspective to address common challenges (e.g., motion blur and low light) in monocular depth estimation. However, how to effectively exploit the sparse spatial information and rich temporal cues from asynchronous events remains a challenging endeavor. To this end, we propose a novel event-based monocular depth estimator with recurrent transformers, namely EReFormer, which is the first pure transformer with a recursive mechanism to process continuous event streams. Technically, for spatial modeling, a novel transformer-based encoder-decoder with a spatial transformer fusion module is presented, having better global context information modeling capabilities than CNN-based methods. For temporal modeling, we design a gate recurrent vision transformer unit that introduces a recursive mechanism into transformers, improving temporal modeling capabilities while alleviating the expensive GPU memory cost. The experimental results show that our EReFormer outperforms state-of-the-art methods by a margin on both synthetic and real-world datasets. We hope that our work will attract further research to develop stunning transformers in the event-based vision community. Our open-source code can be found in the supplemental material.
Chengyang He, Gadiel Sznaier Camps, Xu Liu, Mac Schwager, Guillaume Sartoretti
We present Latent Theory of Mind (LatentToM), a decentralized diffusion policy architecture for collaborative robot manipulation. Our policy allows multiple manipulators with their own perception and computation to collaborate with each other towards a common task goal with or without explicit communication. Our key innovation lies in allowing each agent to maintain two latent representations: an ego embedding specific to the robot, and a consensus embedding trained to be common to both robots, despite their different sensor streams and poses. We further let each robot train a decoder to infer the other robot's ego embedding from their consensus embedding, akin to theory of mind in latent space. Training occurs centrally, with all the policies' consensus encoders supervised by a loss inspired by sheaf theory, a mathematical theory for clustering data on a topological manifold. Specifically, we introduce a first-order cohomology loss to enforce sheaf-consistent alignment of the consensus embeddings. To preserve the expressiveness of the consensus embedding, we further propose structural constraints based on theory of mind and a directional consensus mechanism. Execution can be fully distributed, requiring no explicit communication between policies. In which case, the information is exchanged implicitly through each robot's sensor stream by observing the actions of the other robots and their effects on the scene. Alternatively, execution can leverage direct communication to share the robots' consensus embeddings, where the embeddings are shared once during each inference step and are aligned using the sheaf Laplacian. In our hardware experiments, LatentToM outperforms a naive decentralized diffusion baseline, and shows comparable performance with a state-of-the-art centralized diffusion policy for bi-manual manipulation. Project website: https://stanfordmsl.github.io/LatentToM/.
Xu Liu
Most of researchers on testing a significance of coefficient $\ubeta$ in high-dimensional linear regression models consider the classical hypothesis testing problem $H_0^{c}: \ubeta=\uzero \mbox{ versus } H_1^{c}: \ubeta \neq \uzero$. We take a different perspective and study the testing problem with the null hypothesis of no relevant difference between $\ubeta$ and $\uzero$, that is, $H_0: \|\ubeta\|\leq δ_0 \mbox{ versus } H_1: \|\ubeta\|> δ_0$, where $δ_0$ is a prespecified small constant. This testing problem is motivated by the urgent requirement to detect the transferability of source data in the transfer learning framework. We propose a novel test procedure incorporating the estimation of the largest eigenvalue of a high-dimensional covariance matrix with the assistance of the random matrix theory. In the more challenging setting in the presence of high-dimensional nuisance parameters, we establish the asymptotic normality for the proposed test statistics under both the null and alternative hypotheses. By applying the proposed test approaches to detect the transferability of source data, the unified transfer learning models simultaneously achieve lower estimation and prediction errors with comparison to existing methods. We study the finite-sample properties of the new test by means of simulation studies and illustrate its performance by analyzing the GTEx data.
Nian Liu, Xue Liu
In this paper, we study the irregular set of any continuous observable for a class of skew product transformations, which is driven by a uniquely ergodic homeomorphism system $(Ω,\mathbb{P},θ)$ and satisfies Anosov and toplogical mixing on fibers property. We prove that if the irregular set of any continuous observable is nonempty on a fiber, then the irregular set must be nonempty, residual, and carry full fiber topological entropy on $\mathbb{P}$-a.e. fibers. The orbit-gluing technique provided by the fiber specification property are utilized.
Xu Liu, Yibo Lu, Xinxian Wang, Xinyu Wu
We propose Adaptive Multi-Style Fusion (AMSF), a reference-based training-free framework that enables controllable fusion of multiple reference styles in diffusion models. Most of the existing reference-based methods are limited by (a) acceptance of only one style image, thus prohibiting hybrid aesthetics and scalability to more styles, and (b) lack of a principled mechanism to balance several stylistic influences. AMSF mitigates these challenges by encoding all style images and textual hints with a semantic token decomposition module that is adaptively injected into every cross-attention layer of an frozen diffusion model. A similarity-aware re-weighting module then recalibrates, at each denoising step, the attention allocated to every style component, yielding balanced and user-controllable blends without any fine-tuning or external adapters. Both qualitative and quantitative evaluations show that AMSF produces multi-style fusion results that consistently outperform the state-of-the-art approaches, while its fusion design scales seamlessly to two or more styles. These capabilities position AMSF as a practical step toward expressive multi-style generation in diffusion models.
Xu Liu, Yan Chen, Kan Ling, Yichi Zhu, Hengrun Zhang, Guisheng Fan, Huiqun Yu
Despite extensive safety alignment, Large Language Models (LLMs) remain vulnerable to jailbreak attacks. However, existing methods generally lack the capability for continuous learning and self-evolution from interactions, limiting the diversity and adaptability of attack strategies. To address this, we propose ASTRA, an automated framework capable of autonomously discovering, retrieving, and evolving attack strategies. ASTRA operates on a closed-loop ``attack-evaluate-distill-reuse'' mechanism, which not only generates attack prompts but also automatically distills reusable strategies from every interaction. To systematically manage these strategies, we introduce a dynamic three-tier strategy library (Effective, Promising, and Ineffective) that categorizes strategies based on performance. This hierarchical memory mechanism enables the framework to enhance efficiency by leveraging successful patterns while optimizing the exploration space by avoiding known failures. Extensive experiments in a black-box setting demonstrate that ASTRA significantly outperforms existing baselines.
Xue Liu, Dan Sun, Wei Wei
The Graph Convolutional Networks (GCN) proposed by Kipf and Welling is an effective model for semi-supervised learning, but faces the obstacle of over-smoothing, which will weaken the representation ability of GCN. Recently some works are proposed to tackle above limitation by randomly perturbing graph topology or feature matrix to generate data augmentations as input for training. However, these operations inevitably do damage to the integrity of information structures and have to sacrifice the smoothness of feature manifold. In this paper, we first introduce a novel graph entropy definition as a measure to quantitatively evaluate the smoothness of a data manifold and then point out that this graph entropy is controlled by triangle motif-based information structures. Considering the preservation of graph entropy, we propose an effective strategy to generate randomly perturbed training data but maintain both graph topology and graph entropy. Extensive experiments have been conducted on real-world datasets and the results verify the effectiveness of our proposed method in improving semi-supervised node classification accuracy compared with a surge of baselines. Beyond that, our proposed approach could significantly enhance the robustness of training process for GCN.
Xu Liu, Ankit Prabhu, Fernando Cladera, Ian D. Miller, Lifeng Zhou, Camillo J. Taylor, Vijay Kumar
Traditional approaches for active mapping focus on building geometric maps. For most real-world applications, however, actionable information is related to semantically meaningful objects in the environment. We propose an approach to the active metric-semantic mapping problem that enables multiple heterogeneous robots to collaboratively build a map of the environment. The robots actively explore to minimize the uncertainties in both semantic (object classification) and geometric (object modeling) information. We represent the environment using informative but sparse object models, each consisting of a basic shape and a semantic class label, and characterize uncertainties empirically using a large amount of real-world data. Given a prior map, we use this model to select actions for each robot to minimize uncertainties. The performance of our algorithm is demonstrated through multi-robot experiments in diverse real-world environments. The proposed framework is applicable to a wide range of real-world problems, such as precision agriculture, infrastructure inspection, and asset mapping in factories. A demo video can be found at https://youtu.be/S86SgXi54oU.
Xu Liu, Yutong Xia, Yuxuan Liang, Junfeng Hu, Yiwei Wang, Lei Bai, Chao Huang, Zhenguang Liu, Bryan Hooi, Roger Zimmermann
Road traffic forecasting plays a critical role in smart city initiatives and has experienced significant advancements thanks to the power of deep learning in capturing non-linear patterns of traffic data. However, the promising results achieved on current public datasets may not be applicable to practical scenarios due to limitations within these datasets. First, the limited sizes of them may not reflect the real-world scale of traffic networks. Second, the temporal coverage of these datasets is typically short, posing hurdles in studying long-term patterns and acquiring sufficient samples for training deep models. Third, these datasets often lack adequate metadata for sensors, which compromises the reliability and interpretability of the data. To mitigate these limitations, we introduce the LargeST benchmark dataset. It encompasses a total number of 8,600 sensors in California with a 5-year time coverage and includes comprehensive metadata. Using LargeST, we perform in-depth data analysis to extract data insights, benchmark well-known baselines in terms of their performance and efficiency, and identify challenges as well as opportunities for future research. We release the datasets and baseline implementations at: https://github.com/liuxu77/LargeST.
Honam Yum, Youngjoon Jang, May Kim, Xue Liu, Selim Shahriar
Previously, we proposed a data buffering system that makes use of a pair of white light cavities. For application to telecommunication systems, it would be convenient to realize such a device using fiber optic resonators. In this paper, we present the design of such a system, where the white light cavity effect is produced by using stimulated Brillouin scattering. The system consists of a pair of fiber optic white light cavities placed in series. As in the original proposal, the delay time can be controlled independently of the bandwidth of the data pulses. Furthermore, we show how the bandwidth of the system can be made as large as several times the Brillouin frequency shift. We also show that the net delay achievable in such a buffer can be significantly larger than what can be achieved using a conventional recirculating loop buffer.
Xue Liu, Jin Hu, Chunlei Yue, Nicholas D. Della Fera, Yun Ling, Zhiqiang Mao, Jiang Wei
Semiconducting two-dimensional transition metal chalcogenide crystals have been regarded as the promising candidate for the future generation of transistor in modern electronics. However, how to fabricate those crystals into practical devices with acceptable performance still remains as a challenge. Employing tungsten disulfide multilayer thin crystals, we demonstrate that using gold as the only contact metal and choosing appropriate thickness of the crystal, high performance transistor with on/off ratio of $10^{8}$ and mobility up to $234\:cm^{2}V^{-1}s^{-1}$ at room temperature can be realized in a simple device structure. Further low temperature study revealed that the high performance of our device is caused by the minimized Schottky barrier at the contact and the existence of a shallow impurity level around 80 meV right below the conduction band edge. From the analysis on temperature dependence of field-effect mobility, we conclude that strongly suppressed phonon scattering and relatively low charge impurity density are the key factors leading to the high mobility of our tungsten disulfide devices.
Xu Liu
Like the classical potential theory, it was conjectured that there exists equivalence between locally and globally pluripolar and complete pluripolar sets, namely, Problem I of Lelong, and was solved by Josefson, Bedford - Taylor and Colţoiu. In this article, we consider complements of complete Kähler domains as the generalization of closed complete pluripolar sets and prove that there exists an equivalence between local and global existence of these sets.
Chengyang He, Xu Liu, Gadiel Sznaier Camps, Guillaume Sartoretti, Mac Schwager
Diffusion policies have demonstrated remarkable dexterity and robustness in intricate, high-dimensional robot manipulation tasks, while training from a small number of demonstrations. However, the reason for this performance remains a mystery. In this paper, we offer a surprising hypothesis: diffusion policies essentially memorize an action lookup table -- and this is beneficial. We posit that, at runtime, diffusion policies find the closest training image to the test image in a latent space, and recall the associated training action sequence, offering reactivity without the need for action generalization. This is effective in the sparse data regime, where there is not enough data density for the model to learn action generalization. We support this claim with systematic empirical evidence. Even when conditioned on wildly out of distribution (OOD) images of cats and dogs, the Diffusion Policy still outputs an action sequence from the training data. With this insight, we propose a simple policy, the Action Lookup Table (ALT), as a lightweight alternative to the Diffusion Policy. Our ALT policy uses a contrastive image encoder as a hash function to index the closest corresponding training action sequence, explicitly performing the computation that the Diffusion Policy implicitly learns. We show empirically that for relatively small datasets, ALT matches the performance of a diffusion model, while requiring only 0.0034 of the inference time and 0.0085 of the memory footprint, allowing for much faster closed-loop inference with resource constrained robots. We also train our ALT policy to give an explicit OOD flag when the distance between the runtime image is too far in the latent space from the training images, giving a simple but effective runtime monitor. More information can be found at: https://stanfordmsl.github.io/alt/.
Xu Liu, Na Xia, Jinxing Zhou, Jingyuan Xu, Dan Guo
Spiking Neural Networks (SNNs) become popular due to excellent energy efficiency, yet facing challenges for effective model training. Recent works improve this by introducing knowledge distillation (KD) techniques, with the pre-trained artificial neural networks (ANNs) used as teachers and the target SNNs as students. This is commonly accomplished through a straightforward element-wise alignment of intermediate features and prediction logits from ANNs and SNNs, often neglecting the intrinsic differences between their architectures. Specifically, ANN's outputs exhibit a continuous distribution, whereas SNN's outputs are characterized by sparsity and discreteness. To mitigate this issue, we introduce two innovative KD strategies. Firstly, we propose the Saliency-scaled Activation Map Distillation (SAMD), which aligns the spike activation map of the student SNN with the class-aware activation map of the teacher ANN. Rather than performing KD directly on the raw %and distinct features of ANN and SNN, our SAMD directs the student to learn from saliency activation maps that exhibit greater semantic and distribution consistency. Additionally, we propose a Noise-smoothed Logits Distillation (NLD), which utilizes Gaussian noise to smooth the sparse logits of student SNN, facilitating the alignment with continuous logits from teacher ANN. Extensive experiments on multiple datasets demonstrate the effectiveness of our methods. Code is available~\footnote{https://github.com/SinoLeu/CKDSNN.git}.
Xu Liu, Yu Liu, Hanshuo Qiu, Yang Qirong, Zhouhui Lian
Vision-Language Navigation (VLN) enables agents to navigate in complex environments by following natural language instructions grounded in visual observations. Although most existing work has focused on ground-based robots or outdoor Unmanned Aerial Vehicles (UAVs), indoor UAV-based VLN remains underexplored, despite its relevance to real-world applications such as inspection, delivery, and search-and-rescue in confined spaces. To bridge this gap, we introduce \textbf{IndoorUAV}, a novel benchmark and method specifically tailored for VLN with indoor UAVs. We begin by curating over 1,000 diverse and structurally rich 3D indoor scenes from the Habitat simulator. Within these environments, we simulate realistic UAV flight dynamics to collect diverse 3D navigation trajectories manually, further enriched through data augmentation techniques. Furthermore, we design an automated annotation pipeline to generate natural language instructions of varying granularity for each trajectory. This process yields over 16,000 high-quality trajectories, comprising the \textbf{IndoorUAV-VLN} subset, which focuses on long-horizon VLN. To support short-horizon planning, we segment long trajectories into sub-trajectories by selecting semantically salient keyframes and regenerating concise instructions, forming the \textbf{IndoorUAV-VLA} subset. Finally, we introduce \textbf{IndoorUAV-Agent}, a novel navigation model designed for our benchmark, leveraging task decomposition and multimodal reasoning. We hope IndoorUAV serves as a valuable resource to advance research on vision-language embodied AI in the indoor aerial navigation domain.
Xu Liu, Guikun Chen, Wenguan Wang
Large language models (LLMs) suffer from hallucination and context forgetting. Prior studies suggest that attention drift is a primary cause of these problems, where LLMs' focus shifts towards newly generated tokens and away from the initial input context. To counteract this, we make use of a related, intrinsic characteristic of LLMs: attention sink -- the tendency to consistently allocate high attention to the very first token (i.e., <BOS>) of a sequence. Concretely, we propose an advanced context anchoring method, SinkTrack, which treats <BOS> as an information anchor and injects key contextual features (such as those derived from the input image or instruction) into its representation. As such, LLM remains anchored to the initial input context throughout the entire generation process. SinkTrack is training-free, plug-and-play, and introduces negligible inference overhead. Experiments demonstrate that SinkTrack mitigates hallucination and context forgetting across both textual (e.g., +21.6% on SQuAD2.0 with Llama3.1-8B-Instruct) and multi-modal (e.g., +22.8% on M3CoT with Qwen2.5-VL-7B-Instruct) tasks. Its consistent gains across different architectures and scales underscore the robustness and generalizability. We also analyze its underlying working mechanism from the perspective of information delivery. Our source code is available at https://github.com/67L1/SinkTrack.
Lin Cheng, Xu Liu, Lingling Li, Licheng Jiao, Xu Tang
Object detection is a fundamental and challenging problem in aerial and satellite image analysis. More recently, a two-stage detector Faster R-CNN is proposed and demonstrated to be a promising tool for object detection in optical remote sensing images, while the sparse and dense characteristic of objects in remote sensing images is complexity. It is unreasonable to treat all images with the same region proposal strategy, and this treatment limits the performance of two-stage detectors. In this paper, we propose a novel and effective approach, named deep adaptive proposal network (DAPNet), address this complexity characteristic of object by learning a new category prior network (CPN) on the basis of the existing Faster R-CNN architecture. Moreover, the candidate regions produced by DAPNet model are different from the traditional region proposal network (RPN), DAPNet predicts the detail category of each candidate region. And these candidate regions combine the object number, which generated by the category prior network to achieve a suitable number of candidate boxes for each image. These candidate boxes can satisfy detection tasks in sparse and dense scenes. The performance of the proposed framework has been evaluated on the challenging NWPU VHR-10 data set. Experimental results demonstrate the superiority of the proposed framework to the state-of-the-art.