Hamid Hezari, Hang Xu
We give upper bounds for the Bergman kernels associated to tensor powers of a smooth positive line bundle in terms of the rate of growth of the Taylor coefficients of the Kähler potential. As applications, we obtain improved off-diagonal rate of decay for the classes of analytic, quasi-analytic, and more generally Gevrey potentials.
Jiaxi Gu, Xiaojun Meng, Guansong Lu, Lu Hou, Minzhe Niu, Xiaodan Liang, Lewei Yao, Runhui Huang, Wei Zhang, Xin Jiang, Chunjing Xu, Hang Xu
Vision-Language Pre-training (VLP) models have shown remarkable performance on various downstream tasks. Their success heavily relies on the scale of pre-trained cross-modal datasets. However, the lack of large-scale datasets and benchmarks in Chinese hinders the development of Chinese VLP models and broader multilingual applications. In this work, we release a large-scale Chinese cross-modal dataset named Wukong, which contains 100 million Chinese image-text pairs collected from the web. Wukong aims to benchmark different multi-modal pre-training methods to facilitate the VLP research and community development. Furthermore, we release a group of models pre-trained with various image encoders (ViT-B/ViT-L/SwinT) and also apply advanced pre-training techniques into VLP such as locked-image text tuning, token-wise similarity in contrastive learning, and reduced-token interaction. Extensive experiments and a benchmarking of different downstream tasks including a new largest human-verified image-text test dataset are also provided. Experiments show that Wukong can serve as a promising Chinese pre-training dataset and benchmark for different cross-modal learning methods. For the zero-shot image classification task on 10 datasets, $Wukong_{ViT-L}$ achieves an average accuracy of 73.03%. For the image-text retrieval task, it achieves a mean recall of 71.6% on AIC-ICC which is 12.9% higher than WenLan 2.0. Also, our Wukong models are benchmarked on downstream tasks with other variants on multiple datasets, e.g., Flickr8K-CN, Flickr-30K-CN, COCO-CN, et al. More information can be referred to: https://wukong-dataset.github.io/wukong-dataset/.
Hamid Hezari, Zhiqin Lu, Hang Xu
We prove a new off-diagonal asymptotic of the Bergman kernels associated to tensor powers of a positive line bundle on a compact Kähler manifold. We show that if the Kähler potential is real analytic, then the Bergman kernel accepts a complete asymptotic expansion in a neighborhood of the diagonal of shrinking size $k^{-\frac14}$. These improve the earlier results in the subject for smooth potentials, where an expansion exists in a $k^{-\frac12}$ neighborhood of the diagonal. We obtain our results by finding upper bounds of the form $C^m m!^{2}$ for the Bergman coefficients $b_m(x, \bar y)$, which is an interesting problem on its own. We find such upper bounds using the method of Berman-Berndtsson-Sjöstrand. We also show that sharpening these upper bounds would improve the rate of shrinking neighborhoods of the diagonal $x=y$ in our results. In the special case of metrics with local constant holomorphic sectional curvatures, we obtain off-diagonal asymptotic in a fixed (as $k \to \infty$) neighborhood of the diagonal, which recovers a result of Berman [Ber] (see Remark 3.5 of [Ber] for higher dimensions). In this case, we also find an explicit formula for the Bergman kernel mod $O(e^{-k δ} )$.
Hang Xu, Song Li, Junhong Lin
We study deterministic matrix completion problem, i.e., recovering a low-rank matrix from a few observed entries where the sampling set is chosen as the edge set of a Ramanujan graph. We first investigate projected gradient descent (PGD) applied to a Burer-Monteiro least-squares problem and show that it converges linearly to the incoherent ground-truth with respect to the condition number \k{appa} of ground-truth under a benign initialization and large samples. We next apply the scaled variant of PGD to deal with the ill-conditioned case when \k{appa} is large, and we show the algorithm converges at a linear rate independent of the condition number \k{appa} under similar conditions. Finally, we provide numerical experiments to corroborate our results.
Peter Ebenfelt, Ming Xiao, Hang Xu
Let $M$ be a complete Kähler manifold, and let $(L, h) \to M$ be a positive line bundle inducing a Kähler metric $g$ on $M$. We study two Bergman kernels in this setting: the Bergman kernel of the disk bundle of the dual line bundle $(L^*, h^*)$, and the Bergman kernel of the line bundle $(L^k, h^k)$, $k\geq 1$, twisted by the canonical line bundle of $(M, g)$. We first prove a localization result for the former Bergman kernel. Then we establish a necessary and sufficient condition for this Bergman kernel to have no logarithmic singularity, expressed in terms of the Tian-Yau-Zelditch-Catlin type expansion of the latter Bergman kernel. This result, in particular, answers a question posed by Lu and Tian. As an application, we show that if $(M, g)$ is compact and locally homogeneous, then the circle bundle of $(L^*, h^*)$ is necessarily Bergman logarithmically flat.
Hang Xu, Xinyuan Liu, Haonan Xu, Yike Ma, Zunjie Zhu, Chenggang Yan, Feng Dai
Oriented object detection has been developed rapidly in the past few years, where rotation equivariance is crucial for detectors to predict rotated boxes. It is expected that the prediction can maintain the corresponding rotation when objects rotate, but severe mutation in angular prediction is sometimes observed when objects rotate near the boundary angle, which is well-known boundary discontinuity problem. The problem has been long believed to be caused by the sharp loss increase at the angular boundary, and widely used joint-optim IoU-like methods deal with this problem by loss-smoothing. However, we experimentally find that even state-of-the-art IoU-like methods actually fail to solve the problem. On further analysis, we find that the key to solution lies in encoding mode of the smoothing function rather than in joint or independent optimization. In existing IoU-like methods, the model essentially attempts to fit the angular relationship between box and object, where the break point at angular boundary makes the predictions highly unstable.To deal with this issue, we propose a dual-optimization paradigm for angles. We decouple reversibility and joint-optim from single smoothing function into two distinct entities, which for the first time achieves the objectives of both correcting angular boundary and blending angle with other parameters.Extensive experiments on multiple datasets show that boundary discontinuity problem is well-addressed. Moreover, typical IoU-like methods are improved to the same level without obvious performance gap. The code is available at https://github.com/hangxu-cv/cvpr24acm.
Wenshuo Ma, Tingzhong Tian, Hang Xu, Yimin Huang, Zhenguo Li
Most state-of-the-art object detection systems follow an anchor-based diagram. Anchor boxes are densely proposed over the images and the network is trained to predict the boxes position offset as well as the classification confidence. Existing systems pre-define anchor box shapes and sizes and ad-hoc heuristic adjustments are used to define the anchor configurations. However, this might be sub-optimal or even wrong when a new dataset or a new model is adopted. In this paper, we study the problem of automatically optimizing anchor boxes for object detection. We first demonstrate that the number of anchors, anchor scales and ratios are crucial factors for a reliable object detection system. By carefully analyzing the existing bounding box patterns on the feature hierarchy, we design a flexible and tight hyper-parameter space for anchor configurations. Then we propose a novel hyper-parameter optimization method named AABO to determine more appropriate anchor boxes for a certain dataset, in which Bayesian Optimization and subsampling method are combined to achieve precise and efficient anchor configuration optimization. Experiments demonstrate the effectiveness of our proposed method on different detectors and datasets, e.g. achieving around 2.4% mAP improvement on COCO, 1.6% on ADE and 1.5% on VG, and the optimal anchors can bring 1.4% to 2.4% mAP improvement on SOTA detectors by only optimizing anchor configurations, e.g. boosting Mask RCNN from 40.3% to 42.3%, and HTC detector from 46.8% to 48.2%.
Hang Xu, Shaoju Wang, Xinyue Cai, Wei Zhang, Xiaodan Liang, Zhenguo Li
We address the curve lane detection problem which poses more realistic challenges than conventional lane detection for better facilitating modern assisted/autonomous driving systems. Current hand-designed lane detection methods are not robust enough to capture the curve lanes especially the remote parts due to the lack of modeling both long-range contextual information and detailed curve trajectory. In this paper, we propose a novel lane-sensitive architecture search framework named CurveLane-NAS to automatically capture both long-ranged coherent and accurate short-range curve information while unifying both architecture search and post-processing on curve lane predictions via point blending. It consists of three search modules: a) a feature fusion search module to find a better fusion of the local and global context for multi-level hierarchy features; b) an elastic backbone search module to explore an efficient feature extractor with good semantics and latency; c) an adaptive point blending module to search a multi-level post-processing refinement strategy to combine multi-scale head prediction. The unified framework ensures lane-sensitive predictions by the mutual guidance between NAS and adaptive point blending. Furthermore, we also steer forward to release a more challenging benchmark named CurveLanes for addressing the most difficult curve lanes. It consists of 150K images with 680K labels.The new dataset can be downloaded at github.com/xbjxh/CurveLanes (already anonymized for this submission). Experiments on the new CurveLanes show that the SOTA lane detection methods suffer substantial performance drop while our model can still reach an 80+% F1-score. Extensive experiments on traditional lane benchmarks such as CULane also demonstrate the superiority of our CurveLane-NAS, e.g. achieving a new SOTA 74.8% F1-score on CULane.
Hamid Hezari, Hang Xu
We provide a simple proof of a result of Rouby-Sjöstrand-Ngoc \cite{RSN} and Deleporte \cite{Deleporte}, which asserts that if the Kähler potential is real analytic then the Bergman kernel is an \textit{analytic kernel} meaning that its amplitude is an \textit{analytic symbol} and its phase is given by the polarization of the Kähler potential. This in particular shows that in the analytic case the Bergman kernel accepts an asymptotic expansion in a fixed neighborhood of the diagonal with an exponentially small remainder. The proof we provide is based on a linear recursive formula of L. Charles \cite{Cha03} on the Bergman kernel coefficients which is similar to, but simpler than, the ones found in \cite{BBS}.
Hang Xu, Chen Long, Wenxiao Zhang, Yuan Liu, Zhen Cao, Zhen Dong, Bisheng Yang
In this paper, we explore a novel framework, EGIInet (Explicitly Guided Information Interaction Network), a model for View-guided Point cloud Completion (ViPC) task, which aims to restore a complete point cloud from a partial one with a single view image. In comparison with previous methods that relied on the global semantics of input images, EGIInet efficiently combines the information from two modalities by leveraging the geometric nature of the completion task. Specifically, we propose an explicitly guided information interaction strategy supported by modal alignment for point cloud completion. First, in contrast to previous methods which simply use 2D and 3D backbones to encode features respectively, we unified the encoding process to promote modal alignment. Second, we propose a novel explicitly guided information interaction strategy that could help the network identify critical information within images, thus achieving better guidance for completion. Extensive experiments demonstrate the effectiveness of our framework, and we achieved a new state-of-the-art (+16% CD over XMFnet) in benchmark datasets despite using fewer parameters than the previous methods. The pre-trained model and code and are available at https://github.com/WHU-USI3DV/EGIInet.
Hang Xu, Linjiang Huang, Feng Zhao
Test-time scaling (TTS) aims to achieve better results by increasing random sampling and evaluating samples based on rules and metrics. However, in text-to-image(T2I) diffusion models, most related works focus on search strategies and reward models, yet the impact of the stochastic characteristic of noise in T2I diffusion models on the method's performance remains unexplored. In this work, we analyze the effects of randomness in T2I diffusion models and explore a new format of randomness for TTS: text embedding perturbation, which couples with existing randomness like SDE-injected noise to enhance generative diversity and quality. We start with a frequency-domain analysis of these formats of randomness and their impact on generation, and find that these two randomness exhibit complementary behavior in the frequency domain: spatial noise favors low-frequency components (early steps), while text embedding perturbation enhances high-frequency details (later steps), thereby compensating for the potential limitations of spatial noise randomness in high-frequency manipulation. Concurrently, text embedding demonstrates varying levels of tolerance to perturbation across different dimensions of the generation process. Specifically, our method consists of two key designs: (1) Introducing step-based text embedding perturbation, combining frequency-guided noise schedules with spatial noise perturbation. (2) Adapting the perturbation intensity selectively based on their frequency-specific contributions to generation and tolerance to perturbation. Our approach can be seamlessly integrated into existing TTS methods and demonstrates significant improvements on multiple benchmarks with almost no additional computation. Code is available at \href{https://github.com/xuhang07/TEP-Diffusion}{https://github.com/xuhang07/TEP-Diffusion}.
Hang Xu, Xinghua Qu, Zinovi Rabinovich
This paper investigates policy resilience to training-environment poisoning attacks on reinforcement learning (RL) policies, with the goal of recovering the deployment performance of a poisoned RL policy. Due to the fact that the policy resilience is an add-on concern to RL algorithms, it should be resource-efficient, time-conserving, and widely applicable without compromising the performance of RL algorithms. This paper proposes such a policy-resilience mechanism based on an idea of knowledge sharing. We summarize the policy resilience as three stages: preparation, diagnosis, recovery. Specifically, we design the mechanism as a federated architecture coupled with a meta-learning manner, pursuing an efficient extraction and sharing of the environment knowledge. With the shared knowledge, a poisoned agent can quickly identify the deployment condition and accordingly recover its policy performance. We empirically evaluate the resilience mechanism for both model-based and model-free RL algorithms, showing its effectiveness and efficiency in restoring the deployment performance of a poisoned policy.
Yannick Sire, Hang Xu
In this paper, we introduce a new functional for the conformal spectrum of the conformal laplacian on a closed manifold M of dimension at least 3. For this new functional we provide a Korevaar type result. The main body of the paper deals with the case of the sphere but a section is devoted to more general closed manifolds.
Peter Ebenfelt, Ming Xiao, Hang Xu
In this paper, we investigate analytic and geometric properties of obstruction flatness of strongly pseudoconvex CR hypersurfaces of dimension $2n-1$. Our first two results concern local aspects. Theorem 3.2 asserts that any strongly pseudoconvex CR hypersurface $M\subset \mathbb{C}^n$ can be osculated at a given point $p\in M$ by an obstruction flat one up to order $2n+4$ generally and $2n+5$ if and only if $p$ is an obstruction flat point. In Theorem 4.1, we show that locally there are non-spherical but obstruction flat CR hypersurfaces with transverse symmetry for $n=2$. The final main result in this paper concerns the existence of obstruction flat points on compact, strongly pseudoconvex, 3-dimensional CR hypersurfaces. Theorem 5.1 asserts that the unit sphere in a negative line bundle over a Riemann surface $X$ always has at least one circle of obstruction flat points.
Lewei Yao, Hang Xu, Wei Zhang, Xiaodan Liang, Zhenguo Li
The state-of-the-art object detection method is complicated with various modules such as backbone, feature fusion neck, RPN and RCNN head, where each module may have different designs and structures. How to leverage the computational cost and accuracy trade-off for the structural combination as well as the modular selection of multiple modules? Neural architecture search (NAS) has shown great potential in finding an optimal solution. Existing NAS works for object detection only focus on searching better design of a single module such as backbone or feature fusion neck, while neglecting the balance of the whole system. In this paper, we present a two-stage coarse-to-fine searching strategy named Structural-to-Modular NAS (SM-NAS) for searching a GPU-friendly design of both an efficient combination of modules and better modular-level architecture for object detection. Specifically, Structural-level searching stage first aims to find an efficient combination of different modules; Modular-level searching stage then evolves each specific module and pushes the Pareto front forward to a faster task-specific network. We consider a multi-objective search where the search space covers many popular designs of detection methods. We directly search a detection backbone without pre-trained models or any proxy task by exploring a fast training from scratch strategy. The resulting architectures dominate state-of-the-art object detection systems in both inference time and accuracy and demonstrate the effectiveness on multiple detection datasets, e.g. halving the inference time with additional 1% mAP improvement compared to FPN and reaching 46% mAP with the similar inference time of MaskRCNN.
Kelly Kostopoulou, Hang Xu, Aritra Dutta, Xin Li, Alexandros Ntoulas, Panos Kalnis
Sparse tensors appear frequently in distributed deep learning, either as a direct artifact of the deep neural network's gradients, or as a result of an explicit sparsification process. Existing communication primitives are agnostic to the peculiarities of deep learning; consequently, they impose unnecessary communication overhead. This paper introduces DeepReduce, a versatile framework for the compressed communication of sparse tensors, tailored for distributed deep learning. DeepReduce decomposes sparse tensors in two sets, values and indices, and allows both independent and combined compression of these sets. We support a variety of common compressors, such as Deflate for values, or run-length encoding for indices. We also propose two novel compression schemes that achieve superior results: curve fitting-based for values and bloom filter-based for indices. DeepReduce is orthogonal to existing gradient sparsifiers and can be applied in conjunction with them, transparently to the end-user, to significantly lower the communication overhead. As proof of concept, we implement our approach on Tensorflow and PyTorch. Our experiments with large real models demonstrate that DeepReduce transmits fewer data and imposes lower computational overhead than existing methods, without affecting the training accuracy.
Hang Xu, Jie Huang, Linjiang Huang, Dong Li, Yidi Liu, Feng Zhao
Domain Adaptation(DA) for dense prediction tasks is an important topic, which enhances the dense prediction model's performance when tested on its unseen domain. Recently, with the development of Diffusion-based Dense Prediction (DDP) models, the exploration of DA designs tailored to this framework is worth exploring, since the diffusion model is effective in modeling the distribution transformation that comprises domain information. In this work, we propose a training-free mechanism for DDP frameworks, endowing them with DA capabilities. Our motivation arises from the observation that the exposure bias (e.g., noise statistics bias) in diffusion brings domain shift, and different domains in conditions of DDP models can also be effectively captured by the noise prediction statistics. Based on this, we propose a training-free Domain Noise Alignment (DNA) approach, which alleviates the variations of noise statistics to domain changes during the diffusion sampling process, thereby achieving domain adaptation. Specifically, when the source domain is available, we directly adopt the DNA method to achieve domain adaptation by aligning the noise statistics of the target domain with those of the source domain. For the more challenging source-free DA, inspired by the observation that regions closer to the source domain exhibit higher confidence meeting variations of sampling noise, we utilize the statistics from the high-confidence regions progressively to guide the noise statistic adjustment during the sampling process. Notably, our method demonstrates the effectiveness of enhancing the DA capability of DDP models across four common dense prediction tasks. Code is available at \href{https://github.com/xuhang07/FreeDNA}{https://github.com/xuhang07/FreeDNA}.
Hang Xu, Kai Li, Bingyun Liu, Haobo Fu, Qiang Fu, Junliang Xing, Jian Cheng
Counterfactual regret minimization (CFR) is a family of algorithms for effectively solving imperfect-information games. It decomposes the total regret into counterfactual regrets, utilizing local regret minimization algorithms, such as Regret Matching (RM) or RM+, to minimize them. Recent research establishes a connection between Online Mirror Descent (OMD) and RM+, paving the way for an optimistic variant PRM+ and its extension PCFR+. However, PCFR+ assigns uniform weights for each iteration when determining regrets, leading to substantial regrets when facing dominated actions. This work explores minimizing weighted counterfactual regret with optimistic OMD, resulting in a novel CFR variant PDCFR+. It integrates PCFR+ and Discounted CFR (DCFR) in a principled manner, swiftly mitigating negative effects of dominated actions and consistently leveraging predictions to accelerate convergence. Theoretical analyses prove that PDCFR+ converges to a Nash equilibrium, particularly under distinct weighting schemes for regrets and average strategies. Experimental results demonstrate PDCFR+'s fast convergence in common imperfect-information games. The code is available at https://github.com/rpSebastian/PDCFRPlus.
Hang Xu
The Homotopy Analysis Method (HAM) is a widely used analytical approach for solving nonlinear problems, yet its theoretical foundation lacks rigorous justification, and its intrinsic correlation with perturbation theory remains ambiguous, leading to prevalent confusion in the existing literature. This study demonstrates that the fundamental homotopy deformation equation of HAM can be naturally derived from the weak-nonlinearity perturbation theory. We construct a specific analytical expression and optimize the core parameters (the optimal auxiliary linear operator, convergence-control parameter, and auxiliary function) to mitigate the inherent strong nonlinearity of the nonlinear operator. Extending the small parameter εof perturbation theory to the interval [0,1] enables a systematic homotopy deformation process, which connects the linear auxiliary system (at ε=0) with the original nonlinear problem (at ε=1) and confirms HAM as a structured, adaptive generalization of classical perturbation theory. Furthermore, this work provides a rigorous proof that the Homotopy Perturbation Method (HPM) is a special case of HAM: HPM can be directly derived by fixing the optimal auxiliary linear operator as the linear component of the nonlinear system and setting the convergence-control parameter and auxiliary function to specific values, thus making HPM a degenerate form of HAM. This study clarifies the perturbation-theoretic origin of HAM, defines the hierarchical subordination of HPM to HAM, unifies the theoretical framework of homotopy-based nonlinear analytical methods, rectifies common misconceptions in the existing literature, and offers valuable guidance for the rational application, comparative analysis, and further development of such methods.
Peter Ebenfelt, Ming Xiao, Hang Xu
Obstruction flatness of a strongly pseudoconvex hypersurface $Σ$ in a complex manifold refers to the property that any (local) Kähler-Einstein metric on the pseudoconvex side of $Σ$, complete up to $Σ$, has a potential $-\log u$ such that $u$ is $C^\infty$-smooth up to $Σ$. In general, $u$ has only a finite degree of smoothness up to $Σ$. In this paper, we study obstruction flatness of hypersurfaces $Σ$ that arise as unit circle bundles $S(L)$ of negative Hermitian line bundles $(L, h)$ over Kähler manifolds $(M, g).$ We prove that if $(M,g)$ has constant Ricci eigenvalues, then $S(L)$ is obstruction flat. If, in addition, all these eigenvalues are strictly less than one and $(M,g)$ is complete, then we show that the corresponding disk bundle admits a complete Kähler-Einstein metric. Finally, we give a necessary and sufficient condition for obstruction flatness of $S(L)$ when $(M, g)$ is a Kähler surface $(\dim M=2$) with constant scalar curvature.