Chong Zhang, Qi Wu, Liqian Ma, Hongyuan Su
Not until recently, robust robot locomotion has been achieved by deep reinforcement learning (DRL). However, for efficient learning of parametrized bipedal walking, developed references are usually required, limiting the performance to that of the references. In this paper, we propose to design an adaptive reward function for imitation learning from the references. The agent is encouraged to mimic the references when its performance is low, while to pursue high performance when it reaches the limit of references. We further demonstrate that developed references can be replaced by low-quality references that are generated without laborious tuning and infeasible to deploy by themselves, as long as they can provide a priori knowledge to expedite the learning process.
Qi Wu, Yixiao Zhu, Hexun Jiang, Mengfan Fu, Yikun Zhang, Qunbi Zhuge, Weisheng Hu
Data centers, the engines of the global Internet, are supported by massive high-speed optical interconnects. In optical fiber communication, the classic direct detection obtains only the intensity of the optical field, while the coherent detection counterpart utilizes both phase and polarization diversities at the expense of beating with a narrow-linewidth and high-stable local oscillator (LO). Herein, we propose and demonstrate a four-dimensional Jones space optical field recovery (4-D JSFR) scheme without LO. The information encoded on the intensity and phase of both polarizations can be captured by the polarization-diversity full-field receiver structure and subsequently extracted through deep neural network-aided field recovery. It achieves similar electrical spectral efficiency as standard intradyne coherent detection. The fully recovered optical field can extend the transmission distance beyond the power fading limitation induced by fiber chromatic dispersion. Furthermore, the LO-free advantage makes 4-D JSFR suitable for monolithic photonic integration, offering a spectrally efficient and cost-effective candidate for large-scale data center applications. Our results could motivate a fundamental paradigm shift in the optical field recovery theory and future optical transceiver design.
Qi Wu, Yan-Ke Chen, Gang Li, Shi-Dong Liu, Dian-Yong Chen
In the present work, we investigate the productions of the molecular states composed of $D^{(*)}_s \bar{D}^{(*)}$ and $D^{(*)}_s \bar{D}^{(*)}_s$ in the $B$ and $B_s$ decays by using an effective Lagrangian approach. The branching ratios in terms of the model parameter $α$ and the binding energy $ΔE$ are estimated. Our estimations indicate that the branching fractions are of the order of $10^{-4}$ and the relative ratios are very weakly dependent on the model parameter $α$ and the binding energy $ΔE$. The predicted ratios are helpful for searching the hidden-charm molecular states with strange quark in the future experiments at Belle II and LHCb.
Di Wang, Qi Wu, Wen Zhang
This paper takes a deep learning approach to understand consumer credit risk when e-commerce platforms issue unsecured credit to finance customers' purchase. The "NeuCredit" model can capture both serial dependences in multi-dimensional time series data when event frequencies in each dimension differ. It also captures nonlinear cross-sectional interactions among different time-evolving features. Also, the predicted default probability is designed to be interpretable such that risks can be decomposed into three components: the subjective risk indicating the consumers' willingness to repay, the objective risk indicating their ability to repay, and the behavioral risk indicating consumers' behavioral differences. Using a unique dataset from one of the largest global e-commerce platforms, we show that the inclusion of shopping behavioral data, besides conventional payment records, requires a deep learning approach to extract the information content of these data, which turns out significantly enhancing forecasting performance than the traditional machine learning methods.
Qi Wu, Peng Wang, Chunhua Shen, Ian Reid, Anton van den Hengel
The Visual Dialogue task requires an agent to engage in a conversation about an image with a human. It represents an extension of the Visual Question Answering task in that the agent needs to answer a question about an image, but it needs to do so in light of the previous dialogue that has taken place. The key challenge in Visual Dialogue is thus maintaining a consistent, and natural dialogue while continuing to answer questions correctly. We present a novel approach that combines Reinforcement Learning and Generative Adversarial Networks (GANs) to generate more human-like responses to questions. The GAN helps overcome the relative paucity of training data, and the tendency of the typical MLE-based approach to generate overly terse answers. Critically, the GAN is tightly integrated into the attention mechanism that generates human-interpretable reasons for each answer. This means that the discriminative model of the GAN has the task of assessing whether a candidate answer is generated by a human or not, given the provided reason. This is significant because it drives the generative model to produce high quality answers that are well supported by the associated reasoning. The method also generates the state-of-the-art results on the primary benchmark.
Junjie Zhang, Qi Wu, Chunhua Shen, Jian Zhang, Jianfeng Lu, Anton van den Hengel
Despite significant progress in a variety of vision-and-language problems, developing a method capable of asking intelligent, goal-oriented questions about images is proven to be an inscrutable challenge. Towards this end, we propose a Deep Reinforcement Learning framework based on three new intermediate rewards, namely goal-achieved, progressive and informativeness that encourage the generation of succinct questions, which in turn uncover valuable information towards the overall goal. By directly optimizing for questions that work quickly towards fulfilling the overall goal, we avoid the tendency of existing methods to generate long series of insane queries that add little value. We evaluate our model on the GuessWhat?! dataset and show that the resulting questions can help a standard Guesser identify a specific object in an image at a much higher success rate.
Qi Wu, Yingguang Yang, Buyun He, Hao Liu, Renyu Yang, Yong Liao
Detecting ever-evolving social bots has become increasingly challenging. Advanced bots tend to interact more with humans as a camouflage to evade detection. While graph-based detection methods can exploit various relations in social networks to model node behaviors, the aggregated information from neighbors largely ignore the inherent heterophily, i.e., the connections between different classes of accounts. Message passing mechanism on heterophilic edges can lead to feature mixture between bots and normal users, resulting in more false negatives. In this paper, we present BotSCL, a heterophily-aware contrastive learning framework that can adaptively differentiate neighbor representations of heterophilic relations while assimilating the representations of homophilic neighbors. Specifically, we employ two graph augmentation methods to generate different graph views and design a channel-wise and attention-free encoder to overcome the limitation of neighbor information summing. Supervised contrastive learning is used to guide the encoder to aggregate class-specific information. Extensive experiments on two social bot detection benchmarks demonstrate that BotSCL outperforms baseline approaches including the state-of-the-art bot detection approaches, partially heterophilic GNNs and self-supervised contrast learning methods.
Qi Wu, Joseph A. Insley, Victor A. Mateevitsi, Silvio Rizzi, Michael E. Papka, Kwan-Liu Ma
Implicit neural representations (INRs) have emerged as a powerful tool for compressing large-scale volume data. This opens up new possibilities for in situ visualization. However, the efficient application of INRs to distributed data remains an underexplored area. In this work, we develop a distributed volumetric neural representation and optimize it for in situ visualization. Our technique eliminates data exchanges between processes, achieving state-of-the-art compression speed, quality and ratios. Our technique also enables the implementation of an efficient strategy for caching large-scale simulation data in high temporal frequencies, further facilitating the use of reactive in situ visualization in a wider range of scientific problems. We integrate this system with the Ascent infrastructure and evaluate its performance and usability using real-world simulations.
Qi Wu, Mingyan Han, Ting Jiang, Chengzhi Jiang, Jinting Luo, Man Jiang, Haoqiang Fan, Shuaicheng Liu
Deep denoising models require extensive real-world training data, which is challenging to acquire. Current noise synthesis techniques struggle to accurately model complex noise distributions. We propose a novel Realistic Noise Synthesis Diffusor (RNSD) method using diffusion models to address these challenges. By encoding camera settings into a time-aware camera-conditioned affine modulation (TCCAM), RNSD generates more realistic noise distributions under various camera conditions. Additionally, RNSD integrates a multi-scale content-aware module (MCAM), enabling the generation of structured noise with spatial correlations across multiple frequencies. We also introduce Deep Image Prior Sampling (DIPS), a learnable sampling sequence based on depth image prior, which significantly accelerates the sampling process while maintaining the high quality of synthesized noise. Extensive experiments demonstrate that our RNSD method significantly outperforms existing techniques in synthesizing realistic noise under multiple metrics and improving image denoising performance.
Arisa Cowe, Tyson Neuroth, Qi Wu, Martin Rieth, Jacqueline Chen, Myoungkyu Lee, Kwan-Liu Ma
Many scientific and engineering problems involving multi-physics span a wide range of scales. Understanding the interactions across these scales is essential for fully comprehending such complex problems. However, visualizing multivariate, multiscale data within an integrated view where correlations across space, scales, and fields are easily perceived remains challenging. To address this, we introduce a novel local spatial statistical visualization of flow fields across multiple fields and turbulence scales. Our method leverages the curvelet transform for scale decomposition of fields of interest, a level-set-restricted centroidal Voronoi tessellation to partition the spatial domain into local regions for statistical aggregation, and a set of glyph designs that combines information across scales and fields into a single, or reduced set of perceivable visual representations. Each glyph represents data aggregated within a Voronoi region and is positioned at the Voronoi site for direct visualization in a 3D view centered around flow features of interest. We implement and integrate our method into an interactive visualization system where the glyph-based technique operates in tandem with linked 3D spatial views and 2D statistical views, supporting a holistic analysis. We demonstrate with case studies visualizing turbulent combustion data--multi-scalar compressible flows--and turbulent incompressible channel flow data. This new capability enables scientists to better understand the interactions between multiple fields and length scales in turbulent flows.
Zhenfang Chen, Peng Wang, Lin Ma, Kwan-Yee K. Wong, Qi Wu
Referring expression comprehension (REF) aims at identifying a particular object in a scene by a natural language expression. It requires joint reasoning over the textual and visual domains to solve the problem. Some popular referring expression datasets, however, fail to provide an ideal test bed for evaluating the reasoning ability of the models, mainly because 1) their expressions typically describe only some simple distinctive properties of the object and 2) their images contain limited distracting information. To bridge the gap, we propose a new dataset for visual reasoning in context of referring expression comprehension with two main features. First, we design a novel expression engine rendering various reasoning logics that can be flexibly combined with rich visual properties to generate expressions with varying compositionality. Second, to better exploit the full reasoning chain embodied in an expression, we propose a new test setting by adding additional distracting images containing objects sharing similar properties with the referent, thus minimising the success rate of reasoning-free cross-domain alignment. We evaluate several state-of-the-art REF models, but find none of them can achieve promising performance. A proposed modular hard mining strategy performs the best but still leaves substantial room for improvement. We hope this new dataset and task can serve as a benchmark for deeper visual reasoning analysis and foster the research on referring expression comprehension.
Hongping Cai, Qi Wu, Tadeo Corradi, Peter Hall
The cross-depiction problem is that of recognising visual objects regardless of whether they are photographed, painted, drawn, etc. It is a potentially significant yet under-researched problem. Emulating the remarkable human ability to recognise objects in an astonishingly wide variety of depictive forms is likely to advance both the foundations and the applications of Computer Vision. In this paper we benchmark classification, domain adaptation, and deep learning methods; demonstrating that none perform consistently well in the cross-depiction problem. Given the current interest in deep learning, the fact such methods exhibit the same behaviour as all but one other method: they show a significant fall in performance over inhomogeneous databases compared to their peak performance, which is always over data comprising photographs only. Rather, we find the methods that have strong models of spatial relations between parts tend to be more robust and therefore conclude that such information is important in modelling object classes regardless of appearance details.
Ming-Zhu Liu, Qi Wu
Vector charmonium states can be directly produced in the $e^+e^{-}$ annihilation process. Among them, $Y(4230)$ and $Y(4360)$ splitting from the previously discovered $Y(4260)$ are not easily arranged into the conventional charmonium spectrum, while recent studies have indicated that they have strong couplings to $D\bar{D}_1$ and $D^*\bar{D}_1$. In this work, we investigate the production of $Y(4230)$ and $Y(4360)$ as the heavy-quark spin symmetry doublet hadronic molecules of $D\bar{D}_1$ and $D^*\bar{D}_1$ in $B$ decays via the triangle diagram mechanism. In particular, we propose that the decay constants of $Y(4230)$ and $Y(4360)$ extracted in $B$ decays are useful for clarifying their nature.
Qi Wu, Janick Martinez Esturo, Ashkan Mirzaei, Nicolas Moenne-Loccoz, Zan Gojcic
3D Gaussian Splatting (3DGS) enables efficient reconstruction and high-fidelity real-time rendering of complex scenes on consumer hardware. However, due to its rasterization-based formulation, 3DGS is constrained to ideal pinhole cameras and lacks support for secondary lighting effects. Recent methods address these limitations by tracing the particles instead, but, this comes at the cost of significantly slower rendering. In this work, we propose 3D Gaussian Unscented Transform (3DGUT), replacing the EWA splatting formulation with the Unscented Transform that approximates the particles through sigma points, which can be projected exactly under any nonlinear projection function. This modification enables trivial support of distorted cameras with time dependent effects such as rolling shutter, while retaining the efficiency of rasterization. Additionally, we align our rendering formulation with that of tracing-based methods, enabling secondary ray tracing required to represent phenomena such as reflections and refraction within the same 3D representation. The source code is available at: https://github.com/nv-tlabs/3dgrut.
Chenfeng Wei, Qi Wu, Si Zuo, Jiahua Xu, Boyang Zhao, Zeyu Yang, Guotao Xie, Shenhong Wang
Autonomous driving datasets are essential for validating the progress of intelligent vehicle algorithms, which include localization, perception, and prediction. However, existing datasets are predominantly focused on structured urban environments, which limits the exploration of unstructured and specialized scenarios, particularly those characterized by significant dust levels. This paper introduces the LiDARDustX dataset, which is specifically designed for perception tasks under high-dust conditions, such as those encountered in mining areas. The LiDARDustX dataset consists of 30,000 LiDAR frames captured by six different LiDAR sensors, each accompanied by 3D bounding box annotations and point cloud semantic segmentation. Notably, over 80% of the dataset comprises dust-affected scenes. By utilizing this dataset, we have established a benchmark for evaluating the performance of state-of-the-art 3D detection and segmentation algorithms. Additionally, we have analyzed the impact of dust on perception accuracy and delved into the causes of these effects. The data and further information can be accessed at: https://github.com/vincentweikey/LiDARDustX.
Qi Wu, Dian-Yong Chen, Wen-Hua Qin, Gang Li
In the present work, we investigate the production of $Z_{cs}^+$ in $B^+$ and $B_s^0$ decay, where $Z_{cs}^+$ is assigned as a $D_s^{+} \bar{D}^{\ast0} + D_s^{\ast +}\bar{D}^0$ molecular state. By using an effective Lagrangian approach, we evaluate the branching ratio of $B^0_s\rightarrow K^- Z^+_{cs}$ and $B^+\rightarrow φZ^{+}_{cs}$ via the triangle loop mechanism. The estimated branching fractions of $B^0_s\rightarrow K^- Z^+_{cs}$ and $B^+\rightarrow φZ^{+}_{cs}$ are an order of $10^{-4} $ and $10^{-5}$, respectively. The ratio of these two branching fraction is estimated to be about 5, which indicate that the $B_s^0 \to K^\pm Z^\mp_{cs} \to K^+ K^- J/ψ$ may be a better process of searching $Z_{cs}$ and accessible for further experimental measurement of the Belle II and LHCb collaborations.
Qi Wu, Zhongqi Lu
Recent advances in large language models (LLMs) have significantly improved the performance of dialog systems, yet current approaches often fail to provide accurate guidance of topic due to their inability to discern user confusion in related concepts. To address this, we introduce the Ask-Good-Question (AGQ) framework, which features an improved Concept-Enhanced Item Response Theory (CEIRT) model to better identify users' knowledge levels. Our contributions include applying the CEIRT model along with LLMs to directly generate guiding questions based on the inspiring text, greatly improving information retrieval efficiency during the question & answer process. Through comparisons with other baseline methods, our approach outperforms by significantly enhencing the users' information retrieval experiences.
Qi Wu, Chong Zhang, Yanchen Liu
Not until recently, robust bipedal locomotion has been achieved through reinforcement learning. However, existing implementations rely heavily on insights and efforts from human experts, which is costly for the iterative design of robot systems. Also, styles of the learned motion are strictly limited to that of the reference. In this paper, we propose a new way to learn bipedal locomotion from a simple sine wave as the reference for foot heights. With the naive human insight that the two feet should be lifted up alternatively and periodically, we experimentally demonstrate on the Cassie robot that, a simple reward function is able to make the robot learn to walk end-to-end and efficiently without any explicit knowledge of the model. With custom sine waves, the learned gait pattern can also have customized styles. Codes are released at github.com/WooQi57/sin-cassie-rl.
Qi Wu, Yixiao Zhu, Hexun Jiang, Qunbi Zhuge, Weisheng Hu
Optical full-field recovery makes it possible to compensate for fiber impairments such as chromatic dispersion and polarization mode dispersion (PMD) in the digital signal processing. For cost-sensitive short-reach optical networks, some advanced single-polarization (SP) optical field recovery schemes are recently proposed to avoid chromatic dispersion-induced power fading effect, and improve the spectral efficiency for larger potential capacity. Polarization division multiplexing (PDM) can further double both the spectral efficiency and the system capacity of these SP carrier-assisted direct detection (DD) schemes. However, the so-called polarization fading phenomenon induced by random polarization rotation is a fundamental obstacle which prevents SP carrier-assisted DD systems from polarization diversity. In this paper, we propose a receiver of Jones-space field recovery (JSFR) to realize polarization diversity with SP carrier-assisted DD schemes in Jones space. Different receiver structures and simplified recovery procedures for JSFR are explored theoretically. The proposed JSFR pushes the SP DD schemes towards PDM without extra optical signal-to-noise ratio (OSNR) penalty. In addition, the JSFR shows good tolerance to PMD since the optical field recovery is conducted before polarization recovery. In the concept-of-proof experiment, we demonstrate 448-Gb/s reception over 80-km single-mode fiber using the proposed JSFR based on 22 couplers. Furthermore, we qualitatively compare the optical field recovery in Jones space and Stokes space from the perspective of the modulation dimension. Qualitatively, we compare the optical field recovery in the Jones space and Stokes space from the perspective of the modulation dimension.
Xing Yan, Qi Wu, Wen Zhang
May 31, 2019·q-fin.RM·PDF We propose a novel probabilistic model to facilitate the learning of multivariate tail dependence of multiple financial assets. Our method allows one to construct from known random vectors, e.g., standard normal, sophisticated joint heavy-tailed random vectors featuring not only distinct marginal tail heaviness, but also flexible tail dependence structure. The novelty lies in that pairwise tail dependence between any two dimensions is modeled separately from their correlation, and can vary respectively according to its own parameter rather than the correlation parameter, which is an essential advantage over many commonly used methods such as multivariate $t$ or elliptical distribution. It is also intuitive to interpret, easy to track, and simple to sample comparing to the copula approach. We show its flexible tail dependence structure through simulation. Coupled with a GARCH model to eliminate serial dependence of each individual asset return series, we use this novel method to model and forecast multivariate conditional distribution of stock returns, and obtain notable performance improvements in multi-dimensional coverage tests. Besides, our empirical finding about the asymmetry of tails of the idiosyncratic component as well as the market component is interesting and worth to be well studied in the future.