Qi Wu, David Bauer, Michael J. Doyle, Kwan-Liu Ma
Neural networks have shown great potential in compressing volume data for visualization. However, due to the high cost of training and inference, such volumetric neural representations have thus far only been applied to offline data processing and non-interactive rendering. In this paper, we demonstrate that by simultaneously leveraging modern GPU tensor cores, a native CUDA neural network framework, and a well-designed rendering algorithm with macro-cell acceleration, we can interactively ray trace volumetric neural representations (10-60fps). Our neural representations are also high-fidelity (PSNR > 30dB) and compact (10-1000x smaller). Additionally, we show that it is possible to fit the entire training step inside a rendering loop and skip the pre-training process completely. To support extreme-scale volume data, we also develop an efficient out-of-core training strategy, which allows our volumetric neural representation training to potentially scale up to terascale using only an NVIDIA RTX 3090 workstation.
Qi Wu, Jun-Zhang Wang, Shi-Lin Zhu
We study the $ρ$ and $ω$ meson contribution to the radiative decays $X(3872)\rightarrow J/ψπγ$ and $X(3872)\rightarrow J/ψππγ$. The $X(3872)\rightarrow J/ψπγ$ is dominated by the $ω$ meson. As for the $X(3872)\rightarrow J/ψππγ$, the contributions of the cascade decays through the $ρ$ and $ω$ mesons are strongly suppressed with respect to the diagrams which proceed either through the $ψ(2S)$ or the three body decay of $ρ$. The branching ratios of $X(3872)\rightarrow J/ψπγ$ and $X(3872)\rightarrow J/ψππγ$ are $(8.10^{+3.50}_{-2.88})\times10^{-3}$ and $(2.38\pm1.06)\%$, which may be accessible by the BESIII and LHCb Collaborations. Especailly, the $X(3872)\rightarrow J/ψπγ$ and $X(3872)\rightarrow J/ψπ^+π^- γ$ decays can be employed to extract the couplings $g_{Xψω}$ and $g_{Xψρ}$, which probe the isoscalar and isovector components of the X(3872) wave function respectively.
Qi Wu, Yuanxin Zheng, Shidong Liu, Gang Li
The light hadron decay processes of $Z_b(10610)/Z_b(10650)$ provide us a way to study their nature and decay mechanism. In this work, we evaluate the branching ratios of $Z_b(10610)/Z_b(10650) \to VP$ ($V$ and $P$ stand for light vector and pseudoscalar mesons, respectively) using an effective Lagrangian approach, in which the contributions of intermediate bottomed meson triangle loops are considered. In our calculations, the $Z_b(10610)$ and $Z_b(10650)$ are regarded as $B\bar{B}^*+c.c.$ and $B^*\bar{B}^*$ molecular states, respectively. The predicted branching ratios of $Z_b(10610)\rightarrow VP$ are about in the order of $10^{-2}$, while the branching ratios of $Z_b(10650)\rightarrow VP$ are in the order of $10^{-3}$. Furthermore, the dependence of these ratios between different decay modes of $Z_b(10610)/Z_b(10650)$ on the mixing $η-η^\prime$ angle $θ_P$ is investigated, which may be a good quantity for the experiments. It is hoped that the calculations here could be tested by future experiments.
Qi Wu, Quanlong Zheng, Yanhao Zhang, Junlin Xie, Jinguo Luo, Kuo Wang, Peng Liu, Qingsong Xie, Ru Zhen, Zhenyu Yang, Haonan Lu
With the rapid development of multimodal models, the demand for assessing video understanding capabilities has been steadily increasing. However, existing benchmarks for evaluating video understanding exhibit significant limitations in coverage, task diversity, and scene adaptability. These shortcomings hinder the accurate assessment of models' comprehensive video understanding capabilities. To tackle this challenge, we propose a hierarchical and holistic video understanding (H2VU) benchmark designed to evaluate both general video and online streaming video comprehension. This benchmark contributes three key features: Extended video duration: Spanning videos from brief 3-second clips to comprehensive 1.5-hour recordings, thereby bridging the temporal gaps found in current benchmarks. Comprehensive assessment tasks: Beyond traditional perceptual and reasoning tasks, we have introduced modules for countercommonsense comprehension and trajectory state tracking. These additions test the models' deep understanding capabilities beyond mere prior knowledge. Enriched video data: To keep pace with the rapid evolution of current AI agents, we have expanded first-person streaming video datasets. This expansion allows for the exploration of multimodal models' performance in understanding streaming videos from a first-person perspective. Extensive results from H2VU reveal that existing multimodal large language models (MLLMs) possess substantial potential for improvement in our newly proposed evaluation tasks. We expect that H2VU will facilitate advancements in video understanding research by offering a comprehensive and in-depth analysis of MLLMs.
Qi Wu, Khiem Vuong, Minsik Jeon, Srinivasa Narasimhan, Deva Ramanan
We tackle the problem of sparse novel view synthesis (NVS) using video diffusion models; given $K$ ($\approx 5$) multi-view images of a scene and their camera poses, we predict the view from a target camera pose. Many prior approaches leverage generative image priors encoded via diffusion models. However, models trained on single images lack multi-view knowledge. We instead argue that video models already contain implicit multi-view knowledge and so should be easier to adapt for NVS. Our key insight is to formulate sparse NVS as a low frame-rate video completion task. However, one challenge is that sparse NVS is defined over an unordered set of inputs, often too sparse to admit a meaningful order, so the models should be $\textit{invariant}$ to permutations of that input set. To this end, we present FrameCrafter, which adapts video models (naturally trained with coherent frame orderings) to permutation-invariant NVS through several architectural modifications, including per-frame latent encodings and removal of temporal positional embeddings. Our results suggest that video models can be easily trained to "forget" about time with minimal supervision, producing competitive performance on sparse-view NVS benchmarks. Project page: https://frame-crafter.github.io/
Yanyuan Qiao, Chaorui Deng, Qi Wu
Referring expression comprehension (REC) aims to localize a target object in an image described by a referring expression phrased in natural language. Different from the object detection task that queried object labels have been pre-defined, the REC problem only can observe the queries during the test. It thus more challenging than a conventional computer vision problem. This task has attracted a lot of attention from both computer vision and natural language processing community, and several lines of work have been proposed, from CNN-RNN model, modular network to complex graph-based model. In this survey, we first examine the state of the art by comparing modern approaches to the problem. We classify methods by their mechanism to encode the visual and textual modalities. In particular, we examine the common approach of joint embedding images and expressions to a common feature space. We also discuss modular architectures and graph-based models that interface with structured graph representation. In the second part of this survey, we review the datasets available for training and evaluating REC systems. We then group results according to the datasets, backbone models, settings so that they can be fairly compared. Finally, we discuss promising future directions for the field, in particular the compositional referring expression comprehension that requires longer reasoning chain to address.
Xing Yan, Weizhong Zhang, Lin Ma, Wei Liu, Qi Wu
Oct 16, 2020·q-fin.RM·PDF We propose a parsimonious quantile regression framework to learn the dynamic tail behaviors of financial asset returns. Our model captures well both the time-varying characteristic and the asymmetrical heavy-tail property of financial time series. It combines the merits of a popular sequential neural network model, i.e., LSTM, with a novel parametric quantile function that we construct to represent the conditional distribution of asset returns. Our model also captures individually the serial dependences of higher moments, rather than just the volatility. Across a wide range of asset classes, the out-of-sample forecasts of conditional quantiles or VaR of our model outperform the GARCH family. Further, the proposed approach does not suffer from the issue of quantile crossing, nor does it expose to the ill-posedness comparing to the parametric probability density function approach.
Qi Wu, Dian-Yong Chen, Ran Ji
Inspired by the $P_{cs}(4459)$ reported by the LHCb collaboration recently, we investigate the $P_{cs}(4459)$ production from $Ξ_b$ decay in a molecular scenario using an effective Lagrangian approach. With different $J^P$ assignments to $P_{cs}(4459)$, the magnitude of branching fractions of $Ξ_b \to P_{cs}(4459) K$ is estimated, which is of the order of $10^{-4}$. Together with the decay properties of $P_{cs}(4459)$, the present estimations could be further testified by precise measurements and contribute to a better understanding of the molecular interpretations and the exploration of $J^P$ quantum numbers of $P_{cs}(4459)$.
Qi Wu, Peng Wang, Chenghao Huang
Natural language processing (NLP) has been applied to various fields including text classification and sentiment analysis. In the shared task of sentiment analysis of code-mixed tweets, which is a part of the SemEval-2020 competition~\cite{patwa2020sentimix}, we preprocess datasets by replacing emoji and deleting uncommon characters and so on, and then fine-tune the Bidirectional Encoder Representation from Transformers(BERT) to perform the best. After exhausting top3 submissions, Our team MeisterMorxrc achieves an averaged F1 score of 0.730 in this task, and and our codalab username is MeisterMorxrc.
Qi Wu, Tyson Neuroth, Oleg Igouchkine, Konduri Aditya, Jacqueline H. Chen, Kwan-Liu Ma
The use of adaptive workflow management for in situ visualization and analysis has been a growing trend in large-scale scientific simulations. However, coordinating adaptive workflows with traditional procedural programming languages can be difficult because system flow is determined by unpredictable scientific phenomena, which often appear in an unknown order and can evade event handling. This makes the implementation of adaptive workflows tedious and error-prone. Recently, reactive and declarative programming paradigms have been recognized as well-suited solutions to similar problems in other domains. However, there is a dearth of research on adapting these approaches to in situ visualization and analysis. With this paper, we present a language design and runtime system for developing adaptive systems through a declarative and reactive programming paradigm. We illustrate how an adaptive workflow programming system is implemented using our approach and demonstrate it with a use case from a combustion simulation.
Qi Wu, Shumin Ma, Cheuk Hang Leung, Wei Liu, Nanbo Peng
This paper provides a non-robust interpretation of the distributionally robust optimization (DRO) problem by relating the distributional uncertainties to the chance probabilities. Our analysis allows a decision-maker to interpret the size of the ambiguity set, which is often lack of business meaning, through the chance parameters constraining the objective function. We first show that, for general $φ$-divergences, a DRO problem is asymptotically equivalent to a class of mean-deviation problems. These mean-deviation problems are not subject to uncertain distributions, and the ambiguity radius in the original DRO problem now plays the role of controlling the risk preference of the decision-maker. We then demonstrate that a DRO problem can be cast as a chance-constrained optimization (CCO) problem when a boundedness constraint is added to the decision variables. Without the boundedness constraint, the CCO problem is shown to perform uniformly better than the DRO problem, irrespective of the radius of the ambiguity set, the choice of the divergence measure, or the tail heaviness of the center distribution. Thanks to our high-order expansion result, a notable feature of our analysis is that it applies to divergence measures that accommodate well heavy tail distributions such as the student $t$-distribution and the lognormal distribution, besides the widely-used Kullback-Leibler (KL) divergence, which requires the distribution of the objective function to be exponentially bounded. Using the portfolio selection problem as an example, our comprehensive testings on multivariate heavy-tail datasets, both synthetic and real-world, shows that this business-interpretation approach is indeed useful and insightful.
Qi Wu, David Bauer, Yuyang Chen, Kwan-Liu Ma
Implicit Neural Representations (INRs) have recently exhibited immense potential in the field of scientific visualization for both data generation and visualization tasks. However, these representations often consist of large multi-layer perceptrons (MLPs), necessitating millions of operations for a single forward pass, consequently hindering interactive visual exploration. While reducing the size of the MLPs and employing efficient parametric encoding schemes can alleviate this issue, it compromises generalizability for unseen parameters, rendering it unsuitable for tasks such as temporal super-resolution. In this paper, we introduce HyperINR, a novel hypernetwork architecture capable of directly predicting the weights for a compact INR. By harnessing an ensemble of multiresolution hash encoding units in unison, the resulting INR attains state-of-the-art inference performance (up to 100x higher inference bandwidth) and can support interactive photo-realistic volume visualization. Additionally, by incorporating knowledge distillation, exceptional data and visualization generation quality is achieved, making our method valuable for real-time parameter exploration. We validate the effectiveness of the HyperINR architecture through a comprehensive ablation study. We showcase the versatility of HyperINR across three distinct scientific domains: novel view synthesis, temporal super-resolution of volume data, and volume rendering with dynamic global shadows. By simultaneously achieving efficiency and generalizability, HyperINR paves the way for applying INR in a wider array of scientific visualization applications.
Qi Wu, Dian-Yong Chen
In the present work, we investigate the production of the newly observed $P^Λ_{ψs}(4338)$ state in $Ξ_b^-$ decay, where the $P^Λ_{ψs}(4338)$ is assigned as a $Ξ_c \bar{D}$ molecular state. By using an effective Lagrangian approach, we evaluate the branching fractions of $Ξ_b^-\rightarrow P^Λ_{ψs}(4338)K^-$ via the triangle loop mechanism. The branching fractions of $Ξ_b^-\rightarrow P^Λ_{ψs}(4338)K^-$ are in the order of $10^{-4}$; the result is compared with our previous work of $Ξ_b^-\rightarrow P^Λ_{ψs}(4459)K^-$. We also predict the ratio of $P^Λ_{ψs}(4459)$ and $P^Λ_{ψs}(4338)$ productions in the decay $Ξ_b^- \to P^Λ_{ψs} K^- \to J/ψΛK^-$. The predicted branching fractions and their ratios could be tested experimentally, which may be helpful for understanding the molecular picture of $P^Λ_{ψs}(4338)$ as well as other hidden-charm pentaquark states with strangeness. Moreover, the experimental potential of observing $P^Λ_{ψs}(4338)$ in the $Ξ^-_b\to K^- J/ψΛ$ is discussed.
Qi Wu, Yubo Zhao, Yifan Wang, Xinhang Liu, Yu-Wing Tai, Chi-Keung Tang
While previous approaches to 3D human motion generation have achieved notable success, they often rely on extensive training and are limited to specific tasks. To address these challenges, we introduce Motion-Agent, an efficient conversational framework designed for general human motion generation, editing, and understanding. Motion-Agent employs an open-source pre-trained language model to develop a generative agent, MotionLLM, that bridges the gap between motion and text. This is accomplished by encoding and quantizing motions into discrete tokens that align with the language model's vocabulary. With only 1--3\% of the model's parameters fine-tuned using adapters, MotionLLM delivers performance on par with diffusion models and other transformer-based methods trained from scratch. By integrating MotionLLM with GPT-4 without additional training, Motion-Agent is able to generate highly complex motion sequences through multi-turn conversations, a capability that previous models have struggled to achieve. Motion-Agent supports a wide range of motion-language tasks, offering versatile capabilities for generating and customizing human motion through interactive conversational exchanges. Project page: https://knoxzhao.github.io/Motion-Agent
Peng Wang, Qi Wu, Chunhua Shen, Anton van den Hengel
One of the most intriguing features of the Visual Question Answering (VQA) challenge is the unpredictability of the questions. Extracting the information required to answer them demands a variety of image operations from detection and counting, to segmentation and reconstruction. To train a method to perform even one of these operations accurately from {image,question,answer} tuples would be challenging, but to aim to achieve them all with a limited set of such training data seems ambitious at best. We propose here instead a more general and scalable approach which exploits the fact that very good methods to achieve these operations already exist, and thus do not need to be trained. Our method thus learns how to exploit a set of external off-the-shelf algorithms to achieve its goal, an approach that has something in common with the Neural Turing Machine. The core of our proposed method is a new co-attention model. In addition, the proposed approach generates human-readable reasons for its decision, and can still be trained end-to-end without ground truth reasons being given. We demonstrate the effectiveness on two publicly available datasets, Visual Genome and VQA, and show that it produces the state-of-the-art results in both cases.
Qi Wu, Yingguang Yang, hao liu, Hao Peng, Buyun He, Yutong Xia, Yong Liao
Social bot detection is crucial for mitigating misinformation, online manipulation, and coordinated inauthentic behavior. While existing neural network-based detectors perform well on benchmarks, they struggle with generalization due to distribution shifts across datasets and frequently produce overconfident predictions for out-of-distribution accounts beyond the training data. To address this, we introduce a novel Uncertainty Estimation for Social Bot Detection (UESBD) framework, which quantifies the predictive uncertainty of detectors beyond mere classification. For this task, we propose Robust Multi-modal Neural Processes (RMNP), which aims to enhance the robustness of multi-modal neural processes to modality inconsistencies caused by social bot camouflage. RMNP first learns unimodal representations through modality-specific encoders. Then, unimodal attentive neural processes are employed to encode the Gaussian distribution of unimodal latent variables. Furthermore, to avoid social bots stealing human features to camouflage themselves thus causing certain modalities to provide conflictive information, we introduce an evidential gating network to explicitly model the reliability of modalities. The joint latent distribution is learned through the generalized product of experts, which takes the reliability of each modality into consideration during fusion. The final prediction is obtained through Monte Carlo sampling of the joint latent distribution followed by a decoder. Experiments on three real-world benchmarks show the effectiveness of RMNP in classification and uncertainty estimation, as well as its robustness to modality conflicts.
Qi Wu, Zhong-Quan Sun, Dian-Yong Chen, Shi-Dong Liu, Gang Li
In this work, we investigate the dipion transition processes $X(3872)\to ππχ_{cJ} (J=0,1,2)$ within the framework of heavy hadron chiral perturbation theory, treating $X(3872)$ as a molecular state composed of $D\bar{D}^*$+ H.c. components. By analyzing the box and triangle loop diagrams with the nonrelativistic effective field theory power-counting rule, we demonstrate that box diagrams dominate these dipion transition processes. Branching ratios are calculated as functions of the mixing angle $θ$, which parametrizes the neutral and charged meson compositions of the $X(3872)$. Our results indicate that the branching fractions for $X(3872)\toππχ_{c0}$, $X(3872)\to ππχ_{c1}$, and $X(3872)\to ππχ_{c2}$ are of the orders of $10^{-4}$, $10^{-3}$, and $10^{-5}$, respectively. We also predict the ratios ${\mathcal{B}[X(3872)\rightarrow ππχ_{c0/2}]}/{\mathcal{B}[X(3872)\rightarrow ππχ_{c1}]}$ and ${\mathcal{B}[X(3872)\rightarrow π^+π^-χ_{cJ}]}/{\mathcal{B}[X(3872)\rightarrow π^0π^0χ_{cJ}]}$. The latter deviates from isospin-symmetry expectations, revealing various degrees of isospin violation. By studying the $π^+π^-$ and $π^+χ_{cJ}$ invariant mass spectra, we find a double-bump structure in the $π^ + π^-$ invariant mass distributions of the process $X(3872)\to π^+π^-χ_{c1}$ and $π^+χ_{c0}$ invariant mass distribution of the process $X(3872)\to π^+π^-χ_{c0}$, which could be tested by future experimental measurements.
Qi Wu, Chao Fang, Jiayuan Chen, Ye Lin, Yueqi Zhang, Yichuan Bai, Yuan Du, Li Du
Mixture-of-Experts (MoE) models facilitate edge deployment by decoupling model capacity from active computation, yet their large memory footprint drives the need for GPU systems with near-data processing (NDP) capabilities that offload experts to dedicated processing units. However, deploying MoE models on such edge-based GPU-NDP systems faces three critical challenges: 1) severe load imbalance across NDP units due to non-uniform expert selection and expert parallelism, 2) insufficient GPU utilization during expert computation within NDP units, and 3) extensive data pre-profiling necessitated by unpredictable expert activation patterns for pre-fetching. To address these challenges, this paper proposes an efficient inference framework featuring three key optimizations. First, the underexplored tensor parallelism in MoE inference is exploited to partition and compute large expert parameters across multiple NDP units simultaneously towards edge low-batch scenarios. Second, a load-balancing-aware scheduling algorithm distributes expert computations across NDP units and GPU to maximize resource utilization. Third, a dataset-free pre-fetching strategy proactively loads frequently accessed experts to minimize activation delays. Experimental results show that our framework enables GPU-NDP systems to achieve 2.41x on average and up to 2.56x speedup in end-to-end latency compared to state-of-the-art approaches, significantly enhancing MoE inference efficiency in resource-constrained environments.
Daniel Zavorotny, Qi Wu, David Bauer, Kwan-Liu Ma
Machine learning has enabled the use of implicit neural representations (INRs) to efficiently compress and reconstruct massive scientific datasets. However, despite advances in fast INR rendering algorithms, INR-based rendering remains computationally expensive, as computing data values from an INR is significantly slower than reading them from GPU memory. This bottleneck currently restricts interactive INR visualization to professional workstations. To address this challenge, we introduce an INR rendering framework accelerated by a scalable, multi-resolution GPU cache capable of efficiently representing tera-scale datasets. By minimizing redundant data queries and prioritizing novel volume regions, our method reduces the number of INR computations per frame, achieving an average 5x speedup over the state-of-the-art INR rendering method while still maintaining high visualization quality. Coupled with existing hardware-accelerated INR compressors, our framework enables scientists to generate and compress massive datasets in situ on high-performance computing platforms and then interactively explore them on consumer-grade hardware post hoc.
Ingo Wald, Stefan Zellmann, Jefferson Amstutz, Qi Wu, Kevin Griffin, Milan Jaros, Stefan Wesner
We propose and discuss a paradigm that allows for expressing \emph{data-parallel} rendering with the classically non-parallel ANARI API. We propose this as a new standard for data-parallel sci-vis rendering, describe two different implementations of this paradigm, and use multiple sample integrations into existing apps to show how easy it is to adopt this paradigm, and what can be gained from doing so.