Jun Wang, Jiamu Zhou, Muning Wen, Xiaoyun Mo, Haoyu Zhang, Qiqiang Lin, Cheng Jin, Xihuai Wang, Weinan Zhang, Qiuying Peng, Jun Wang
Evaluating the performance of LLMs in multi-turn human-agent interactions presents significant challenges, particularly due to the complexity and variability of user behavior. In this paper, we introduce HammerBench, a novel benchmark framework for assessing LLMs' function-calling capabilities in real-world, multi-turn dialogues. HammerBench simulates diverse mobile assistant use cases, incorporating imperfect instructions, dynamic question-answer trajectories, intent and argument shifts, and the indirect use of external information through pronouns. To construct this benchmark, we curate a comprehensive dataset derived from popular mobile app functionalities and anonymized user logs, complemented by a cost-effective data generation pipeline leveraging open-source models. HammerBench is further augmented with fine-grained interaction snapshots and metrics, enabling detailed evaluation of function-calling performance across individual conversational turns. We demonstrate the effectiveness of HammerBench by evaluating several leading LLMs and uncovering key performance trends. Our experiments reveal that different types of parameter name errors are a significant source of failure across different interaction scenarios, highlighting critical areas for further improvement in LLM robustness for mobile assistant applications.
Leslie Greengard, Shidong Jiang, Jun Wang
Two fundamental difficulties are encountered in the numerical evaluation of time-dependent layer potentials. One is the quadratic cost of history dependence, which has been successfully addressed by splitting the potentials into two parts - a local part that contains the most recent contributions and a history part that contains the contributions from all earlier times. The history part is smooth, easily discretized using high-order quadratures, and straightforward to compute using a variety of fast algorithms. The local part, however, involves complicated singularities in the underlying Green's function. Existing methods, based on exchanging the order of integration in space and time, are able to achieve high order accuracy, but are limited to the case of stationary boundaries. Here, we present a new quadrature method that leaves the order of integration unchanged, making use of a change of variables that converts the singular integrals with respect to time into smooth ones. We have also derived asymptotic formulas for the local part that lead to fast and accurate hybrid schemes, extending earlier work for scalar heat potentials and applicable to moving boundaries. The performance of the overall scheme is demonstrated via numerical examples.
Jun Wang, Yinglu Liu, Yibo Hu, Hailin Shi, Tao Mei
Deep learning based face recognition has achieved significant progress in recent years. Yet, the practical model production and further research of deep face recognition are in great need of corresponding public support. For example, the production of face representation network desires a modular training scheme to consider the proper choice from various candidates of state-of-the-art backbone and training supervision subject to the real-world face recognition demand; for performance analysis and comparison, the standard and automatic evaluation with a bunch of models on multiple benchmarks will be a desired tool as well; besides, a public groundwork is welcomed for deploying the face recognition in the shape of holistic pipeline. Furthermore, there are some newly-emerged challenges, such as the masked face recognition caused by the recent world-wide COVID-19 pandemic, which draws increasing attention in practical applications. A feasible and elegant solution is to build an easy-to-use unified framework to meet the above demands. To this end, we introduce a novel open-source framework, named FaceX-Zoo, which is oriented to the research-development community of face recognition. Resorting to the highly modular and scalable design, FaceX-Zoo provides a training module with various supervisory heads and backbones towards state-of-the-art face recognition, as well as a standardized evaluation module which enables to evaluate the models in most of the popular benchmarks just by editing a simple configuration. Also, a simple yet fully functional face SDK is provided for the validation and primary application of the trained models. Rather than including as many as possible of the prior techniques, we enable FaceX-Zoo to easily upgrade and extend along with the development of face related domains. The source code and models are available at https://github.com/JDAI-CV/FaceX-Zoo.
Jun Wang, Max W. Y. Lam, Dan Su, Dong Yu
We study the cocktail party problem and propose a novel attention network called Tune-In, abbreviated for training under negative environments with interference. It firstly learns two separate spaces of speaker-knowledge and speech-stimuli based on a shared feature space, where a new block structure is designed as the building block for all spaces, and then cooperatively solves different tasks. Between the two spaces, information is cast towards each other via a novel cross- and dual-attention mechanism, mimicking the bottom-up and top-down processes of a human's cocktail party effect. It turns out that substantially discriminative and generalizable speaker representations can be learnt in severely interfered conditions via our self-supervised training. The experimental results verify this seeming paradox. The learnt speaker embedding has superior discriminative power than a standard speaker verification method; meanwhile, Tune-In achieves remarkably better speech separation performances in terms of SI-SNRi and SDRi consistently in all test modes, and especially at lower memory and computational consumption, than state-of-the-art benchmark systems.
Jun Wang, Xuezhi Zhao
Let $M$ be a subset of vector space or projective space. The authors define the \emph{generalized configuration space} of $M$ which is formed by $n$-tuples of elements of $M$ where any $k$ elements of each $n$-tuple are linearly independent. The \emph{generalized configuration space} gives a generalization of the classical configuration space defined by E.Fadell. Denote the \emph{generalized configuration space} of $M$ by $W_{k,n}(M)$. The authors are mainly interested in the calculation about the homotopy groups of generalized configuration space. This article gives the fundamental groups of generalized configuration spaces of $\mathbb{R}P^m$ for some special cases, and the connections between the homotopy groups of generalized configuration spaces of $S^m$ and the homotopy groups of Stiefel manifolds. It is also proved that the higher homotopy groups of generalized configuration spaces $W_{k,n}(S^m)$ and $W_{k,n}(\mathbb{R}P^m)$ are isomorphic.
Jun Wang
Let $X$ be any smooth Deligne-Mumford stack with projective coarse moduli, and $Y$ be a smooth complete intersection in $X$ associated with a direct sum of semi-positive line bundles. We will introduce a useful and broad class known as admissible series for discussing quantum Lefschetz theorem. For any admissible series on the Givental's Lagrangian cone of $X$, we will show that a hypergeometric modification of the series lies on the Lagrangian cone of $Y$. This confirms a prediction from Coates-Corti-Iritani-Tseng about the genus zero quantum Lefschetz theorem beyond convexity. In our quantum Lefschetz theorem, we use extended variables to formulate the hypergeometric modification, which may be of self-independent interest.
Wenlin Li, Yucheng Xu, Xiaoqing Zheng, Suoya Han, Jun Wang, Xiaobo Sun
Sparse and noisy images (SNIs), like those in spatial gene expression data, pose significant challenges for effective representation learning and clustering, which are essential for thorough data analysis and interpretation. In response to these challenges, we propose Dual Advancement of Representation Learning and Clustering (DARLC), an innovative framework that leverages contrastive learning to enhance the representations derived from masked image modeling. Simultaneously, DARLC integrates cluster assignments in a cohesive, end-to-end approach. This integrated clustering strategy addresses the "class collision problem" inherent in contrastive learning, thus improving the quality of the resulting representations. To generate more plausible positive views for contrastive learning, we employ a graph attention network-based technique that produces denoised images as augmented data. As such, our framework offers a comprehensive approach that improves the learning of representations by enhancing their local perceptibility, distinctiveness, and the understanding of relational semantics. Furthermore, we utilize a Student's t mixture model to achieve more robust and adaptable clustering of SNIs. Extensive experiments, conducted across 12 different types of datasets consisting of SNIs, demonstrate that DARLC surpasses the state-of-the-art methods in both image clustering and generating image representations that accurately capture gene interactions. Code is available at https://github.com/zipging/DARLC.
Jun Wang
OpenAI o1 has shown that applying reinforcement learning to integrate reasoning steps directly during inference can significantly improve a model's reasoning capabilities. This result is exciting as the field transitions from the conventional autoregressive method of generating answers to a more deliberate approach that models the slow-thinking process through step-by-step reasoning training. Reinforcement learning plays a key role in both the model's training and decoding processes. In this article, we present a comprehensive formulation of reasoning problems and investigate the use of both model-based and model-free approaches to better support this slow-thinking framework.
Yiannis Kantaros, Jun Wang
This paper addresses the problem of learning optimal control policies for systems with uncertain dynamics and high-level control objectives specified as Linear Temporal Logic (LTL) formulas. Uncertainty is considered in the workspace structure and the outcomes of control decisions giving rise to an unknown Markov Decision Process (MDP). Existing reinforcement learning (RL) algorithms for LTL tasks typically rely on exploring a product MDP state-space uniformly (using e.g., an $ε$-greedy policy) compromising sample-efficiency. This issue becomes more pronounced as the rewards get sparser and the MDP size or the task complexity increase. In this paper, we propose an accelerated RL algorithm that can learn control policies significantly faster than competitive approaches. Its sample-efficiency relies on a novel task-driven exploration strategy that biases exploration towards directions that may contribute to task satisfaction. We provide theoretical analysis and extensive comparative experiments demonstrating the sample-efficiency of the proposed method. The benefit of our method becomes more evident as the task complexity or the MDP size increases.
Jun Wang, Zhi Qiao, Wenlong Zhang, Suyi Li
Over the past decades, we have witnessed a rapid emergence of soft and reconfigurable robots thanks to their capability to interact safely with humans and adapt to complex environments. However, their softness makes accurate control very challenging. High-fidelity sensing is critical in improving control performance, especially posture and contact estimation. To this end, traditional camera-based sensors and load cells have limited portability and accuracy, and they will inevitably increase the robot's cost and weight. In this study, instead of using specialized sensors, we only collect distributed pressure data inside a pneumatics-driven soft arm and apply the physical reservoir computing principle to simultaneously predict its kinematic posture (i.e., bending angle) and payload status (i.e., payload mass). Our results show that, with careful readout training, one can obtain accurate bending angle and payload mass predictions via simple, weighted linear summations of pressure readings. In addition, our comparative analysis shows that, to guarantee low prediction errors within 10\%, bending angle prediction requires less training data than payload prediction. This result reveals that balanced linear and nonlinear body dynamics are critical for the physical reservoir to accomplish complex proprioceptive and exteroceptive information perception tasks. Finally, the method of exploring the most efficient readout training methods presented in this paper could be extended to other soft robotic systems to maximize their perception capabilities.
Jun Wang, Zhaoheng Guo, Erik Isele, Philip H. Bucksbaum, Agostino Marinelli, James P. Cryan, Taran Driver
We present a comprehensive framework of modeling covariance in angular streaking experiments. Within the impulsive streaking regime, the displacement of electron momentum distribution (MD) provides a tight connection between the dressing-free MD and the dressed MD. Such connection establishes universal structures in the composition of streaking covariance that are common across different MDs, regardless of their exact shape. Building on this robust framework, we have developed methods for retrieving temporal information from angular streaking measurements. By providing a detailed understanding of the covariance structure in angular streaking experiments, our work enables more accurate and robust temporal measurements in a wide range of experimental scenarios.
Jun Wang
In order to determine the sparse approximation function which has a direct metric relationship with the $\ell_{0}$ quasi-norm, we introduce a wonderful triangle whose sides are composed of $\Vert \mathbf{x} \Vert_{0}$, $\Vert \mathbf{x} \Vert_{1}$ and $\Vert \mathbf{x} \Vert_{\infty}$ for any non-zero vector $\mathbf{x} \in \mathbb{R}^{n}$ by delving into the iterative soft-thresholding operator in this paper. Based on this triangle, we deduce the ratio $\ell_{1}$ and $\ell_{\infty}$ norms as a sparsity-promoting objective function for sparse signal reconstruction and also try to give the sparsity interval of the signal. Considering the $\ell_{1}/\ell_{\infty}$ minimization from a angle $β$ of the triangle corresponding to the side whose length is $\Vert \mathbf{x} \Vert_{\infty} - \Vert \mathbf{x} \Vert_{1}/\Vert \mathbf{x} \Vert_{0}$, we finally demonstrate the performance of existing $\ell_{1}/\ell_{\infty}$ algorithm by comparing it with $\ell_{1}/\ell_{2}$ algorithm.
Jun Wang
Building damage detection after natural disasters like earthquakes is crucial for initiating effective emergency response actions. Remotely sensed very high spatial resolution (VHR) imagery can provide vital information due to their ability to map the affected buildings with high geometric precision. However, we suffer from suboptimal performances in detecting damaged buildings due to earthquakes. This paper presents a novel superpixel based approach incorporates Deep Neural Networks (DNN) with a modified segmentation method, for more precise building damage detection from VHR imagery. Firstly, a modified Fast Scanning and Adaptive Merging method is extended to create initial over-segmentation. Secondly, the segments are properly merged based on the Region Adjacent Graph (RAG). Thirdly, a pre-trained DNN using Stacked Denoising Auto-Encoders (SDAE-DNN) is presented, to exploit the rich semantic features for building damage detection. Experimental results on a WorldView-2 imagery from Nepal Earthquake of 2015 demonstrate the feasibility and effectiveness of our method, which could boost detection accuracy through learning more intrinsic and discriminative features, which outperforms other methods using alternative classifiers.
Jun Wang, Suyi Li
Soft robots have become increasingly popular for complex manipulation tasks requiring gentle and safe contact. However, their softness makes accurate control challenging, and high-fidelity sensing is a prerequisite to adequate control performance. To this end, many flexible and embedded sensors have been created over the past decade, but they inevitably increase the robot's complexity and stiffness. This study demonstrates a novel approach that uses simple bending strain gauges embedded inside a modular arm to extract complex information regarding its deformation and working conditions. The core idea is based on physical reservoir computing (PRC): A soft body's rich nonlinear dynamic responses, captured by the inter-connected bending sensor network, could be utilized for complex multi-modal sensing with a simple linear regression algorithm. Our results show that the soft modular arm reservoir can accurately predict body posture (bending angle), estimate payload weight, determine payload orientation, and even differentiate two payloads with only minimal difference in weight -- all using minimal digital computing power.
Jun Wang, M. J. Norden, P. Donker
LOFAR is a low-frequency array distributed across several European countries. Each LOFAR station contains thousands of antennas and associated electronics, making monitoring and thorough testing of those components essential to ensuring station reliability. This paper discusses various anomalies that may arise in LOFAR antennas, tile elements, modems, and summators. We also introduce two diagnostic pipelines designed to detect these anomalies: a real-time station monitoring system and an offline stationtest system. These pipelines provide valuable insights into the operational status of each antenna, issuing alerts to minimize observational disruptions while maximizing station uptime, reliability, and sensitivity. By enhancing the efficiency and stability of LOFAR stations, they also serve as a foundation for future large-scale arrays like SKA-Low. The experience gained from their development and deployment will contribute to the construction and maintenance of SKA-Low, improving monitoring and diagnostic capabilities for large-scale antenna networks. Ultimately, these systems play a crucial role in ensuring continuous observations and maintaining data integrity.
Jun Wang, Qiang Zhao
We carry out a combined study of the isospin-violating decay $D_{s}^{*} \to D_{s} π^{0}$ and radiative decay $D^*_s\to D_sγ$ in an effective Lagrangian approach by taking into account the corrections from the one-loop transitions. By distinguishing the transition mechanisms of the long-distance interactions through the intermediate meson loops from the short-distance interactions through the $η-π^{0}$ mixing at the tree level the isospin-violating decay $D_{s}^{*} \to D_{s} π^{0}$ can be well constrained. In our approach the higher order corrections to the isospin-violating effects can involve the intermediate $D^{(*)}$ and $K^{(*)}$ scatterings. We find that the contributions from the destructive interference of intermediate meson loops via $D^{(*)0}(c\bar{u}){K}^{(*)+}(u\bar{s})$ and $D^{(*)+}(c\bar{d}){K}^{(*)0}(d\bar{s})$ rescatterings are significant. Within the commonly accepted ultra-violet (UV) cutoff range we obtain the partial decay width $Γ[D_{s}^{*} \to D_{s} π^{0}] = 9.92^{+0.76}_{-0.66}\,\mathrm{eV}$. This approach allows us to describe the $D_s^*$ radiative decay in the same framework via the vector meson dominance (VMD) model. We demonstrate that both the tree-level and one-loop transitions can be self-consistently determined if we adopt the experimental data for the branching ratio fraction of $D_{s}^{*} \to D_{s} π^{0}$ to $D^*_s\to D_sγ$. It then leads to a reliable estimate of the total decay width of $D_s^*$, i.e. $Γ_{\text{total}}(D_s^{*+})=170^{+ 13}_{-12}\,\mathrm{eV}$.
Jun Wang, Hosein Hasanbeig, Kaiyuan Tan, Zihe Sun, Yiannis Kantaros
This paper addresses the problem of designing control policies for agents with unknown stochastic dynamics and control objectives specified using Linear Temporal Logic (LTL). Recent Deep Reinforcement Learning (DRL) algorithms have aimed to compute policies that maximize the satisfaction probability of LTL formulas, but they often suffer from slow learning performance. To address this, we introduce a novel Deep Q-learning algorithm that significantly improves learning speed. The enhanced sample efficiency stems from a mission-driven exploration strategy that prioritizes exploration towards directions likely to contribute to mission success. Identifying these directions relies on an automaton representation of the LTL task as well as a learned neural network that partially models the agent-environment interaction. We provide comparative experiments demonstrating the efficiency of our algorithm on robot navigation tasks in unseen environments.
Jun Wang, Xuezhi Zhao
In this paper, we obtain some sufficient conditions to guarantee the existence of multiple points of maps from $S^m$ to $\mathbb{R}^d$. Our main tool is the ideal-valued index of $G$-space defined by E. Fadell and S. Husseini. We obtain more detailed relative positional relationship of multiple points. It is proved that for a continuous real value function $f: S^m\rightarrow \mathbb{R}$ such that $f(-p)=-f(p)$, if $m+1$ is a power of $2$, then there are $m+1$ points $p_1, \ldots, p_{m+1}$ in $S^m$ such that $f(p_1)=\cdots=f(p_{m+1})$, where $p_1, \ldots, p_{m+1}$ are linearly dependent and any $m$ points of $p_1, \ldots, p_{m+1}$ are linearly independent. As a generalization of Hopf's theorem, we also prove that for any continuous map $f: S^m\rightarrow \mathbb{R}^d$, if $m> d$, then there exists a pair of mutually orthogonal points having the same image in addition to the antipodal points.
Jun Wang, Lixing Zhu, Xiaohan Yu, Abhir Bhalerao, Yulan He
Learning medical visual representations from image-report pairs through joint learning has garnered increasing research attention due to its potential to alleviate the data scarcity problem in the medical domain. The primary challenges stem from the lengthy reports that feature complex discourse relations and semantic pathologies. Previous works have predominantly focused on instance-wise or token-wise cross-modal alignment, often neglecting the importance of pathological-level consistency. This paper presents a novel framework PLACE that promotes the Pathological-Level Alignment and enriches the fine-grained details via Correlation Exploration without additional human annotations. Specifically, we propose a novel pathological-level cross-modal alignment (PCMA) approach to maximize the consistency of pathology observations from both images and reports. To facilitate this, a Visual Pathology Observation Extractor is introduced to extract visual pathological observation representations from localized tokens. The PCMA module operates independently of any external disease annotations, enhancing the generalizability and robustness of our methods. Furthermore, we design a proxy task that enforces the model to identify correlations among image patches, thereby enriching the fine-grained details crucial for various downstream tasks. Experimental results demonstrate that our proposed framework achieves new state-of-the-art performance on multiple downstream tasks, including classification, image-to-text retrieval, semantic segmentation, object detection and report generation. Code is available at https://github.com/Markin-Wang/PLACE.
Jun Wang, Ninglun Gu, Kailai Zhang, Zijiao Zhang, Yelun Bao, Jin Yang, Xu Yin, Liwei Liu, Yihuan Liu, Pengyong Li, Gary G. Yen, Junchi Yan
For Large Language Models (LLMs), a disconnect persists between benchmark performance and real-world utility. Current evaluation frameworks remain fragmented, prioritizing technical metrics while neglecting holistic assessment for deployment. This survey introduces an anthropomorphic evaluation paradigm through the lens of human intelligence, proposing a novel three-dimensional taxonomy: Intelligence Quotient (IQ)-General Intelligence for foundational capacity, Emotional Quotient (EQ)-Alignment Ability for value-based interactions, and Professional Quotient (PQ)-Professional Expertise for specialized proficiency. For practical value, we pioneer a Value-oriented Evaluation (VQ) framework assessing economic viability, social impact, ethical alignment, and environmental sustainability. Our modular architecture integrates six components with an implementation roadmap. Through analysis of 200+ benchmarks, we identify key challenges including dynamic assessment needs and interpretability gaps. It provides actionable guidance for developing LLMs that are technically proficient, contextually relevant, and ethically sound. We maintain a curated repository of open-source evaluation resources at: https://github.com/onejune2018/Awesome-LLM-Eval.