Jun Wang, Zhao-Yu Han, Song-Bo Wang, Zeyang Li, Liang-Zhu Mu, Heng Fan, Lei Wang
We propose a quantum tomography scheme for pure qudit systems which adopts random base measurements and generative learning methods, along with a built-in fidelity estimation approach to assess the reliability of the tomographic states. We prove the validity of the scheme theoretically, and we perform numerically simulated experiments on several target states including three typical quantum information states and randomly initiated states, demonstrating its efficiency and robustness. The number of replicas required by a certain convergence criterion grows in the manner of low-degree polynomial when the system scales, thus the scheme achieves high scalability that is crucial for practical quantum state tomography.
Yixuan Qiao, Shanshan Zhao, Jun Wang, Hao Chen, Tuozhen Liu, Xianbin Ye, Xin Tang, Rui Fang, Peng Gao, Wenfeng Xie, Guotong Xie
This paper describes the PASH participation in TREC 2021 Deep Learning Track. In the recall stage, we adopt a scheme combining sparse and dense retrieval method. In the multi-stage ranking phase, point-wise and pair-wise ranking strategies are used one after another based on model continual pre-trained on general knowledge and document-level data. Compared to TREC 2020 Deep Learning Track, we have additionally introduced the generative model T5 to further enhance the performance.
Shaoliang Yang, Jun Wang, Yunsheng Wang
We present AutoSiMP, an autonomous pipeline that transforms a natural-language structural problem description into a validated, binary topology without manual configuration. The pipeline comprises five modules: (1) an LLM-based configurator that parses a plain-English prompt into a validated specification of geometry, supports, loads, passive regions, and mesh parameters; (2) a boundary-condition generator producing solver-ready DOF arrays, force vectors, and passive-element masks; (3) a three-field SIMP solver with Heaviside projection and pluggable continuation control; (4) an eight-check structural evaluator (connectivity, compliance, grayness, volume fraction, convergence, plus three informational quality metrics); and (5) a closed-loop retry mechanism. We evaluate on three axes. Configuration accuracy: across 10 diverse problems the configurator produces valid specifications on all cases with a median compliance penalty of $+0.3\%$ versus expert ground truth. Controller comparison: on 17 benchmarks with six controllers sharing an identical sharpening tail, the LLM controller achieves the lowest median compliance but $76.5\%$ pass rate, while the deterministic schedule achieves $100\%$ pass rate at only $+1.5\%$ higher compliance. End-to-end reliability: with the schedule controller, all LLM-configured problems pass every quality check on the first attempt $-$ no retries needed. Among the systems surveyed in this work (Table 1), AutoSiMP is the first to close the full loop from natural-language problem description to validated structural topology. The complete codebase, all specifications, and an interactive web demo will be released upon journal acceptance.
Xiaowei Pang, Jun Wang
The author is mainly interest in the Gröbner-Shirshov bases of finite Coxeter groups. It is known that the finite Coxeter groups are classified in terms of Coxeter-Dynkin diagrams. Under the fixed order, it is worth mention that the presentation of group determines the Gröbner-Shirshov bases of group. In this paper, the author rearranges the generators, marks them on Coxeter-Dynkin diagrams, and gets a simple presentation of the Gröbner-Shirshov bases for Coxeter groups of types $G_2, F_4, E_6$ and $E_7$. This article also gives the Gröbner-Shirshov basis of Coxeter group of type $E_8$.
Jun Wang, Xiaohan Yu, Yongsheng Gao
The core for tackling the fine-grained visual categorization (FGVC) is to learn subtle yet discriminative features. Most previous works achieve this by explicitly selecting the discriminative parts or integrating the attention mechanism via CNN-based approaches.However, these methods enhance the computational complexity and make the modeldominated by the regions containing the most of the objects. Recently, vision trans-former (ViT) has achieved SOTA performance on general image recognition tasks. Theself-attention mechanism aggregates and weights the information from all patches to the classification token, making it perfectly suitable for FGVC. Nonetheless, the classifi-cation token in the deep layer pays more attention to the global information, lacking the local and low-level features that are essential for FGVC. In this work, we proposea novel pure transformer-based framework Feature Fusion Vision Transformer (FFVT)where we aggregate the important tokens from each transformer layer to compensate thelocal, low-level and middle-level information. We design a novel token selection mod-ule called mutual attention weight selection (MAWS) to guide the network effectively and efficiently towards selecting discriminative tokens without introducing extra param-eters. We verify the effectiveness of FFVT on three benchmarks where FFVT achieves the state-of-the-art performance.
Jun Wang, Yang Zhao, Linglong Qian, Xiaohan Yu, Yongsheng Gao
The precise detection of blood vessels in retinal images is crucial to the early diagnosis of the retinal vascular diseases, e.g., diabetic, hypertensive and solar retinopathies. Existing works often fail in predicting the abnormal areas, e.g, sudden brighter and darker areas and are inclined to predict a pixel to background due to the significant class imbalance, leading to high accuracy and specificity while low sensitivity. To that end, we propose a novel error attention refining network (ERA-Net) that is capable of learning and predicting the potential false predictions in a two-stage manner for effective retinal vessel segmentation. The proposed ERA-Net in the refine stage drives the model to focus on and refine the segmentation errors produced in the initial training stage. To achieve this, unlike most previous attention approaches that run in an unsupervised manner, we introduce a novel error attention mechanism which considers the differences between the ground truth and the initial segmentation masks as the ground truth to supervise the attention map learning. Experimental results demonstrate that our method achieves state-of-the-art performance on two common retinal blood vessel datasets.
Jun Wang
Weakly Supervised Object Detection (WSOD), aiming to train detectors with only image-level dataset, has arisen increasing attention for researchers. In this project, we focus on two-phase WSOD architecture which integrates a powerful detector with a pure WSOD model. We explore the effectiveness of some representative detectors utilized as the second-phase detector in two-phase WSOD and propose a two-phase WSOD architecture. In addition, we present a strategy to establish the pseudo ground truth (PGT) used to train the second-phase detector. Unlike previous works that regard top one bounding boxes as PGT, we consider more bounding boxes to establish the PGT annotations. This alleviates the insufficient learning problem caused by the low recall of PGT. We also propose some strategies to refine the PGT during the training of the second detector. Our strategies suspend the training in specific epoch, then refine the PGT by the outputs of the second-phase detector. After that, the algorithm continues the training with the same gradients and weights as those before suspending. Elaborate experiments are conduceted on the PASCAL VOC 2007 dataset to verify the effectiveness of our methods. As results demonstrate, our two-phase architecture improves the mAP from 49.17% to 53.21% compared with the single PCL model. Additionally, the best PGT generation strategy obtains a 0.7% mAP increment. Our best refinement strategy boosts the performance by 1.74% mAP. The best results adopting all of our methods achieve 55.231% mAP which is the state-of-the-art performance.
Max W. Y. Lam, Jun Wang, Dan Su, Dong Yu
Deep-learning based speech separation models confront poor generalization problem that even the state-of-the-art models could abruptly fail when evaluating them in mismatch conditions. To address this problem, we propose an easy-to-implement yet effective consistency based semi-supervised learning (SSL) approach, namely Mixup-Breakdown training (MBT). It learns a teacher model to "breakdown" unlabeled inputs, and the estimated separations are interpolated to produce more useful pseudo "mixup" input-output pairs, on which the consistency regularization could apply for learning a student model. In our experiment, we evaluate MBT under various conditions with ascending degrees of mismatch, including unseen interfering speech, noise, and music, and compare MBT's generalization capability against state-of-the-art supervised learning and SSL approaches. The result indicates that MBT significantly outperforms several strong baselines with up to 13.77% relative SI-SNRi improvement. Moreover, MBT only adds negligible computational overhead to standard training schemes.
Jun Wang, Zhaoyang Yin
In this paper, we study the following nonlinear magnetic Schrödinger equation with logarithmic nonlinearity \begin{equation*} -(\nabla+iA(x))^2u+λV(x)u =|u|^{q-2}u+u\log |u|^2,\ u\in H^1(\mathbb{R}^N,\mathbb{C}), \end{equation*} where the magnetic potential $A \in L_{l o c}^2\left(\mathbb{R}^N, \mathbb{R}^N\right)$, $2<q<2^*,\ λ>0$ is a parameter and the nonnegative continuous function $V: \mathbb{R}^N \rightarrow \mathbb{R}$ has the deepening potential well. Using the variational methods, we obtain that the equation has at least $2^k-1$ multi-bump solutions when $λ>0$ is large enough.
Jun Wang, Patrick Ng, Alexander Hanbo Li, Jiarong Jiang, Zhiguo Wang, Ramesh Nallapati, Bing Xiang, Sudipta Sengupta
Most recent research on Text-to-SQL semantic parsing relies on either parser itself or simple heuristic based approach to understand natural language query (NLQ). When synthesizing a SQL query, there is no explicit semantic information of NLQ available to the parser which leads to undesirable generalization performance. In addition, without lexical-level fine-grained query understanding, linking between query and database can only rely on fuzzy string match which leads to suboptimal performance in real applications. In view of this, in this paper we present a general-purpose, modular neural semantic parsing framework that is based on token-level fine-grained query understanding. Our framework consists of three modules: named entity recognizer (NER), neural entity linker (NEL) and neural semantic parser (NSP). By jointly modeling query and database, NER model analyzes user intents and identifies entities in the query. NEL model links typed entities to schema and cell values in database. Parser model leverages available semantic information and linking results and synthesizes tree-structured SQL queries based on dynamically generated grammar. Experiments on SQUALL, a newly released semantic parsing dataset, show that we can achieve 56.8% execution accuracy on WikiTableQuestions (WTQ) test set, which outperforms the state-of-the-art model by 2.7%.
Wei Liu, Haozhao Wang, Jun Wang, Ruixuan Li, Xinyang Li, Yuankai Zhang, Yang Qiu
Rationalization is to employ a generator and a predictor to construct a self-explaining NLP model in which the generator selects a subset of human-intelligible pieces of the input text to the following predictor. However, rationalization suffers from two key challenges, i.e., spurious correlation and degeneration, where the predictor overfits the spurious or meaningless pieces solely selected by the not-yet well-trained generator and in turn deteriorates the generator. Although many studies have been proposed to address the two challenges, they are usually designed separately and do not take both of them into account. In this paper, we propose a simple yet effective method named MGR to simultaneously solve the two problems. The key idea of MGR is to employ multiple generators such that the occurrence stability of real pieces is improved and more meaningful pieces are delivered to the predictor. Empirically, we show that MGR improves the F1 score by up to 20.9% as compared to state-of-the-art methods. Codes are available at https://github.com/jugechengzi/Rationalization-MGR .
Jun Wang, Suyi Li
In this paper, we experimentally examine the cognitive capability of a simple, paper-based Miura-ori -- using the physical reservoir computing framework -- to achieve different information perception tasks. The body dynamics of Miura-ori (aka. its vertices displacements), which is excited by a simple harmonic base excitation, can be exploited as the reservoir computing resource. By recording these dynamics with a high-resolution camera and image processing program and then using linear regression for training, we show that the origami reservoir has sufficient computing capacity to estimate the weight and position of a payload. It can also recognize the input frequency and magnitude patterns. Furthermore, multitasking is achievable by simultaneously applying two targeted functions to the same reservoir state matrix. Therefore, we demonstrate that Miura-ori can assess the dynamic interactions between its body and ambient environment to extract meaningful information -- an intelligent behavior in the mechanical domain. Given that Miura-ori has been widely used to construct deployable structures, lightweight materials, and compliant robots, enabling such information perception tasks can add a new dimension to the functionality of such a versatile structure.
Jun Wang, Yue Song, David John Hill, Yunhe Hou
In this letter, we analytically investigate the sensitivity of stability index to its dependent variables in general power systems. Firstly, we give a small-signal model, the stability index is defined as the solution to a semidefinite program (SDP) based on the related Lyapunov equation. In case of stability, the stability index also characterizes the convergence rate of the system after disturbances. Then, by leveraging the duality of SDP, we deduce an analytical formula of the stability sensitivity to any entries of the system Jacobian matrix in terms of the SDP primal and dual variables. Unlike the traditional numerical perturbation method, the proposed sensitivity evaluation method is more accurate with a much lower computational burden. This letter applies a modified microgrid for comparative case studies. The results reveal the significant improvements on the accuracy and computational efficiency of stability sensitivity evaluation.
Jun Wang
In this paper, we propose a novel architecture for direct extractive speech-to-speech summarization, ESSumm, which is an unsupervised model without dependence on intermediate transcribed text. Different from previous methods with text presentation, we are aimed at generating a summary directly from speech without transcription. First, a set of smaller speech segments are extracted based on speech signal's acoustic features. For each candidate speech segment, a distance-based summarization confidence score is designed for latent speech representation measure. Specifically, we leverage the off-the-shelf self-supervised convolutional neural network to extract the deep speech features from raw audio. Our approach automatically predicts the optimal sequence of speech segments that capture the key information with a target summary length. Extensive results on two well-known meeting datasets (AMI and ICSI corpora) show the effectiveness of our direct speech-based method to improve the summarization quality with untranscribed data. We also observe that our unsupervised speech-based method even performs on par with recent transcript-based summarization approaches, where extra speech recognition is required.
Jun Wang, Jing-Yu Pan, Ya-Bo Zhao, Jun Xiong, Hai-Bo Wang
Sep 21, 2022·quant-ph·PDF We present a novel cavity opto-magno-mechanical hybrid system to generate entanglements among multiple quantum carriers, such as magnons, mechanical resonators, and cavity photons in both the optical and microwave domains. Two Yttrium iron garnet (YIG) spheres are embedded in two separate microwave cavities which are joined by a communal mechanical resonator. Because the microwave cavities are separate, the ferromagnetic resonate frequencies of two YIG spheres can be tuned independently, as well as the cavity frequencies. We show that entanglement can be achieved with experimentally reachable parameters. The entanglement is robust against environmental thermal noise, owing to the mechanical cooling process achieved by the optical cavity. The maximum entanglement among different carriers is achieved by optimizing the parameters of the system. The individual tunability of the separated cavities allows us to independently control the entanglement properties of different subsystems and establish quantum channels with different entanglement properties in one system. This work could provide promising applications in quantum metrology and quantum information tasks.
Wei Liu, Jun Wang, Haozhao Wang, Ruixuan Li, Yang Qiu, YuanKai Zhang, Jie Han, Yixiong Zou
A self-explaining rationalization model is generally constructed by a cooperative game where a generator selects the most human-intelligible pieces from the input text as rationales, followed by a predictor that makes predictions based on the selected rationales. However, such a cooperative game may incur the degeneration problem where the predictor overfits to the uninformative pieces generated by a not yet well-trained generator and in turn, leads the generator to converge to a sub-optimal model that tends to select senseless pieces. In this paper, we theoretically bridge degeneration with the predictor's Lipschitz continuity. Then, we empirically propose a simple but effective method named DR, which can naturally and flexibly restrain the Lipschitz constant of the predictor, to address the problem of degeneration. The main idea of DR is to decouple the generator and predictor to allocate them with asymmetric learning rates. A series of experiments conducted on two widely used benchmarks have verified the effectiveness of the proposed method. Codes: \href{https://github.com/jugechengzi/Rationalization-DR}{https://github.com/jugechengzi/Rationalization-DR}.
Jun Wang, Ying-Chang Liang, Sumei Sun
Symbiotic radio (SR) is a promising technique to support cellular Internet-of-Things (IoT) by forming a mutualistic relationship between IoT and cellular transmissions. In this paper, we propose a novel multi-user multi-IoT-device SR system to enable massive access in cellular IoT. In the considered system, the base station (BS) transmits information to multiple cellular users, and a number of IoT devices simultaneously backscatter their information to these users via the cellular signal. The cellular users jointly decode the information from the BS and IoT devices. Noting that the reflective links from the IoT devices can be regarded as the channel uncertainty of the direct links, we apply the robust design method to design the beamforming vectors at the BS. Specifically, the transmit power is minimized under the cellular transmission outage probability constraints and IoT transmission sum rate constraints. The algorithm based on semi-definite programming and difference-of-convex programming is proposed to solve the power minimization problem. Moreover, we consider a special case where each cellular user is associated with several adjacent IoT devices and propose a direction of arrival (DoA)-based transmit beamforming design approach. The DoA-based approach requires only the DoA and angular spread (AS) of the direct links instead of the instantaneous channel state information (CSI) of the reflective link channels, leading to a significant reduction in the channel feedback overhead. Simulation results have substantiated the multi-user multi-IoT-device SR system and the effectiveness of the proposed beamforming approaches. It is shown that the DoA-based beamforming approach achieves comparable performance as the CSI-based approach in the special case when the ASs are small.
Jun Wang, Gang Li, Hao Zhang, Xiqin Wang
Orthogonal matching pursuit (OMP) is a canonical greedy algorithm for sparse signal reconstruction. When the signal of interest is block sparse, i.e., it has nonzero coefficients occurring in clusters, the block version of OMP algorithm (i.e., Block OMP) outperforms the conventional OMP. In this paper, we demonstrate that a new notion of block restricted isometry property (Block RIP), which is less stringent than standard restricted isometry property (RIP), can be used for a very straightforward analysis of Block OMP. It is demonstrated that Block OMP can exactly recover any block K-sparse signal in no more than K steps if the Block RIP of order K+1 with a sufficiently small isometry constant is satisfied. Using this result it can be proved that Block OMP can yield better reconstruction properties than the conventional OMP when the signal is block sparse.
Jie Zheng, Ru Wen, Haiqin Hu, Lina Wei, Kui Su, Wei Chen, Chen Liu, Jun Wang
Existing Masked Image Modeling (MIM) depends on a spatial patch-based masking-reconstruction strategy to perceive objects'features from unlabeled images, which may face two limitations when applied to chest CT: 1) inefficient feature learning due to complex anatomical details presented in CT images, and 2) suboptimal knowledge transfer owing to input disparity between upstream and downstream models. To address these issues, we propose a new MIM method named Tissue-Contrastive Semi-Masked Autoencoder (TCS-MAE) for modeling chest CT images. Our method has two novel designs: 1) a tissue-based masking-reconstruction strategy to capture more fine-grained anatomical features, and 2) a dual-AE architecture with contrastive learning between the masked and original image views to bridge the gap of the upstream and downstream models. To validate our method, we systematically investigate representative contrastive, generative, and hybrid self-supervised learning methods on top of tasks involving segmenting pneumonia, mediastinal tumors, and various organs. The results demonstrate that, compared to existing methods, our TCS-MAE more effectively learns tissue-aware representations, thereby significantly enhancing segmentation performance across all tasks.
Wei Liu, Zhiying Deng, Zhongyu Niu, Jun Wang, Haozhao Wang, YuanKai Zhang, Ruixuan Li
An important line of research in the field of explainability is to extract a small subset of crucial rationales from the full input. The most widely used criterion for rationale extraction is the maximum mutual information (MMI) criterion. However, in certain datasets, there are spurious features non-causally correlated with the label and also get high mutual information, complicating the loss landscape of MMI. Although some penalty-based methods have been developed to penalize the spurious features (e.g., invariance penalty, intervention penalty, etc) to help MMI work better, these are merely remedial measures. In the optimization objectives of these methods, spurious features are still distinguished from plain noise, which hinders the discovery of causal rationales. This paper aims to develop a new criterion that treats spurious features as plain noise, allowing the model to work on datasets rich in spurious features as if it were working on clean datasets, thereby making rationale extraction easier. We theoretically observe that removing either plain noise or spurious features from the input does not alter the conditional distribution of the remaining components relative to the task label. However, significant changes in the conditional distribution occur only when causal features are eliminated. Based on this discovery, the paper proposes a criterion for \textbf{M}aximizing the \textbf{R}emaining \textbf{D}iscrepancy (MRD). Experiments on six widely used datasets show that our MRD criterion improves rationale quality (measured by the overlap with human-annotated rationales) by up to $10.4\%$ as compared to several recent competitive MMI variants. Code: \url{https://github.com/jugechengzi/Rationalization-MRD}.