Yu Zhao, Joohyun Lee, Wei Chen
This paper proposes a learning algorithm to find a scheduling policy that achieves an optimal delay-power trade-off in communication systems. Reinforcement learning (RL) is used to minimize the expected latency for a given energy constraint where the environments such as traffic arrival rates or channel conditions can change over time. For this purpose, this problem is formulated as an infinite-horizon Markov Decision Process (MDP) with constraints. To handle the constrained optimization problem, we adopt the Lagrangian relaxation technique to solve it. Then, we propose a variant of Q-learning, Q-greedyUCB that combines Q-learning for \emph{average} reward algorithm and Upper Confidence Bound (UCB) policy to solve this decision-making problem. We prove that the Q-greedyUCB algorithm is convergent through mathematical analysis. Simulation results show that Q-greedyUCB finds an optimal scheduling strategy, and is more efficient than Q-learning with the $\varepsilon$-greedy and Average-payoff RL algorithm in terms of the cumulative reward (i.e., the weighted sum of delay and energy) and the convergence speed. We also show that our algorithm can reduce the regret by up to 12% compared to the Q-learning with the $\varepsilon$-greedy and Average-payoff RL algorithm.
Yu Zhao
In this paper, we start the study of Feigin-Odesskii wheel conditions from a geometric viewpoint, and generalize it to the K-theory Hall algebra of any surface.
Yu Zhao, Jia Song, Huali Feng, Fuzhen Zhuang, Qing Li, Xiaojie Wang, Ji Liu
Keyphrase provides accurate information of document content that is highly compact, concise, full of meanings, and widely used for discourse comprehension, organization, and text retrieval. Though previous studies have made substantial efforts for automated keyphrase extraction and generation, surprisingly, few studies have been made for \textit{keyphrase completion} (KPC). KPC aims to generate more keyphrases for document (e.g. scientific publication) taking advantage of document content along with a very limited number of known keyphrases, which can be applied to improve text indexing system, etc. In this paper, we propose a novel KPC method with an encoder-decoder framework. We name it \textit{deep keyphrase completion} (DKPC) since it attempts to capture the deep semantic meaning of the document content together with known keyphrases via a deep learning framework. Specifically, the encoder and the decoder in DKPC play different roles to make full use of the known keyphrases. The former considers the keyphrase-guiding factors, which aggregates information of known keyphrases into context. On the contrary, the latter considers the keyphrase-inhibited factor to inhibit semantically repeated keyphrase generation. Extensive experiments on benchmark datasets demonstrate the efficacy of our proposed model.
Yu Zhao, Fangfang Zhu, Biao Chen
A simple encoding scheme based on Sato's non-naïve frequency division is proposed for a class of Gaussian interference channels with mixed interference. The achievable region is shown to be equivalent to that of Costa's noiseberg region for the onesided Gaussian interference channel. This allows for an indirect proof that this simple achievable rate region is indeed equivalent to the Han-Kobayashi (HK) region with Gaussian input and with time sharing for this class of Gaussian interference channels with mixed interference.
Yu Zhao, Rennong Yang, Guillaume Chevalier, Rajiv Shah, Rob Romijnders
Data analytics helps basketball teams to create tactics. However, manual data collection and analytics are costly and ineffective. Therefore, we applied a deep bidirectional long short-term memory (BLSTM) and mixture density network (MDN) approach. This model is not only capable of predicting a basketball trajectory based on real data, but it also can generate new trajectory samples. It is an excellent application to help coaches and players decide when and where to shoot. Its structure is particularly suitable for dealing with time series problems. BLSTM receives forward and backward information at the same time, while stacking multiple BLSTMs further increases the learning ability of the model. Combined with BLSTMs, MDN is used to generate a multi-modal distribution of outputs. Thus, the proposed model can, in principle, represent arbitrary conditional probability distributions of output variables. We tested our model with two experiments on three-pointer datasets from NBA SportVu data. In the hit-or-miss classification experiment, the proposed model outperformed other models in terms of the convergence speed and accuracy. In the trajectory generation experiment, eight model-generated trajectories at a given time closely matched real trajectories.
Yu Zhao, Hongyu Wang, Yuting Hu, Yidun Wan
We investigate the composite systems consisting of topological orders separated by gapped domain walls. We derive a pair of domain-wall Verlinde formulae, that elucidate the connection between the braiding of interdomain excitations labeled by pairs of anyons in different domains and quasiparticles in the gapped domain wall with their respective fusion rules. Through explicit non-Abelian examples, we showcase the calculation of such braiding and fusion, revealing that the fusion rules for interdomain excitations are generally fractional or irrational. By investigating the correspondence between composite systems and anyon condensation, we unveil the reason for designating these fusion rules as symmetry fractionalized (irrationalized) fusion rules. Our findings hold promise for applications across various fields, such as topological quantum computation, topological field theory, and conformal field theory.
Yu Zhao, Zhennan Zhou
We consider a special type of fast reaction-diffusion systems in which the coefficients of the reaction terms of the two substances are much larger than those of the diffusion terms while the diffusive motion to the substrate is negligible. Specifically speaking, the rate constants of the reaction terms are $O(1/ε)$ while the diffusion coefficients are $O(1)$ where the parameter $ε$ is small. When the rate constants of the reaction terms become highly large, i.e. $ε$ tends to 0, the singular limit behavior of such a fast reaction-diffusion system is inscribed by the Stefan problem with latent heat, which brings great challenges in numerical simulations. In this paper, we adopt a semi-implicit scheme, which is first-order accurate in time and can accurately approximate the interface propagation even when the reaction becomes extremely fast, that is to say, the parameter $ε$ is sufficiently small. The scheme satisfies the positivity, bound preserving properties and has $L^2$ stability and the linearized stability results of the system. For better performance on numerical simulations, we then construct a semi-implicit Runge-Kutta scheme which is second-order accurate in time. Numerous numerical tests are carried out to demonstrate the properties, such as the order of accuracy, positivity and bound preserving, the capturing of the sharp interface with various $ε$ and to simulate the dynamics of the substances and the substrate, and to explore the heat transfer process, such as solid melting or liquid solidification in two dimensions.
Huiyao Chen, Yu Zhao, Zulong Chen, Mengjia Wang, Liangyue Li, Meishan Zhang, Min Zhang
Hierarchical text classification (HTC) is an important task with broad applications, while few-shot HTC has gained increasing interest recently. While in-context learning (ICL) with large language models (LLMs) has achieved significant success in few-shot learning, it is not as effective for HTC because of the expansive hierarchical label sets and extremely-ambiguous labels. In this work, we introduce the first ICL-based framework with LLM for few-shot HTC. We exploit a retrieval database to identify relevant demonstrations, and an iterative policy to manage multi-layer hierarchical labels. Particularly, we equip the retrieval database with HTC label-aware representations for the input texts, which is achieved by continual training on a pretrained language model with masked language modeling (MLM), layer-wise classification (CLS, specifically for HTC), and a novel divergent contrastive learning (DCL, mainly for adjacent semantically-similar labels) objective. Experimental results on three benchmark datasets demonstrate superior performance of our method, and we can achieve state-of-the-art results in few-shot HTC.
Yu Zhao, Fang Liu
This survey examines the most effective retrieval algorithms utilized in ad recommendation and content recommendation systems. Ad targeting algorithms rely on detailed user profiles and behavioral data to deliver personalized advertisements, thereby driving revenue through targeted placements. Conversely, organic retrieval systems aim to improve user experience by recommending content that matches user preferences. This paper compares these two applications and explains the most effective methods employed in each.
Yike Wu, Yu Zhao, Shiwan Zhao, Ying Zhang, Xiaojie Yuan, Guoqing Zhao, Ning Jiang
Despite the great progress of Visual Question Answering (VQA), current VQA models heavily rely on the superficial correlation between the question type and its corresponding frequent answers (i.e., language priors) to make predictions, without really understanding the input. In this work, we define the training instances with the same question type but different answers as \textit{superficially similar instances}, and attribute the language priors to the confusion of VQA model on such instances. To solve this problem, we propose a novel training framework that explicitly encourages the VQA model to distinguish between the superficially similar instances. Specifically, for each training instance, we first construct a set that contains its superficially similar counterparts. Then we exploit the proposed distinguishing module to increase the distance between the instance and its counterparts in the answer space. In this way, the VQA model is forced to further focus on the other parts of the input beyond the question type, which helps to overcome the language priors. Experimental results show that our method achieves the state-of-the-art performance on VQA-CP v2. Codes are available at \href{https://github.com/wyk-nku/Distinguishing-VQA.git}{Distinguishing-VQA}.
Yu Zhao, Hao Fei, Shengqiong Wu, Meishan Zhang, Min Zhang, Tat-seng Chua
Grammar Induction could benefit from rich heterogeneous signals, such as text, vision, and acoustics. In the process, features from distinct modalities essentially serve complementary roles to each other. With such intuition, this work introduces a novel \emph{unsupervised visual-audio-text grammar induction} task (named \textbf{VAT-GI}), to induce the constituent grammar trees from parallel images, text, and speech inputs. Inspired by the fact that language grammar natively exists beyond the texts, we argue that the text has not to be the predominant modality in grammar induction. Thus we further introduce a \emph{textless} setting of VAT-GI, wherein the task solely relies on visual and auditory inputs. To approach the task, we propose a visual-audio-text inside-outside recursive autoencoder (\textbf{VaTiora}) framework, which leverages rich modal-specific and complementary features for effective grammar parsing. Besides, a more challenging benchmark data is constructed to assess the generalization ability of VAT-GI system. Experiments on two benchmark datasets demonstrate that our proposed VaTiora system is more effective in incorporating the various multimodal signals, and also presents new state-of-the-art performance of VAT-GI.
Shivam Verma, Hannes Karlbom, Yu Zhao, Nick Topping, Vivian Chen, Kieran Stanley, Bharath Rengarajan
We present a unified multi-objective model for targeting both advertisements and promotions within the Spotify podcast ecosystem. Our approach addresses key challenges in personalization and cold-start initialization, particularly for new advertising objectives. By leveraging transfer learning from large-scale ad and content interactions within a multi-task learning (MTL) framework, a single joint model can be fine-tuned or directly applied to new or low-data targeting tasks, including in-app promotions. This multi-objective design jointly optimizes podcast outcomes such as streams, clicks, and follows for both ads and promotions using a shared representation over user, content, context, and creative features, effectively supporting diverse business goals while improving user experience. Online A/B tests show up to a 22% reduction in effective Cost-Per-Stream (eCPS), particularly for less-streamed podcasts, and an 18-24% increase in podcast stream rates. Offline experiments and ablations highlight the contribution of ancillary objectives and feature groups to cold-start performance. Our experience shows that a unified modeling strategy improves maintainability, cold-start performance, and coverage, while breaking down historically siloed targeting pipelines. We discuss practical trade-offs of such joint models in a real-world advertising system.
Qiumei Huang, Xu Wang, Yu Zhao
In this work, we propose a modified Hybrid Parallel Kolmogorov--Arnold Network and Multilayer Perceptron Physics-Informed Neural Network to overcome the high-frequency and multiscale challenges inherent in Physics-Informed Neural Networks. This proposed model features a trainable weighting parameter to optimize the convex combination of outputs from the Kolmogorov--Arnold Network and the Multilayer Perceptron, thus maximizing the networks' capabilities to capture different frequency components. Furthermore, we adopt an overlapping domain decomposition technique to decompose complex problems into subproblems, which alleviates the challenge of global optimization. Benchmark results demonstrate that our method reduces training costs and improves computational efficiency compared with manual hyperparameter tuning in solving high-frequency multiscale problems.
Ying Zhang, Yu Zhao, Xuhui Sui, Baohang Zhou, Xiangrui Cai, Li Shen, Xiaojie Yuan, Dacheng Tao
With the increasing multimodal knowledge privatization requirements, multimodal knowledge graphs in different institutes are usually decentralized, lacking of effective collaboration system with both stronger reasoning ability and transmission safety guarantees. In this paper, we propose the Federated Multimodal Knowledge Graph Completion (FedMKGC) task, aiming at training over federated MKGs for better predicting the missing links in clients without sharing sensitive knowledge. We propose a framework named MMFeD3-HidE for addressing multimodal uncertain unavailability and multimodal client heterogeneity challenges of FedMKGC. (1) Inside the clients, our proposed Hyper-modal Imputation Diffusion Embedding model (HidE) recovers the complete multimodal distributions from incomplete entity embeddings constrained by available modalities. (2) Among clients, our proposed Multimodal FeDerated Dual Distillation (MMFeD3) transfers knowledge mutually between clients and the server with logit and feature distillation to improve both global convergence and semantic consistency. We propose a FedMKGC benchmark for a comprehensive evaluation, consisting of a general FedMKGC backbone named MMFedE, datasets with heterogeneous multimodal information, and three groups of constructed baselines. Experiments conducted on our benchmark validate the effectiveness, semantic consistency, and convergence robustness of MMFeD3-HidE.
Lucas Maystre, Alvaro Ortega Gonzalez, Charles Park, Rares Dolga, Tudor Berariu, Yu Zhao, Kamil Ciosek
Embedding models trained separately on similar data often produce representations that encode stable information but are not directly interchangeable. This lack of interoperability raises challenges in several practical applications, such as model retraining, partial model upgrades, and multimodal search. Driven by these challenges, we study when two sets of embeddings can be aligned by an orthogonal transformation. We show that if pairwise dot products are approximately preserved, then there exists an isometry that closely aligns the two sets, and we provide a tight bound on the alignment error. This insight yields a simple alignment recipe, Procrustes post-processing, that makes two embedding models interoperable while preserving the geometry of each embedding space. Empirically, we demonstrate its effectiveness in three applications: maintaining compatibility across retrainings, combining different models for text retrieval, and improving mixed-modality search, where it achieves state-of-the-art performance.
Junyao Peng, Yu Zhao
We prove a Serre relation in the $K$-theoretic Hall algebra of surfaces constructed by Kapranov-Vasserot and the second author.
Yong Zhang, Yu Zhao, Zhennan Zhou
In this paper, we consider a nonlinear and nonlocal parabolic model for multi-species ionic fluids and introduce a semi-implicit finite volume scheme, which is second order accurate in space, first order in time and satisfies the following properties: positivity preserving, mass conservation and energy dissipation. Besides, our scheme involves a fast algorithm on the convolution terms with singular but integrable kernels, which otherwise impedes the accuracy and efficiency of the whole scheme. Error estimates on the fast convolution algorithm are shown next. Numerous numerical tests are provided to demonstrate the properties, such as unconditional stability, order of convergence, energy dissipation and the complexity of the fast convolution algorithm. Furthermore, extensive numerical experiments are carried out to explore the modeling effects in specific examples, such as, the steric repulsion, the concentration of ions at the boundary and the blowup phenomenon of the Keller-Segel equations.
Jian-Guo Liu, Jinhuan Wang, Yu Zhao, Zhennan Zhou
In this paper, we consider the field model for complex ionic fluids with an energy variational structure, and analyze the well-posedness to this model with regularized kernels. Furthermore, we deduce the estimate of the maximal density function to quantify the finite size effect. On the numerical side, we adopt a finite volume scheme to the field model, which satisfies the following properties: positivity-preserving, mass conservation and energy dissipation. Besides, series of numerical experiments are provided to demonstrate the properties of the steady state and the finite size effect by showing the equilibrium profiles with different values of the parameter in the kernel.
Yu Zhao, Guanghui Hu, Baoqiang Yan
In this paper, we consider inverse time-harmonic acoustic and electromagnetic scattering from locally perturbed rough surfaces in three dimensions. The scattering interface is supposed to be the graph of a Lipschitz continuous function with compact support. It is proved that an acoustically sound-soft or sound-hard surface can be uniquely determined by the far-field pattern of infinite number of incident plane waves with distinct directions. Moreover, a single point source or plane wave can be used to uniquely determine a scattering surface of polyhedral type. These uniqueness results apply to Maxwell equations with the perfectly conducting boundary condition. Our arguments rely on the mixed reciprocity relation in a half space and the reflection principle for Helmholtz and Maxwell equations.
Yu Zhao
In this paper, we define the $K$-theoretic Hall algebra for $0$-dimensional coherent sheaves on a smooth projective surface, prove that the algebra is associative and construct a homomorphism to a redefined shuffle algebra analogous to Negut.