Zhu Zhang, Zhijie Lin, Zhou Zhao, Jieming Zhu, Xiuqiang He
Video moment retrieval aims to localize the target moment in an video according to the given sentence. The weak-supervised setting only provides the video-level sentence annotations during training. Most existing weak-supervised methods apply a MIL-based framework to develop inter-sample confrontment, but ignore the intra-sample confrontment between moments with semantically similar contents. Thus, these methods fail to distinguish the target moment from plausible negative moments. In this paper, we propose a novel Regularized Two-Branch Proposal Network to simultaneously consider the inter-sample and intra-sample confrontments. Concretely, we first devise a language-aware filter to generate an enhanced video stream and a suppressed video stream. We then design the sharable two-branch proposal module to generate positive proposals from the enhanced stream and plausible negative proposals from the suppressed one for sufficient confrontment. Further, we apply the proposal regularization to stabilize the training process and improve model performance. The extensive experiments show the effectiveness of our method. Our code is released at here.
Renjun Duan, Feimin Huang, Yong Wang, Zhu Zhang
In the paper, assuming that the motion of rarefied gases in a bounded domain is governed by the angular cutoff Boltzmann equation with diffuse reflection boundary, we study the effects of both soft intermolecular interaction and non-isothermal wall temperature upon the long-time dynamics of solutions to the corresponding initial boundary value problem. Specifically, we are devoted to proving the existence and dynamical stability of stationary solutions whenever the boundary temperature has suitably small variations around a positive constant. For the proof of existence, we introduce a new mild formulation of solutions to the steady boundary-value problem along the speeded backward bicharacteristic, and develop the uniform estimates on approximate solutions in both $L^2$ and $L^\infty$. Such mild formulation proves to be useful for treating the steady problem with soft potentials even over unbounded domains. In showing the dynamical stability, a new point is that we can obtain the sub-exponential time-decay rate in $L^\infty$ without losing any velocity weight, which is actually quite different from the classical results as in [11,43] for the torus domain and essentially due to the diffuse reflection boundary and the boundedness of the domain.
Zhu Zhang, Chang Zhou, Jianxin Ma, Zhijie Lin, Jingren Zhou, Hongxia Yang, Zhou Zhao
Existing reasoning tasks often have an important assumption that the input contents can be always accessed while reasoning, requiring unlimited storage resources and suffering from severe time delay on long sequences. To achieve efficient reasoning on long sequences with limited storage resources, memory augmented neural networks introduce a human-like write-read memory to compress and memorize the long input sequence in one pass, trying to answer subsequent queries only based on the memory. But they have two serious drawbacks: 1) they continually update the memory from current information and inevitably forget the early contents; 2) they do not distinguish what information is important and treat all contents equally. In this paper, we propose the Rehearsal Memory (RM) to enhance long-sequence memorization by self-supervised rehearsal with a history sampler. To alleviate the gradual forgetting of early information, we design self-supervised rehearsal training with recollection and familiarity tasks. Further, we design a history sampler to select informative fragments for rehearsal training, making the memory focus on the crucial information. We evaluate the performance of our rehearsal memory by the synthetic bAbI task and several downstream tasks, including text/video question answering and recommendation on long sequences.
Andrew Yang, Zhu Zhang
It is well-known that at the high Reynolds number, the linearized Navier-Stokes equations around the inviscid stable shear profile admit growing mode solutions due to the destabilizing effect of the viscosity. This phenomenon, called Tollmien-Schlichting instability, has been rigorously justified by Grenier-Guo-Nguyen [Adv. Math. 292 (2016); Duke J. Math. 165 (2016)] for Poiseuille flows and boundary layers in the incompressible fluid. To reveal this intrinsic instability mechanism in the compressible setting, in this paper, we study the long-time instability of the Poiseuille flow in a channel. Note that this instability arises in a low-frequency regime instead of a high-frequency regime for the Prandtl boundary layer. The proof is based on the quasi-compressible-Stokes iteration introduced by Yang-Zhang in [50] and subtle analysis of the dispersion relation for the instability. Note that we do not require symmetric conditions on the background shear flow or perturbations.
Shengxin Li, Tong Yang, Zhu Zhang
Despite the physical importance, there are limited mathematical theories for the compressible Navier-Stokes equations with strong boundary layers. This is mainly due to the absence of a stream function structure, unlike the extensively studied incompressible fluid dynamics in two dimensions. This paper aims to establish the structural stability of boundary layer profiles in the form of shear flow for the two-dimensional steady compressible Navier-Stokes equations. Our estimates are uniform across the entire subsonic regime, where the Mach number $m\in (0,1)$. As a byproduct, we provide the first result concerning the low Mach number limit in the presence of Prandtl boundary layers. The proof relies on the quasi-compressible-Stokes iteration introduced in [38], along with a subtle analysis of the interplay between density and velocity variables in different frequency regimes, and the identification of cancellations in higher-order estimates.
Anna Peters, Zhu Zhang, Sanli Faez
We present detailed design and operation instructions for a single-objective inverted microscope. Our design is suitable for two dark-field modes of operation: 1- total internal reflection scattering, and 2- cross-polarization backscattering. The user can switch between the two modes by exchanging one mode-steering element, which is also adapted to the Thorlabs cage system. To establish a stable background speckle for differential microscopy the imaging plane is stabilized with active feedback. We validate the stabilization efficacy by performing long-term scattering measurement on single nanoparticles. This setup can be extended for simultaneous scattering, fluorescence, and confocal imaging modes.
Zhu Zhang, Haolan Tao, Cheng Lian, René van Roij, Sanli Faez
Cyclic Voltammetry (CV) is the most commonly used method in electrochemistry to characterize electrochemical reactions, usually involving macroscopic electrodes. Here we demonstrate an optical CV technique called Opto-iontronic Microscopy, which is capable of monitoring electrochemical processes at the nanoscale. By integrating optical microscopy with nanohole electrodes, we enhance sensitivity in detecting redox reactions within volumes as small as an attoliter ($(100 \text{~nm})^{3}$). This technique uses total internal reflection illumination, Electric-double-layer modulation, cyclic voltammetry, and lock-in detection, to probe ion dynamics inside nanoholes. We applied this method to study EDL (dis)charging coupled to ferrocenedimethanol (Fc(MeOH)$_2$) redox reactions. Experimental results were validated against a theoretical Poisson-Nernst-Planck-Butler-Volmer model, providing insights into ion concentration changes of reaction species that contribute to the optical contrast. This work opens up opportunities for high-sensitivity, label-free analysis of electrochemical reactions in nanoconfined environments, with potential applications in pure nanocrystal growth and monitoring.
Cheng-Jie Liu, Mengjun Ma, Di Wu, Zhu Zhang
We study the stability properties of boundary layer-type shear flows for the three-dimensional Navier-Stokes equations in the limit of small viscosity $0<ν\ll 1$. When the streamwise and spanwise velocity profiles are linearly independent near the boundary, we construct an unstable mode that exhibits rapid growth at the rate of $e^{t/\sqrtν}$. Our results reveal an analytic instability in the three-dimensional Navier-Stokes equations around generic boundary layer profiles. This instability arises from the interplay between spanwise flow and three-dimensional perturbations, and does not occur in purely two-dimensional flows.
Zhu Zhang, Zhou Zhao, Zhijie Lin, Baoxing Huai, Nicholas Jing Yuan
Spatio-temporal video grounding aims to retrieve the spatio-temporal tube of a queried object according to the given sentence. Currently, most existing grounding methods are restricted to well-aligned segment-sentence pairs. In this paper, we explore spatio-temporal video grounding on unaligned data and multi-form sentences. This challenging task requires to capture critical object relations to identify the queried target. However, existing approaches cannot distinguish notable objects and remain in ineffective relation modeling between unnecessary objects. Thus, we propose a novel object-aware multi-branch relation network for object-aware relation discovery. Concretely, we first devise multiple branches to develop object-aware region modeling, where each branch focuses on a crucial object mentioned in the sentence. We then propose multi-branch relation reasoning to capture critical object relationships between the main branch and auxiliary branches. Moreover, we apply a diversity loss to make each branch only pay attention to its corresponding object and boost multi-branch learning. The extensive experiments show the effectiveness of our proposed method.
Zhu Zhang, Zhou Zhao, Zhijie Lin, Jingkuan Song, Xiaofei He
Open-ended video question answering aims to automatically generate the natural-language answer from referenced video contents according to the given question. Currently, most existing approaches focus on short-form video question answering with multi-modal recurrent encoder-decoder networks. Although these works have achieved promising performance, they may still be ineffectively applied to long-form video question answering due to the lack of long-range dependency modeling and the suffering from the heavy computational cost. To tackle these problems, we propose a fast Hierarchical Convolutional Self-Attention encoder-decoder network(HCSA). Concretely, we first develop a hierarchical convolutional self-attention encoder to efficiently model long-form video contents, which builds the hierarchical structure for video sequences and captures question-aware long-range dependencies from video context. We then devise a multi-scale attentive decoder to incorporate multi-layer video representations for answer generation, which avoids the information missing of the top encoder layer. The extensive experiments show the effectiveness and efficiency of our method.
Zhu Zhang, Zhou Zhao, Yang Zhao, Qi Wang, Huasheng Liu, Lianli Gao
In this paper, we consider a novel task, Spatio-Temporal Video Grounding for Multi-Form Sentences (STVG). Given an untrimmed video and a declarative/interrogative sentence depicting an object, STVG aims to localize the spatio-temporal tube of the queried object. STVG has two challenging settings: (1) We need to localize spatio-temporal object tubes from untrimmed videos, where the object may only exist in a very small segment of the video; (2) We deal with multi-form sentences, including the declarative sentences with explicit objects and interrogative sentences with unknown objects. Existing methods cannot tackle the STVG task due to the ineffective tube pre-generation and the lack of object relationship modeling. Thus, we then propose a novel Spatio-Temporal Graph Reasoning Network (STGRN) for this task. First, we build a spatio-temporal region graph to capture the region relationships with temporal object dynamics, which involves the implicit and explicit spatial subgraphs in each frame and the temporal dynamic subgraph across frames. We then incorporate textual clues into the graph and develop the multi-step cross-modal graph reasoning. Next, we introduce a spatio-temporal localizer with a dynamic selection method to directly retrieve the spatio-temporal tubes without tube pre-generation. Moreover, we contribute a large-scale video grounding dataset VidSTG based on video relation dataset VidOR. The extensive experiments demonstrate the effectiveness of our method.
Cheng-Jie Liu, Tong Yang, Zhu Zhang
In this paper, we study the instability induced by the Tollmien-Schlichting wave governed by the MHD system in the Prandtl-Hartmann regime. The interaction of the inviscid mode and viscous mode that leads to the instability is analyzed by the introduction of a new decomposition of the Orr-Sommerfeld operator on the velocity and magnetic fields. The critical Gevrey index for the instability is justified by constructing the growing mode in the essential frequency and it is shown to be the same as the incompressible Navier-Stokes equations in the Prandtl regime. This result justifies rigorously the physical understanding that the transverse magnetic field to the boundary in the Prandtl-Hartmann regime has no extra stabilizing effect on the Tollmien-Schlichting wave.
Zhu Zhang, Jie Yang, Cheng Lian, Sanli Faez
Modulating the electric potential on a conducting electrode is presented to generate an optical contrast for scattering microscopy that is sensitive to both surface charge and local topography. We dub this method Electric-Double-Layer-Modulation microscopy. We numerically compute the change in the local ion concentration that is the origin of this optical contrast for three experimentally relevant geometries: nanosphere, nanowire, and nanohole. In absence of plasmonic effects and physical absorption, the observable optical contrast is proportional to the derivative of the ion concentration with respect to the modulated potential. We demonstrate that this derivative depends on the size of the object and, less intuitively, also on its surface charge. This dependence is key to measuring the surface charge, in an absolute way, using this method. Our results help to identify the experimental conditions such as dynamic range and sensitivity that will be necessary for detecting the elementary charge jumps. We conclude that the nanohole is the most suitable geometry among these three for achieving elementary charge sensitivity.
Renjun Duan, Shuangqian Liu, Zhu Zhang
The paper is concerned with the propagation of ion-acoustic shock waves in a collision dominated plasma. We firstly establish the existence and uniqueness of a small-amplitude smooth travelling wave, then justify its approximation to the shock profile of the KdV-Burgers equations in a suitable asymptotic regime where dissipation in terms of viscosity coefficient is much stronger than dispersion by the Debye length, and prove in the end the large time asymptotic stability of travelling waves under suitably small smooth perturbations.
Renjun Duan, Shuangqian Liu, Tong Yang, Zhu Zhang
In this paper, we study the 1D steady Boltzmann flow in a channel. The walls of the channel are assumed to have vanishing velocity and given temperatures $θ_0$ and $θ_1$. This problem was studied by Esposito et al [13,14] where they showed that the solution tends to a local Maxwellian with parameters satisfying the compressible Navier-Stokes equation with no-slip boundary condition. However, a lot of numerical experiments reveal that the fluid layer does not entirely stick to the boundary. In the regime where the Knudsen number is reasonably small, the slip phenomenon is significant near the boundary. Thus, we revisit this problem by taking into account the slip boundary conditions. Following the lines of [9], we will first give a formal asymptotic analysis to see that the flow governed by the Boltzmann equation is accurately approximated by a superposition of a steady CNS equation with a temperature jump condition and two Knudsen layers located at end points. Then we will establish a uniform $L^\infty$ estimate on the remainder and derive the slip boundary condition for compressible Navier-Stokes equations rigorously.
Zhu Zhang, Zhijie Lin, Zhou Zhao, Zhenxin Xiao
Query-based moment retrieval aims to localize the most relevant moment in an untrimmed video according to the given natural language query. Existing works often only focus on one aspect of this emerging task, such as the query representation learning, video context modeling or multi-modal fusion, thus fail to develop a comprehensive system for further performance improvement. In this paper, we introduce a novel Cross-Modal Interaction Network (CMIN) to consider multiple crucial factors for this challenging task, including (1) the syntactic structure of natural language queries; (2) long-range semantic dependencies in video context and (3) the sufficient cross-modal interaction. Specifically, we devise a syntactic GCN to leverage the syntactic structure of queries for fine-grained representation learning, propose a multi-head self-attention to capture long-range semantic dependencies from video context, and next employ a multi-stage cross-modal interaction to explore the potential relations of video and query contents. The extensive experiments demonstrate the effectiveness of our proposed method.
Zhu Zhang, Zhou Zhao, Zhijie Lin, Jingkuan Song, Deng Cai
Action localization in untrimmed videos is an important topic in the field of video understanding. However, existing action localization methods are restricted to a pre-defined set of actions and cannot localize unseen activities. Thus, we consider a new task to localize unseen activities in videos via image queries, named Image-Based Activity Localization. This task faces three inherent challenges: (1) how to eliminate the influence of semantically inessential contents in image queries; (2) how to deal with the fuzzy localization of inaccurate image queries; (3) how to determine the precise boundaries of target segments. We then propose a novel self-attention interaction localizer to retrieve unseen activities in an end-to-end fashion. Specifically, we first devise a region self-attention method with relative position encoding to learn fine-grained image region representations. Then, we employ a local transformer encoder to build multi-step fusion and reasoning of image and video contents. We next adopt an order-sensitive localizer to directly retrieve the target segment. Furthermore, we construct a new dataset ActivityIBAL by reorganizing the ActivityNet dataset. The extensive experiments show the effectiveness of our method.
Cheng-Jie Liu, Tong Yang, Zhu Zhang
This paper is concerned with the vanishing viscosity and magnetic resistivity limit for the two-dimensional steady incompressible MHD system on the half plane with no-slip boundary condition on velocity field and perfectly conducting wall condition on magnetic field. We prove the nonlinear stability of shear flows of Prandtl type with nondegenerate tangential magnetic field, but without any positivity or monotonicity assumption on the velocity field. It is in sharp contrast to the steady Navier-Stokes equations and reflects the stabilization effect of magnetic field. Unlike the unsteady MHD system, we manage the degeneracy on the boundary caused by no-slip boundary condition and obtain the estimates of solutions by introducing an intrinsic weight function and some good auxiliary functions.
Zhu Zhang
This paper is concerned with a kinetic model of a Vlasov-Fokker-Planck system used to describe the evolution of two species of particles interacting through a potential and a thermal reservoir at given temperature. We prove that at low temperature, the homogeneous equilibrium is dynamically unstable under certain perturbations. Our work is motivated by a problem arising in \cite{EGM1}.
Tong Yang, Zhu Zhang
It is a classical problem in fluid dynamics about the stability and instability of different hydrodynamic patterns in various physical settings, in particular in the high Reynolds number limit of laminar flow with boundary layer. However, there are very few mathematical results on the compressible fluid despite the extensive studies when the fluid is governed by the incompressible Navier-Stokes equations. This paper aims to introduce a new approach to study the compressible Navier-Stokes equations in the subsonic and high Reynolds number regime where a subtle quasi-compressible and Stokes iteration is developed. As a byproduct, we show the spectral instability of subsonic boundary layer.