Yilang Zhang, Bingcong Li, Georgios B. Giannakis
Utilizing task-invariant prior knowledge extracted from related tasks, meta-learning is a principled framework that empowers learning a new task especially when data records are limited. A fundamental challenge in meta-learning is how to quickly "adapt" the extracted prior in order to train a task-specific model within a few optimization steps. Existing approaches deal with this challenge using a preconditioner that enhances convergence of the per-task training process. Though effective in representing locally a quadratic training loss, these simple linear preconditioners can hardly capture complex loss geometries. The present contribution addresses this limitation by learning a nonlinear mirror map, which induces a versatile distance metric to enable capturing and optimizing a wide range of loss geometries, hence facilitating the per-task training. Numerical tests on few-shot learning datasets demonstrate the superior expressiveness and convergence of the advocated approach.
Yilang Zhang, Bingcong Li, Niao He, Georgios B. Giannakis
Scaling network depth has been a central driver behind the success of modern foundation models, yet recent investigations suggest that deep layers are often underutilized. This paper revisits the default mechanism for deepening neural networks, namely residual connections, from an optimization perspective. Rigorous analysis proves that the layout of residual connections can fundamentally shape convergence behavior, and even induces an exponential gap in convergence rates. Prompted by this insight, we introduce adaptive neural connection reassignment (ANCRe), a principled and lightweight framework that parameterizes and learns residual connectivities from the data. ANCRe adaptively reassigns residual connections with negligible computational and memory overhead ($<1\%$), while enabling more effective utilization of network depth. Extensive numerical tests across pre-training of large language models, diffusion models, and deep ResNets demonstrate consistently accelerated convergence, boosted performance, and enhanced depth efficiency over conventional residual connections.
Yilang Zhang, Bingcong Li, Georgios B. Giannakis
Targeting solutions over `flat' regions of the loss landscape, sharpness-aware minimization (SAM) has emerged as a powerful tool to improve generalizability of deep neural network based learning. While several SAM variants have been developed to this end, a unifying approach that also guides principled algorithm design has been elusive. This contribution leverages preconditioning (pre) to unify SAM variants and provide not only unifying convergence analysis, but also valuable insights. Building upon preSAM, a novel algorithm termed infoSAM is introduced to address the so-called adversarial model degradation issue in SAM by adjusting gradients depending on noise estimates. Extensive numerical tests demonstrate the superiority of infoSAM across various benchmarks.
Yilang Zhang, Bingcong Li, Georgios B. Giannakis
Low-Rank Adaptation (LoRA) lowers the computational and memory overhead of fine-tuning large models by updating a low-dimensional subspace of the pre-trained weight matrix. Albeit efficient, LoRA exhibits suboptimal convergence and noticeable performance degradation, due to inconsistent and imbalanced weight updates induced by its nonunique low-rank factorizations. To overcome these limitations, this article identifies the optimal low-rank factorization per step that minimizes an upper bound on the loss. The resultant refactored low-rank adaptation (RefLoRA) method promotes a flatter loss landscape, along with consistent and balanced weight updates, thus speeding up stable convergence. Extensive experiments evaluate RefLoRA on natural language understanding, and commonsense reasoning tasks with popular large language models including DeBERTaV3, LLaMA-7B, LLaMA2-7B and LLaMA3-8B. The numerical tests corroborate that RefLoRA converges faster, outperforms various benchmarks, and enjoys negligible computational overhead compared to state-of-the-art LoRA variants.
Yilang Zhang, Bingcong Li, Shijian Gao, Georgios B. Giannakis
Meta-learning owns unique effectiveness and swiftness in tackling emerging tasks with limited data. Its broad applicability is revealed by viewing it as a bi-level optimization problem. The resultant algorithmic viewpoint however, faces scalability issues when the inner-level optimization relies on gradient-based iterations. Implicit differentiation has been considered to alleviate this challenge, but it is restricted to an isotropic Gaussian prior, and only favors deterministic meta-learning approaches. This work markedly mitigates the scalability bottleneck by cross-fertilizing the benefits of implicit differentiation to probabilistic Bayesian meta-learning. The novel implicit Bayesian meta-learning (iBaML) method not only broadens the scope of learnable priors, but also quantifies the associated uncertainty. Furthermore, the ultimate complexity is well controlled regardless of the inner-level optimization trajectory. Analytical error bounds are established to demonstrate the precision and efficiency of the generalized implicit gradient over the explicit one. Extensive numerical tests are also carried out to empirically validate the performance of the proposed method.
Yilang Zhang, Xiaodong Yang, Yiwei Cai, Georgios B. Giannakis
As large language models (LLMs) continue to scale in size, the computational overhead has become a major bottleneck for task-specific fine-tuning. While low-rank adaptation (LoRA) effectively curtails this cost by confining the weight updates to a low-dimensional subspace, such a restriction can hinder effectiveness and slow convergence. This contribution deals with these limitations by accumulating progressively a high-rank weight update from consecutive low-rank increments. Specifically, the per update optimal low-rank matrix is identified to minimize the loss function and closely approximate full fine-tuning. To endow efficient and seamless optimization without restarting, this optimal choice is formed by appropriately scaling the columns of the original low-rank matrix. Rigorous performance guarantees reveal that the optimal scaling can be found analytically. Extensive numerical tests with popular LLMs scaling up to 12 billion parameters demonstrate a consistent performance gain and fast convergence relative to state-of-the-art LoRA variants on diverse tasks including natural language understanding, commonsense reasoning, and mathematical problem solving.
Yilang Zhang, Bingcong Li, Georgios B. Giannakis
Utilizing task-invariant knowledge acquired from related tasks as prior information, meta-learning offers a principled approach to learning a new task with limited data records. Sample-efficient adaptation of this prior information is a major challenge facing meta-learning, and plays an important role because it facilitates training the sought task-specific model with just a few optimization steps. Past works deal with this challenge through preconditioning that speeds up convergence of the per-task training. Though effective in representing locally quadratic loss curvatures, simple linear preconditioning can be hardly potent with complex loss geometries. Instead of relying on a quadratic distance metric, the present contribution copes with complex loss metrics by learning a versatile distance-generating function, which induces a nonlinear mirror map to effectively capture and optimize a wide range of loss geometries. With suitable parameterization, this generating function is effected by an expressive neural network that is provably a valid distance. Analytical results establish convergence of not only the proposed method, but also all meta-learning approaches based on preconditioning. To attain gradient norm less than $ε$, the convergence rate of $\mathcal{O}(ε^{-2})$ is on par with standard gradient-based meta-learning methods. Numerical tests on few-shot learning datasets demonstrate the superior empirical performance of the novel algorithm, as well as its rapid per-task convergence, which markedly reduces the number of adaptation steps, hence also accommodating large-scale meta-learning models.
Bingcong Li, Yilang Zhang, Georgios B. Giannakis
Sharpness-aware minimization (SAM) has well-documented merits in enhancing generalization of deep neural network models. Accounting for sharpness in the loss function geometry, where neighborhoods of `flat minima' heighten generalization ability, SAM seeks `flat valleys' by minimizing the maximum loss provoked by an adversarial perturbation within the neighborhood. Although critical to account for sharpness of the loss function, in practice SAM suffers from `over-friendly adversaries,' which can curtail the outmost level of generalization. To avoid such `friendliness,' the present contribution fosters stabilization of adversaries through variance suppression (VASSO). VASSO offers a general approach to provably stabilize adversaries. In particular, when integrating VASSO with SAM, improved generalizability is numerically validated on extensive vision and language tasks. Once applied on top of a computationally efficient SAM variant, VASSO offers a desirable generalization-computation tradeoff.
Shuxin Zhao, Bo Lang, Nan Xiao, Yilang Zhang
Object detection models deployed in real-world applications such as autonomous driving face serious threats from backdoor attacks. Despite their practical effectiveness,existing methods are inherently limited in both capability and robustness due to their dependence on single-trigger-single-object mappings and fragile pixel-level cues. We propose CIS-BA, a novel backdoor attack paradigm that redefines trigger design by shifting from static object features to continuous inter-object interaction patterns that describe how objects co-occur and interact in a scene. By modeling these patterns as a continuous interaction space, CIS-BA introduces space triggers that, for the first time, enable a multi-trigger-multi-object attack mechanism while achieving robustness through invariant geometric relations. To implement this paradigm, we design CIS-Frame, which constructs space triggers via interaction analysis, formalizes them as class-geometry constraints for sample poisoning, and embeds the backdoor during detector training. CIS-Frame supports both single-object attacks (object misclassification and disappearance) and multi-object simultaneous attacks, enabling complex and coordinated effects across diverse interaction states. Experiments on MS-COCO and real-world videos show that CIS-BA achieves over 97% attack success under complex environments and maintains over 95% effectiveness under dynamic multi-trigger conditions, while evading three state-of-the-art defenses. In summary, CIS-BA extends the landscape of backdoor attacks in interaction-intensive scenarios and provides new insights into the security of object detection systems.
Yilang Zhang, Abraham Jaeger Mountain, Bingcong Li, Georgios B. Giannakis
Meta-learning offers a principled framework leveraging \emph{task-invariant} priors from related tasks, with which \emph{task-specific} models can be fine-tuned on downstream tasks, even with limited data records. Gradient-based meta-learning (GBML) relies on gradient descent (GD) to adapt the prior to a new task. Albeit effective, these methods incur high computational overhead that scales linearly with the number of GD steps. To enhance efficiency and scalability, existing methods approximate the gradient of prior parameters (meta-gradient) via truncated backpropagation, yet suffer large approximation errors. Targeting accurate approximation, this work puts forth binomial GBML (BinomGBML), which relies on a truncated binomial expansion for meta-gradient estimation. This novel expansion endows more information in the meta-gradient estimation via efficient parallel computation. As a running paradigm applied to model-agnostic meta-learning (MAML), the resultant BinomMAML provably enjoys error bounds that not only improve upon existing approaches, but also decay super-exponentially under mild conditions. Numerical tests corroborate the theoretical analysis and showcase boosted performance with slightly increased computational overhead.
Bingcong Li, Yilang Zhang, Georgios B. Giannakis
Low-rank adaptation (LoRA) has emerged as the de facto standard for parameter-efficient fine-tuning (PEFT) of foundation models, enabling the adaptation of billion-parameter networks with minimal computational and memory overhead. Despite its empirical success and rapid proliferation of variants, it remains elusive which architectural choices, optimization techniques, and deployment constraints should guide practical method selection. This overview revisits LoRA through the lens of signal processing (SP), bridging modern adapter designs with classical low-rank modeling tools and inverse problems, as well as highlighting how SP principles can inform principled advances of fine-tuning approaches. Rather than providing a comprehensive enumeration and empirical comparisons of LoRA variants, emphasis is placed on the technical mechanisms underpinning these approaches to justify their effectiveness. These advances are categorized into three complementary axes: architectural design, efficient optimization, and pertinent applications. The first axis builds on singular value decomposition (SVD)-based factorization, rank-augmentation constructions, and cross-layer tensorization, while the second axis deals with initialization, alternating solvers, gauge-invariant optimization, and parameterization-aware methods. Beyond fine-tuning, emerging applications of LoRA are accounted across the entire lifecycle of large models, ranging from pre- and post-training to serving/deployment. Finally, open research directions are outlined at the confluence of SP and deep learning to catalyze a bidirectional frontier: classical SP tools provide a principled vocabulary for designing principled PEFT methods, while the unique challenges facing modern deep learning, especially the overwhelming scale and prohibitive overhead, also offer new research lines benefiting the SP community in return.
Yilang Zhang, Neal N. Xiong, Zheng Wei, Xin Yuan, Jian Wang
As a novel method eliminating chromatic aberration on objects, computational color constancy has becoming a fundamental prerequisite for many computer vision applications. Among algorithms performing this task, the learning-based ones have achieved great success in recent years. However, they fail to fully consider the spatial information of images, leaving plenty of room for improvement of the accuracy of illuminant estimation. In this paper, by exploiting the spatial information of images, we propose a color constancy algorithm called Attention Dense Color Constancy (ADCC) using convolutional neural network (CNN). Specifically, based on the 2D log-chrominance histograms of the input images as well as their specially augmented ones, ADCC estimates the illuminant with a self-attention DenseNet. The augmented images help to tell apart the edge gradients, edge pixels and non-edge ones in log-histogram, which contribute significantly to the feature extraction and color-ambiguity elimination, thereby advancing the accuracy of illuminant estimation. Simulations and experiments on benchmark datasets demonstrate that the proposed algorithm is effective for illuminant estimation compared to the state-of-the-art methods. Thus, ADCC offers great potential in promoting applications of smart cities, such as smart camera, where color is an important factor for distinguishing objects.