Ning Yu, Connelly Barnes, Eli Shechtman, Sohrab Amirghodsi, Michal Lukac
This paper addresses the problem of interpolating visual textures. We formulate this problem by requiring (1) by-example controllability and (2) realistic and smooth interpolation among an arbitrary number of texture samples. To solve it we propose a neural network trained simultaneously on a reconstruction task and a generation task, which can project texture examples onto a latent space where they can be linearly interpolated and projected back onto the image domain, thus ensuring both intuitive control and realistic results. We show our method outperforms a number of baselines according to a comprehensive suite of metrics as well as a user study. We further show several applications based on our technique, which include texture brush, texture dissolve, and animal hybridization.
Saurabh Sharma, Ning Yu, Mario Fritz, Bernt Schiele
Deep learning enables impressive performance in image recognition using large-scale artificially-balanced datasets. However, real-world datasets exhibit highly class-imbalanced distributions, yielding two main challenges: relative imbalance amongst the classes and data scarcity for mediumshot or fewshot classes. In this work, we address the problem of long-tailed recognition wherein the training set is highly imbalanced and the test set is kept balanced. Differently from existing paradigms relying on data-resampling, cost-sensitive learning, online hard example mining, loss objective reshaping, and/or memory-based modeling, we propose an ensemble of class-balanced experts that combines the strength of diverse classifiers. Our ensemble of class-balanced experts reaches results close to state-of-the-art and an extended ensemble establishes a new state-of-the-art on two benchmarks for long-tailed recognition. We conduct extensive experiments to analyse the performance of the ensembles, and discover that in modern large-scale datasets, relative imbalance is a harder problem than data scarcity. The training and evaluation code is available at https://github.com/ssfootball04/class-balanced-experts.
Ning Yu, Ke Li, Peng Zhou, Jitendra Malik, Larry Davis, Mario Fritz
Generative Adversarial Networks (GANs) have brought about rapid progress towards generating photorealistic images. Yet the equitable allocation of their modeling capacity among subgroups has received less attention, which could lead to potential biases against underrepresented minorities if left uncontrolled. In this work, we first formalize the problem of minority inclusion as one of data coverage, and then propose to improve data coverage by harmonizing adversarial training with reconstructive generation. The experiments show that our method outperforms the existing state-of-the-art methods in terms of data coverage on both seen and unseen data. We develop an extension that allows explicit control over the minority subgroups that the model should ensure to include, and validate its effectiveness at little compromise from the overall performance on the entire dataset. Code, models, and supplemental videos are available at GitHub.
Ning Yu, Dingwei Zhang, Xiaofeng Luo
We propose the transverse velocity ($β_T$) dependence of the anti-deuteron to deuteron ratio as a new observable to search for the QCD critical point in heavy-ion collisions. The QCD critical point can attract the system evolution trajectory in the QCD phase diagram, which is known as focusing effect. To quantify this effect, we employ thermal model and hadronic transport model to simulate the dynamical particle emission along a hypothetical focusing trajectory near critical point. We found the focusing effect can lead to anomalous $β_T$ dependence of $\bar{p}/p$, $\bar{d}/d$ and $^3\overline{\text{He}}/^3\text{He}$ ratios. We examined the $β_T$ dependence of $\bar{p}/p$ and $\bar{d}/d$ ratios of central Au+Au collisions at $\sqrt{s_{NN}}$ = 7.7 to 200 GeV measured by the STAR experiment at RHIC. Surprisingly, we only observe a negative slope in $β_T$ dependence of $\bar{d}/d$ ratio at $\sqrt{s_{NN}}$ = 19.6 GeV, which indicates the trajectory evolution has passed through the critical region. In the future, we could constrain the location of the critical point and/or width of the critical region by making precise measurements on the $β_T$ dependence of $\bar{d}/d$ ratio at different energies and rapidity.
Ning Yu, Timothy Haskins
Regional rainfall forecasting is an important issue in hydrology and meteorology. This paper aims to design an integrated tool by applying various machine learning algorithms, especially the state-of-the-art deep learning algorithms including Deep Neural Network, Wide Neural Network, Deep and Wide Neural Network, Reservoir Computing, Long Short Term Memory, Support Vector Machine, K-Nearest Neighbor for forecasting regional precipitations over different catchments in Upstate New York. Through the experimental results and the comparison among machine learning models including classification and regression, we find that KNN is an outstanding model over other models to handle the uncertainty in the precipitation data. The data normalization methods such as ZScore and MinMax are also evaluated and discussed.
Ning Yu, Vladislav Skripniuk, Dingfan Chen, Larry Davis, Mario Fritz
Over the past years, deep generative models have achieved a new level of performance. Generated data has become difficult, if not impossible, to be distinguished from real data. While there are plenty of use cases that benefit from this technology, there are also strong concerns on how this new technology can be misused to generate deep fakes and enable misinformation at scale. Unfortunately, current deep fake detection methods are not sustainable, as the gap between real and fake continues to close. In contrast, our work enables a responsible disclosure of such state-of-the-art generative models, that allows model inventors to fingerprint their models, so that the generated samples containing a fingerprint can be accurately detected and attributed to a source. Our technique achieves this by an efficient and scalable ad-hoc generation of a large population of models with distinct fingerprints. Our recommended operation point uses a 128-bit fingerprint which in principle results in more than $10^{38}$ identifiable models. Experiments show that our method fulfills key properties of a fingerprinting mechanism and achieves effectiveness in deep fake detection and attribution. Code and models are available at https://github.com/ningyu1991/ScalableGANFingerprints .
Minxing Zhang, Ning Yu, Rui Wen, Michael Backes, Yang Zhang
Generative models have demonstrated revolutionary success in various visual creation tasks, but in the meantime, they have been exposed to the threat of leaking private information of their training data. Several membership inference attacks (MIAs) have been proposed to exhibit the privacy vulnerability of generative models by classifying a query image as a training dataset member or nonmember. However, these attacks suffer from major limitations, such as requiring shadow models and white-box access, and either ignoring or only focusing on the unique property of diffusion models, which block their generalization to multiple generative models. In contrast, we propose the first generalized membership inference attack against a variety of generative models such as generative adversarial networks, [variational] autoencoders, implicit functions, and the emerging diffusion models. We leverage only generated distributions from target generators and auxiliary non-member datasets, therefore regarding target generators as black boxes and agnostic to their architectures or application scenarios. Experiments validate that all the generative models are vulnerable to our attack. For instance, our work achieves attack AUC $>0.99$ against DDPM, DDIM, and FastDPM trained on CIFAR-10 and CelebA. And the attack against VQGAN, LDM (for the text-conditional generation), and LIIF achieves AUC $>0.90.$ As a result, we appeal to our community to be aware of such privacy leakage risks when designing and publishing generative models.
Tao Luo, Yu Ning, Xiande Zhang
The determination of the quantum chromatic number of graphs has attracted considerable attention recently. However, there are few families of graphs whose quantum chromatic numbers are determined. A notable exception is the family of orthogonality graphs, whose quantum chromatic numbers are fully determined. In this paper, we extend these results by determining the exact quantum chromatic number of several subgraphs of the orthogonality graphs. Using the technique of combinatorial designs, we also determine the quantum chromatic number of the distance-2 Hamming graph, whose edges consist of binary vectors of Hamming distance 2, for infinitely many length.
Zuman Zhang, Ning Yu, Hongge Xu
Using the multiphase transport (AMPT) model, we study the relative neutron density fluctuation and neutron-proton correlation in matter produced by Au+Au collisions at $\sqrt{s_\text{NN}} = $7.7-200 GeV. The rapidity, centrality, and energy dependence of these two observations are also discussed. The light nuclei yield ratio of proton, deuteron, and triton $N_tN_p/N_d^2$ calculated directly from the relative neutron density fluctuation and neutron-proton correlation, decreases with rapidity coverage and increases with collision centrality. Our study also found that the ratio does not exhibit any non-monotonic behavior in collision energy dependence. Since there is no first-order phase transition or critical physics in the AMPT model, our work provides a reference for extracting the relative neutron density fluctuation from light nuclei production in experiments.
Hongge Xu, Ning Yu, Zuman Zhang, Guoying Chen
We demonstrate that the propagator, derived from an Effective Field Theory (EFT) that incorporates Weinberger's compositeness theorem, provides a more general formula for describing S-wave near-threshold states. By fitting the lineshape using this propagator, we can extract the $Z$ factor for these states and elucidate their structures.
Ning Yu, Zuman Zhang, Hongge Xu, Minxuan Song
In this study, the chemical freeze-out of hadrons, including light-and strange-flavor particles and light nuclei, produced in Au+Au collisions at the Relativistic Heavy Ion Collider (RHIC), was investigated. Using the thermal-FIST thermodynamic statistical model, we analyzed various particle sets: those inclusive of light nuclei, those exclusive to light nuclei, and those solely comprising light nuclei. We determined the chemical freeze-out parameters at $\sqrt{s_\text{NN}}=$ 7.7--200 GeV and four different centralities. A significant finding was the decrease in the chemical freeze-out temperature $T_{\textrm{ch}}$ with light nuclei inclusion, with an even more pronounced reduction when considering light nuclei yields exclusively. This suggests that light nuclei formation occurs at a later stage in the system's evolution at RHIC energies. We present parameterized formulas that describe the energy dependence of $T_{\textrm{ch}}$ and the baryon chemical potential $μ_B$ for three distinct particle sets in central Au+Au collisions at RHIC energies. Our results reveal at least three distinct $T_{\textrm{ch}}$ at RHIC energies correspond to different freeze-out hypersurfaces: a light-flavor freeze-out temperature of $T_L$ = 150.2$\pm$6 MeV, a strange-flavor freeze-out temperature $T_s$ = 165.1$\pm$2.7 MeV, and a light-nuclei freeze-out temperature $T_{\textrm{ln}}$ = 141.7$\pm$1.4 MeV. Notably, at the Large Hadron Collider (LHC) Pb+Pb 2.76 TeV, the expected lower freeze-out temperature for light nuclei was not observed; instead, the $T_{\textrm{ch}}$ for light nuclei was found to be approximately 10 MeV higher than that for light-flavor hadrons.
Dingfan Chen, Ning Yu, Yang Zhang, Mario Fritz
Deep learning has achieved overwhelming success, spanning from discriminative models to generative models. In particular, deep generative models have facilitated a new level of performance in a myriad of areas, ranging from media manipulation to sanitized dataset generation. Despite the great success, the potential risks of privacy breach caused by generative models have not been analyzed systematically. In this paper, we focus on membership inference attack against deep generative models that reveals information about the training data used for victim models. Specifically, we present the first taxonomy of membership inference attacks, encompassing not only existing attacks but also our novel ones. In addition, we propose the first generic attack model that can be instantiated in a large range of settings and is applicable to various kinds of deep generative models. Moreover, we provide a theoretically grounded attack calibration technique, which consistently boosts the attack performance in all cases, across different attack settings, data modalities, and training configurations. We complement the systematic analysis of attack performance by a comprehensive experimental study, that investigates the effectiveness of various attacks w.r.t. model type and training configurations, over three diverse application scenarios (i.e., images, medical data, and location data).
Ning Yu, Guilin Liu, Aysegul Dundar, Andrew Tao, Bryan Catanzaro, Larry Davis, Mario Fritz
Generative Adversarial Networks (GANs) produce impressive results on unconditional image generation when powered with large-scale image datasets. Yet generated images are still easy to spot especially on datasets with high variance (e.g. bedroom, church). In this paper, we propose various improvements to further push the boundaries in image generation. Specifically, we propose a novel dual contrastive loss and show that, with this loss, discriminator learns more generalized and distinguishable representations to incentivize generation. In addition, we revisit attention and extensively experiment with different attention blocks in the generator. We find attention to be still an important module for successful image generation even though it was not used in the recent state-of-the-art models. Lastly, we study different attention architectures in the discriminator, and propose a reference attention mechanism. By combining the strengths of these remedies, we improve the compelling state-of-the-art Fréchet Inception Distance (FID) by at least 17.5% on several benchmark datasets. We obtain even more significant improvements on compositional synthetic scenes (up to 47.5% in FID). Code and models are available at https://github.com/ningyu1991/AttentionDualContrastGAN .
Ning Yu, Chia-Chih Chen, Zeyuan Chen, Rui Meng, Gang Wu, Paul Josel, Juan Carlos Niebles, Caiming Xiong, Ran Xu
Graphic layout designs play an essential role in visual communication. Yet handcrafting layout designs is skill-demanding, time-consuming, and non-scalable to batch production. Generative models emerge to make design automation scalable but it remains non-trivial to produce designs that comply with designers' multimodal desires, i.e., constrained by background images and driven by foreground content. We propose LayoutDETR that inherits the high quality and realism from generative modeling, while reformulating content-aware requirements as a detection problem: we learn to detect in a background image the reasonable locations, scales, and spatial relations for multimodal foreground elements in a layout. Our solution sets a new state-of-the-art performance for layout generation on public benchmarks and on our newly-curated ad banner dataset. We integrate our solution into a graphical system that facilitates user studies, and show that users prefer our designs over baselines by significant margins. Code, models, dataset, and demos are available at https://github.com/salesforce/LayoutDETR.
Ryan Burgert, Yuancheng Xu, Wenqi Xian, Oliver Pilarski, Pascal Clausen, Mingming He, Li Ma, Yitong Deng, Lingxiao Li, Mohsen Mousavi, Michael Ryoo, Paul Debevec, Ning Yu
Generative modeling aims to transform random noise into structured outputs. In this work, we enhance video diffusion models by allowing motion control via structured latent noise sampling. This is achieved by just a change in data: we pre-process training videos to yield structured noise. Consequently, our method is agnostic to diffusion model design, requiring no changes to model architectures or training pipelines. Specifically, we propose a novel noise warping algorithm, fast enough to run in real time, that replaces random temporal Gaussianity with correlated warped noise derived from optical flow fields, while preserving the spatial Gaussianity. The efficiency of our algorithm enables us to fine-tune modern video diffusion base models using warped noise with minimal overhead, and provide a one-stop solution for a wide range of user-friendly motion control: local object motion control, global camera movement control, and motion transfer. The harmonization between temporal coherence and spatial Gaussianity in our warped noise leads to effective motion control while maintaining per-frame pixel quality. Extensive experiments and user studies demonstrate the advantages of our method, making it a robust and scalable approach for controlling motion in video diffusion models. Video results are available on our webpage: https://eyeline-labs.github.io/Go-with-the-Flow. Source code and model checkpoints are available on GitHub: https://github.com/Eyeline-Labs/Go-with-the-Flow.
Ning Yu, Jie Zhang, Sandeep Mitra, Rebecca Smith, Adam Rich
This study introduces the AI-Educational Development Loop (AI-EDL), a theory-driven framework that integrates classical learning theories with human-in-the-loop artificial intelligence (AI) to support reflective, iterative learning. Implemented in EduAlly, an AI-assisted platform for writing-intensive and feedback-sensitive tasks, the framework emphasizes transparency, self-regulated learning, and pedagogical oversight. A mixed-methods study was piloted at a comprehensive public university to evaluate alignment between AI-generated feedback, instructor evaluations, and student self-assessments; the impact of iterative revision on performance; and student perceptions of AI feedback. Quantitative results demonstrated statistically significant improvement between first and second attempts, with agreement between student self-evaluations and final instructor grades. Qualitative findings indicated students valued immediacy, specificity, and opportunities for growth that AI feedback provided. These findings validate the potential to enhance student learning outcomes through developmentally grounded, ethically aligned, and scalable AI feedback systems. The study concludes with implications for future interdisciplinary applications and refinement of AI-supported educational technologies.
Ning Yu, Larry Davis, Mario Fritz
Recent advances in Generative Adversarial Networks (GANs) have shown increasing success in generating photorealistic images. But they also raise challenges to visual forensics and model attribution. We present the first study of learning GAN fingerprints towards image attribution and using them to classify an image as real or GAN-generated. For GAN-generated images, we further identify their sources. Our experiments show that (1) GANs carry distinct model fingerprints and leave stable fingerprints in their generated images, which support image attribution; (2) even minor differences in GAN training can result in different fingerprints, which enables fine-grained model authentication; (3) fingerprints persist across different image frequencies and patches and are not biased by GAN artifacts; (4) fingerprint finetuning is effective in immunizing against five types of adversarial image perturbations; and (5) comparisons also show our learned fingerprints consistently outperform several baselines in a variety of setups.
Ning Yu, Zachary Tuttle, Carl Jake Thurnau, Emmanuel Mireku
Since the first Graphical User Interface (GUI) prototype was invented in the 1970s, GUI systems have been deployed into various personal computer systems and server platforms. Recently, with the development of artificial intelligence (AI) technology, malicious malware powered by AI is emerging as a potential threat to GUI systems. This type of AI-based cybersecurity attack, targeting at GUI systems, is explored in this paper. It is twofold: (1) A malware is designed to attack the existing GUI system by using AI-based object recognition techniques. (2) Its defensive methods are discovered by generating adversarial examples and other methods to alleviate the threats from the intelligent GUI attack. The results have shown that a generic GUI attack can be implemented and performed in a simple way based on current AI techniques and its countermeasures are temporary but effective to mitigate the threats of GUI attack so far.
Ning Yu, Xiaohui Shen, Zhe Lin, Radomir Mech, Connelly Barnes
In this paper, we introduce the problem of simultaneously detecting multiple photographic defects. We aim at detecting the existence, severity, and potential locations of common photographic defects related to color, noise, blur and composition. The automatic detection of such defects could be used to provide users with suggestions for how to improve photos without the need to laboriously try various correction methods. Defect detection could also help users select photos of higher quality while filtering out those with severe defects in photo curation and summarization. To investigate this problem, we collected a large-scale dataset of user annotations on seven common photographic defects, which allows us to evaluate algorithms by measuring their consistency with human judgments. Our new dataset enables us to formulate the problem as a multi-task learning problem and train a multi-column deep convolutional neural network (CNN) to simultaneously predict the severity of all the defects. Unlike some existing single-defect estimation methods that rely on low-level statistics and may fail in many cases on natural photographs, our model is able to understand image contents and quality at a higher level. As a result, in our experiments, we show that our model has predictions with much higher consistency with human judgments than low-level methods as well as several baseline CNN models. Our model also performs better than an average human from our user study.
Ning Yu, Zuman Zhang, Hongge Xu, Zhong Zhu
This research use the AMPT model in Au+Au collisions to study the influence of the three nucleons correlation $C_{n^2p}$ on the light nuclei yield ratios. It is found that neglecting $C_{n^2p}$ leads to an overestimated relative neutron density fluctuation extraction. Including $C_{n^2p}$ will enhances the agreement with experimental results with higher yield ratios, yet it does not change the energy dependence of the yield ratio. Since there is no first-order phase transition or critical physics in the AMPT model, our work fails to reproduce the experimental energy-dependent peak around $\sqrt{s_\text{NN}} = $20-30 GeV. Our work might offer a baseline for investigating critical physics phenomena using the light nuclei production as a probe.