Parsa Saadatpanah, Ali Shafahi, Tom Goldstein
It is well-known that many machine learning models are susceptible to adversarial attacks, in which an attacker evades a classifier by making small perturbations to inputs. This paper discusses how industrial copyright detection tools, which serve a central role on the web, are susceptible to adversarial attacks. We discuss a range of copyright detection systems, and why they are particularly vulnerable to attacks. These vulnerabilities are especially apparent for neural network based systems. As a proof of concept, we describe a well-known music identification method, and implement this system in the form of a neural net. We then attack this system using simple gradient methods. Adversarial music created this way successfully fools industrial systems, including the AudioTag copyright detector and YouTube's Content ID system. Our goal is to raise awareness of the threats posed by adversarial examples in this space, and to highlight the importance of hardening copyright detection systems to attacks.
W. Ronny Huang, Jonas Geiping, Liam Fowl, Gavin Taylor, Tom Goldstein
Data poisoning -- the process by which an attacker takes control of a model by making imperceptible changes to a subset of the training data -- is an emerging threat in the context of neural networks. Existing attacks for data poisoning neural networks have relied on hand-crafted heuristics, because solving the poisoning problem directly via bilevel optimization is generally thought of as intractable for deep models. We propose MetaPoison, a first-order method that approximates the bilevel problem via meta-learning and crafts poisons that fool neural networks. MetaPoison is effective: it outperforms previous clean-label poisoning methods by a large margin. MetaPoison is robust: poisoned data made for one model transfer to a variety of victim models with unknown training settings and architectures. MetaPoison is general-purpose, it works not only in fine-tuning scenarios, but also for end-to-end training from scratch, which till now hasn't been feasible for clean-label attacks with deep nets. MetaPoison can achieve arbitrary adversary goals -- like using poisons of one class to make a target image don the label of another arbitrarily chosen class. Finally, MetaPoison works in the real-world. We demonstrate for the first time successful data poisoning of models trained on the black-box Google Cloud AutoML API. Code and premade poisons are provided at https://github.com/wronnyhuang/metapoison
Jonas Geiping, Micah Goldblum, Phillip E. Pope, Michael Moeller, Tom Goldstein
It is widely believed that the implicit regularization of SGD is fundamental to the impressive generalization behavior we observe in neural networks. In this work, we demonstrate that non-stochastic full-batch training can achieve comparably strong performance to SGD on CIFAR-10 using modern architectures. To this end, we show that the implicit regularization of SGD can be completely replaced with explicit regularization even when comparing against a strong and well-researched baseline. Our observations indicate that the perceived difficulty of full-batch training may be the result of its optimization properties and the disproportionate time and effort spent by the ML community tuning optimizers and hyperparameters for small-batch training.
Oscar Castañeda, Sven Jacobsson, Giuseppe Durisi, Mikael Coldrey, Tom Goldstein, Christoph Studer
Massive multiuser (MU) multiple-input multiple-output (MIMO) will be a core technology in fifth-generation (5G) wireless systems as it offers significant improvements in spectral efficiency compared to existing multi-antenna technologies. The presence of hundreds of antenna elements at the base station (BS), however, results in excessively high hardware costs and power consumption, and requires high interconnect throughput between the baseband-processing unit and the radio unit. Massive MU-MIMO that uses low-resolution analog-to-digital and digital-to-analog converters (DACs) has the potential to address all these issues. In this paper, we focus on downlink precoding for massive MU-MIMO systems with 1-bit DACs at the BS. The objective is to design precoders that simultaneously mitigate multi-user interference (MUI) and quantization artifacts. We propose two nonlinear 1-bit precoding algorithms and corresponding very-large scale integration (VLSI) designs. Our algorithms rely on biconvex relaxation, which enables the design of efficient 1-bit precoding algorithms that achieve superior error-rate performance compared to that of linear precoding algorithms followed by quantization. To showcase the efficacy of our algorithms, we design VLSI architectures that enable efficient 1-bit precoding for massive MU-MIMO systems in which hundreds of antennas serve tens of user equipments. We present corresponding field-programmable gate array (FPGA) implementations to demonstrate that 1-bit precoding enables reliable and high-rate downlink data transmission in practical systems.
Kevin Kuo, Anthony Ostuni, Elizabeth Horishny, Michael J. Curry, Samuel Dooley, Ping-yeh Chiang, Tom Goldstein, John P. Dickerson
The design of revenue-maximizing auctions with strong incentive guarantees is a core concern of economic theory. Computational auctions enable online advertising, sourcing, spectrum allocation, and myriad financial markets. Analytic progress in this space is notoriously difficult; since Myerson's 1981 work characterizing single-item optimal auctions, there has been limited progress outside of restricted settings. A recent paper by Dütting et al. circumvents analytic difficulties by applying deep learning techniques to, instead, approximate optimal auctions. In parallel, new research from Ilvento et al. and other groups has developed notions of fairness in the context of auction design. Inspired by these advances, in this paper, we extend techniques for approximating auctions using deep learning to address concerns of fairness while maintaining high revenue and strong incentive guarantees.
Tom Goldstein, Min Li, Xiaoming Yuan, Ernie Esser, Richard Baraniuk
The Primal-Dual hybrid gradient (PDHG) method is a powerful optimization scheme that breaks complex problems into simple sub-steps. Unfortunately, PDHG methods require the user to choose stepsize parameters, and the speed of convergence is highly sensitive to this choice. We introduce new adaptive PDHG schemes that automatically tune the stepsize parameters for fast convergence without user inputs. We prove rigorous convergence results for our methods, and identify the conditions required for convergence. We also develop practical implementations of adaptive schemes that formally satisfy the convergence requirements. Numerical experiments show that adaptive PDHG methods have advantages over non-adaptive implementations in terms of both efficiency and simplicity for the user.
Micah Goldblum, Steven Reich, Liam Fowl, Renkun Ni, Valeriia Cherepanova, Tom Goldstein
Meta-learning algorithms produce feature extractors which achieve state-of-the-art performance on few-shot classification. While the literature is rich with meta-learning methods, little is known about why the resulting feature extractors perform so well. We develop a better understanding of the underlying mechanics of meta-learning and the difference between models trained using meta-learning and models which are trained classically. In doing so, we introduce and verify several hypotheses for why meta-learned models perform better. Furthermore, we develop a regularizer which boosts the performance of standard training routines for few-shot classification. In many cases, our routine outperforms meta-learning while simultaneously running an order of magnitude faster.
Micah Goldblum, Dimitris Tsipras, Chulin Xie, Xinyun Chen, Avi Schwarzschild, Dawn Song, Aleksander Madry, Bo Li, Tom Goldstein
As machine learning systems grow in scale, so do their training data requirements, forcing practitioners to automate and outsource the curation of training data in order to achieve state-of-the-art performance. The absence of trustworthy human supervision over the data collection process exposes organizations to security vulnerabilities; training data can be manipulated to control and degrade the downstream behaviors of learned models. The goal of this work is to systematically categorize and discuss a wide range of dataset vulnerabilities and exploits, approaches for defending against these threats, and an array of open problems in this space. In addition to describing various poisoning and backdoor threat models and the relationships among them, we develop their unified taxonomy.
Avi Schwarzschild, Eitan Borgnia, Arjun Gupta, Arpit Bansal, Zeyad Emam, Furong Huang, Micah Goldblum, Tom Goldstein
We describe new datasets for studying generalization from easy to hard examples.
Micah Goldblum, Avi Schwarzschild, Ankit B. Patel, Tom Goldstein
Algorithmic trading systems are often completely automated, and deep learning is increasingly receiving attention in this domain. Nonetheless, little is known about the robustness properties of these models. We study valuation models for algorithmic trading from the perspective of adversarial machine learning. We introduce new attacks specific to this domain with size constraints that minimize attack costs. We further discuss how these attacks can be used as an analysis tool to study and evaluate the robustness properties of financial models. Finally, we investigate the feasibility of realistic adversarial attacks in which an adversarial trader fools automated trading systems into making inaccurate predictions.
Ping-yeh Chiang, Michael J. Curry, Ahmed Abdelkader, Aounon Kumar, John Dickerson, Tom Goldstein
Despite the vulnerability of object detectors to adversarial attacks, very few defenses are known to date. While adversarial training can improve the empirical robustness of image classifiers, a direct extension to object detection is very expensive. This work is motivated by recent progress on certified classification by randomized smoothing. We start by presenting a reduction from object detection to a regression problem. Then, to enable certified regression, where standard mean smoothing fails, we propose median smoothing, which is of independent interest. We obtain the first model-agnostic, training-free, and certified defense for object detection against $\ell_2$-bounded attacks. The code for all experiments in the paper is available at http://github.com/Ping-C/CertifiedObjectDetection .
Tom Goldstein, Christoph Studer, Richard Baraniuk
This is a user manual for the software package FASTA.
Eitan Borgnia, Jonas Geiping, Valeriia Cherepanova, Liam Fowl, Arjun Gupta, Amin Ghiasi, Furong Huang, Micah Goldblum, Tom Goldstein
Data poisoning and backdoor attacks manipulate training data to induce security breaches in a victim model. These attacks can be provably deflected using differentially private (DP) training methods, although this comes with a sharp decrease in model performance. The InstaHide method has recently been proposed as an alternative to DP training that leverages supposed privacy properties of the mixup augmentation, although without rigorous guarantees. In this work, we show that strong data augmentations, such as mixup and random additive noise, nullify poison attacks while enduring only a small accuracy trade-off. To explain these finding, we propose a training method, DP-InstaHide, which combines the mixup regularizer with additive noise. A rigorous analysis of DP-InstaHide shows that mixup does indeed have privacy advantages, and that training with k-way mixup provably yields at least k times stronger DP guarantees than a naive DP mechanism. Because mixup (as opposed to noise) is beneficial to model performance, DP-InstaHide provides a mechanism for achieving stronger empirical performance against poisoning attacks than other known DP methods.
W. Ronny Huang, Zeyad Emam, Micah Goldblum, Liam Fowl, J. K. Terry, Furong Huang, Tom Goldstein
The power of neural networks lies in their ability to generalize to unseen data, yet the underlying reasons for this phenomenon remain elusive. Numerous rigorous attempts have been made to explain generalization, but available bounds are still quite loose, and analysis does not always lead to true understanding. The goal of this work is to make generalization more intuitive. Using visualization methods, we discuss the mystery of generalization, the geometry of loss landscapes, and how the curse (or, rather, the blessing) of dimensionality causes optimizers to settle into minima that generalize well.
Tom Goldstein, Gavin Taylor, Kawika Barabin, Kent Sayre
Recent approaches to distributed model fitting rely heavily on consensus ADMM, where each node solves small sub-problems using only local data. We propose iterative methods that solve {\em global} sub-problems over an entire distributed dataset. This is possible using transpose reduction strategies that allow a single node to solve least-squares over massive datasets without putting all the data in one place. This results in simple iterative methods that avoid the expensive inner loops required for consensus methods. To demonstrate the efficiency of this approach, we fit linear classifiers and sparse linear models to datasets over 5 Tb in size using a distributed implementation with over 7000 cores in far less time than previous approaches.
Zuxuan Wu, Ser-Nam Lim, Larry Davis, Tom Goldstein
We present a systematic study of adversarial attacks on state-of-the-art object detection frameworks. Using standard detection datasets, we train patterns that suppress the objectness scores produced by a range of commonly used detectors, and ensembles of detectors. Through extensive experiments, we benchmark the effectiveness of adversarially trained patches under both white-box and black-box settings, and quantify transferability of attacks between datasets, object classes, and detector models. Finally, we present a detailed study of physical world attacks using printed posters and wearable clothes, and rigorously quantify the performance of such attacks with different metrics.
Eric Lei, Oscar Castañeda, Olav Tirkkonen, Tom Goldstein, Christoph Studer
Neural networks have been proposed recently for positioning and channel charting of user equipments (UEs) in wireless systems. Both of these approaches process channel state information (CSI) that is acquired at a multi-antenna base-station in order to learn a function that maps CSI to location information. CSI-based positioning using deep neural networks requires a dataset that contains both CSI and associated location information. Channel charting (CC) only requires CSI information to extract relative position information. Since CC builds on dimensionality reduction, it can be implemented using autoencoders. In this paper, we propose a unified architecture based on Siamese networks that can be used for supervised UE positioning and unsupervised channel charting. In addition, our framework enables semisupervised positioning, where only a small set of location information is available during training. We use simulations to demonstrate that Siamese networks achieve similar or better performance than existing positioning and CC approaches with a single, unified neural network architecture.
Chen Zhu, Yu Cheng, Zhe Gan, Siqi Sun, Tom Goldstein, Jingjing Liu
Adversarial training, which minimizes the maximal risk for label-preserving input perturbations, has proved to be effective for improving the generalization of language models. In this work, we propose a novel adversarial training algorithm, FreeLB, that promotes higher invariance in the embedding space, by adding adversarial perturbations to word embeddings and minimizing the resultant adversarial risk inside different regions around input samples. To validate the effectiveness of the proposed approach, we apply it to Transformer-based models for natural language understanding and commonsense reasoning tasks. Experiments on the GLUE benchmark show that when applied only to the finetuning stage, it is able to improve the overall test scores of BERT-base model from 78.3 to 79.4, and RoBERTa-large model from 88.5 to 88.8. In addition, the proposed approach achieves state-of-the-art single-model test accuracies of 85.44\% and 67.75\% on ARC-Easy and ARC-Challenge. Experiments on CommonsenseQA benchmark further demonstrate that FreeLB can be generalized and boost the performance of RoBERTa-large model on other tasks as well. Code is available at \url{https://github.com/zhuchen03/FreeLB .
Ahmed Abdelkader, Michael J. Curry, Liam Fowl, Tom Goldstein, Avi Schwarzschild, Manli Shu, Christoph Studer, Chen Zhu
Transfer learning facilitates the training of task-specific classifiers using pre-trained models as feature extractors. We present a family of transferable adversarial attacks against such classifiers, generated without access to the classification head; we call these \emph{headless attacks}. We first demonstrate successful transfer attacks against a victim network using \textit{only} its feature extractor. This motivates the introduction of a label-blind adversarial attack. This transfer attack method does not require any information about the class-label space of the victim. Our attack lowers the accuracy of a ResNet18 trained on CIFAR10 by over 40\%.
Rohan Chandra, Ziyuan Zhong, Justin Hontz, Val McCulloch, Christoph Studer, Tom Goldstein
Phase retrieval deals with the estimation of complex-valued signals solely from the magnitudes of linear measurements. While there has been a recent explosion in the development of phase retrieval algorithms, the lack of a common interface has made it difficult to compare new methods against the state-of-the-art. The purpose of PhasePack is to create a common software interface for a wide range of phase retrieval algorithms and to provide a common testbed using both synthetic data and empirical imaging datasets. PhasePack is able to benchmark a large number of recent phase retrieval methods against one another to generate comparisons using a range of different performance metrics. The software package handles single method testing as well as multiple method comparisons. The algorithm implementations in PhasePack differ slightly from their original descriptions in the literature in order to achieve faster speed and improved robustness. In particular, PhasePack uses adaptive stepsizes, line-search methods, and fast eigensolvers to speed up and automate convergence.