Matthew Jones
We describe a new analysis of Upsilon(nS) to mu+mu- decays collected in proton anti-proton collisions with the CDF II detector at the Fermilab Tevatron. This analysis measures the angular distributions of the final state muons in the Upsilon rest frame, providing new information about Upsilon production polarization. We find the angular distributions to be nearly isotropic up to Upsilon transverse momentum of 40 GeV/c, consistent with previous measurements by CDF, but inconsistent with results obtained by the D0 experiment. The results are compared with recent NLO calculations based on color-singlet matrix elements and non-relativistic QCD with color-octet matrix elements.
Daniel S. Katz, Sou-Cheng T. Choi, Nancy Wilkins-Diehr, Neil Chue Hong, Colin C. Venters, James Howison, Frank Seinstra, Matthew Jones, Karen Cranston, Thomas L. Clune, Miguel de Val-Borro, Richard Littauer
This technical report records and discusses the Second Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE2). The report includes a description of the alternative, experimental submission and review process, two workshop keynote presentations, a series of lightning talks, a discussion on sustainability, and five discussions from the topic areas of exploring sustainability; software development experiences; credit & incentives; reproducibility & reuse & sharing; and code testing & code review. For each topic, the report includes a list of tangible actions that were proposed and that would lead to potential change. The workshop recognized that reliance on scientific software is pervasive in all areas of world-leading research today. The workshop participants then proceeded to explore different perspectives on the concept of sustainability. Key enablers and barriers of sustainable scientific software were identified from their experiences. In addition, recommendations with new requirements such as software credit files and software prize frameworks were outlined for improving practices in sustainable software engineering. There was also broad consensus that formal training in software development or engineering was rare among the practitioners. Significant strides need to be made in building a sense of community via training in software and technical practices, on increasing their size and scope, and on better integrating them directly into graduate education programs. Finally, journals can define and publish policies to improve reproducibility, whereas reviewers can insist that authors provide sufficient information and access to data and software to allow them reproduce the results in the paper. Hence a list of criteria is compiled for journals to provide to reviewers so as to make it easier to review software submitted for publication as a "Software Paper."
Huy Lê Nguyen, Thy Nguyen, Matthew Jones
We study the problem of fairness in k-centers clustering on data with disjoint demographic groups. Specifically, this work proposes a variant of fairness which restricts each group's number of centers with both a lower bound (minority-protection) and an upper bound (restricted-domination), and provides both an offline and one-pass streaming algorithm for the problem. In the special case where the lower bound and the upper bound is the same, our offline algorithm preserves the same time complexity and approximation factor with the previous state-of-the-art. Furthermore, our one-pass streaming algorithm improves on approximation factor, running time and space complexity in this special case compared to previous works. Specifically, the approximation factor of our algorithm is 13 compared to the previous 17-approximation algorithm, and the previous algorithms' time complexities have dependence on the metric space's aspect ratio, which can be arbitrarily large, whereas our algorithm's running time does not depend on the aspect ratio.
Matthew Jones, Huy Lê Nguyen, Thy Nguyen
Recently a multi-agent variant of the classical multi-armed bandit was proposed to tackle fairness issues in online learning. Inspired by a long line of work in social choice and economics, the goal is to optimize the Nash social welfare instead of the total utility. Unfortunately previous algorithms either are not efficient or achieve sub-optimal regret in terms of the number of rounds $T$. We propose a new efficient algorithm with lower regret than even previous inefficient ones. For $N$ agents, $K$ arms, and $T$ rounds, our approach has a regret bound of $\tilde{O}(\sqrt{NKT} + NK)$. This is an improvement to the previous approach, which has regret bound of $\tilde{O}( \min(NK, \sqrt{N} K^{3/2})\sqrt{T})$. We also complement our efficient algorithm with an inefficient approach with $\tilde{O}(\sqrt{KT} + N^2K)$ regret. The experimental findings confirm the effectiveness of our efficient algorithm compared to the previous approaches.
Matthew Jones, Huy Lê Nguyen, Thy Nguyen
This paper studies the problem of clustering in metric spaces while preserving the privacy of individual data. Specifically, we examine differentially private variants of the k-medians and Euclidean k-means problems. We present polynomial algorithms with constant multiplicative error and lower additive error than the previous state-of-the-art for each problem. Additionally, our algorithms use a clustering algorithm without differential privacy as a black-box. This allows practitioners to control the trade-off between runtime and approximation factor by choosing a suitable clustering algorithm to use.
Elizabeth J Cross, Timothy J Rogers, Daniel J Pitchforth, Samuel J Gibson, Matthew R Jones
Despite the growing availability of sensing and data in general, we remain unable to fully characterise many in-service engineering systems and structures from a purely data-driven approach. The vast data and resources available to capture human activity are unmatched in our engineered world, and, even in cases where data could be referred to as ``big,'' they will rarely hold information across operational windows or life spans. This paper pursues the combination of machine learning technology and physics-based reasoning to enhance our ability to make predictive models with limited data. By explicitly linking the physics-based view of stochastic processes with a data-based regression approach, a spectrum of possible Gaussian process models are introduced that enable the incorporation of different levels of expert knowledge of a system. Examples illustrate how these approaches can significantly reduce reliance on data collection whilst also increasing the interpretability of the model, another important consideration in this context.
Matthew I. Jones, Matthew Chervenak, Nicholas A. Christakis
When confronted with a host of issues, groups often save time and energy by compiling many issues into a single bundle when making decisions. This reduces the time and cost of group decision-making, but it also leads to suboptimal outcomes as individuals lose the ability to express their preferences on each specific issue. We examine this trade-off by quantifying the value of bundled voting compared to a more tedious issue-by-issue voting process. Our research investigates multi-issue bundles and their division into multiple subbundles, confirming that bundling generally yields positive outcomes for the group. However, bundling and issue-by-issue voting can easily yield opposite results regardless of the number of votes the bundle receives. Furthermore, we show that most combinations of voters and issues are vulnerable to manipulation if the subundling is controlled by a bad actor. By carefully crafting bundles, such an antagonist can achieve the minority preference on almost every issue. Thus, naturally occurring undemocratic outcomes may be rare, but they can be easily manufactured. To thoroughly investigate this problem, we employ three techniques throughout the paper: mathematical analysis, computer simulations, and the analysis of American voter survey data. This study provides valuable insights into the dynamics of bundled voting and its implications for group decision-making. By highlighting the potential for manipulation and suboptimal outcomes, our findings add another layer to our understanding of voting paradoxes and offer insights for those designing group decision-making systems that are safer and more fair.
Logan E. Hillberry, Matthew T. Jones, David L. Vargas, Patrick Rall, Nicole Yunger Halpern, Ning Bao, Simone Notarnicola, Simone Montangero, Lincoln D. Carr
Cellular automata are interacting classical bits that display diverse emergent behaviors, from fractals to random-number generators to Turing-complete computation. We discover that quantum cellular automata (QCA) can exhibit complexity in the sense of the complexity science that describes biology, sociology, and economics. QCA exhibit complexity when evolving under "Goldilocks rules" that we define by balancing activity and stasis. Our Goldilocks rules generate robust dynamical features (entangled breathers), network structure and dynamics consistent with complexity, and persistent entropy fluctuations. Present-day experimental platforms -- Rydberg arrays, trapped ions, and superconducting qubits -- can implement our Goldilocks protocols, making testable the link between complexity science and quantum computation exposed by our QCA.
Matthew I. Jones, Antonio D. Sirianni, Feng Fu
The median voter theorem has long been the default model of voter behavior and candidate choice. While contemporary work on the distribution of political opinion has emphasized polarization and an increasing gap between the "left" and the "right" in democracies, the median voter theorem presents a model of anti-polarization: competing candidates move to the center of the ideological distribution to maximize vote share, regardless of the underlying ideological distribution of voters. These anti-polar results, however, largely depend on the "singled-peakedness" of voter preferences, an assumption that is rapidly loosing relevance in the age of polarization. This article presents a model of voter choice that examines three potential mechanisms that can undermine this finding: a relative cost of voting that deters voters who are sufficiently indifferent to both candidates, ideologically motivated third-party alternatives that attract extreme voters, and a bimodal distribution of voter ideology. Under reasonable sets of conditions and empirically observed voter opinion distributions, these mechanisms can be sufficient to cause strategically-minded candidates to fail to converge to the center, or to even become more polarized than their electorate.
Amarpal Sahota, Amber Roguski, Matthew W Jones, Zahraa S. Abdallah, Raul Santos-Rodriguez
We evaluate the effectiveness of combining brain connectivity metrics with signal statistics for early stage Parkinson's Disease (PD) classification using electroencephalogram data (EEG). The data is from 5 arousal states - wakeful and four sleep stages (N1, N2, N3 and REM). Our pipeline uses an Ada Boost model for classification on a challenging early stage PD classification task with with only 30 participants (11 PD , 19 Healthy Control). Evaluating 9 brain connectivity metrics we find the best connectivity metric to be different for each arousal state with Phase Lag Index achieving the highest individual classification accuracy of 86\% on N1 data. Further to this our pipeline using regional signal statistics achieves an accuracy of 78\%, using brain connectivity only achieves an accuracy of 86\% whereas combining the two achieves a best accuracy of 91\%. This best performance is achieved on N1 data using Phase Lag Index (PLI) combined with statistics derived from the frequency characteristics of the EEG signal. This model also achieves a recall of 80 \% and precision of 96\%. Furthermore we find that on data from each arousal state, combining PLI with regional signal statistics improves classification accuracy versus using signal statistics or brain connectivity alone. Thus we conclude that combining brain connectivity statistics with regional EEG statistics is optimal for classifier performance on early stage Parkinson's. Additionally, we find outperformance of N1 EEG for classification of Parkinson's and expect this could be due to disrupted N1 sleep in PD. This should be explored in future work.
Matthew I. Jones, Scott D. Pauls, Feng Fu
Global coordination is required to solve a wide variety of challenging collective action problems from network colorings to the tragedy of the commons. Recent empirical study shows that the presence of a few noisy autonomous agents can greatly improve collective performance of humans in solving networked color coordination games. To provide further analytical insights into the role of behavioral randomness, here we study myopic artificial agents attempt to solve similar network coloring problems using decision update rules that are only based on local information but allow random choices at various stages of their heuristic reasonings. We consider that agents are distributed over a random bipartite network which is guaranteed to be solvable with two colors. Using agent-based simulations and theoretical analysis, we show that the resulting efficacy of resolving color conflicts is dependent on the specific implementation of random behavior of agents, including the fraction of noisy agents and at which decision stage noise is introduced. Moreover, behavioral randomness can be finely tuned to the specific underlying population structure such as network size and average network degree in order to produce advantageous results in finding collective coloring solutions. Our work demonstrates that distributed greedy optimization algorithms exploiting local information should be deployed in combination with occasional exploration via random choices in order to overcome local minima and achieve global coordination.
Matthew I. Jones
We introduce a new framework to study the group dynamics and game-theoretic considerations when voters in a committee are allowed to trade votes. This model represents a significant step forward by considering vote-for-vote trades in a low-information environment where voters do not know the preferences of their trading partners. All voters draw their preference intensities on two issues from a common probability distribution and then consider offering to trade with an anonymous partner. The result is a strategic game between two voters that can be studied analytically. We compute the Nash equilibria for this game and derive several interesting results involving symmetry, group heterogeneity, and more. This framework allows us to determine that trades are typically detrimental to the welfare of the group as a whole, but there are exceptions. We also expand our model to allow all voters to trade votes and derive approximate results for this more general scenario. Finally, we emulate vote trading in real groups by forming simulated committees using real voter preference intensity data and computing the resulting equilibria and associated welfare gains or losses.
Matthew R Jones, Timothy J Rogers, Elizabeth J Cross
The automated localisation of damage in structures is a challenging but critical ingredient in the path towards predictive or condition-based maintenance of high value structures. The use of acoustic emission time of arrival mapping is a promising approach to this challenge, but is severely hindered by the need to collect a dense set of artificial acoustic emission measurements across the structure, resulting in a lengthy and often impractical data acquisition process. In this paper, we consider the use of physics-informed Gaussian processes for learning these maps to alleviate this problem. In the approach, the Gaussian process is constrained to the physical domain such that information relating to the geometry and boundary conditions of the structure are embedded directly into the learning process, returning a model that guarantees that any predictions made satisfy physically-consistent behaviour at the boundary. A number of scenarios that arise when training measurement acquisition is limited, including where training data are sparse, and also of limited coverage over the structure of interest. Using a complex plate-like structure as an experimental case study, we show that our approach significantly reduces the burden of data collection, where it is seen that incorporation of boundary condition knowledge significantly improves predictive accuracy as training observations are reduced, particularly when training measurements are not available across all parts of the structure.
Matthew R. Jones, Tim J. Rogers, Keith Worden, Elizabeth J. Cross
In the field of structural health monitoring (SHM), the acquisition of acoustic emissions to localise damage sources has emerged as a popular approach. Despite recent advances, the task of locating damage within composite materials and structures that contain non-trivial geometrical features, still poses a significant challenge. Within this paper, a Bayesian source localisation strategy that is robust to these complexities is presented. Under this new framework, a Gaussian process is first used to learn the relationship between source locations and the corresponding difference-in-time-of-arrival values for a number of sensor pairings. As an acoustic emission event with an unknown origin is observed, a mapping is then generated that quantifies the likelihood of the emission location across the surface of the structure. The new probabilistic mapping offers multiple benefits, leading to a localisation strategy that is more informative than deterministic predictions or single-point estimates with an associated confidence bound. The performance of the approach is investigated on a structure with numerous complex geometrical features and demonstrates a favourable performance in comparison to other similar localisation methods.
Bertram Ludaescher, Kyle Chard, Niall Gaffney, Matthew B. Jones, Jaroslaw Nabrzyski, Victoria Stodden, Matthew Turk
We present an overview of the recently funded "Merging Science and Cyberinfrastructure Pathways: The Whole Tale" project (NSF award #1541450). Our approach has two nested goals: 1) deliver an environment that enables researchers to create a complete narrative of the research process including exposure of the data-to-publication lifecycle, and 2) systematically and persistently link research publications to their associated digital scholarly objects such as the data, code, and workflows. To enable this, Whole Tale will create an environment where researchers can collaborate on data, workspaces, and workflows and then publish them for future adoption or modification. Published data and applications will be consumed either directly by users using the Whole Tale environment or can be integrated into existing or future domain Science Gateways.
Matthew I Jones, Scott D. Pauls, Feng Fu
To curb the spread of fake news on social media platforms, recent studies have considered an online crowdsourcing fact-checking approach as one possible intervention method to reduce misinformation. However, it remains unclear under what conditions crowdsourcing fact-checking efforts deter the spread of misinformation. To address this issue, we model such distributed fact-checking as `peer policing' that will reduce the perceived payoff to share or disseminate false information (fake news) and also reward the spread of trustworthy information (real news). By simulating our model on synthetic square lattices and small-world networks, we show that the presence of social network structure enables fake news spreaders to be self-organized into echo chambers, thereby providing a boost to the efficacy of fake news and thus its resistance to fact-checking efforts. Additionally, to study our model in a more realistic setting, we utilize a Twitter network dataset and study the effectiveness of deliberately choosing specific individuals to be fact-checkers. We find that targeted fact-checking efforts can be highly effective, seeing the same level of success with as little as a fifth of the number of fact-checkers, but it depends on the structure of the network in question. In the limit of weak selection, we obtain closed-form analytical conditions for critical threshold of crowdsourced fact-checking in terms of the payoff values in our fact-checker/fake news game. Our work has practical implications for developing model-based mitigation strategies for controlling the spread of misinformation that interferes with the political discourse.
Adam Graham-Squire, Matthew I. Jones, David McCune
In real-world elections where voters cast preference ballots, voters often provide only a partial ranking of the candidates. Despite this empirical reality, prior social choice literature frequently analyzes fairness criteria under the assumption that all voters provide a complete ranking of the candidates. We introduce new fairness criteria for multiwinner ranked-choice elections concerning truncated ballots. In particular, we define notions of the independence of losing voters blocs and independence of winning voters blocs, which state that the winning committee of an election should not change when we remove partial ballots which rank only losing candidates, and the winning committee should change in reasonable ways when removing ballots which rank only winning candidates. Of the voting methods we analyze, the Chamberlin-Courant rule performs the best with respect to these criteria, the expanding approvals rule performs the worst, and the method of single transferable vote falls in between.
Matthew I. Jones, Nicholas A. Christakis
Consensus formation is a complex process, particularly in networked groups. When individuals are incentivized to dig in and refuse to compromise, leaders may be essential to guiding the group to consensus. Specifically, the relative geodesic position of leaders (which we use as a proxy for ease of communication between leaders) could be important for reaching consensus. Additionally, groups searching for consensus can be confounded by noisy signals in which individuals are given false information about the actions of their fellow group members. We tested the effects of the geodesic distance between leaders (geodesic distance ranging from 1-4) and of noise (noise levels at 0%, 5%, and 10%) by recruiting participants (N=3,456) for a set of experiments (n=216 groups). We find that noise makes groups less likely to reach consensus, and the groups that do reach consensus take longer to find it. We find that leadership changes the behavior of both leaders and followers in important ways (for instance, being labeled a leader makes people more likely to 'go with the flow'). However, we find no evidence that the distance between leaders is a significant factor in the probability of reaching consensus. While other network properties of leaders undoubtedly impact consensus formation, the distance between leaders in network sub-groups appears not to matter.
Matthew I. Jones, Zachary Winkeler
Graph colorings have been of interest to mathematicians for a long time, but relatively recently, social scientists have also found them to be interesting tools for studying group behavior. In the last 20 years, scientists have begun to study how coloring problems can be solved by groups of individuals on a graph, which has led to new insights into network structure, group dynamics, and individual human behavior. Despite this newfound utility, the exact nature of these distributed coloring problems is not well-understood, and established mathematical tools like the chromatic polynomial miss the unique challenges that arise in these social problem-solving situations with limited information. In this paper, we provide a new framework for understanding these distributed problems by defining a new kind of graph coloring with particular relevance to consensus formation on networks, in which all vertices are trying to agree on a common color. These strict gridlock colorings represent roadblocks to consensus where the group will not reach a uniform coloring using natural update processes. We describe a recurrence relation that provides an algorithm for counting these gridlocked colorings, which establishes a mathematical measure of how much a given graph hinders consensus in a group.
Elizabeth J Cross, Samuel J Gibson, Matthew R Jones, Daniel J Pitchforth, Sikai Zhang, Timothy J Rogers
The use of machine learning in Structural Health Monitoring is becoming more common, as many of the inherent tasks (such as regression and classification) in developing condition-based assessment fall naturally into its remit. This chapter introduces the concept of physics-informed machine learning, where one adapts ML algorithms to account for the physical insight an engineer will often have of the structure they are attempting to model or assess. The chapter will demonstrate how grey-box models, that combine simple physics-based models with data-driven ones, can improve predictive capability in an SHM setting. A particular strength of the approach demonstrated here is the capacity of the models to generalise, with enhanced predictive capability in different regimes. This is a key issue when life-time assessment is a requirement, or when monitoring data do not span the operational conditions a structure will undergo. The chapter will provide an overview of physics-informed ML, introducing a number of new approaches for grey-box modelling in a Bayesian setting. The main ML tool discussed will be Gaussian process regression, we will demonstrate how physical assumptions/models can be incorporated through constraints, through the mean function and kernel design, and finally in a state-space setting. A range of SHM applications will be demonstrated, from loads monitoring tasks for off-shore and aerospace structures, through to performance monitoring for long-span bridges.