Tat-Thang Vo, Ting Ye, Ashkan Ertefaie, Samrat Roy, James Flory, Sean Hennessy, Stijn Vansteelandt, Dylan S. Small
In the standard difference-in-differences research design, the parallel trends assumption may be violated when the relationship between the exposure trend and the outcome trend is confounded by unmeasured confounders. Progress can be made if there is an exogenous variable that (i) does not directly influence the change in outcome means (i.e. the outcome trend) except through influencing the change in exposure means (i.e. the exposure trend), and (ii) is not related to the unmeasured exposure - outcome confounders on the trend scale. Such exogenous variable is called an instrument for difference-in-differences. For continuous outcomes that lend themselves to linear modelling, so-called instrumented difference-in-differences methods have been proposed. In this paper, we will suggest novel multiplicative structural mean models for instrumented difference-in-differences, which allow one to identify and estimate the average treatment effect on count and rare binary outcomes, in the whole population or among the treated, when a valid instrument for difference-in-differences is available. We discuss the identifiability of these models, then develop efficient semi-parametric estimation approaches that allow the use of flexible, data-adaptive or machine learning methods to estimate the nuisance parameters. We apply our proposal on health care data to investigate the risk of moderate to severe weight gain under sulfonylurea treatment compared to metformin treatment, among new users of antihyperglycemic drugs.
Mingyuan Zhang, Marshall M. Joffe, Dylan S. Small
Most of the work on the structural nested model and g-estimation for causal inference in longitudinal data assumes a discrete-time underlying data generating process. However, in some observational studies, it is more reasonable to assume that the data are generated from a continuous-time process and are only observable at discrete time points. When these circumstances arise, the sequential randomization assumption in the observed discrete-time data, which is essential in justifying discrete-time g-estimation, may not be reasonable. Under a deterministic model, we discuss other useful assumptions that guarantee the consistency of discrete-time g-estimation. In more general cases, when those assumptions are violated, we propose a controlling-the-future method that performs at least as well as g-estimation in most scenarios and which provides consistent estimation in some cases where g-estimation is severely inconsistent. We apply the methods discussed in this paper to simulated data, as well as to a data set collected following a massive flood in Bangladesh, estimating the effect of diarrhea on children's height. Results from different methods are compared in both simulation and the real application.
Kenneth E. Shirley, Dylan S. Small, Kevin G. Lynch, Stephen A. Maisto, David W. Oslin
In a clinical trial of a treatment for alcoholism, a common response variable of interest is the number of alcoholic drinks consumed by each subject each day, or an ordinal version of this response, with levels corresponding to abstinence, light drinking and heavy drinking. In these trials, within-subject drinking patterns are often characterized by alternating periods of heavy drinking and abstinence. For this reason, many statistical models for time series that assume steady behavior over time and white noise errors do not fit alcohol data well. In this paper we propose to describe subjects' drinking behavior using Markov models and hidden Markov models (HMMs), which are better suited to describe processes that make sudden, rather than gradual, changes over time. We incorporate random effects into these models using a hierarchical Bayes structure to account for correlated responses within subjects over time, and we estimate the effects of covariates, including a randomized treatment, on the outcome in a novel way. We illustrate the models by fitting them to a large data set from a clinical trial of the drug Naltrexone. The HMM, in particular, fits this data well and also contains unique features that allow for useful clinical interpretations of alcohol consumption behavior.
Kai Zhang, Dylan S. Small
Comment on ``The Essential Role of Pair Matching in Cluster-Randomized Experiments, with Application to the Mexican Universal Health Insurance Evaluation'' [arXiv:0910.3752]
Bingkai Wang, Chan Park, Dylan S. Small, Fan Li
Cluster-randomized experiments are increasingly used to evaluate interventions in routine practice conditions, and researchers often adopt model-based methods with covariate adjustment in the statistical analyses. However, the validity of model-based covariate adjustment is unclear when the working models are misspecified, leading to ambiguity of estimands and risk of bias. In this article, we first adapt two conventional model-based methods, generalized estimating equations and linear mixed models, with weighted g-computation to achieve robust inference for cluster-average and individual-average treatment effects. To further overcome the limitations of model-based covariate adjustment methods, we propose an efficient estimator for each estimand that allows for flexible covariate adjustment and additionally addresses cluster size variation dependent on treatment assignment and other cluster characteristics. Such cluster size variations often occur post-randomization and, if ignored, can lead to bias of model-based estimators. For our proposed efficient covariate-adjusted estimator, we prove that when the nuisance functions are consistently estimated by machine learning algorithms, the estimator is consistent, asymptotically normal, and efficient. When the nuisance functions are estimated via parametric working models, the estimator is triply-robust. Simulation studies and analyses of three real-world cluster-randomized experiments demonstrate that the proposed methods are superior to existing alternatives.
Dylan S. Small, Dan Firth, Luke Keele, Matthew Huber, Molly Passarella, Scott Lorch, Heather Burris
Surface mining has become a major method of coal mining in Central Appalachia alongside the traditional underground mining. Concerns have been raised about the health effects of this surface mining, particularly mountaintop removal mining where coal is mined upon steep mountaintops by removing the mountaintop through clearcutting forests and explosives. We have designed a matched observational study to assess the effects of surface mining in Central Appalachia on adverse birth outcomes. This protocol describes for the study the background and motivation, the sample selection and the analysis plan.
Siyu Heng, Hyunseung Kang, Dylan S. Small, Colin B. Fogarty
In many observational studies, the interest is in the effect of treatment on bad, aberrant outcomes rather than the average outcome. For such settings, the traditional approach is to define a dichotomous outcome indicating aberration from a continuous score and use the Mantel-Haenszel test with matched data. For example, studies of determinants of poor child growth use the World Health Organization's definition of child stunting being height-for-age z-score $\leq -2$. The traditional approach may lose power because it discards potentially useful information about the severity of aberration. We develop an adaptive approach that makes use of this information and asymptotically dominates the traditional approach. We develop our approach in two parts. First, we develop an aberrant rank approach in matched observational studies and prove a novel design sensitivity formula enabling its asymptotic comparison with the Mantel-Haenszel test under various settings. Second, we develop a new, general adaptive approach, the two-stage programming method, and use it to adaptively combine the aberrant rank test and the Mantel-Haenszel test. We apply our approach to a study of the effect of teenage pregnancy on stunting.
Pallavi Basu, Dylan S. Small
Difference-in-differences analysis with a control group that differs considerably from a treated group is vulnerable to bias from historical events that have different effects on the groups. Constructing a more closely matched control group by matching a subset of the overall control group to the treated group may result in less bias. We study this phenomenon in simulation studies. We study the effect of mountaintop removal mining (MRM) on mortality using a difference-in-differences analysis that makes use of the increase in MRM following the 1990 Clean Air Act Amendments. For a difference-in-differences analysis of the effect of MRM on mortality, we constructed a more closely matched control group and found a 95\% confidence interval that contains substantial adverse effects along with no effect and small beneficial effects.
Timothy G. Gaulton, Sameer K. Deshpande, Dylan S. Small, Mark D. Neuman
American football is the most popular high school sport and is among the leading cause of injury among adolescents. While there has been considerable recent attention on the link between football and cognitive decline, there is also evidence of higher than expected rates of pain, obesity, and lower quality of life among former professional players, either as a result of repetitive head injury or through different mechanisms. Previously hidden downstream effects of playing football may have far-reaching public health implications for participants in youth and high school football programs. Our proposed study is a retrospective observational study that compares 1,153 high school males who played varsity football with 2,751 male students who did not. 1,951 of the control subjects did not play any sport and the remaining 800 controls played a non-contact sport. Our primary outcome is self-rated health measured at age 65. To control for potential confounders, we adjust for pre-exposure covariates with matching and model-based covariance adjustment. We will conduct an ordered testing procedure designed to use the full pool of 2,751 controls while also controlling for possible unmeasured differences between students who played sports and those who did not. We will quantitatively assess the sensitivity of the results to potential unmeasured confounding. The study will also assess secondary outcomes of pain, difficulty with activities of daily living, and obesity, as these are both important to individual well-being and have public health relevance.
Siyu Heng, Dylan S. Small
In observational studies, it is typically unrealistic to assume that treatments are randomly assigned, even conditional on adjusting for all observed covariates. Therefore, a sensitivity analysis is often needed to examine how hidden biases due to unobserved covariates would affect inferences on treatment effects. In matched observational studies where each treated unit is matched to one or multiple untreated controls for observed covariates, the Rosenbaum bounds sensitivity analysis is one of the most popular sensitivity analysis models. In this paper, we show that in the presence of interactions between observed and unobserved covariates, directly applying the Rosenbaum bounds will almost inevitably exaggerate the report of sensitivity of causal conclusions to hidden bias. We give sharper odds ratio bounds to fix this deficiency. We illustrate our new method through studying the effect of anger/hostility tendency on the risk of having heart problems.
Kwonsang Lee, Bhaswar B. Bhattacharya, Jing Qin, Dylan S. Small
Instrumental variable methods allow for inference about the treatment effect by controlling for unmeasured confounding in randomized experiments with noncompliance. However, many studies do not consider the observed compliance behavior in the testing procedure, which can lead to a loss of power. In this paper, we propose a novel nonparametric likelihood approach, referred to as the binomial likelihood (BL) method, that incorporates information on compliance behavior while overcoming several limitations of previous techniques and utilizing the advantages of likelihood methods. Our proposed method produces proper estimates of the counterfactual distribution functions by maximizing the binomial likelihood over the space of distribution functions. Using this we propose two versions of a binomial likelihood ratio test for the null hypothesis of no treatment effect. We show that both versions are more powerful to detect any distributional change than existing methods in finite sample cases, and are asymptotically equivalent to the two-sample Anderson-Darling test. We also develop an efficient algorithm for computing our estimates, and apply the binomial likelihood method to a study of the effect of Medicaid coverage on mental health using the Oregon Health Insurance Experiment.
Kwonsang Lee, Dylan S. Small
Malaria is a parasitic disease that is a major health problem in many tropical regions. The most characteristic symptom of malaria is fever. The fraction of fevers that are attributable to malaria, the malaria attributable fever fraction (MAFF), is an important public health measure for assessing the effect of malaria control programs and other purposes. Estimating the MAFF is not straightforward because there is no gold standard diagnosis of a malaria attributable fever; an individual can have malaria parasites in her blood and a fever, but the individual may have developed partial immunity that allows her to tolerate the parasites and the fever is being caused by another infection. We define the MAFF using the potential outcome framework for causal inference and show what assumptions underlie current estimation methods. Current estimation methods rely on an assumption that the parasite density is correctly measured. However, this assumption does not generally hold because (i) fever kills some parasites and (ii) the measurement of parasite density has measurement error. In the presence of these problems, we show current estimation methods do not perform well. We propose a novel maximum likelihood estimation method based on exponential family g-modeling. Under the assumption that the measurement error mechanism and the magnitude of the fever killing effect are known, we show that our proposed method provides approximately unbiased estimates of the MAFF in simulation studies. A sensitivity analysis can be used to assess the impact of different magnitudes of fever killing and different measurement error mechanisms. We apply our proposed method to estimate the MAFF in Kilombero, Tanzania.
Hao Chen, Dylan S. Small
We propose new tests for assessing whether covariates in a treatment group and matched control group are balanced in observational studies. The tests exhibit high power under a wide range of multivariate alternatives, some of which existing tests have little power for. The asymptotic permutation null distributions of the proposed tests are studied and the p-values calculated through the asymptotic results work well in finite samples, facilitating the application of the test to large data sets. The tests are illustrated in a study of the effect of smoking on blood lead levels. The proposed tests are implemented in an R package BalanceCheck.
Kwonsang Lee, Scott A. Lorch, Dylan S. Small
Two problems that arise in making causal inferences for non-mortality outcomes such as bronchopulmonary dysplasia (BPD) are unmeasured confounding and censoring by death, i.e., the outcome is only observed when subjects survive. In randomized experiments with noncompliance, instrumental variable methods can be used to control for the unmeasured confounding without censoring by death. But when there is censoring by death, the average causal treatment effect cannot be identified under usual assumptions, but can be studied for a specific subpopulation by using sensitivity analysis with additional assumptions. However, in observational studies, evaluation of the local average treatment effect (LATE) in censoring by death problems with unmeasured confounding is not well studied. We develop a novel sensitivity analysis method based on instrumental variable models for studying the LATE. Specifically, we present the identification results under an additional assumption, and propose a three-step procedure for the LATE estimation. Also, we propose an improved two-step procedure by simultaneously estimating the instrument propensity score (i.e., the probability of instrument given covariates) and the parameters induced by the assumption. We have shown with simulation studies that the two-step procedure can be more robust and efficient than the three-step procedure. Finally, we apply our sensitivity analysis methods to a study of the effect of delivery at high-level neonatal intensive care units on the risk of BPD.
Colin B. Fogarty, Mark E. Mikkelsen, David F. Gaieski, Dylan S. Small
Motivated by an observational study of the effect of hospital ward versus intensive care unit admission on severe sepsis mortality, we develop methods to address two common problems in observational studies: (1) when there is a lack of covariate overlap between the treated and control groups, how to define an interpretable study population wherein inference can be conducted without extrapolating with respect to important variables; and (2) how to use randomization inference to form confidence intervals for the average treatment effect with binary outcomes. Our solution to problem (1) incorporates existing suggestions in the literature while yielding a study population that is easily understood in terms of the covariates themselves, and can be solved using an efficient branch-and-bound algorithm. We address problem (2) by solving a linear integer program to utilize the worst case variance of the average treatment effect among values for unobserved potential outcomes that are compatible with the null hypothesis. Our analysis finds no evidence for a difference between the sixty day mortality rates if all individuals were admitted to the ICU and if all patients were admitted to the hospital ward among less severely ill patients and among patients with cryptic septic shock. We implement our methodology in R, providing scripts in the supplementary material.
Hyunseung Kang, Benno Kreuels, Jürgen May, Dylan S. Small
Most previous studies of the causal relationship between malaria and stunting have been studies where potential confounders are controlled via regression-based methods, but these studies may have been biased by unobserved confounders. Instrumental variables (IV) regression offers a way to control for unmeasured confounders where, in our case, the sickle cell trait can be used as an instrument. However, for the instrument to be valid, it may still be important to account for measured confounders. The most commonly used instrumental variable regression method, two-stage least squares, relies on parametric assumptions on the effects of measured confounders to account for them. Additionally, two-stage least squares lacks transparency with respect to covariate balance and weighing of subjects and does not blind the researcher to the outcome data. To address these drawbacks, we propose an alternative method for IV estimation based on full matching. We evaluate our new procedure on simulated data and real data concerning the causal effect of malaria on stunting among children. We estimate that the risk of stunting among children with the sickle cell trait decrease by 0.22 times the average number of malaria episodes prevented by the sickle cell trait, a substantial effect of malaria on stunting (p-value: 0.011, 95% CI: 0.044, 1).
Jeffrey Zhang, Zhe Chen, Katherine R. Courtright, Scott D. Halpern, Michael O. Harhay, Dylan S. Small, Fan Li
While palliative care is increasingly commonly delivered to hospitalized patients with serious illnesses, few studies have estimated its causal effects. Courtright et al. (2016) adopted a cluster-randomized stepped-wedge design to assess the effect of palliative care on a patient-centered outcome. The randomized intervention was a nudge to administer palliative care but did not guarantee receipt of palliative care, resulting in noncompliance (compliance rate ~30%). A subsequent analysis using methods suited for standard trial designs produced statistically anomalous results, as an intention-to-treat analysis found no effect while an instrumental variable analysis did (Courtright et al., 2024). This highlights the need for a more principled approach to address noncompliance in stepped-wedge designs. We provide a formal causal inference framework for the stepped-wedge design with noncompliance by introducing a relevant causal estimand and corresponding estimators and inferential procedures. Through simulation, we compare an array of estimators across a range of stepped-wedge designs and provide practical guidance in choosing an analysis method. Finally, we apply our recommended methods to reanalyze the trial of Courtright et al. (2016), producing point estimates suggesting a larger effect than the original analysis of (Courtright et al., 2024), but intervals that did not reach statistical significance.
Ruizhe Zhang, Jooyoung Kong, Dylan S. Small, William Bekerman
Adverse childhood experiences (ACEs) have been linked to a wide range of negative health outcomes in adulthood. However, few studies have investigated what specific combinations of ACEs most substantially impact mental health. In this article, we provide the protocol for our observational study of the effects of combinations of ACEs on adult depression. We use data from the 2023 Behavioral Risk Factor Surveillance System (BRFSS) to assess these effects. We will evaluate the replicability of our findings by splitting the sample into two discrete subpopulations of individuals. We employ data turnover for this analysis, enabling a single team of statisticians and domain experts to collaboratively evaluate the strength of evidence, and also integrating both qualitative and quantitative insights from exploratory data analysis. We outline our analysis plan using this method and conclude with a brief discussion of several specifics for our study.
Ruizhe Zhang, Jooyoung Kong, Dylan S. Small, William Bekerman
Adverse childhood experiences (ACEs) are categories of childhood abuse, neglect, and household dysfunction. Screening by a single additive ACE score (e.g., a $\ge 4$ cutoff) has poor individual-level discrimination. We instead identify replicable combinations of ACEs that elevate adult depression risk. Our data turnover framework enables a single research team to explore, confirm, and replicate within one observational dataset while controlling the family-wise error rate. We integrate isotonic subgroup selection (ISS) to estimate a higher-risk subgroup under a monotonicity assumption -- additional ACE exposure or higher intensity cannot reduce depression risk. We pre-specify a risk threshold $τ$ corresponding to roughly a two-fold increase in the odds of depression relative to the no-ACE baseline. Within data turnover, the prespecified component improves power while maintaining FWER control, as demonstrated in simulations. Guided by EDA, we adopt frequency coding for ACE items, retaining intensity information that reduces false positives relative to binary or score codings. The result is a replicable, pattern-based higher-risk subgroup. On held-out BRFSS 2022, we show that, at the same level of specificity (0.95), using our replicable subgroup as the screening rule increases sensitivity by 26\% compared with an ACE-score cutoff, yielding concrete triggers that are straightforward to implement and help target scarce clinical screening resources toward truly higher-risk profiles.
William Bekerman, Abhinandan Dalal, Carlo del Ninno, Dylan S. Small
Observational studies are valuable tools for inferring causal effects in the absence of controlled experiments. However, these studies may be biased due to the presence of some relevant, unmeasured set of covariates. One approach to mitigate this concern is to identify hypotheses likely to be more resilient to hidden biases by splitting the data into a planning sample for designing the study and an analysis sample for making inferences. We devise a powerful and flexible method for selecting hypotheses in the planning sample when an unknown number of outcomes are affected by the treatment, allowing researchers to gain the benefits of exploratory analysis and still conduct powerful inference under concerns of unmeasured confounding. We investigate the theoretical properties of our method and conduct extensive simulations that demonstrate pronounced benefits, especially at higher levels of allowance for unmeasured confounding. Finally, we demonstrate our method in an observational study of the multi-dimensional impacts of a devastating flood in Bangladesh.