The Effects of Computational Resources on Flaky Tests
/ Authors
/ Abstract
Flaky tests are disruptive to efficient continuous integration pipelines. When a flaky test fails, developers need to decide to either carefully examine the failure (to determine if it is a true failure or a false alarm), or to simply ignore the failure altogether. While much research has focused on how to detect flaky tests, relatively little recent work has examined how to help make flaky tests more reliable (showing fewer flaky failures and wasting less time). We hypothesize that, regardless of the underlying root cause of flakiness, it may be possible to reduce the rate of flaky failures by providing more computational resources (e.g. CPUs, memory, etc) to test-running infrastructure. This talk describes our ongoing research to test this hypothesis [1]. Our methodology is to repeatedly execute the test suites of software projects under different resource configurations, collect the flaky failure rate of each test under each configuration, and then to perform a statistical analysis to identify tests with significantly different failure rates under different resource constraints. We consider some “extreme’’ resource constrained configurations (e.g. with as little as 10% of a single CPU), in addition to configurations that map to those offered by cloud CI providers (e.g. with 2 or 4 CPUs and 512MiB-16GiB of RAM). On our dataset of 52 open-source projects written in Java, JavaScript and Python, we identify that many tests are in fact highly dependent on the number of CPU cores and amount of memory available. Of course, provisioning more resources comes at a higher cost, so a practical solution is needed to help identify a test-running “sweet spot’’ of a minimal number of flaky failures at a minimal resource cost. SaaS CI prices can exaggerate this cost, where an 8-CPU builder might cost 10 times more than a 2-CPU builder. This talk will describe our methodology for analyzing this cost/flakiness trade-off, and practical steps that engineers can use to reduce test flakiness rates.
Journal: 2024 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW)