A presentation at DevNexus in in Atlanta, GA, USA by Brian Demers
Testing on Thin Ice Chipping Away at Test Unpredictability
DevNexus 2025 | Testing on Thin Ice LOTS OF RERUN TESTS Brian Demers François Martin Java Champion Developer Advocate @bdemers Senior Full Stack Software Engineer @martinfrancois
DevNexus 2025 | Testing on Thin Ice $36 Billion /year (USD) Loss Productivity $40 Billion End World Hunger by 2030 🦋bdemers.io | bdemers /year (USD)
DevNexus 2025 | Testing on Thin Ice Topics What is a Flaky Test? How to find them How to deal with them How to fix them Slides & eBook 🦋fmartin.ch | FrançoisMartin
DevNexus 2025 | Testing on Thin Ice What is a Flaky Test? No Flake News! 🦋fmartin.ch | FrançoisMartin
DevNexus 2025 | Testing on Thin Ice Definition Test that exhibits both a passing and failing test result with the same code if run multiple times 🦋fmartin.ch | FrançoisMartin
DevNexus 2025 | Testing on Thin Ice Timing related Example: 1. 2. 3. 4. Enter search term into textbox Click on button “Search” Wait for one second Click on first result 🦋fmartin.ch | FrançoisMartin
DevNexus 2025 | Testing on Thin Ice Test data related Examples: ● Input field for prename doesn’t allow special characters ○ Some randomly generated prenames in tests contain special characters (“François”) ● Tests reliant on predefined test data in the database ○ Dependencies between tests ○ Failing tests cause inconsistencies 🦋fmartin.ch | FrançoisMartin
DevNexus 2025 | Testing on Thin Ice Infrastructure related Unstable infrastructure Tests time out Most difficult to reproduce Examples: ● Build runners under heavy load ● Microservices with networking issues 🦋fmartin.ch | FrançoisMartin
DevNexus 2025 | Testing on Thin Ice Resource contention related Mostly when running tests in parallel Examples: Multiple tests trying to… ● … write to the same file at the same time ○ File lock ○ Concurrent modification errors ● … create a user with the same username at the same time ○ Unique constraint violations 🦋fmartin.ch | FrançoisMartin
DevNexus 2025 | Testing on Thin Ice Why Worry about Flaky Tests? ● Re-running tests wastes a lot of time ● Repetitive investigation efforts to determine if flaky or legitimate failure ● Team starts losing trust in test results ● Bad software quality ● Pilots frequently ignoring an alarm in cockpits of a Boeing 737 resulted in fatal plane crash in 2005 🦋fmartin.ch | FrançoisMartin
DevNexus 2025 | Testing on Thin Ice Unit Tests vs Integration Tests UT vs IT 🦋bdemers.io | bdemers
DevNexus 2025 | Testing on Thin Ice End-to-End Tests Integration Tests Unit Tests Testing Pyramid 🦋bdemers.io | bdemers
DevNexus 2025 | Testing on Thin Ice Unit Tests Integration Tests Flaky Foundations End-to-End Tests 🦋bdemers.io | bdemers
DevNexus 2025 | Testing on Thin Ice Te nd -E to dEn io n at gr te In Un it Te st s Te s st s ts Complexity Flakiness 🦋bdemers.io | bdemers
DevNexus 2025 | Testing on Thin Ice How to find Flaky Tests Flake Busters 🦋fmartin.ch | FrançoisMartin
DevNexus 2025 | Testing on Thin Ice Running tests in parallel Can reveal many issues: ● Resource contention ● Dependency on test data ● Dependency on test execution order Side benefit: Tests execute faster ● Playwright & WebdriverIO: parallel by default ● Cypress: only in cloud (paid) ● JUnit and Gradle: through configuration 🦋fmartin.ch | FrançoisMartin
DevNexus 2025 | Testing on Thin Ice Rerun Tests Most test frameworks allow for executing a test multiple times. Slides & eBook 🦋bdemers.io | bdemers
Maven Surefire Plugin DevNexus 2025 | Testing on Thin Ice <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-surefire-plugin</artifactId> <configuration> <rerunFailingTestsCount>3//rerunFailingTestsCount> <failOnFlakeCount>0//failOnFlakeCount> </configuration> </plugin> 🦋bdemers.io | bdemers
Rerun Gotcha Increased time to fail 🦋bdemers.io | bdemers
DevNexus 2025 | Testing on Thin Ice 🦋bdemers.io | bdemers
DevNexus 2025 | Testing on Thin Ice Dealing with Flaky Tests 🦋fmartin.ch | FrançoisMartin
DevNexus 2025 | Testing on Thin Ice Prevention In Pull / Merge Requests, run new or changed tests 3, 10, 50, or 1000 times (depending on resources and test type) Alternatively run all tests multiple times nightly 🦋fmartin.ch | FrançoisMartin
DevNexus 2025 | Testing on Thin Ice Tell Your Team! Don’t let them waste time Uncover deeper issues Saves morale 🦋bdemers.io | bdemers
DevNexus 2025 | Testing on Thin Ice New Sprint Checklist Fix a flaky test Pull a new issue 🦋fmartin.ch | FrançoisMartin
DevNexus 2025 | Testing on Thin Ice Flaky Test Days 🦋bdemers.io | bdemers
DevNexus 2025 | Testing on Thin Ice Temporarily Quarantine Tests // JUnit @Disabled(“quarantine: reason or link to issue here”) @Test void flakyTest() { } // Jest </ quarantine: reason or link to issue here it.skip(‘should throw an error’, () <> { expect(response).toThrowError(expected_error); }); 🦋fmartin.ch | FrançoisMartin
DevNexus 2025 | Testing on Thin Ice Quarantine ● Quarantine test after first failure ● Fix the quarantined tests timely ● Set a limit for tests in quarantine ● Define time limit for quarantined tests 🦋fmartin.ch | FrançoisMartin
DevNexus 2025 | Testing on Thin Ice Delete Flaky Tests Does the test provide value? Can it be fixed? Keeping a flaky test provides negative value 🦋bdemers.io | bdemers
SSH into CI Agent $ ssh ci-agent cd /your-project # rerun build ./mvnw clean install 🦋bdemers.io | bdemers DevNexus 2025 | Testing on Thin Ice
DevNexus 2025 | Testing on Thin Ice How to fix Flaky Tests Defeating Murphy’s Law 🦋fmartin.ch | FrançoisMartin
DevNexus 2025 | Testing on Thin Ice Test Data Consistency // JUnit 5 // Jest @BeforeEach void setup() { </ Create test data } beforeEach(() <> { </ Create test data }); @AfterEach void teardown() { </ Delete test data } afterEach(() <> { </ Delete test data }); 🦋fmartin.ch | FrançoisMartin
DevNexus 2025 | Testing on Thin Ice Test Data Consistency // Using Testcontainers </ build.gradle testImplementation “org.testcontainers:postgres:1.19.8” testImplementation “org.testcontainers:junit-jupiter:1.19.8” </ JUnit Test @Testcontainers public class IntegrationTest { @Container PostgreSQLContainer<?> postgresContainer = new PostgreSQLContainer<>(“postgres:16.3”); } 🦋fmartin.ch | FrançoisMartin
DevNexus 2025 | Testing on Thin Ice Ports Source Data Tests Data 🦋bdemers.io | bdemers Isolate Tests
DevNexus 2025 | Testing on Thin Ice Don’t Reuse System Resources ● Setting System Properties System.setProperty(“key”, “value”); ● Use unique files for each test temp directories, output files, etc. ● Use a random free ephemeral port (49152 - 65535) new ServerSocket(0).getLocalPort() 🦋bdemers.io | bdemers
DevNexus 2025 | Testing on Thin Ice In E2E Tests ● Use deep links to navigate directly to the page to be tested ● Start the test with the test user already logged in ● Make it possible to run each test spec independently ● Do not wait in predefined time intervals, wait for a condition to be fulfilled with a timeout ● Use stable test ids in selectors ● Consider mocking external APIs 🦋fmartin.ch | FrançoisMartin
DevNexus 2025 | Testing on Thin Ice Te nd -E to dEn io n at gr te In Un it Te st s Te s st s ts Convert ITs and E2E into UTs Flakiness 🦋bdemers.io | bdemers
DevNexus 2025 | Testing on Thin Ice Te nd -E to dEn io n at gr te In Un it Te st s Te s st s ts Convert ITs and E2E into UTs Stability 🦋bdemers.io | bdemers
DevNexus 2025 | Testing on Thin Ice Fix the Culture of Flaky Tests ● Brown Bag / Lunch & Learn ● Make flaky tests visible ● Track them ● Use manager terms 💰 🦋bdemers.io | bdemers
DevNexus 2025 | Testing on Thin Ice THANKS! QUESTIONS? bdemers.io fmartin.ch bdemers FrançoisMartin CREDITS: This presentation template was created by Slidesgo, and includes icons by Flaticon, and infographics & images by Freepik and images generated by OpenAI’s DALL-E model via ChatGPT Slides & eBook
Ever tried to catch melting snowflakes? That’s the challenge of dealing with flaky tests - those annoying, unpredictable tests that fail randomly and pass when rerun. In this talk, we’ll slide down the slippery slope of why flaky tests are more than just a nuisance. They’re time-sinks, morale crushers, and silent code quality killers.
We’ll skate across real-life scenarios to understand how flaky tests can freeze your development in its tracks, and why sweeping them under the rug is like ignoring a crack in the ice. From delayed releases to lurking bugs, the stakes are high, and the costs are real.
But don’t pack your parkas just yet! We’re here to share expert strategies and insights on how to identify, analyze, and ultimately melt away these flaky tests. Through our combined experience, we’ll provide actionable techniques and tools to make sure snow is the only flakiness you experience, ensuring a smoother, more reliable journey in software development.