How Good Automated Testing Supports GxP Compliance in Software Development
Automated testing is so much more than just catching bugs or making sure Automated testing goes beyond catching bugs or getting a green status bar in CI pipelines. When done right, it drives better software delivery, improves code quality, and enhances team productivity. However, when done poorly, it can hinder progress and inflate costs.
Wondering how GxP validation impacts clinical software development? Explore the key principles and processes behind ensuring compliance and quality in regulated environments.
Having spent over three years at Appsilon refining methods for testing R code, I’ve seen both sides of this equation. From quick two-week Shiny app prototypes to complex, months-long projects, automated testing has played a critical role in software delivery. Along the way, I contributed to the R community by creating the {cucumber} package, which simplifies writing high-level, user-focused behavior tests.
So, should you invest in automated testing? The answer isn’t always straightforward.
Let’s explore how to maximize the value of automated testing while balancing costs and benefits.
Watch the Video from the GxP Validation Summit
Economics of Test Automation
Automated tests can be integrated during development (which is more of the Agile way of creating software), after software completion (which is the "classic", Waterfall approach), or outsourced to a QA team.
No matter what the approach is, the same rules apply:
- Writing good test code is difficult, it takes a while to learn.
- But what also applies is maintaining bad test code is harder.
Exploring how to set up automated tests for your Shiny apps? Learn how to build a strong foundation for reliable software delivery.
Good Tests
The introduction of automated testing has its costs. It takes time to learn testing, it takes time to write and maintain tests. We need to be aware of those costs.
Skipping testing in the early stages of a project may seem like a way to save time, but it can lead to significant costs down the line. While development without testing might feel faster initially, the benefits of testing often become apparent as the project progresses.
Good tests strike a balance. Their cost is outweighed by the savings they bring. These savings come from reduced manual testing, fewer debugging and troubleshooting sessions, and lower remediation costs for defects that would otherwise go unnoticed until later stages or early production.
With reliable tests in place, teams can shift their focus from time-intensive manual checks to delivering real value to users.
Bad Tests
But what happens if we don't succeed in building good tests?
Well, this can be discovered only on the later stages of the project, only when the test suite becomes large. In this scenario, tests become a burden, and can halt the development process as changing the production code requires changing a ton of tests.
If tests become too difficult or time consuming to maintain, it’s easy to give up on adding them. We might be even tempted to remove some of the test code as it’s not possible to change it.
This situation is dangerous and it slowly creeps in, it might be hard to notice it early, but there are heuristics that can help us see and avoid it, like the Testing Pyramid and separation of concerns.
Automate Testing or Not?
Building automated testing has its cost. Maintaining the test code has its cost. However, not having automated tests also has its cost.
As with everything, there are tradeoffs.
So what are the other benefits of automated testing, other than possible cost savings on manual testing? What else could make it worth it?
Why Invest in Automated Testing?
What we could get from automated testing could be categorized as follows:
- Better code quality
- Documentation
- Reproducibility
- Scalability
- Reliability
Curious about validating Shiny apps in GxP environments? Discover insights into overcoming validation challenges in regulated industries.
Better code quality
There are two facets to the quality of software:
- Have we built the correct software?
- Have we built the software correctly?
Tests as Specification
If we are practicing a development process in which we write tests before writing production code, tests give us a way to capture what we expect from the system before we build it.
Tests allow us to define the system's behavior across different scenarios in a format that can be executed. To ensure we're "building the right software," it's essential that our tests accurately reflect how the system will be used in practice.
Thinking through these scenarios in sufficient detail to turn them into tests helps uncover areas where the requirements may be unclear or contradictory.
In this sense tests help us reduce the risk of building the wrong thing.
Example
Let's imagine we want to allow users to get insights about the subjects in a given clinical trial.
We could start with this vague idea captured as a User Story:
As a User I want to see a visual representation of the subjects in a clinical trial so that I can easily understand characteristics of the population in the study.
This is vague, and it can't possibly be translated into software. We need to break it down into smaller, more precise pieces.
We might start by thinking about what those visuals should be, what data should be displayed, and how the user should interact with it.
We might end up with a list of requirements like:
- As a User, I want to see a histogram of subjects by treatment group.
- As a User, I want to see a distribution of subjects' ages.
- As a User, I want to see all subjects belonging to a specific treatment group.
Those requirements are still a bit vague, but now we are more specific what value we want to provide to the user, we have a clearer idea of what we want to achieve.
The next step in making those specifications the most precise, to make them translatable into tests would be to create examples of what we want to achieve.
We could write those scenarios using a specific language called Gherkin, which is a language that can be used in Behavior-Driven Development (BDD). It contains specific keywords like Given, When, Then that help us structure the scenarios.
When working on examples like this, with real values, we'll discover what exactly we need in order to implement this feature.
For example, in previous versions of requirements we didn't realise that without specifying the study and the population we can't display the subjects.
This is an outside-in approach to testing, when we start with the user's perspective and work our way down to the implementation details.
At this level, we don't even know if this is a Markdown report or a Web app or a Neuralink that uploads images to the mind of the user. And that's the point. We don't care about the implementation details. We care about the user's perspective, if they can get what they need, not how they get it.
That type of tests can be implemented with just {testthat} package or with {cucumber}.
Preparing for FDA submissions? Learn 7 essential categories of Good Software Engineering Practices to get started.
Behaviour Driven Development and the Pyramid
This approach to development and testing is called Behavior-Driven Development (BDD), as it puts high emphasis on satisfying users needs, and it's a method to ensure that we are building the right software, while unit tests of individual functions ensure that we are building the software correctly.
Documentation
I bet you've at least once come across a situation when you wanted to use a piece of code, looked at how it's used in one context and wanted to apply it in another, but it didn't work as expected. If not, you're lucky.
Tests can help us understand how the code is supposed to work, by showing us how the code works in expected and unexpected scenarios.
Even better, with tests there's an easily accessible place where the code runs. If we're not satisfied with the examples, we can plug-in, run the debugger and see step-by-step how the code works.
How's that different from just documentation?
Examples in documentation can easily get outdated and misleading. That's why by default in R packages, examples in function documentations are run during package checks. But just running examples won't assert that they are working correctly.
Tests can be the most precise documentation, because not only they check if the code runs, but if it runs correctly.
Interested in automating GxP-compliant reports? See how we get it done to boost accuracy and efficiency in document generation.
Reproducibility
Mistakes happen, there's no way around it. It's what we do about this fact that matters.
When a bug is found in the software, we need to know what caused it.
If we have made our unit tests fairly small by testing only a single behavior in each, we should be able to pinpoint the bug pretty quickly based on which test is failing.
This is one of the big advantages of unit tests over manual tests. The manual test will tell us that some behavior expected by the user isn't working. The unit test will tell us why.
We call it Defect Localization, tests should not only tell us that something is wrong, but also tell us precisely what failed, making the fix easier, as we save time on debugging and searching.
But this only comes when we write good tests and there's enough of them. If there was a bug that wasn't caught, it's time to add a test for it.
All of those benefits we get from tests are wonderful, but we won't get them if we don't write tests which we can trust. It's critical that tests are as simple as possible, so that there are no bugs in test code itself.
Another facet of reproducibility is that tests show us what's been implemented and they're proof that the software works.
When we have a test suite that covers all the features of the software, we can be confident that the software works as expected as it has already run in a specific environment.
If we're running our test suite in CI (as you always should), we have snapshots of exact circumstances in which the software is runnable and that it runs as expected.
Scalability
Tests affect scalability in the context of growing the codebase.
Making changes in a codebase without automated tests is a risky business. We never know if our change doesn't break something. Even if we're very careful, we might miss something. Even if we check the output manually and everything looks fine, we might've made mistakes when checking. And even if we haven't, something might still come up later as a new scenario comes up.
This risk forces us to go very slowly, but even then we're not 100% secure.
When relying on manual testing, it's easy to get very conservative about introducing changes, even the simple ones. And when deadlines are tight and resources for manual testing are scarce, we might decide to skip some changes or do a quick check and hope for the best.
Imagine your team implements an optimization of the app that shouldn't change the behavior of the app, and could be introduced with a new release to deliver value to its users. But since there are no automated tests, and the deadline is tight, the team is afraid to merge it because they don't know what could break. They decide to skip the optimization and deliver the app without it. This is a lost opportunity to improve the app.
When working with code that has a good test suite we can work much more quickly. We can be bolder about introducing changes, because we know that if we break something, we'll know about it quickly. We can be more confident that the changes we introduce won't break anything.
Code Coverage Can Be a Measure of Confidence
We can easily learn how much code we exercise during tests with packages like {covr}. Code coverage can give us a good idea of how much of the code is tested and how much is not, but there are some caveats to this approach. We can have 100% code coverage and still have bugs in the code. We can have 0% code coverage and have no bugs in the code. Code coverage is a useful metric but it is not a guarantee of quality. It just checks how much code was exercised, it doesn't verify the quality of the tests.
Let's take advice from Kent Beck, the creator of Extreme Programming:
"I get paid for code that works, not for tests, so my philosophy is to test as little as possible to reach a given level of confidence" – Kent Beck
and find a balance between the amount of tests and the confidence we get from them.
The effectiveness of the safety net is amplified by the version control. If we go in small steps, we can do bold changes, fail fast, improve or roll-back if needed.
Looking for best practices for R package validation? Explore our guide to mastering validation processes in pharma.
Reliability
Yes, tests can help us find bugs, but that isn't their core purpose. When we talk about tests and bugs, we talk about preventing bugs from happening.
Every person that has ever written code, must've introduced an error at some point, even as easy as a typo.
Automated tests are bug repellent in a sense that they keep us safe from the effects of mixups as simple as typos from crashing the software in production. When we have a good code surface covered by tests, we can be more confident that the changes we introduce won't break anything. Provided we run our tests before checking in our changes to the codebase.
Example
We might have changed something about how we handle the ADSL data and now the table is affected. If we have a test that checks the rendering of a table, we'll know that something is wrong and can investigate the issue.
Efficiency of Automated Testing
The other aspect of reliability is that tests can help us reduce the risk related to verifying the behavior of the software in highly improbable circumstances that cannot be easily created when doing manual testing.
Those could be some inputs that will be provided once in a million. Or this could be a test of what would happen if there are 10 users at the same time, but we only have 2 testers available.
So in a sense, just as the High Throughput Screening Robotics Platform, automated tests can probe a vast combinations of inputs and states that would be very difficult to manage manually. Automated tests will do this job more reliably, more quickly, and more cheaply than manual testing.
Defense Against Changes From Outside
Another important aspect of reliability is that tests can help us protect against changes in our dependencies.
R is open-source and its power comes from the vast ecosystem of packages. When we use those packages, we rely on the maintainers of those packages to keep them up-to-date and bug-free. We need to trust them that they work correctly and won't break our code.
Code is a live thing, it changes, it evolves. And when we rely on external packages, we need to be sure that the changes in those packages won't break our code.
When we have a good test suite, we can be confident that the changes in the dependencies won't break our code. We can run our tests after updating the dependencies and see if everything still works as expected. If something breaks, we can fix it now, instead finding it out in production.
Good Automated Testing
We went through 5 major types of benefits we can get from Good Automated Testing:
- Better code quality
- Documentation
- Reproducibility
- Scalability
- Reliability
Those benefits come at a cost. We need to invest time in learning how to write good tests, we need to invest time in writing good tests, we need to invest time in maintaining tests.
Testing is not reserved to specialised software engineering teams. They are for everyone who writes code and looks for ways to improve the quality of their work.
What's important is to come to automated testing with a clear goal in mind. We need to know what we want to achieve with tests and not do them just for the sake of getting a green light on CI or a specific level of code coverage.
What’s Next?
Testing, just as with coding, is something you need to learn. And to get better at it you need to practice.
Getting more effective at something is also a matter of tools. I think in the R community, there's an unspoken rule:
If a problem exists, there's a package for it. If not, you should write one.
The need in pharma has sparked the pharmaverse. There are already packages that help with accessing the safety of a package or reproducibility.
The same things can happen with testing. There are already some efforts in this direction.
- There is {verifyr2}, a package you can use for comparing rtf files.
- There is also a {shinyValidator} package that aims at combining multiple checks of Shiny apps into a single report for easier validation.
What innovations could we see in the future? Only time will tell.
Did you find this post useful? Explore our checklist for GxP validation in pharma. You can also check out other sessions from our GxP Validation Summit.