A Guide to R Package Validation in Pharma

Reading time:

time

min

October 3, 2024

Picture this: a crucial clinical trial is underway, and every data point matters. The pharma industry is increasingly turning to open-source tools like R to handle complex data analysis, drawn by their flexibility, innovation, added value and cost-effectiveness. But with these benefits come new challenges — particularly when it comes to ensuring these tools meet rigorous regulatory standards.

Curious about how to make sure your clinical software meets GxP standards? Check out our easy-to-understand guide on GxP validation.

Validating R packages isn't just about ticking boxes. It's about ensuring that the software you rely on produces accurate, reproducible, and compliant results every time. As the landscape shifts towards greater transparency and reliance on open-source tools, having a solid validation strategy is essential to keep your data — and your decisions — above reproach.

In this guide, we’ll navigate the essentials of R package validation, unpack the approaches that work, and explore tools that can help ensure compliance without compromising agility.

Industry Shift Towards Open-Source Tools
Approaches to R Package Validation‍
- Software-Based Validation‍
- Risk-Based Validation‍
Key Validation Criteria
Steps to Achieve Package Validation
Additional Considerations for R Package Validation
Example of a Successful R Package Validation
Conclusion
Resources and Communities

What is R Package Validation and Why Does It Matter?

R package validation is the process of ensuring that the tools you use for data analysis consistently produce reliable, accurate, and reproducible results. In the pharmaceutical industry, this is not just a best practice — it’s a critical requirement. Regulatory bodies like the FDA and EMA expect that all software used in clinical trials and drug development adheres to strict standards for quality and integrity.

This is where GxP — “Good Practice” guidelines — comes into play. Whether it’s Good Laboratory Practice (GLP), Good Clinical Practice (GCP), Good Manufacturing Practice (GMP), or Good Programming Practice (GPP), these standards require that any software used in regulated activities is thoroughly validated. This means documenting that the software performs as intended, every time, under every condition.

With more companies embracing open-source tools like R, the challenge becomes ensuring these tools meet the same high standards as proprietary software. Validation demonstrates that your R packages are up to the task, supporting not just regulatory compliance but also the trustworthiness of every decision made based on their output.

By validating your R packages, you’re ensuring that the data driving critical decisions — from trial outcomes to market authorizations — is rock-solid, every step of the way.

By including validation, from an early development stage, on your software development cycle you can reduce the total time you will require for your package to be submitted for approval, rather than start thinking in validation once the development phase seems completed. Integrating validation in your whole development strategy can be another layer to improve the software quality.

Industry Shift Towards Open-Source Tools

The pharmaceutical industry is increasingly turning to open-source tools to enhance flexibility, transparency, and innovation in data analysis. Companies like Roche are moving away from legacy systems, adopting R as their primary framework for evidence generation in late-stage clinical trials. This shift reflects a broader trend toward open-source solutions that enable faster development and collaboration.

Why is this important? Open-source tools like R add significant value through flexibility, collaboration, adaptability, and customization, fostering innovation and rapid adaptation to regulatory and scientific changes. Leveraging a global developer community enhances this value with frameworks and shared knowledge for compliance and best practices. By reducing reliance on costly proprietary software, these tools also cut costs. Communities like Pharmaverse and the R Validation Hub support this transition, helping teams navigate regulatory complexities.

However, using these tools also requires a robust validation process to meet regulatory standards for accuracy, reproducibility, and traceability. Validating R packages ensures they are fit for generating reliable evidence in regulatory submissions.

To achieve this, two main approaches can be taken: software-based validation and risk-based validation, which we will discuss in the next section.

Approaches to R Package Validation

Software-Based Validation

The software-based approach to validation treats R packages like any other software product, emphasizing solid software development practices throughout the entire lifecycle.

Key aspects of this approach include:

Software Development Life Cycle (SDLC)
Following a structured SDLC, typically including phases like requirements gathering, design, implementation, testing, and maintenance. This ensures a systematic method to development and validation.

Software Development Life Cycle Model SDLC Stages

Code Coverage
Aiming for high code coverage in our testing, often using tools like covr. This helps ensure that most, if not all, of the code has been exercised during testing.
Traceability
Using version control systems (like Git) to track code changes. Using a test management system to register all tests performed. Monitoring deployment, access and usage in production. In general, the more we maintain a clear history of the software's evolution the easiest for the submission process verification and for our own learning, collaborating and efficiency.
Continuous Integration/Continuous Deployment (CI/CD)
Implementing CI/CD pipelines, often using tools like GitHub Actions or GitLab CI, allows for automated testing and validation with each code change.
Documentation
Comprehensive documentation, including requirements specifications, design documents, and user manuals, is crucial for transparency and traceability.
Second person reviews
Regular peer code reviews help catch issues early and make sure everyone follows coding standards and best practices. In general, it is required to have second person reviews for all our documents and to register those reviews by electronic signatures.
Well-Developed Packages
Use well-maintained R packages that follow best practices, as this lays a solid foundation for everything else we build on top. A package manager can also facilitate the distribution of packages across our organization.

By following these practices, we ensure that the R packages we develop are solid, well-tested, and perform their intended functions. This is particularly effective for custom-developed packages that we have complete control over.

But what about when we’re dealing with third-party or open-source packages, where we don’t have that same level of control? That’s where risk-based validation comes into play.

Risk-Based Validation

This second focuses on managing risk, especially when using third-party or open-source packages where the development process is outside our control.

For a comprehensive understanding of the risk-based approach, refer to the RISK-BASED APPROACH Whitepaper by the R Validation Hub, which outlines best practices and methodologies for ensuring compliance and mitigating risks when using R in regulated environments.

This involves:

Conducting Risk Assessments to evaluate the potential risks associated with each package. This includes evaluating factors like the package's complexity, its criticality to the analysis, and its development history.
Focusing resources on the high-risk or most critical components based on the risk assessment.
Designing test cases and validation procedures based on the identified risk.

Pharmaverse tools such as {riskmetric}and {riskassessment} help make data-driven decisions on where to focus validation efforts, providing a more tailored and efficient strategy.

Here’s a quick breakdown of these tools:

{riskmetric}: This package, available on GitHub, provides a framework for assessing the risk of R packages. It evaluates various metrics like maintenance activity, community usage, and testing coverage to generate a risk score.

For example:

# Load necessary libraries
library(dplyr)
library(riskmetric)

# Assess and score R packages
pkg_ref(c("riskmetric", "utils", "tools")) %>%
  pkg_assess() %>%
  pkg_score()

‍

{riskassessment}: Building on riskmetric, this package, also available on GitHub, offers a more comprehensive risk assessment framework. It includes features for generating detailed reports and dashboards, which can be invaluable for regulatory submissions.

ℹ️ You can check out the R Package Risk Assessment App.

‍

ℹ️ You can find case studies of implemented risk-based approaches to validate R packages.

‍

In practice, we often find that a combination of these two (software-based and risk-based) yields the best results.

The software-based approach ensures a solid foundation of good development practices, while the risk-based approach allows us to tailor our validation efforts to the specific challenges posed by the R package.

Key Validation Criteria

According to the FDA, validation involves establishing documented evidence that provides a high degree of assurance. This evidence ensures that a process consistently produces accurate and reliable results that meet predetermined specifications.

For R packages, this means ensuring accuracy, reproducibility, and traceability in all statistical analyses.

Accuracy

Ensuring that R packages deliver precise and correct results is fundamental. The R Validation Hub differentiates between different types of packages:

Base and Recommended Packages: Developed and maintained by the R Foundation, these packages undergo thorough testing and validation processes, minimizing the risk associated with their use in regulatory submissions.
Contributed Packages: Developed by the wider R community, these packages must pass basic technical checks on platforms like CRAN. However, these checks do not guarantee accuracy. A comprehensive risk assessment is necessary, focusing on package maintenance, community usage, and formal testing coverage to evaluate their reliability for regulatory use.
Custom Packages: When packages are custom-made for specific use cases, additional software-based validation is crucial. This involves rigorous testing to ensure that the package performs as expected under all intended conditions, with comprehensive documentation to support its functionality and compliance.

Reproducibility

To ensure consistent analytical outputs across different environments, it is essential to manage R installations effectively. Tools like Docker containers, renv, and Posit Package Manager handle dependencies and version control, maintaining a stable environment where results can be reproduced. Quarto and webR can be used to demonstrate reproducible manuscripts or processes in depth.

Traceability

Developing systems and controls to automatically document the packages and dependencies used in R analyses is critical. Tests (usually by test managing system) and additional documentation should also be properly registered and available. This enhances traceability, providing a clear audit trail of all software components, which is crucial for regulatory compliance.

Steps to Achieve Package Validation

Flow Diagram of the R Package Validation Framework

‍1. Define Requirements

Clear requirements are essential for validating any R package, defining the software's goals, and guiding testing efforts. Work with subject matter experts (SMEs) to document these requirements in a human- and machine-readable format like Markdown (R Markdown and Officedown).

You can also consider the following:

Use headers in each file to track edits, leveraging version control tools such as Git & GitHub.
Implement Test Driven Development, to think in all types of tests required as soon as you have requirements, get feedback for your code, identify risks and not underestimate your process.
Perform risk assessments to identify potential defects, using tools like riskmetric to prioritize testing based on risk.
Store all requirements in a “requirements” folder within the “vignettes/validation” directory to keep them organized and accessible.

This ensures requirements are clear, compliant, and easily traceable.

Level up your version control and collaboration skills with our free ebook - Level Up Your R/Shiny Team Skills

Example R package folder structure with the R Package Validation Framework infrastructure added

2. Develop the Package
For custom R packages, use software-based validation with good programming practices (GPP). Document each function’s edits using {roxygen2} to track ownership, roles, and responsibilities, ensuring transparency and proper attribution. This helps maintain clear accountability.

You can follow these steps as well :

Combine modular and comprehensive functions to meet requirements while writing clean, reusable code.
Manage dependencies and compatibility (a common challenge) by using tools like renv and Posit Package Manager to handle package versions and dependencies effectively. Thus reducing the risk of incompatibilities across different environments and R versions.
Conduct regular audits to keep dependencies up to date and minimize compatibility risks.

3. Create and Execute Test Cases

Once the software is developed, it is essential to create comprehensive test cases to validate its functionality in the intended environment. Each test case should clearly define the input data, processing steps, and expected outputs to confirm that all requirements are met. Automated testing tools, like {testthat}, help ensure these tests cover both common use cases and critical edge cases.

To strengthen the testing process:

Prioritize test cases based on risk, with more tests dedicated to higher-risk areas to thoroughly mitigate potential issues.
Consider the need for manual or integration tests when relying on other systems, especially those that are not R packages, such as APIs.

After defining test cases:

Write the corresponding test code to implement them. Use simple, clear, and repeatable code snippets to accurately capture test results.
Tools like {covr} can measure test coverage, highlighting any untested segments.
Ensure that test code is executed in isolated environments to prevent unintended side effects, and utilize continuous integration tools, such as GitHub Actions, to automate testing with every code change. Store all test cases in a “test_cases” folder and test code in a “test_code” folder, maintaining consistency and traceability for validation purposes.
A test managing system is very recommended to register the execution of your tests once the development is finished.

4. Conduct Risk Assessment:
Perform a risk assessment to evaluate the likelihood and potential impact of defects related to each requirement. Use tools like {riskmetric} to systematically analyze factors such as the complexity of code, frequency of use, maintenance activity, and dependency on external packages. This assessment helps identify high-risk areas that require more comprehensive testing.

Based on the results, prioritize the test cases to focus on the highest-risk areas first, ensuring that critical functionality is thoroughly validated. Document the risk assessment details alongside each requirement, including the rationale for prioritization, to provide transparency and support decision-making throughout the validation process. This ensures that validation efforts are both efficient and effective, focusing resources where they are most needed.

5. Generate the Validation Report:
The validation report serves as objective evidence that the R package meets all requirements and can be consistently relied upon. This report compiles all the documentation, test results, and risk assessments into a single, comprehensive file that can be reviewed and approved by key stakeholders.

To create this report, utilize Quarto or R Markdown’s code-executing and document-generating capabilities. By sourcing files like requirements, test cases, and test code results, R Markdown can generate a fully customized validation report that aligns with your organization's standards, including details such as the testing environment, system dependencies, and validation team roles. This report can be tailored to include specific organizational elements, such as letterhead or logos, ensuring it meets regulatory and corporate requirements.

Once the validation report source code in R Markdown is prepared and approved, it should be compiled into a final validation report for the specific version of the package. This report must be recompiled with each new version or update to ensure that the validation aligns with the current state of the software.

By storing the validation source code as a vignette within the “vignettes” directory of the R package, it remains accessible and can be rendered or rerun as needed, whether during version release, upon installation, or in response to changes in the environment where the package is deployed.

Validation reports can be generated at different stages: during version release to validate in the developer's environment, upon installation to validate in the user’s environment, or after installation to revalidate when environmental changes occur.

For post-installation validation, the relevant files should be copied into the “inst/validation” folder, ensuring that they are preserved and accessible within the installed package. This guarantees that the validation remains robust and aligned with any changes in the environment, ensuring continued compliance and reliability. Tools like thevalidatoR and valtools can automate the creation and maintenance of these reports.

Ongoing Validation
Ongoing validation ensures that the package remains compliant with regulatory standards throughout its lifecycle, even as new versions are released or the environment changes. This involves continuously monitoring for changes in requirements, software updates, or external dependencies that could impact the package's functionality or compliance status.

To achieve this, integrate continuous validation practices into the development process. Regularly update test cases, requirements, and documentation to reflect any modifications or new features. Automation tools, such as GitHub Actions, can be configured to automatically rerun tests and regenerate validation reports with each code change.

Conduct periodic audits to identify gaps or emerging risks and adjust validation efforts accordingly. By embedding validation activities into the ongoing development cycle, you ensure the package remains reliable, compliant, and ready for regulatory review at any point in time.

Other Useful Tools to Support Validation

Several tools enhance the efficiency and effectiveness of the validation process:

CRAN Checks (R CMD CHECK) and devtools::check(): Validate package functionality, compatibility, and documentation. While CRAN checks are run during package submission, devtools::check() allows developers to run these checks locally, identifying and fixing issues early.

pkglite: Simplifies package validation and distribution by creating lightweight versions of R packages with minimal dependencies, reducing complexity and improving maintainability in regulated environments.
Linting Tools (e.g., lintr): Analyze R code for syntax errors, style issues, and potential bugs, maintaining code quality and reducing errors.
packrat: Manages project-specific package libraries and dependencies to ensure reproducibility across different environments, similar to renv.
pkgdown: Builds quick websites for your package.

Additional Considerations for R Package Validation

When validating R packages, there are a few advanced considerations we need to keep in mind to ensure everything works smoothly:

Operating Systems: R packages might behave differently on various operating systems (like Windows, macOS, or Linux). We need to test and validate them across these systems to ensure consistent performance.
Double Programming: Consider implementing this for critical analyses. This technique, widely regarded as a gold standard, involves two programmers independently developing code based on the same specifications and then comparing the results to ensure accuracy.
Importance of Validating Input Data: Validating the data that goes into your analysis is as crucial as validating the package itself.

Even the best software can produce faulty results if it starts with inaccurate data. Ensuring input data is correct and properly formatted helps maintain the accuracy, reliability, and credibility of the analysis from the get-go.

To validate input data effectively in R, consider the following:

Identifying and handling any missing or duplicate data that could compromise the analysis.
Ensuring that all data types (e.g., numbers, dates, categories) are correct and consistent with the intended analysis.
Establishing rules to detect outliers or unexpected inputs that could skew results.
Using R packages like pointblank, assertthat, validate, or checkmate to automate data validation processes, ensuring that only clean, high-quality data is used.

Example of a Successful R Package Validation

Novo Nordisk led the way in using R for regulatory submissions, completing the first R-based submission to the FDA in late 2021. The submission included SDTM and ADaM datasets, documentation, analyses, and TLFs (Tables, Listings, and Figures) generated entirely with R.

The shift to R began in 2019, with the first trial outputs produced in 2020. While statistical analyses were still performed in SAS, results were reported using R-based TLF programs. The submission covered six phase 3a trials, clinical pharmacology trials, and integrated summaries (ISE and ISS).

Challenges and Regulatory Engagements
During the submission, Novo Nordisk had to address FDA requests for detailed programming codes and resolve issues related to environment replication. Tools like {renv} and {pkglite} were employed to manage package dependencies and improve reproducibility, ultimately meeting FDA requirements.

Risk-Based Validation Approach
Novo Nordisk utilized a risk-based validation framework, following guidelines from the R Validation Hub. The process involved using {riskmetric} to assess package risks, with manual reviews to ensure compliance. This enabled Novo Nordisk to maintain a GxP-approved environment for R.

Future Directions
Looking ahead, Novo Nordisk plans to expand its use of R, improve environment setups, and embrace open-source development by minimizing internal packages and contributing to CRAN and Pharmaverse.

Implications for the R Community
Novo Nordisk’s successful R-based submission to the FDA sets a precedent for using R in regulatory processes, encouraging wider adoption in the pharmaceutical industry. This milestone highlights the value of robust validation frameworks and fosters greater collaboration and innovation within the R community, strengthening R’s role in regulated environments.

Read more about Novo Nordisk’s First R-Based Submission to the FDA

Conclusion

As the pharmaceutical industry increasingly adopts open-source tools like R for data analysis and regulatory submissions, robust validation practices are essential. Validating R packages is crucial for ensuring data integrity, accuracy, and reproducibility. By employing both software-based and risk-based validation, using the right tools, and maintaining continuous validation, organizations can meet regulatory standards while leveraging R's flexibility and innovation.

With strong community support and adherence to best practices, the future of R in biopharma is set for greater transparency, collaboration, and innovation. A solid validation strategy will be key to achieving compliance and success.

Keep your focus on delivering results while we take care of the complexities of GxP validation for your regulatory needs. Reach out to us now to simplify your submission process.

What if you could finish your GxP reporting in minutes instead of weeks? Learn how automation can simplify your workload.

Resources and Communities

Communities

Have questions or insights?

Engage with experts, share ideas and take your data journey to the next level!

Is Your Software GxP Compliant?

Download a checklist designed for clinical managers in data departments to make sure that software meets requirements for FDA and EMA submissions.

Get the Checklist