GSK’s Open-Source Shift: Training 1,000 Biostatisticians in R

In 2025, many pharmaceutical companies are seeking a blueprint for transitioning from legacy statistical software to open-source languages like R.
GSK is one of the organizations leading this change. In 2017, they embarked on an ambitious journey to integrate R into their biostatistics division. They also committed to using open-source tools for central applications and producing 50% of their code in R by 2025.
But how did GSK progress from early pilot projects to full enterprise adoption?
Scaling R adoption was not just a technical shift, it required cultural change, strong leadership support, and strategic investments in infrastructure and training. Over the past five years, GSK has implemented key initiatives such as the AccelerateR program, the development of frozen R environments, and cross-industry collaborations with initiatives like the R Validation Hub and Pharmaverse.
This article explores how GSK overcame adoption challenges, trained over 80% of its biostatistics division, built enterprise-grade tools, and established best practices for regulatory compliance.
Table of contents:
- The Early Challenges of Enterprise R Adoption
- How GSK Went From Pilot Projects to Enterprise-Wide Open-Source Adoption
- GSK's Tools and Solutions That Enable R and Open-Source
- Lessons Learned from GSK's Open-Source Journey
- Conclusion
The Early Challenges of Enterprise R Adoption
Transitioning to open-source software in a highly regulated industry like pharma is no small feat. GSK began this transition in 2017, and six years later, they made two bold commitments:
- All central tools would be built using open-source languages.
- 50% of all code would be open source by the end of 2025.
At the start of their journey in 2017, only about 5–10% of their code was written in open-source languages, and they had no open-source projects.
GSK’s biostatistics division includes more than 1,000 statisticians, data scientists, and programmers, all striving to develop higher-quality medicines and vaccines. However, a significant portion of the team struggled with what they called Old Boots Syndrome, a resistance to change, stemming from reliance on familiar software and approaches that had worked well in the past.
Embarking on an open-source journey meant overcoming this resistance while also meeting several critical requirements:
- The right infrastructure: The company’s infrastructure needed to support both existing workloads and the additional demands of new open-source initiatives.
- Standardized central environments: Development and research teams had to start from a common foundation with access to consistent and reproducible programming environments.
- Management support: Teams and their leaders needed backing from management during the transition from proprietary software to open source.
- Sponsorship: Senior leaders had to fund the necessary equipment and ensure enforcement of new tools and technologies.
- A desire for change: Ultimately, the team had to want to transition to open-source. A group of knowledgeable and skilled professionals needed to drive the initiative, with support from senior leadership.
In short, adopting open-source for a single project with a small team of developers is already challenging. Scaling it to a team of over 1,000 employees with diverse backgrounds is an entirely different challenge.
After two decades in pharma, we uncover the reasons behind clinical trial failures.
Up next, we’ll explore how GSK transitioned from small pilot projects to enterprise-wide adoption of R and open-source technologies.
How GSK Went From Pilot Projects to Enterprise-Wide Open-Source Adoption
During the transition phase, GSK reached an important milestone: successfully implementing an entire study in R. However, open-source was still far from replacing proprietary tools, as GSK typically manages 50–60 studies per year.
To help bridge the gap, GSK launched intensive training sessions, organizing large in-person training initiatives, conducting classes, and providing study teams with training documentation. These teams would then begin using R to generate outputs for regulatory agencies.
But this approach didn’t quite work.
After some reflection, GSK realized that the time gap between R training sessions and applying R in a new study was 12–18 months.
It became clear that large-scale, in-person training was not ideal for study teams with varying needs and timelines. However, employee training remained a critical priority, so GSK took the following steps:
- Eliminated traditional classes: All employee training was shifted to an on-demand format, supplemented by pre-existing documentation.
- Adopted an individualized approach: GSK focused on supporting and mentoring individuals in the biostatistics department.
- Launched AccelerateR: A dedicated team of R experts was created to work closely with study teams, providing hands-on training in R.
Over time, the AccelerateR initiative expanded to include the R engineering team, which provided real-time insights and feedback to support study teams. The engineering team also offered mentoring and equipped study teams with the necessary tools to deliver impactful results.
What are the opportunities and challenges of adopting open-source in pharma? Our recent guide answers these questions and provides key considerations for pharma leaders.
GSK's Tools and Solutions That Enable R and Open-Source
During the transition to open-source, GSK developed two tools that not only supported their employees but are now also benefiting researchers and developers in other organizations. These tools are Frozen Environments and Slushy.
Frozen environments
If you're familiar with R and the pharmaceutical industry, you've probably heard of Pharmaverse, a collection of R packages widely used by pharma experts. However, since every study is different, Pharmaverse alone isn’t enough.
To address this, GSK developed Frozen Environments, pre-configured environments designed for production work that users can modify as needed. The goal is to provide a consistent starting point across teams, which can then be expanded based on project requirements.
However, there was one challenge, there was a significant time gap between release versions of Frozen Environments. To bridge this gap, GSK developed another tool: Slushy.
Slushy
To prevent workflow disruptions, GSK's engineering team provided study teams with the following:
- Reproducible code using the `renv` package.
- Access to required packages via the Posit Package Manager.
- A seamless transition to future releases of Frozen Environments, as study teams effectively develop code for an upcoming environment version.
- Additional support to reduce the burden, keep projects on track, and ensure everything runs smoothly.
All of these capabilities were integrated into a new R package called Slushy. The name is fitting—it "melts down" Frozen Environments, adding flexibility and extended functionality.
In essence, Slushy provides study teams with updates to CRAN snapshots between Frozen Environment versions, allowing them to anticipate required code changes and stay ahead of updates.
Lessons Learned from GSK's Open-Source Journey
After nearly eight years, GSK's transition to open-source is still ongoing, but the summit is in sight. Most of the key challenges have been addressed.
Throughout this journey, their team has experienced numerous benefits and learned valuable lessons. Here are a few key takeaways:
- The transition to open-source brings teams closer together: Tool developers and study teams must collaborate closely. Study teams gain insight into development processes, while tool developers receive instant feedback from domain experts.
- Growth of internal tools: Slushy was initially created as an internal tool, but GSK later decided to open-source it. This decision fostered dialogue with other pharmaceutical organizations, ultimately improving the tool in the long run.
- Solving common problems: Some internally developed products were highly specific and provided little benefit to other companies, while others were more general and could be valuable when open-sourced.
- Open-source is not just for packages: For example, CAMIS is a platform that contains code samples for various statistical analyses across different programming languages. Thanks to GSK's initiatives (and those of other like-minded companies), CAMIS has evolved into a central industry-wide repository.
Summing Up GSK's Open-Source Shift
GSK's transition to open-source was (and still is) no trivial process. Making this shift in isolation would have been challenging enough, but navigating the transition with more than 1,000 statisticians, data scientists, and programmers, while simultaneously delivering 50–60 studies per year is nothing short of remarkable.
A successful transition to open-source often begins as an idea from a handful of employees. However, long-term success requires comprehensive planning, the right knowledge, skills, and experience, as well as strong sponsorship from senior leaders and managers.
In GSK's case, their hard work has paid off. Today, they are a well-recognized contributor to open-source in the pharmaceutical industry. By solving their own challenges, they are also helping to solve broader industry-wide problems.
If you have an hour to spare, we encourage you to watch GSK's full presentation.
If you're a clinical manager leading a data department, check out our Definition of Done (DoD) checklist to ensure your team meets compliance and quality standards for FDA and EMA submissions.