Case Study

Production-Ready Nextflow Pipelines, Cutting Analysis Time for a Global Biopharmaceutical Leader

See how optimized Nextflow pipelines helped a global biopharma leader automate bioinformatics workflows and scale research.

astellas
Genmab
merck
johnson and johnson
World Health Organisation
Kenvue
Phuse
Pharmaverse
astellas
Genmab
merck
johnson and johnson
World Health Organisation
Kenvue
Phuse
Pharmaverse

Table of contents

Before:
The research team relied on disconnected workflows with too much manual work, leading to slow data processing and inconsistent results across studies. Poor documentation made it hard to scale or adopt new tools.
After:
The research team now uses Nextflow pipelines to automate and standardize bioinformatics analyses, reducing manual effort, speeding up processing, and improving reproducibility. With documented and scalable pipelines, they can efficiently manage large datasets and integrate new tools without delays.

About the Project

A global biopharmaceutical company specializing in oncology, immunology, and other therapeutic areas faced significant challenges with fragmented bioinformatics workflows. The research teams were dedicating excessive time to manual processes, while inconsistencies in data handling limited their ability to scale analyses and adopt emerging tools. The company engaged Appsilon to develop Nextflow pipelines and improve efficiency by standardization and automation.

This transformation significantly reduced manual intervention, accelerated data processing, and established consistency across diverse research studies. As a result, researchers can now efficiently manage large-scale datasets, execute complex analyses without bottlenecks, and seamlessly incorporate new technologies as they emerge in the rapidly evolving field of bioinformatics.

The Challenge

The client has adopted Nextflow - a powerful workflow management system designed for bioinformatics data analysis research that ensures reproducibility. It offers seamless integration with cloud computing and high-performance computing environments, consolidating workflows with their dependencies to maintain consistency across different computational systems.

However, several obstacles in the existing implementation were undermining research efficiency and innovation potential: 

  1.  Disjointed Bioinformatics Ecosystem: Years of accumulated workflows had created operational inefficiencies, with disconnected processes that required significant manual intervention between steps. This fragmentation was increasingly limiting productivity and the organization's capacity for scientific innovation.
  2.  Infrastructure Complexity: The technical landscape was characterized by complex interdependencies between multiple systems, including cloud platforms and an established but rigid file structure hierarchy. This complexity was multiplied by insufficient documentation and varying levels of technical expertise across different research teams accessing the same resources.

These challenges not only delayed research initiatives but also restricted the organization's agility in responding to emerging scientific opportunities and technological advancements in the competitive biopharmaceutical landscape.

The Solution

Appsilon partnered with the client to update existing workflows and introduce new ones for more robust bioinformatics pipelines. The work focused on delivering the improvements through two projects:

Project #1:  Develop production-ready Nextflow pipeline, connected with internal framework

The goal of this project was to convert standalone scripts into a Nextflow pipeline and integrate it into the existing system that tracks, analyzes, standardizes, and streamlines various data types. The system already contained components such as data intake and validation, pipeline execution, and centralized result storage and delivery.

The project required the development of two core components:

  1. Python script to validate and intake new studies: This ensures that new study data meets required quality standards before being processed.
  2. Pipeline: Migration of standalone bioinformatics scripts into an integrated Nextflow pipeline

Project #2: Integrating Spatial Transcriptomics Workflows

The second project integrated two critical workflow steps into one Nextflow pipeline for spatial transcriptomics  analysis:

  1. Workflow 1: Processing spatially-resolved gene count data with spatial coordinates and image data designed for 10x Genomics Visium transcriptomics.
  2. Workflow 2: A Bioconductor R package that employs Bayesian modeling to enhance the resolution and clustering of spatial gene expression experiments.

The key characteristics of the pipeline are:

  • Integrated processing for both raw and preprocessed data, eliminating the need for separate workflows
  • Flexible pathway selection that allows researchers to customize analytical approaches based on specific project requirements
  • Enhanced functionality including custom improvements, advanced filtering mechanisms, and optimized output processing - enabling researchers to prioritize relevant findings, reduce time-consuming manual adjustments, and extract actionable insights more efficiently.

Results and Impact

The transformation of standalone scripts into an integrated Nextflow pipeline delivered substantial efficiency gains, enhanced research capabilities, and automated validation that new intake studies met the required internal quality standards before being processed. By streamlining workflows and introducing advanced analytical features, pipelines transformed from a proof of concept (PoC) to production-ready.

As a result, it reduced processing time while improving data quality and insight generation. In addition to the technical implementations across the systems that integrated with the existing client infrastructure, comprehensive documentation was delivered to the client for both projects.

The client experienced significant improvements across several areas:

  1. Increased efficiency: Modernized workflows significantly reduced manual intervention through automation and standardization, resulting in faster analysis times through optimized parallel execution capabilities.
  2. Enhanced scalability: Standardized pipelines enabled seamless scaling of bioinformatics analyses across multiple research teams, accommodating growing data volumes without proportional increases in processing time.
  3. Improved reproducibility: The fully documented and standardized pipelines ensured consistent research results across different environments, enhancing confidence in findings.
  4. Future-ready: The flexible, adaptable pipeline architecture allows for the adoption of emerging technologies and tools to stay at the forefront of bioinformatics innovation.
  5. Comprehensive Knowledge Repository: Comprehensive technical documentation ensured continuity and efficient knowledge transfer, reducing dependency on specific team members.
  6. Data integrity assurance: The pipeline delivered consistent, unbiased data processing regardless of sample characteristics, maintaining analytical integrity across diverse studies.
  7. Production-level evolution – The transformation from experimental R&D workflows to production-ready R&D workflows enabling the client to reliably process large-scale datasets

Appsilon worked closely with the client to understand and adapt to their current infrastructure and processes, ensuring successful adoption and a smooth integration. 

The client's team awarded Appsilon a perfect net promoter score of 10/10, specifically highlighting three key strengths: 

  • Technical expertise in pipeline development and Nextflow implementation
  • Exceptional adaptability to complex requirements and infrastructure constraints
  • Transparent communication with consistently prompt responses throughout the engagement

Summary

With validated, production-ready Nextflow workflows, the client can now automate and standardize bioinformatics analyses without infrastructure bottlenecks. Previously, inefficiencies slowed research, but improved pipelines now enable seamless study intake, reproducibility, and scalability. With Nextflow-Tower

Seqera, the client can now run analyses at scale, reduce processing time, and integrate new tools effortlessly. Moving to production means they can process larger datasets efficiently, and generate consistent, reproducible results, accelerating discoveries and innovation.

Contact us to see how we can help set up your scalable and efficient bioinformatics pipelines development.

Explore Possibilities

Share Your Data Goals with Us

From advanced analytics to platform development and pharma consulting, we craft solutions tailored to your needs.

Talk to our Experts