Diamond Age Data Science: GPU Speed meets Biological Expertise. Precision RNA-Seq at the pace of Discovery.

RNA-Seq analysis often seems straightforward, like just pushing a button. But to do it well, especially for finding new drugs, you need more than just a basic approach. Every study, every experiment, and every set of data is unique. Getting RNA-Seq analysis right means you need to understand biology, be good with statistics, and really dig into the data.

Here’s where pre-canned workflows can fall short:

Alignment and Quantification: It Starts with How You Prepare the Samples

Best practices for alignment and quantification include tools like STAR for alignment and Salmon or Kallisto for counting, or for even faster analyses, GPU-accelerated NVIDIA Parabricks. Check your assumptions before designing an analysis to avoid pitfalls.

Unusual Samples: If you’re working with special models, like those from patient tumors (PDX), a standard method might incorrectly match human reads to mouse DNA and vice-versa. An expert knows to set up a two-step alignment before starting to keep these reads separate.
Knowing Your Library: Did the lab use a strand-specific library? Failing to choose the right settings in your analysis means your counts will be flipped, and your results misleading.
New Biology Needs Adjustments: In drug discovery, we often look at unusual genomic variations or organisms that aren’t well-studied. Default settings in an alignment tool can miss widespread RNA editing events, or large changes in tumor DNA. Genomic differences in an unusual model organism can be filtered out as “low quality” without custom STAR settings like outFilterMismatchNoverLmax.

Speed matters

Knowing which tool to use for the job is critical. The GPU-enabled Parabricks¹ rna_fq2bam (GPU-accelerated STAR), finishes in minutes alignments that used to take hours, allowing users to quickly iterate over an analysis to find the optimum parameters. With a fast test-loop like this, analysts can do the job properly—testing different settings, checking the results against a known standard, and iterating until the results capture the real biology. This is simply impossible with a multi-hour analysis. Then, once the parameters are optimized, these tools will perform faster and cheaper for pipelines that need to operate at scale. Anecdotally, at Diamond Age, we’ve found using GPU based tools can reduce the runtime of resource heavy tasks from 10 hours to 15 minutes, a benefit appreciated by our consultants and clients alike.

It’s not just standard alignment that can be sped up. GPU accelerated tools especially shine for the more complex and long-running alignment-based tools. For example, DeepSAP² not only accelerates splice junction detection but also improves accuracy, achieving an F1 score of 0.971 compared with 0.821 for STAR. Its transformer‑based splice scoring module can also be integrated with STAR to further enhance performance.

In drug discovery, where you’re often looking for subtle splicing changes that difference between 0.82 and 0.97 is the difference between finding real signals and chasing false positives.

This doesn’t mean you can just run these tools and trust every fusion or splice junction, but it does mean that you’ll be working with higher quality results that haven’t already thrown away important biological signals. So while you can run your data through tools like Parabricks and get results quickly, whether those results are meaningful depends entirely on whether someone with expertise set up the analysis correctly, evaluate quality metrics properly, and understood what the numbers actually mean for specific biological questions. GPU acceleration is a powerful tool—and like any tool, it’s only as effective as the expertise guiding its use.

Tertiary Analysis: Where Discoveries Happen

This is the stage where the most important discoveries are made, and where a “one-size-fits-all” approach completely fails. Even a simple comparison between a treated group and a control group needs a careful, customized statistical plan.

QC: A thorough review of the QC reports is an important go/no-go point you don’t want to rush. It takes an experienced analyst to know which QC flags are common, or even expected, which flags you can correct for, and which require sample omission or rework.

Statistical Models: Often, you’re not just comparing two groups. You might have different drug doses, time points, or patient samples that need to be adjusted for things like age, sex, or how the samples were processed. A strong statistical method, like a generalized linear model (GLM), must be used to account for these variables before you even look at charts like volcano plots.

Gene Set Enrichment Analysis (GSEA): GSEA is powerful, but just running it against common pathways isn’t enough. In drug discovery, you need to work with a partner who will:

Use Your Own Data: Run GSEA against custom gene sets (like your company’s list of resistance genes) that are specific to what you’re studying.
Look Deeper: If you’re trying to confirm a target, you need to specifically test for changes in pathways before and after that target, not just general inflammation pathways that show up everywhere.

Custom Visualization and Interpretation: Showing What Matters

Anyone can make a basic heatmap. But an expert knows which charts will best show the results to your specific audience (like a CEO versus a scientist). This involves creating custom visuals that combine gene expression with patient or experimental details, and then carefully explaining the results by:

Checking findings against public data and published research.
Identifying if results are consistent within the dataset.
Suggesting the best next experiments or analyses, such as a specific single-cell RNA-Seq experiment to understand complex tissue signals, or prioritizing drug candidates based on how they’re expected to work.

Don’t Settle for “Almost Right”

You can run your data through a simple, pre-made process and get results that are “almost right.” But when you’re trying to find subtle biological clues that will guide a multi-million dollar drug discovery decision, do you really want to risk your project on “almost”? Most of the truly valuable insights are subtle and easily lost if the analysis is poorly set up or reviewed. You need a partner who understands both the technical details and the biology—someone who can spot a statistical error that leads to a false positive, recognize a missed opportunity from ignoring important factors in the data and quickly adjust to these results. At Diamond Age we have a lot of experience and have solved many complex problems, and we’re ready to help ensure your RNA-Seq data provides its full, clear signal.

References

¹ Parabricks: GPU Accelerated Universal Pan-Instrument Genomics Analysis Software Suite. Tong Zhu, Pankaj Vats, Seth Onken, Al Dunstan, Babak Zamirai, Daniel F. Puleri, Abhishek Nair, Marco Oliva, Anil Gaihre, Priyanka Sadhnani, Sam Li, Kamesh Arumugam, Alex Chacon, Milos Maric, Jonathan Cohen, Ankit Sethia, Mehrzad Samadi bioRxiv 2025.07.23.666378; doi: https://doi.org/10.1101/2025.07.23.666378

² DeepSAP: Improved RNA-Seq Alignment by Integrating Transcriptome Guidance with Transformer-Based Splice Junction Scoring. Fadel Berakdar, Thomas D. Wu, Tong Zhu, Mehrzad Samadi, Pankaj Vats bioRxiv 2025.04.23.650072; doi: https://doi.org/10.1101/2025.04.23.650072

Getting RNA-Seq Right and Getting it Fast with GPU-Accelerated Computing