AI in the Wild Part 1: AI Readiness and Preparation

Eleanor Howe

2 weeks ago

We’ve said it before, and we’ll say it again: AI rewards preparation, not ambition.

Teams that invest in data quality and experimental design will always outperform those who just dive headfirst into the latest technology. We know this because we are often called in to course-correct those who did not heed this advice.

In this first part of our AI in the Wild blog series, we’ll share two examples of how we helped clients prepare for AI success. One through tool evaluation and the other through data standardization.

Clinical-Stage Biopharma Needs AI Readiness Support: Evaluating Foundation Models for Single-Cell Genomics

A clinical-stage biopharma company approached Diamond Age to research and assess transformer-based foundation models for single-cell genomics. With a sense of well-earned skepticism, the client wanted to assess the practical feasibility of integrating these AI/ML tools into their operations before diving in.

While they had a broad understanding of the tools, the stakeholders lacked the bandwidth and expertise to conduct the detailed research needed to make decisions.

Taking a hands-on, iterative approach, we set out to answer the questions: Could the technology practically predict changes in gene expression after genetic perturbation? Could it classify cell types from single-cell genomics data?

The Process of Elimination

The project began with upwards of 15 models, each claiming to be a better alternative to traditional methods such as linear regression. Each week, we would research and test new models and report our findings back to the team.

In order to eliminate those that did not live up to their claims or simply did not perform, we took the following actions:

Reviewed existing literature, which often led to deep dives to find basic information, such as memory requirements and necessary computer specifications
Benchmarked claims against recent independent evaluations and reviews.
Tested models to determine hardware requirements and data compatibility, as outdated software and complex installations were often concerns.
Conducted experiments using GPU infrastructure to test a model’s ability to fine-tune pipelines for cell-type classification.

In the end, our research shows that many of these sophisticated models lose to simple linear regression. By taking the time to conduct due diligence and bring in expert support, the client saves themselves considerable time and resources. We prevented them from investing in impractical or nonfunctional models, with several key takeaways.

Models are not plug-and-play and require significant bioinformatics expertise to install and configure. In one stress test, we barely managed to install the model due to outdated software and poor code.
Academic papers can be misleading about what models actually do. During testing, it became clear that some of the models are trained on specific datasets but not on a novel set, which is why benchmarking is essential.
Many AI methods perform poorly. Simple baselines, such as traditional statistics or linear regression, often yield results as good as or better than complex AI approaches.

The client also learned that fine-tuning a model on the data can improve results. Overall, the client has a clear path forward and confirmation that it’s essential to perform research before committing to an AI tool.

But that’s not where AI-readiness ends. Research is just one part of the process.

A Large Pharmaceutical Company Faces a Data Standardization Issue: Building AI-Powered Systems to Distill Data

The most common failure mode we encounter is when experiments are captured inconsistently, metadata is missing, or key context is missing from shared systems. Diamond Age is often brought in after previous AI efforts have stalled. In many cases, the team attempted to incorporate AI, but no one stopped to assess whether the data itself was suitable for modeling.

This was the exact problem a large pharmaceutical company was experiencing when they reached out. Researchers were running assays in completely different ways. Data was siloed, critical metadata wasn’t captured consistently, and outputs were incompatible.

These practices led to extensive manual manipulation to harmonize results, which meant each experiment took much longer than necessary. The team was consistently at risk of missing critical variables that could explain assay variations.

The rise of AI/ML only made the situation worse. Messy data doesn’t mesh well with AI. What the team needed was a solution to standardize the data before handing it over to AI.

In other words, they needed an AI solution to further enhance AI.

Building an ML-Trained Web App

Two other teams had tried (and failed) to help with this company’s data standardization problem. Thanks to an internal AI/ML grant and a vendor partnership, the team was able to bring us in to do what we do best: building systems to distill complex (and in this case, siloed) data.

Starting with a single standard assay, we built a web application that handles the full workflow from plate map design through quality control. The app provides four core capabilities, outlined below.

Plate map creation: Users can generate plate maps indicating the locations of samples in assay plates by entering sample information into a pre-existing template. Users communicate sample information to a chat interface for interpretation, which is powered by an LLM that recognizes common templates and patterns. The resulting sample layout is displayed in an editable format, allowing users to confirm or correct the design before proceeding.

Results file ingestion: An ML model, trained to extract assay results, can process raw CSV files generated by the team’s plate readers. Extracted results are displayed back in a plate-format visualization for easy review.

Unified downstream processing: The app maps assay results to sample identities, including the necessary controls for creating a standard curve. It then generates standard curve visualizations and statistics, calculates assay results, and delivers them in a standardized output format ready for downstream use.

LLM-generated quality control: An LLM produces a QC assessment of each result set, investigating replicate variability and overall assay quality to flag potential issues for the user.

While the end-to-end prototype was fully realized for a single standard assay, several components were designed to extend to additional assay types, providing a foundation that the client can build on.

Human Validation Cannot be Skipped

Having human validation built into the workflow was a critical component because, as we know, AI tools do not think critically.

Panelists on a recent Diamond-Age led panel agreed that without experts validating the outputs, it is easy to mistake plausible-looking results for real insight.

This siloed, inconsistent data problem is universal. It affects labs of all sizes, but it’s just more apparent at scale in larger organizations like this one. Each company’s solution will look different, but the takeaway remains the same.

Standardized data collection is the foundation that makes AI possible

Preparation is the Key to Success in AI-supported Drug Discovery

Investing early in consistent data capture and documentation is the surest way to avoid wasting resources on the wrong AI approach. Involve data scientists and statisticians at the experiment design stage, not after the data is already generated.

If you’re eager to incorporate AI into your workflow, Diamond Age can help you evaluate and pressure-test your approach before you commit time and budget.

Our highly regarded team helps biotech and pharmaceutical teams make confident decisions about science and AI.

Reach out to learn how we can support your company in addressing complex biological questions with urgency, adaptability, and scientific rigor.