Working from Home

Lots of folks in the life sciences are finding themselves suddenly thrown into an involuntary work-from-home situation. Here at Diamond Age we’ve been working remotely from our homes for some time, so we’d like to say “Welcome!” and offer a few hard-won tips to make the transition (hopefully) easier. A few of these are common to all of us:

  • Put together an ergonomic desk setup. It’s entirely too easy to damage yourself, sometimes permanently, hunching over a laptop keyboard. Get a real office chair, a monitor at the correct height, an ergonomic keyboard if that suits you, etc. This work-from-home situation isn’t temporary, and even a week with bad posture can hurt you.
  • Don’t rely on email alone to communicate with your coworkers. If you find yourself trying to write an email of more than three sentences, it probably needs to move to a phone call or a video meeting. That goes double for an email sent to more than one person.
  • Keep in contact with your professional network. You’re not meeting in the halls or at the coffee shop anymore, so you need to take action to keep up those contacts. Schedule them explicitly for video and bring your own coffee. You need to keep your social and professional network alive. 
  • Take breaks during the day. Walk around your neighborhood if you can. It’s important to get away from your desk and reset your brain. Sitting in one room all day is suffocating.

And then we each cope with the home work environment a little differently. I asked around the team for their favorite less-common tips:

  • Katie: “I don’t like to sit in one chair all day so I have four or five options to rotate through. Also when I work from home I tend to endlessly snack, but if I force myself to drink water I eat less of my kids’ Valentine’s day candy.”
  • Mike: “Check what’s in view of your camera during video calls. You may need to move your cat’s litterbox.”
  • Eleanor: “Don’t keep unhealthy snacks in the house – you will eat them all. Stock up on carrot sticks instead. And be sure to feed the cats before your 4pm meeting: they will harass you on video if you don’t.”
  • Erica: “Sometimes I plan calls with friends for my breaks ahead of time — if I know I’ll take lunch around noon, I’ll reach out to a few people in the morning before I start working to see if they want to connect around that time.”

Hang in there, everyone. We’re all adjusting to this new normal.

Reach out if you’d like to talk to us about either the joys of home office work or bioinformatics. We’re always happy to chat.


Come visit us December 10th

We’ll be hosting the venerable Boston Computational Biology and Bioinformatics Meetup in Cambridge, at the Asgard, December 10th. Please come join us for food and drinks and much discussion of all things bioinformatics.

This meetup is a fantastic place to meet folks who are in the field, whether they’re just starting out or long-time veterans. I’ve made tremendously valuable connections there, and I hope to keep doing so. I’ll be there on the 10th, as will several of my Diamond Age colleagues. Please come find us, and let us talk your ear off about what consulting is like.


The challenge of single-cell RNA-seq and differential expression

One of the common analysis tasks we have at Diamond Age is to analyze single cell RNA-seq data. Our customers are largely therapeutics-development biotechs who use this new technology to assess the impact of their development candidates on gene expression in selected cell types. scRNA-seq is a very different beast than its apparent predecessor, bulk RNA-seq. There are gotchas in both the experimental design and analysis of this data that simply didn’t apply to the older technology. One of them relates to appropriate experimental design for differential expression studies.

The problem

Folks often think that when designing a scRNA-seq experiment, they need only collect data from one sample per treatment group to reliably find differences in gene expression. They are then surprised when we tell them that they need multiple biological replicates, even though each replicate provides them with measurements from 1000+ cells of their cell type of interest. A recent Twitter thread started by John Hogenesch (@jbhclock) makes it clear that this misconception is widespread.

Vito Zanotelli (@ZanotelliVRT) summed up the problem rather succinctly:

Vito RT Zanotelli (@ZanotelliVRT) tweets: People tend to forget that the statistically independent entity of single cell experiments is mostly still the biological sample and not the cells. Distributions/features of cells can be used to calculate properties of that sample that need to be confirmed by replication.

He’s right; the gene expression profile of the individual cells in a sample aren’t independent measurements. They are more accurately described as repeated measurements on the sample.

Consider a single patient; we expect that any B cells collected from one patient would have a more-similar expression profile to each other than to B cells collected from another patient. If we dose one patient with drug and the other with vehicle, how do we know that differences in expression between those two patients’ B cells aren’t driven by biological difference between the patients? Short answer: we don’t. We *must* collect data from more than one patient, so we must have more than one patient (or animal, or dish of cells) in each treatment group, no matter how many cells we collect from each.

Experimental design

To drive home the point: imagine if we measured blood glucose from a mouse 1000 times. Exsanguination aside, all 1000 of those replicated measurements give us a very good idea of what is happening in that one animal, but doesn’t tell us much about the rest of all mouse-kind. In single-cell RNA-seq, each gene expression profile collected from 1000 different B cells from that mouse are analogous to those glucose measurements.

If we want to figure out how a drug affects B-cells generally across all mice, we must treat multiple mice with the drug, and compare the gene expression of, say, 1000 B-cells from each animal in one treatment group against the profiles of the other group. We treat those 1000 cells as repeated measurements of one animal, or one biological replicate. That means that our N is still counted in animals: three animals means we have three replicates, not 3000.

The upshot of this is that a properly-powered single-cell RNA-seq experiment can get quite expensive. As of this writing, the total cost of a scRNA-seq experiment is in the thousands of dollars *per sample*. If we need a minimum of three samples per group (and we do), that’s a hefty price tag. But it’s worth it to get real data.

Analyzing the data

Once we have a well-designed experiment with biological replicates, how do we handle the analysis? Most of the differential expression methods for single-cell analysis are only suitable for within-sample analysis: they treat each cell as an independent measurement and can only reliably tell you about how one group of cells from the same sample compares to another group in that sample. Differential expression tests using these methods result in improbably low p-values.

One tool that does handle differential expression across multiple samples properly is an R package called MAST. It does this by essentially grouping expression profiles from each sample together, and comparing those groups rather than comparing individual cells. It uses what’s called a mixed-effects model to accomplish this. It’s quite computational intensive to use, but the results are solid. I’d love to hear from folks who have found other tools that do good work on these experimental designs.

Getting the most from the data

One of the hardest things to do in this business is to tell clients that the experiment they ran – the one that cost so much – isn’t going to give them the answers they need. Single-cell RNA-seq experiments are repeat offenders in this space because they are expensive and very new; despite the somewhat familiar name, they are very different beasts than good old bulk-RNAseq.

We hate seeing good mice (or even cell lines) go to waste. Reach out if you’d like to chat about experimental design and making sure your investment pays off.






2018 in Review

Here we are in that funny week at the end of the year, when the projects have mostly wrapped up and we take a breather and look at both the year behind and the year ahead. In short, 2018 has been an amazing, exciting year with a lot of work and a lot of success. I’d like to share a little bit of that with all of you.

Over the course of the past twelve months, Diamond Age has served a wide variety of clients from across the therapeutics discovery space, in metabolic disease, hearing loss, neuroscience, cancer and many more. We’ve continued our work in using genomics to support discovery biology and drug development and expanded into building data processing pipelines, developing analysis methods and doing technology transfer. Beyond therapeutics, we have moved into biotechnology and diagnostics, assisting our clients with data analysis for process improvement and streamlining.

All of this work meant that we needed to expand the team. In July, Chris Friedline joined us, bringing with him extensive experience in sequencing, evolutionary biology, information technology and software engineering / infrastructure. In August, Somdutta Saha brought to the team chemistry, cheminformatics, microbiome analysis and drug discovery/development experience.

In December, we presented at the North Shore Technology Council’s First Friday seminar series. We talked about how to hire computational biology folks, how bioinformatics relates to data science, and using machine learning and AI in discovery biology. We co-presented with Huseyin Mehmet from Zafgen, Inc; he described how Diamond Age helps Zafgen get its data analysis needs met efficiently, and what makes a good collaboration with a bioinformatics team. Check out the slides if you’d like to learn more.

As for the year ahead, 2019 looks to me like another year of exciting work with great companies, and perhaps some even more interesting developments, besides. I want to say a personal thank you to everyone at Diamond Age, plus all of our clients and our supporters. There is certainly more to come.



Taking your bioinformatics to the next level

If you missed our presentation at the North Shore Technology Council’s First Friday event, you can check out the slides on SlideShare.

We co-presented with Huseyin Mehmet of Zafgen, Inc. Huseyin talked about what we do for them, from cross-dataset analysis to technology recommendations to app development. I talked about what you should look for in a bioinformatics team and how to decide whether a deep learning/AI approach is right for your computational questions.



We’re presenting!

Want to learn more about computational biology in a therapeutics context? I’ll be co-presenting with Huseyin Mehmet of Zafgen next Friday, December 7th at the Woburn Trade Center. We’ll be talking about making effective use of computational biology data for drug discovery and development, and also describing how the collaboration between Diamond Age and Zafgen works.

This is event is the North Shore Technology Council’s usual First Friday event. Register here to attend and come say hello. Several Diamond Age folks will be there.

Hope to see you there!



Thanksgiving 2018

It’s Thanksgiving Day, and I’d like to take a minute to say “Thank you” to the people who have made Diamond Age possible, starting with our customers. We’ll have a proper ‘Customers’ page up here soon enough, but in the meantime, thank you to Decibel Therapeutics, Voyager Therapeutics, 1CellBio, Aquinnah Pharmaceuticals, and Zafgen, Inc. As for the rest of you who will not be named, you know who you are, and we appreciate the trust you’ve placed in us.

Also, thank you Chris and Somdutta, who took the risk of throwing in with me by joining Diamond Age full-time this year. And thank you to Dave, Max, Nick, Mike, Zarko, Bruce and Chris for making the time to work with us. Diamond Age would not exist without you.

Happy Thanksgiving!

Three reasons to use Git for bioinformatics projects

Source control is not just for software engineers. Using the tools that coders have written to support their work can make a computational biologist’s life massively easier. You’ll find that having a versioned, trackable backup of your analytic scripts is a lifesaver, over and over again.

  1. Backups. If you are checking code into a get repository, you have at least one other location for your code. Dropping your laptop into the river doesn’t *have* to be a disaster.
  2. Collaboration. Working with other people is hard enough without stepping on each other ‘s toes while making edits. Git’s merging capabilities are excellent, and will help you figure out who did what, when, and whose changes should remain in the final document.
  3. Reproducibility. When you look back at that analysis you did last year, do you know what code you used to run it? Git does. Just ask it! Then you can tell your boss why your results are different from last time.

We’re probably preaching to the choir here, but I wanted to make sure everyone had at least heard the gospel.


BioIT World 2018

It’s that time of year again, folks: Bioinformatics Christmas. Or is it rather Thanksgiving, where the whole family gets together whether we like it or not?

Either way, I’ll be there next week, bells on and business cards in hand. I’d love to catch up with you, if you’ll be there. Send an email or find me on the contact page if you want to get a coffee and chat.

I’m very much looking forward to Carl Zimmer’s talk, as well as some talks by colleagues and friends of mine: John Keilty and Karina Chmielewski from Third Rock, Mike Dinsmore from Editas, and Iain McFaydden from Moderna.

Who do I really want to meet this year? Tanya Cashorali, one of the plenary keynote panelists and another woman who has started her own data science company. There aren’t many of those. We have to stick together.

After all, we’re family.


Bioinformatics vs Computational Biology

The world of quantitative biology is large, diffuse and sometimes overwhelming. It’s hard sometimes to even figure out what someone means when they say “bioinformatics”. This can make it hard to figure out what part of the field someone works in.

One way to break it down is to describe bioinformatics as the building of tools and methods for the processing and management of biological data, and computational biology as the pursuit of biological sciences using computational methods. Therefore, bioinformatics is more of an engineering discipline and computational biology more a scientific discipline.

It’s helpful to think about these distinctions, subtle as they seem. It takes a certain mindset and skillset to build a robust sequencing analysis pipeline that will serve the needs of a large group of scientists for years. That mindset and skillset may be very different from the one required to do a deep investigation of the variants that impact risk of heart disease.

We can argue about the naming conventions all we want, but the label we apply to these two types of specialist doesn’t really matter. What matters is what they do; the person I would call a computational biologist writes code, yes, but does it in pursuit of a particular biological problem, and they would love to write less code and more manuscripts. The bioinformatician, on the other hand, wants to spend their time writing robust, high-quality code that does interesting and powerful computations. Papers are more of a nice side-effect.

The truth of the matter is that most programming biologists are a mix of the two disciplines.

When hiring for a small department or a startup, the distinction between these two caricatures becomes very important. Some people will be in the field for the biology specifically, and will choke when pressed to develop a tool for use by a team. Others will jump at the chance to write such a thing. Every group needs both of these. Consider the current needs; will this person be building a pipeline that will be re-used again and again? Or will they investigate particular variants, or particular compound response profiles? Fitting the right person to the job will ensure a happy employee and high productivity.

Figuring out what kind of background and preferences someone has can be as simple as asking them. Their resume or LinkedIn profile can also give clues. A software-focused person will tend to have one or more large, open-source bioinformatics software tools prominently listed. Their reference list may include a few papers describing this project and others (potentially many others) that use that tool. A manuscript-focused person will not be as likely to have a major tool-building segment of their resume. Instead, they will list a series of biology or dataset-focused projects, with manuscripts describing each.

Data Science

But where does data science fit into all this? That, at least, is simple; bioinformatics/computational biology is data science with a biology application, just as computational chemistry is data science for chemistry. Physicists have figured out that they’re all data scientists already, so there is no need for a name for them beyond “physicist”. I hope in the future we’ll do the same and just call ourselves “biologists”.