Last week was the annual BioIT World Convention and Expo. It’s three days of talks, workshops, and row upon row of technology solutions vendors putting their best foot forward. Also, biotech folks from all over the country converge upon Boston, so I get to catch up with many colleagues I see but once a year.
As usual, the conference is a firehose of information about new technology, new methods, and new software. Also as usual, participating in the Best in Show judging was the highlight of the event. It’s a real pleasure to see the best new technologies our community has to offer. Congratulations to the winners!
The terms probability and likelihood are often used interchangeably in day-to-day conversation. They have specific meanings in the world of statistics, however, and understanding the difference is helpful in understanding statistical methods.
We’ll use examples to start. Take a coin flip: if you flip a coin, and you know it’s fair; a lifetime of experience gives you a model describing the behavior of the coin: half the time a flip will result in heads. You use this probability of 0.5 to decide whether you want to take a bet on the outcome of that coin.
If, on the other hand, you wanted to test whether that coin was fair, you might flip it many times. Say you flip it 1000 times, and you observe 505 heads and 495 tails. Now you want to know: is this coin fair? Is my model of the coin’s behavior correct? Now you are talking about a likelihood; what is the likelihood that this is a fair coin?
In short, a probability quantifies how often you observe a certain outcome of a test, given a certain understanding of the underlying data. A likelihood quantifies how good one’s model is, given a set of data that’s been observed. Probabilities describe test outcomes, while likelihoods describe models.
Source control is not just for software engineers. Using the tools that coders have written to support their work can make a computational biologist’s life massively easier. You’ll find that having a versioned, trackable backup of your analytic scripts is a lifesaver, over and over again.
There are several version control tools out there – we find that distributed source-control tools like Git work best for teams.
- Backups. If you are checking code into a git repository, you have at least one other location for your code. Dropping your laptop into the river doesn’t *have* to be a disaster.
- Collaboration. Working with other people is hard enough without stepping on each other ‘s toes while making edits. Git’s merging capabilities are excellent, and will help you figure out who did what, when, and whose changes should remain in the final document.
- Reproducibility. When you look back at that analysis you did last year, do you know what code you used to run it? Git does. Just ask it! Then you can tell your team why your results are different from last time.
Try it with your writing too – you’ll find that having your manuscript draft safely stashed away in a repository adds a lot of peace of mind.
There is plenty of great documentation out there on using these tools. Here are a few to begin with.
Getting Started with Version Control
Git Best Practices