Friday, March 29, 2013

Grice Lab Website

The Grice lab has a new website!  It just went live, so feel free to check it out:

Thursday, March 7, 2013

Public Repositories


I was recently asked by a Library Science graduate student to take a survey regarding my data archiving practices. For part of the survey I was asked to comment on the different databases that I use for archiving data and state why I chose them. Here are my comments on the public repositories for molecular data that I use:

GenBank/NCBI: It is difficult to deposit large data sets here (although it has been improving somewhat in usability), but it is the standard database for molecular biology. A major advantage is that the data are integrated into a larger framework and all sorts of NCBI tools and programs can be used for future researchers to find and analyze the data alongside the data from nearly all other projects in the field of molecular biology.

Data Dryad: It is extremely flexible in the data formats allowed (making my work more reproducible for other scientists) and it's easy to deposit any type of biological data.

MG-RAST: This database is good for metagenomic data sets, but it can be moderately difficult and time-consuming to complete a submission and make it public. However, once the data are public, this repository allows researchers to see various aspects of large molecular biological sequence data sets through a set of tools, and some comparisons can be made between data sets.

TreeBASE: Phylogenetic trees and DNA/protein sequence alignments can be deposited here in a very strictly-regulated way, although there is really no integration of data sets (e.g., comparison or combination of data sets). It is often required by journals that files for evolutionary biology studies be deposited here, but it is usually difficult because of the formatting requirements.

Overall, I think it's best to use multiple databases. I prefer to put all data and analysis files in Dryad, and then use the other databases when appropriate for the specific types of data. The more places the data are, the more likely people are to find them!

- Brendan

Disclaimer: These comments are the opinions of the author based on personal experiences and do not represent the views of any affiliated institutions or funding agencies.