Squamules: Using R for community analyses

For the central chapter of my doctoral dissertation (published in Environmental Microbiology), I wanted to visualize similarities and differences between lichen-associated bacterial communities, find which factors were most strongly correlated with community differences, and determine the significance of these correlations. When communities are characterized using large sets of DNA sequence data, this can be achieved by using both genetic measures (e.g., UniFrac distance) and taxon-based measures (e.g., Bray-Curtis OTU-based dissimilarity).

This post gives an example of a script (or a series of commands) to be run with the R statistical package (R Development Core Team 2010) for performing ANOSIM and NMDS analyses of (1) an OTU-based relative abundance matrix derived from a set of clone libraries and (2) a UniFrac distance matrix derived from a 454 amplicon library (see example files at the bottom of this post). These examples are derived from Hodkinson et al. (2012), although the scripts and example files shown here are abbreviated in order to save space. Full example files can be downloaded from the Dryad data package associated with this publication [Hodkinson et al. 2011; http://dx.doi.org/10.5061/dryad.t99b1]. The matrices were obtained using the Fast UniFrac program (see instructions here) and Mothur (using the get.relabund command; for the full pipeline that I authored to process these data, see the Dryad data package). Some minor modifications were required to get the matrices into the precise format needed for R (edits were performed in Microsoft Excel and files were saved in .csv format... note that all of the spaces/tabs in the examples below would actually be commas in the raw text versions of the files).

Running the script as it is presented below (with hash marks in front of the 'plot' commands) will allow one to obtain an output file with the numerical results of all analyses. However, in order to visualize the NMDS plots, one can run each of the commands line-by-line in the R-GUI (of course, all 'anosim' commands can be ignored if the only goal is NMDS), removing the hash marks in front of the plot commands (note that R has many fancy options for visualization using 'plot'; these may be worth investigating further for your own purposes). Here is a link to the version of R that I used for these analyses: http://cran.r-project.org/bin/windows/base/old/2.12.0/. Once R is installed, you can call the script with a command like this one:
C:\\"Program Files"\R\R-2.12.0\bin\i386\Rterm.exe --verbose --no-restore --file=C:\\ANOSIM_NMDS_Clon_454_win.R > C:\\ANOSIM_NMDS_Clon_454_win.out

Here is the script, named 'ANOSIM_NMDS_Clon_454_win.R' (abbreviated example script; full file available at http://dx.doi.org/10.5061/dryad.t99b1):

# load the necessary libraries
library(ecodist)
library(vegan)
# set the output file
sink("ANOSIM_nmds.out", append=TRUE, split=TRUE)
# load the clone data set
samplesClon<-read.csv("C:\\SamplesClon.csv", header=TRUE)
samplesClon
# load the Bray-Curtis OTU-based matrix
OTUsClon<-read.csv("C:\\HodkinsonClon16S_OTUs_1per_171.csv", head = FALSE, row.names = 1)
# transform the OTU matrix into a Bray-Curtis matrix
OTUsClon.dist<-vegdist(OTUsClon, "bray")
# run Bray-Curtis OTU-based ANOSIM
anosim(dis = OTUsClon.dist, grouping = samplesClon$Photobiont, strata = samplesClon$Site)
anosim(dis = OTUsClon.dist, grouping = samplesClon$Mycobiont, strata = samplesClon$Site)
anosim(dis = OTUsClon.dist, grouping = samplesClon$Mycobiont, strata = samplesClon$Photobiont)
anosim(dis = OTUsClon.dist, grouping = samplesClon$Site, strata = samplesClon$Photobiont)
anosim(dis = OTUsClon.dist, grouping = samplesClon$Site, strata = samplesClon$Mycobiont)
# run Bray-Curtis OTU-based NMDS
nmds <- nmds(as.dist(OTUsClon.dist))
nmin <- nmds.min(nmds)
nmin
#plot(nmin, pch=as.numeric(samplesClon$Photobiont))
#plot(nmin, pch=as.numeric(samplesClon$Mycobiont))
#plot(nmin, pch=as.numeric(samplesClon$Site))
# load the 454 data set
samples454<-read.csv("C:\\Samples454.csv", header=TRUE)
samples454
# load the normalized weighted UniFrac matrix
unifracnw454<-read.csv("C:\\DistUFnw454.csv", head = FALSE, row.names = 1)
# run normalized weighted UniFrac ANOSIM
anosim(dat = as.dist(unifracnw454), grouping = samples454$Photobiont, strata = samples454$Site)
anosim(dat = as.dist(unifracnw454), grouping = samples454$Site, strata = samples454$Mycobiont)
anosim(dat = as.dist(unifracnw454), grouping = samples454$Site, strata = samples454$Photobiont)
anosim(dat = as.dist(unifracnw454), grouping = samples454$Mycobiont, strata = samples454$Photobiont)
anosim(dat = as.dist(unifracnw454), grouping = samples454$Mycobiont, strata = samples454$Site)
# run normalized weighted UniFrac NMDS
nmds <- nmds(as.dist(unifracnw454))
nmin <- nmds.min(nmds)
nmin
#plot(nmin, pch=as.numeric(samples454$Photobiont))
#plot(nmin, pch=as.numeric(samples454$Mycobiont))
#plot(nmin, pch=as.numeric(samples454$Site))
# close the output file
sink()
# unload the libraries
detach("package:ecodist")
detach("package:vegan")

Samples454.csv (referenced in the above script)

SampleID	Photobiont	Site	Mycobiont
CLCl	Ch	C	Cl
CLDi	Cy	C	Di
CLLe	Cy	C	Le
CLPe	Cy	C	Pe
CLSt	Cy	C	St
CLUs	Ch	C	Us
ELCl	Ch	E	Cl
ELFl	Ch	E	Fl
ELOp	Ch	E	Op
ELPe	Cy	E	Pe
ELUm	Ch	E	Um
HLCl	Ch	H	Cl
HLLe	Cy	H	Le
HLPe	Cy	H	Pe
HLSt	Cy	H	St
HLUs	Ch	H	Us
NLCl	Ch	N	Cl
NLFl	Ch	N	Fl
NLOp	Ch	N	Op
NLPe	Cy	N	Pe
NLUm	Ch	N	Um

DistUFnw454.csv (referenced in the above script; full file available at http://dx.doi.org/10.5061/dryad.t99b1)

CLCl		0.16	0.17	0.11	0.08	0.12	...
CLDi	0.16		0.24	0.18	0.13	0.18	...
CLLe	0.17	0.24		0.11	0.16	0.20	...
CLPe	0.11	0.18	0.11		0.10	0.18	...
CLSt	0.08	0.13	0.16	0.10		0.13	...
CLUs	0.12	0.18	0.20	0.18	0.13		...
ELCl	0.20	0.22	0.30	0.25	0.19	0.17
ELFl	0.14	0.18	0.24	0.19	0.14	0.15	...
ELOp	0.33	0.35	0.40	0.37	0.34	0.31	...
ELPe	0.16	0.21	0.24	0.19	0.15	0.19	...
ELUm	0.22	0.24	0.32	0.27	0.22	0.22	...
HLCl	0.19	0.18	0.27	0.24	0.17	0.15	...
HLLe	0.13	0.16	0.20	0.15	0.11	0.12	...
HLPe	0.09	0.17	0.15	0.11	0.11	0.16	...
HLSt	0.14	0.16	0.21	0.18	0.12	0.13	...
HLUs	0.10	0.20	0.18	0.15	0.14	0.11	...
NLCl	0.19	0.21	0.28	0.25	0.19	0.15	...
NLFl	0.18	0.21	0.27	0.24	0.19	0.15	...
NLOp	0.24	0.27	0.32	0.29	0.25	0.25	...
NLPe	0.10	0.18	0.13	0.09	0.10	0.14	...
NLUm	0.09	0.18	0.21	0.15	0.13	0.16	...

SamplesClon.csv (referenced in the above script)

SampleID	Photobiont	Site	Mycobiont
CL01	Cy	C	Peltigera
CL02	Ch	C	Alectoria
CL03	Ch	C	Cladonia
CL04	Tr	C	Placopsis
CL05	Cy	C	Erioderma
CL06	Cy	C	Sticta
CL07	Cy	C	Dictyonema
CL08	Cy	C	Leptogium
CL09	Tr	C	Stereocaulon
CL10	Ch	C	Usnea
EL01	Cy	E	Peltigera
EL01t	Tr	E	Peltigera
EL02	Ch	E	Alectoria
EL03	Ch	E	Cladonia
EL04	Ch	E	Rhizocarpon
EL05	Ch	E	Ophioparma
EL06	Ch	E	Flavocetraria
EL07	Ch	E	Arctoparmelia
EL08	Ch	E	Umbilicaria
EL09	Ch	E	Masonhalea
EL10	Ch	E	Dactylina
HL01	Cy	H	Peltigera
HL02	Ch	H	Platismatia
HL03	Ch	H	Cladonia
HL04	Ch	H	Parmotrema
HL05	Ch	H	Flavoparmelia
HL06	Cy	H	Sticta
HL07	Ch	H	Xanthoparmelia
HL08	Cy	H	Leptogium
HL09	Tr	H	Stereocaulon
HL10	Ch	H	Usnea
NL01	Cy	N	Peltigera
NL01t	Tr	N	Peltigera
NL03	Ch	N	Cladonia
NL04	Tr	N	Amygdalaria
NL05	Ch	N	Ophioparma
NL06	Ch	N	Flavocetraria
NL07	Ch	N	Parmelia
NL08	Ch	N	Umbilicaria
NL09	Tr	N	Stereocaulon
NL10	Ch	N	Cetraria

HodkinsonClon16S_OTUS_1per_171.csv (referenced in the above script; full file available at http://dx.doi.org/10.5061/dryad.t99b1)

CL01	0.00	0.00	0.00	0.00	0.00	0.00	...
CL02	0.00	0.00	0.00	0.00	0.00	0.00	...
CL03	0.00	0.25	0.00	0.00	0.00	0.00	...
CL04	0.11	0.00	0.00	0.00	0.00	0.00	...
CL05	0.00	0.03	0.34	0.05	0.00	0.00	...
CL06	0.00	0.10	0.02	0.03	0.00	0.02	...
CL07	0.00	0.56	0.00	0.00	0.00	0.19	...
CL08	0.00	0.03	0.00	0.00	0.00	0.00	...
CL09	0.02	0.13	0.06	0.15	0.00	0.00	...
CL10	0.00	0.00	0.00	0.00	0.00	0.00	...
EL01	0.00	0.00	0.00	0.00	0.00	0.17	...
EL01t	0.00	0.00	0.00	0.00	0.00	0.31	...
EL02	0.00	0.26	0.00	0.00	0.00	0.00	...
EL03	0.00	0.03	0.00	0.00	0.00	0.00	...
EL04	0.00	0.10	0.00	0.00	0.00	0.00	...
EL05	0.00	0.04	0.00	0.00	0.00	0.00	...
EL06	0.00	0.07	0.00	0.00	0.00	0.00	...
EL07	0.00	0.00	0.00	0.00	0.00	0.00	...
EL08	0.00	0.34	0.00	0.00	0.00	0.00	...
EL09	0.00	0.21	0.00	0.00	0.00	0.00	...
EL10	0.00	0.21	0.00	0.00	0.04	0.00	...
HL01	0.00	0.00	0.00	0.00	0.00	0.00	...
HL02	0.13	0.00	0.00	0.00	0.00	0.00	...
HL03	0.00	0.00	0.00	0.00	0.00	0.00	...
HL04	0.06	0.00	0.00	0.00	0.00	0.00	...
HL05	0.00	0.03	0.00	0.00	0.00	0.00	...
HL06	0.00	0.00	0.00	0.06	0.00	0.00	...
HL07	0.00	0.14	0.00	0.00	0.00	0.00	...
HL08	0.00	0.00	0.00	0.00	0.00	0.00	...
HL09	0.00	0.00	0.00	0.00	0.00	0.00	...
HL10	0.00	0.02	0.00	0.00	0.00	0.00	...
NL01	0.00	0.15	0.00	0.00	0.00	0.00	...
NL01t	0.00	0.00	0.00	0.00	0.00	0.15	...
NL03	0.00	0.00	0.00	0.00	0.00	0.00	...
NL04	0.00	0.15	0.00	0.00	0.00	0.00	...
NL05	0.00	0.17	0.00	0.00	0.00	0.00	...
NL06	0.00	0.02	0.00	0.00	0.00	0.00	...
NL07	0.00	0.33	0.00	0.00	0.00	0.00	...
NL08	0.00	0.80	0.00	0.00	0.00	0.00	...
NL09	0.00	0.00	0.00	0.00	0.00	0.00	...
NL10	0.00	0.05	0.00	0.00	0.00	0.00	...

Here are some examples of non-metric multidimensional scaling plots produced from OTU-based Bray-Curtis dissimilarities between lichen-associated bacterial communities (Hodkinson et al. 2012). Photobiont is indicated by color (light green = green algae; dark blue = Cyanobacteria; black = Tripartite), while the site is indicated with symbols (• = Eagle Summit, AK; + = Nome, AK; x = Highlands, NC, and * = Cerro de la Muerte, CR). Numbers indicate the identity of different mycobiont types (see associated publication for details). Continuous lines act as visual aids to delimit communities associated with the two major photobiont-types, whereas dashed lines delimit communities associated with chlorolichens from northern versus southern sites.

Plot A shows results obtained from clone library data of 16S sequences from the order Rhizobiales.

Plot B was produced from 454 barcoded 16S amplicon data (representing approximately half the number of samples as the clone library set but ~100 times as many sequences from a much wider range of bacterial diversity; since this plot was produced through OTU-based methods it does not precisely correlate with what would be produced through the abbreviated script above, which uses UniFrac distances to generate NMDS plots based on 454 data).

For additional information and documentation, including results of ANOSIM analyses, please see the references below!

- Brendan

---------------------------------------

References

Hodkinson, B. P. 2011. A phylogenetic, ecological, and functional characterization of non-photoautotrophic bacteria in the lichen microbiome. Doctoral Dissertation, Duke University, Durham, NC.
Download Dissertation (PDF file)

Hodkinson, B. P., N. R. Gottel, C. W. Schadt, and F. Lutzoni. 2012. Photoautotrophic symbiont and geography are major factors affecting highly structured and diverse bacterial communities in the lichen microbiome. Environmental Microbiology 14(1): 147-161. [doi:10.1111/j.1462-2920.2011.02560.x]
Download publication (PDF file)
Download supplementary phylogeny (PDF file)
View data and analysis file webportal (website)
Download data and analysis file archive (ZIP file)

Hodkinson, B. P., N. R. Gottel, C. W. Schadt, and F. Lutzoni. 2011. Data from: Photoautotrophic symbiont and geography are major factors affecting highly structured and diverse bacterial communities in the lichen microbiome. Dryad Digital Repository. doi:10.5061/dryad.t99b1

R Development Core Team. 2010. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. http://www.r-project.org/

Squamules

Friday, December 16, 2011

Using R for community analyses

No comments:

Post a Comment

Search This Blog

Blog Archive

About Me