Project FOMC0000 services include NGS sequencing of the V3V4 region of the 16S rRNA amplicons from the samples. First and foremost, please
download this report, as well as the sequence raw data from the download links provided below.
These links will expire after 60 days. We cannot guarantee the availability of your data after 60 days.
Full Bioinformatics analysis service was requested. We provide many analyses, starting from the raw sequence quality and noise filtering, pair reads merging, as well as chimera filtering for the sequences, using the
DADA2 denosing algorithm and pipeline.
We also provide many downstream analyses such as taxonomy assignment, alpha and beta diversity analyses, and differential abundance analysis.
For taxonomy assignment, most informative would be the taxonomy barplots. We provide an interactive barplots to show the relative abundance of microbes at different taxonomy levels (from Phylum to species) that you can choose.
If you specify which groups of samples you want to compare for differential abundance, we provide both ANCOM and LEfSe differential abundance analysis.
The samples were processed and analyzed with the ZymoBIOMICS® Service: Targeted
Metagenomic Sequencing (Zymo Research, Irvine, CA).
DNA Extraction: If DNA extraction was performed, one of three different DNA
extraction kits was used depending on the sample type and sample volume and were
used according to the manufacturer’s instructions, unless otherwise stated. The kit used
in this project is marked below:
☐
ZymoBIOMICS® DNA Miniprep Kit (Zymo Research, Irvine, CA)
☐
ZymoBIOMICS® DNA Microprep Kit (Zymo Research, Irvine, CA)
☐
ZymoBIOMICS®-96 MagBead DNA Kit (Zymo Research, Irvine, CA)
☑
N/A (DNA Extraction Not Performed)
Elution Volume: 50µL
Additional Notes: NA
Targeted Library Preparation: The DNA samples were prepared for targeted
sequencing with the Quick-16S™ NGS Library Prep Kit (Zymo Research, Irvine, CA).
These primers were custom designed by Zymo Research to provide the best coverage
of the 16S gene while maintaining high sensitivity. The primer sets used in this project
are marked below:
☐
Quick-16S™ Primer Set V1-V2 (Zymo Research, Irvine, CA)
☐
Quick-16S™ Primer Set V1-V3 (Zymo Research, Irvine, CA)
☑
Quick-16S™ Primer Set V3-V4 (Zymo Research, Irvine, CA)
☐
Quick-16S™ Primer Set V4 (Zymo Research, Irvine, CA)
☐
Quick-16S™ Primer Set V6-V8 (Zymo Research, Irvine, CA)
☐
Other: NA
Additional Notes: NA
The sequencing library was prepared using an innovative library preparation process in
which PCR reactions were performed in real-time PCR machines to control cycles and
therefore limit PCR chimera formation. The final PCR products were quantified with
qPCR fluorescence readings and pooled together based on equal molarity. The final
pooled library was cleaned up with the Select-a-Size DNA Clean & Concentrator™
(Zymo Research, Irvine, CA), then quantified with TapeStation® (Agilent Technologies,
Santa Clara, CA) and Qubit® (Thermo Fisher Scientific, Waltham, WA).
Control Samples: The ZymoBIOMICS® Microbial Community Standard (Zymo
Research, Irvine, CA) was used as a positive control for each DNA extraction, if
performed. The ZymoBIOMICS® Microbial Community DNA Standard (Zymo Research,
Irvine, CA) was used as a positive control for each targeted library preparation.
Negative controls (i.e. blank extraction control, blank library preparation control) were
included to assess the level of bioburden carried by the wet-lab process.
Sequencing: The final library was sequenced on Illumina® MiSeq™ with a V3 reagent kit
(600 cycles). The sequencing was performed with 10% PhiX spike-in.
The complete report of your project, including all links in this report, can be downloaded by clicking the link provided below. The downloaded file is a compressed ZIP file and once unzipped, open the file “REPORT.html” (may only shown as "REPORT" in your computer) by double clicking it. Your default web browser will open it and you will see the exact content of this report.
Please download and save the file to your computer storage device. The download link will expire after 60 days upon your receiving of this report.
Complete report download link:
To view the report, please follow the following steps:
1.
Download the .zip file from the report link above.
2.
Extract all the contents of the downloaded .zip file to your desktop.
3.
Open the extracted folder and find the "REPORT.html" (may shown as only "REPORT").
4.
Open (double-clicking) the REPORT.html file. Your default browser will open the top age of the complete report. Within the
report, there are links to view all the analyses performed for the project.
The raw NGS sequence data is available for download with the link provided below. The data is a compressed ZIP file and can be unzipped to individual sequence files.
Since this is a pair-end sequencing, each of your samples is represented by two sequence files, one for READ 1,
with the file extension “*_R1.fastq.gz”, another READ 2, with the file extension “*_R1.fastq.gz”.
The files are in FASTQ format and are compressed. FASTQ format is a text-based data format for storing both a biological sequence
and its corresponding quality scores. Most sequence analysis software will be able to open them.
The Sample IDs associated with the R1 and R2 fastq files are listed in the table below:
Sample ID
Original Sample ID
Read 1 File Name
Read 2 File Name
FOMC0000.S001
100_2014
zr0000_100V3V4_R1.fastq.gz
zr0000_100V3V4_R2.fastq.gz
FOMC0000.S002
101_2014
zr0000_101V3V4_R1.fastq.gz
zr0000_101V3V4_R2.fastq.gz
FOMC0000.S003
102_2014
zr0000_102V3V4_R1.fastq.gz
zr0000_102V3V4_R2.fastq.gz
FOMC0000.S004
103_2014
zr0000_103V3V4_R1.fastq.gz
zr0000_103V3V4_R2.fastq.gz
FOMC0000.S005
104_2014
zr0000_104V3V4_R1.fastq.gz
zr0000_104V3V4_R2.fastq.gz
FOMC0000.S006
107_2014
zr0000_105V3V4_R1.fastq.gz
zr0000_105V3V4_R2.fastq.gz
FOMC0000.S007
108_2014
zr0000_106V3V4_R1.fastq.gz
zr0000_106V3V4_R2.fastq.gz
FOMC0000.S008
110_2014
zr0000_107V3V4_R1.fastq.gz
zr0000_107V3V4_R2.fastq.gz
FOMC0000.S009
112_2014
zr0000_108V3V4_R1.fastq.gz
zr0000_108V3V4_R2.fastq.gz
FOMC0000.S010
113_2014
zr0000_109V3V4_R1.fastq.gz
zr0000_109V3V4_R2.fastq.gz
FOMC0000.S011
114_2014
zr0000_10V3V4_R1.fastq.gz
zr0000_10V3V4_R2.fastq.gz
FOMC0000.S012
115_2014
zr0000_110V3V4_R1.fastq.gz
zr0000_110V3V4_R2.fastq.gz
FOMC0000.S013
119_2014
zr0000_111V3V4_R1.fastq.gz
zr0000_111V3V4_R2.fastq.gz
FOMC0000.S014
1_2014
zr0000_112V3V4_R1.fastq.gz
zr0000_112V3V4_R2.fastq.gz
FOMC0000.S015
120_2014
zr0000_113V3V4_R1.fastq.gz
zr0000_113V3V4_R2.fastq.gz
FOMC0000.S016
121_2014
zr0000_114V3V4_R1.fastq.gz
zr0000_114V3V4_R2.fastq.gz
FOMC0000.S017
122_2014
zr0000_115V3V4_R1.fastq.gz
zr0000_115V3V4_R2.fastq.gz
FOMC0000.S018
123_2014
zr0000_116V3V4_R1.fastq.gz
zr0000_116V3V4_R2.fastq.gz
FOMC0000.S019
130_2014
zr0000_117V3V4_R1.fastq.gz
zr0000_117V3V4_R2.fastq.gz
FOMC0000.S020
13_2014
zr0000_118V3V4_R1.fastq.gz
zr0000_118V3V4_R2.fastq.gz
FOMC0000.S021
132_2014
zr0000_119V3V4_R1.fastq.gz
zr0000_119V3V4_R2.fastq.gz
FOMC0000.S022
133_2014
zr0000_11V3V4_R1.fastq.gz
zr0000_11V3V4_R2.fastq.gz
FOMC0000.S023
134_2014
zr0000_120V3V4_R1.fastq.gz
zr0000_120V3V4_R2.fastq.gz
FOMC0000.S024
136_2014
zr0000_121V3V4_R1.fastq.gz
zr0000_121V3V4_R2.fastq.gz
FOMC0000.S025
139_2014
zr0000_122V3V4_R1.fastq.gz
zr0000_122V3V4_R2.fastq.gz
FOMC0000.S026
140_2014
zr0000_123V3V4_R1.fastq.gz
zr0000_123V3V4_R2.fastq.gz
FOMC0000.S027
14_2014
zr0000_124V3V4_R1.fastq.gz
zr0000_124V3V4_R2.fastq.gz
FOMC0000.S028
142_2014
zr0000_125V3V4_R1.fastq.gz
zr0000_125V3V4_R2.fastq.gz
FOMC0000.S029
146_2014
zr0000_126V3V4_R1.fastq.gz
zr0000_126V3V4_R2.fastq.gz
FOMC0000.S030
148_2014
zr0000_127V3V4_R1.fastq.gz
zr0000_127V3V4_R2.fastq.gz
FOMC0000.S031
149_2014
zr0000_128V3V4_R1.fastq.gz
zr0000_128V3V4_R2.fastq.gz
FOMC0000.S032
150_2014
zr0000_129V3V4_R1.fastq.gz
zr0000_129V3V4_R2.fastq.gz
FOMC0000.S033
159_2014
zr0000_12V3V4_R1.fastq.gz
zr0000_12V3V4_R2.fastq.gz
FOMC0000.S034
161_2014
zr0000_130V3V4_R1.fastq.gz
zr0000_130V3V4_R2.fastq.gz
FOMC0000.S035
162_2014
zr0000_131V3V4_R1.fastq.gz
zr0000_131V3V4_R2.fastq.gz
FOMC0000.S036
164_2014
zr0000_132V3V4_R1.fastq.gz
zr0000_132V3V4_R2.fastq.gz
FOMC0000.S037
168_2014
zr0000_133V3V4_R1.fastq.gz
zr0000_133V3V4_R2.fastq.gz
FOMC0000.S038
170_2014
zr0000_134V3V4_R1.fastq.gz
zr0000_134V3V4_R2.fastq.gz
FOMC0000.S039
171_2014
zr0000_135V3V4_R1.fastq.gz
zr0000_135V3V4_R2.fastq.gz
FOMC0000.S040
176_2014
zr0000_136V3V4_R1.fastq.gz
zr0000_136V3V4_R2.fastq.gz
FOMC0000.S041
177_2014
zr0000_137V3V4_R1.fastq.gz
zr0000_137V3V4_R2.fastq.gz
FOMC0000.S042
180_2014
zr0000_138V3V4_R1.fastq.gz
zr0000_138V3V4_R2.fastq.gz
FOMC0000.S043
187_2014
zr0000_139V3V4_R1.fastq.gz
zr0000_139V3V4_R2.fastq.gz
FOMC0000.S044
188_2014
zr0000_13V3V4_R1.fastq.gz
zr0000_13V3V4_R2.fastq.gz
FOMC0000.S045
189_2014
zr0000_140V3V4_R1.fastq.gz
zr0000_140V3V4_R2.fastq.gz
FOMC0000.S046
191_2014
zr0000_141V3V4_R1.fastq.gz
zr0000_141V3V4_R2.fastq.gz
FOMC0000.S047
193_2014
zr0000_142V3V4_R1.fastq.gz
zr0000_142V3V4_R2.fastq.gz
FOMC0000.S048
196_2014
zr0000_143V3V4_R1.fastq.gz
zr0000_143V3V4_R2.fastq.gz
FOMC0000.S049
198_2014
zr0000_144V3V4_R1.fastq.gz
zr0000_144V3V4_R2.fastq.gz
FOMC0000.S050
200_2014
zr0000_145V3V4_R1.fastq.gz
zr0000_145V3V4_R2.fastq.gz
FOMC0000.S051
201_2014
zr0000_146V3V4_R1.fastq.gz
zr0000_146V3V4_R2.fastq.gz
FOMC0000.S052
204_2014
zr0000_147V3V4_R1.fastq.gz
zr0000_147V3V4_R2.fastq.gz
FOMC0000.S053
205_2014
zr0000_148V3V4_R1.fastq.gz
zr0000_148V3V4_R2.fastq.gz
FOMC0000.S054
207_2014
zr0000_149V3V4_R1.fastq.gz
zr0000_149V3V4_R2.fastq.gz
FOMC0000.S055
208_2014
zr0000_14V3V4_R1.fastq.gz
zr0000_14V3V4_R2.fastq.gz
FOMC0000.S056
209_2014
zr0000_150V3V4_R1.fastq.gz
zr0000_150V3V4_R2.fastq.gz
FOMC0000.S057
21_2014
zr0000_151V3V4_R1.fastq.gz
zr0000_151V3V4_R2.fastq.gz
FOMC0000.S058
214_2014
zr0000_152V3V4_R1.fastq.gz
zr0000_152V3V4_R2.fastq.gz
FOMC0000.S059
215_2014
zr0000_153V3V4_R1.fastq.gz
zr0000_153V3V4_R2.fastq.gz
FOMC0000.S060
216_2014
zr0000_15V3V4_R1.fastq.gz
zr0000_15V3V4_R2.fastq.gz
FOMC0000.S061
217_2014
zr0000_16V3V4_R1.fastq.gz
zr0000_16V3V4_R2.fastq.gz
FOMC0000.S062
218_2014
zr0000_17V3V4_R1.fastq.gz
zr0000_17V3V4_R2.fastq.gz
FOMC0000.S063
219_2014
zr0000_18V3V4_R1.fastq.gz
zr0000_18V3V4_R2.fastq.gz
FOMC0000.S064
220_2014
zr0000_19V3V4_R1.fastq.gz
zr0000_19V3V4_R2.fastq.gz
FOMC0000.S065
221_2014
zr0000_1V3V4_R1.fastq.gz
zr0000_1V3V4_R2.fastq.gz
FOMC0000.S066
225_2014
zr0000_20V3V4_R1.fastq.gz
zr0000_20V3V4_R2.fastq.gz
FOMC0000.S067
234_2014
zr0000_21V3V4_R1.fastq.gz
zr0000_21V3V4_R2.fastq.gz
FOMC0000.S068
240_2014
zr0000_22V3V4_R1.fastq.gz
zr0000_22V3V4_R2.fastq.gz
FOMC0000.S069
243_2014
zr0000_23V3V4_R1.fastq.gz
zr0000_23V3V4_R2.fastq.gz
FOMC0000.S070
246_2014
zr0000_24V3V4_R1.fastq.gz
zr0000_24V3V4_R2.fastq.gz
FOMC0000.S071
251_2014
zr0000_25V3V4_R1.fastq.gz
zr0000_25V3V4_R2.fastq.gz
FOMC0000.S072
259_2014
zr0000_26V3V4_R1.fastq.gz
zr0000_26V3V4_R2.fastq.gz
FOMC0000.S073
263_2014
zr0000_27V3V4_R1.fastq.gz
zr0000_27V3V4_R2.fastq.gz
FOMC0000.S074
276_2014
zr0000_28V3V4_R1.fastq.gz
zr0000_28V3V4_R2.fastq.gz
FOMC0000.S075
287_2014
zr0000_29V3V4_R1.fastq.gz
zr0000_29V3V4_R2.fastq.gz
FOMC0000.S076
288_2014
zr0000_2V3V4_R1.fastq.gz
zr0000_2V3V4_R2.fastq.gz
FOMC0000.S077
293_2014
zr0000_30V3V4_R1.fastq.gz
zr0000_30V3V4_R2.fastq.gz
FOMC0000.S078
295_2014
zr0000_31V3V4_R1.fastq.gz
zr0000_31V3V4_R2.fastq.gz
FOMC0000.S079
3_2014
zr0000_32V3V4_R1.fastq.gz
zr0000_32V3V4_R2.fastq.gz
FOMC0000.S080
72_2014
zr0000_33V3V4_R1.fastq.gz
zr0000_33V3V4_R2.fastq.gz
FOMC0000.S081
75_2014
zr0000_34V3V4_R1.fastq.gz
zr0000_34V3V4_R2.fastq.gz
FOMC0000.S082
76_2014
zr0000_35V3V4_R1.fastq.gz
zr0000_35V3V4_R2.fastq.gz
FOMC0000.S083
77_2014
zr0000_36V3V4_R1.fastq.gz
zr0000_36V3V4_R2.fastq.gz
FOMC0000.S084
78_2014
zr0000_37V3V4_R1.fastq.gz
zr0000_37V3V4_R2.fastq.gz
FOMC0000.S085
80_2014
zr0000_38V3V4_R1.fastq.gz
zr0000_38V3V4_R2.fastq.gz
FOMC0000.S086
87_2014
zr0000_39V3V4_R1.fastq.gz
zr0000_39V3V4_R2.fastq.gz
FOMC0000.S087
88_2014
zr0000_3V3V4_R1.fastq.gz
zr0000_3V3V4_R2.fastq.gz
FOMC0000.S088
92_2014
zr0000_40V3V4_R1.fastq.gz
zr0000_40V3V4_R2.fastq.gz
FOMC0000.S089
93_2014
zr0000_41V3V4_R1.fastq.gz
zr0000_41V3V4_R2.fastq.gz
FOMC0000.S090
94_2014
zr0000_42V3V4_R1.fastq.gz
zr0000_42V3V4_R2.fastq.gz
FOMC0000.S091
95_2014
zr0000_43V3V4_R1.fastq.gz
zr0000_43V3V4_R2.fastq.gz
FOMC0000.S092
97_2014
zr0000_44V3V4_R1.fastq.gz
zr0000_44V3V4_R2.fastq.gz
FOMC0000.S093
98_2014
zr0000_45V3V4_R1.fastq.gz
zr0000_45V3V4_R2.fastq.gz
FOMC0000.S094
99_2014
zr0000_46V3V4_R1.fastq.gz
zr0000_46V3V4_R2.fastq.gz
FOMC0000.S095
105_2016
zr0000_47V3V4_R1.fastq.gz
zr0000_47V3V4_R2.fastq.gz
FOMC0000.S096
135_2016
zr0000_48V3V4_R1.fastq.gz
zr0000_48V3V4_R2.fastq.gz
FOMC0000.S097
138_2016
zr0000_49V3V4_R1.fastq.gz
zr0000_49V3V4_R2.fastq.gz
FOMC0000.S098
141_2016
zr0000_4V3V4_R1.fastq.gz
zr0000_4V3V4_R2.fastq.gz
FOMC0000.S099
143_2016
zr0000_50V3V4_R1.fastq.gz
zr0000_50V3V4_R2.fastq.gz
FOMC0000.S100
144_2016
zr0000_51V3V4_R1.fastq.gz
zr0000_51V3V4_R2.fastq.gz
FOMC0000.S101
147_2016
zr0000_52V3V4_R1.fastq.gz
zr0000_52V3V4_R2.fastq.gz
FOMC0000.S102
153_2016
zr0000_53V3V4_R1.fastq.gz
zr0000_53V3V4_R2.fastq.gz
FOMC0000.S103
154_2016
zr0000_54V3V4_R1.fastq.gz
zr0000_54V3V4_R2.fastq.gz
FOMC0000.S104
155_2016
zr0000_55V3V4_R1.fastq.gz
zr0000_55V3V4_R2.fastq.gz
FOMC0000.S105
156_2016
zr0000_56V3V4_R1.fastq.gz
zr0000_56V3V4_R2.fastq.gz
FOMC0000.S106
157_2016
zr0000_57V3V4_R1.fastq.gz
zr0000_57V3V4_R2.fastq.gz
FOMC0000.S107
158_2016
zr0000_58V3V4_R1.fastq.gz
zr0000_58V3V4_R2.fastq.gz
FOMC0000.S108
163_2016
zr0000_59V3V4_R1.fastq.gz
zr0000_59V3V4_R2.fastq.gz
FOMC0000.S109
167_2016
zr0000_5V3V4_R1.fastq.gz
zr0000_5V3V4_R2.fastq.gz
FOMC0000.S110
169_2016
zr0000_60V3V4_R1.fastq.gz
zr0000_60V3V4_R2.fastq.gz
FOMC0000.S111
173_2016
zr0000_61V3V4_R1.fastq.gz
zr0000_61V3V4_R2.fastq.gz
FOMC0000.S112
175_2016
zr0000_62V3V4_R1.fastq.gz
zr0000_62V3V4_R2.fastq.gz
FOMC0000.S113
178_2016
zr0000_63V3V4_R1.fastq.gz
zr0000_63V3V4_R2.fastq.gz
FOMC0000.S114
179_2016
zr0000_64V3V4_R1.fastq.gz
zr0000_64V3V4_R2.fastq.gz
FOMC0000.S115
181_2016
zr0000_65V3V4_R1.fastq.gz
zr0000_65V3V4_R2.fastq.gz
FOMC0000.S116
182_2016
zr0000_66V3V4_R1.fastq.gz
zr0000_66V3V4_R2.fastq.gz
FOMC0000.S117
183_2016
zr0000_67V3V4_R1.fastq.gz
zr0000_67V3V4_R2.fastq.gz
FOMC0000.S118
184_2016
zr0000_68V3V4_R1.fastq.gz
zr0000_68V3V4_R2.fastq.gz
FOMC0000.S119
185_2016
zr0000_69V3V4_R1.fastq.gz
zr0000_69V3V4_R2.fastq.gz
FOMC0000.S120
192_2016
zr0000_6V3V4_R1.fastq.gz
zr0000_6V3V4_R2.fastq.gz
FOMC0000.S121
194_2016
zr0000_70V3V4_R1.fastq.gz
zr0000_70V3V4_R2.fastq.gz
FOMC0000.S122
195_2016
zr0000_71V3V4_R1.fastq.gz
zr0000_71V3V4_R2.fastq.gz
FOMC0000.S123
197_2016
zr0000_72V3V4_R1.fastq.gz
zr0000_72V3V4_R2.fastq.gz
FOMC0000.S124
199_2016
zr0000_73V3V4_R1.fastq.gz
zr0000_73V3V4_R2.fastq.gz
FOMC0000.S125
200_2016
zr0000_74V3V4_R1.fastq.gz
zr0000_74V3V4_R2.fastq.gz
FOMC0000.S126
206_2016
zr0000_75V3V4_R1.fastq.gz
zr0000_75V3V4_R2.fastq.gz
FOMC0000.S127
210_2016
zr0000_76V3V4_R1.fastq.gz
zr0000_76V3V4_R2.fastq.gz
FOMC0000.S128
211_2016
zr0000_77V3V4_R1.fastq.gz
zr0000_77V3V4_R2.fastq.gz
FOMC0000.S129
212_2016
zr0000_78V3V4_R1.fastq.gz
zr0000_78V3V4_R2.fastq.gz
FOMC0000.S130
213_2016
zr0000_79V3V4_R1.fastq.gz
zr0000_79V3V4_R2.fastq.gz
FOMC0000.S131
222_2016
zr0000_7V3V4_R1.fastq.gz
zr0000_7V3V4_R2.fastq.gz
FOMC0000.S132
227_2016
zr0000_80V3V4_R1.fastq.gz
zr0000_80V3V4_R2.fastq.gz
FOMC0000.S133
228_2016
zr0000_81V3V4_R1.fastq.gz
zr0000_81V3V4_R2.fastq.gz
FOMC0000.S134
230_2016
zr0000_82V3V4_R1.fastq.gz
zr0000_82V3V4_R2.fastq.gz
FOMC0000.S135
232_2016
zr0000_83V3V4_R1.fastq.gz
zr0000_83V3V4_R2.fastq.gz
FOMC0000.S136
235_2016
zr0000_84V3V4_R1.fastq.gz
zr0000_84V3V4_R2.fastq.gz
FOMC0000.S137
236_2016
zr0000_85V3V4_R1.fastq.gz
zr0000_85V3V4_R2.fastq.gz
FOMC0000.S138
238_2016
zr0000_86V3V4_R1.fastq.gz
zr0000_86V3V4_R2.fastq.gz
FOMC0000.S139
261_2016
zr0000_87V3V4_R1.fastq.gz
zr0000_87V3V4_R2.fastq.gz
FOMC0000.S140
271_2016
zr0000_88V3V4_R1.fastq.gz
zr0000_88V3V4_R2.fastq.gz
FOMC0000.S141
277_2016
zr0000_89V3V4_R1.fastq.gz
zr0000_89V3V4_R2.fastq.gz
FOMC0000.S142
284_2016
zr0000_8V3V4_R1.fastq.gz
zr0000_8V3V4_R2.fastq.gz
FOMC0000.S143
285_2016
zr0000_90V3V4_R1.fastq.gz
zr0000_90V3V4_R2.fastq.gz
FOMC0000.S144
41_2016
zr0000_91V3V4_R1.fastq.gz
zr0000_91V3V4_R2.fastq.gz
FOMC0000.S145
73_2016
zr0000_92V3V4_R1.fastq.gz
zr0000_92V3V4_R2.fastq.gz
FOMC0000.S146
74_2016
zr0000_93V3V4_R1.fastq.gz
zr0000_93V3V4_R2.fastq.gz
FOMC0000.S147
79_2016
zr0000_94V3V4_R1.fastq.gz
zr0000_94V3V4_R2.fastq.gz
FOMC0000.S148
81_2016
zr0000_95V3V4_R1.fastq.gz
zr0000_95V3V4_R2.fastq.gz
FOMC0000.S149
82_2016
zr0000_96V3V4_R1.fastq.gz
zr0000_96V3V4_R2.fastq.gz
FOMC0000.S150
83_2016
zr0000_97V3V4_R1.fastq.gz
zr0000_97V3V4_R2.fastq.gz
FOMC0000.S151
84_2016
zr0000_98V3V4_R1.fastq.gz
zr0000_98V3V4_R2.fastq.gz
FOMC0000.S152
90_2016
zr0000_99V3V4_R1.fastq.gz
zr0000_99V3V4_R2.fastq.gz
FOMC0000.S153
91_2016
zr0000_9V3V4_R1.fastq.gz
zr0000_9V3V4_R2.fastq.gz
Please download and save the file to your computer storage device. The download link will expire after 60 days upon your receiving of this report.
DADA2 is a software package that models and corrects Illumina-sequenced amplicon errors.
DADA2 infers sample sequences exactly, without coarse-graining into OTUs,
and resolves differences of as little as one nucleotide. DADA2 identified more real variants
and output fewer spurious sequences than other methods.
DADA2’s advantage is that it uses more of the data. The DADA2 error model incorporates quality information,
which is ignored by all other methods after filtering. The DADA2 error model incorporates quantitative abundances,
whereas most other methods use abundance ranks if they use abundance at all.
The DADA2 error model identifies the differences between sequences, eg. A->C,
whereas other methods merely count the mismatches. DADA2 can parameterize its error model from the data itself,
rather than relying on previous datasets that may or may not reflect the PCR and sequencing protocols used in your study.
DADA2 pipeline includes several tools for read quality control, including quality filtering, trimming, denoising, pair merging and chimera filtering. Below are the major processing steps of DADA2:
Step 1. Read trimming based on sequence quality
The quality of NGS Illumina sequences often decreases toward the end of the reads.
DADA2 allows to trim off the poor quality read ends in order to improve the error
model building and pair mergicing performance.
Step 2. Learn the Error Rates
The DADA2 algorithm makes use of a parametric error model (err) and every
amplicon dataset has a different set of error rates. The learnErrors method
learns this error model from the data, by alternating estimation of the error
rates and inference of sample composition until they converge on a jointly
consistent solution. As in many machine-learning problems, the algorithm must
begin with an initial guess, for which the maximum possible error rates in
this data are used (the error rates if only the most abundant sequence is
correct and all the rest are errors).
Step 3. Infer amplicon sequence variants (ASVs) based on the error model built in previous step. This step is also called sequence "denoising".
The outcome of this step is a list of ASVs that are the equivalent of oligonucleotides.
Step 4. Merge paired reads. If the sequencing products are read pairs, DADA2 will merge the R1 and R2 ASVs into single sequences.
Merging is performed by aligning the denoised forward reads with the reverse-complement of the corresponding
denoised reverse reads, and then constructing the merged “contig” sequences.
By default, merged sequences are only output if the forward and reverse reads overlap by
at least 12 bases, and are identical to each other in the overlap region (but these conditions can be changed via function arguments).
Step 5. Remove chimera.
The core dada method corrects substitution and indel errors, but chimeras remain. Fortunately, the accuracy of sequence variants
after denoising makes identifying chimeric ASVs simpler than when dealing with fuzzy OTUs.
Chimeric sequences are identified if they can be exactly reconstructed by
combining a left-segment and a right-segment from two more abundant “parent” sequences. The frequency of chimeric sequences varies substantially
from dataset to dataset, and depends on on factors including experimental procedures and sample complexity.
Results
1. Read Quality Plots NGS sequence analaysis starts with visualizing the quality of the sequencing. Below are the quality plots of the first
sample for the R1 and R2 reads separately. In gray-scale is a heat map of the frequency of each quality score at each base position. The mean
quality score at each position is shown by the green line, and the quartiles of the quality score distribution by the orange lines.
The forward reads are usually of better quality. It is a common practice to trim the last few nucleotides to avoid less well-controlled errors
that can arise there. The trimming affects the downstream steps including error model building, merging and chimera calling. FOMC uses an empirical
approach to test many combinations of different trim length in order to achieve best final amplicon sequence variants (ASVs), see the next
section “Optimal trim length for ASVs”.
Below is the link to a PDF file for viewing the quality plots for all samples:
2. Optimal trim length for ASVs The final number of merged and chimera-filtered ASVs depends on the quality filtering (hence trimming) in the very beginning of the DADA2 pipeline.
In order to achieve highest number of ASVs, an empirical approach was used -
Create a random subset of each sample consisting of 5,000 R1 and 5,000 R2 (to reduce computation time)
Trim 10 bases at a time from the ends of both R1 and R2 up to 50 bases
For each combination of trimmed length (e.g., 300x300, 300x290, 290x290 etc), the trimmed reads are
subject to the entire DADA2 pipeline for chimera-filtered merged ASVs
The combination with highest percentage of the input reads becoming final ASVs is selected for the complete set of data
Below is the result of such operation, showing ASV percentages of total reads for all trimming combinations (1st Column = R1 lengths in bases; 1st Row = R2 lengths in bases):
R1/R2
251
241
231
221
211
201
250
56.07%
66.11%
70.19%
70.57%
70.67%
71.21%
240
56.37%
66.65%
70.80%
71.16%
71.27%
71.77%
230
56.81%
67.73%
71.70%
71.99%
72.04%
23.38%
220
57.45%
68.46%
72.44%
72.87%
23.33%
21.44%
210
58.07%
69.01%
73.05%
23.67%
21.73%
0.00%
Based on the above result, the trim length combination of R1 = 210 bases and R2 = 231 bases (highlighted red above), was chosen for generating final ASVs for all sequences.
This combination generated highest number of merged non-chimeric ASVs and was used for downstream analyses, if requested.
3. Error plots from learning the error rates
After DADA2 building the error model for the set of data, it is always worthwhile, as a sanity check if nothing else, to visualize the estimated error rates.
The error rates for each possible transition (A→C, A→G, …) are shown below. Points are the observed error rates for each consensus quality score.
The black line shows the estimated error rates after convergence of the machine-learning algorithm.
The red line shows the error rates expected under the nominal definition of the Q-score.
The ideal result would be the estimated error rates (black line) are a good fit to the observed rates (points), and the error rates drop
with increased quality as expected.
Forward Read R1 Error Plot
Reverse Read R2 Error Plot
The PDF version of these plots are available here:
4. DADA2 Result Summary The table below shows the summary of the DADA2 analysis,
tracking paired read counts of each samples for all the steps during DADA2 denoising process -
including end-trimming (filtered), denoising (denoisedF, denoisedF), pair merging (merged) and chimera removal (nonchim).
Sample ID
F0000.S001
F0000.S002
F0000.S003
F0000.S004
F0000.S005
F0000.S006
F0000.S007
F0000.S008
F0000.S009
F0000.S010
F0000.S011
F0000.S012
F0000.S013
F0000.S014
F0000.S015
F0000.S016
F0000.S017
F0000.S018
F0000.S019
F0000.S020
F0000.S021
F0000.S022
F0000.S023
F0000.S024
F0000.S025
F0000.S026
F0000.S027
F0000.S028
F0000.S029
F0000.S030
F0000.S031
F0000.S032
F0000.S033
F0000.S034
F0000.S035
F0000.S036
F0000.S037
F0000.S038
F0000.S039
F0000.S040
F0000.S041
F0000.S042
F0000.S043
F0000.S044
F0000.S045
F0000.S046
F0000.S047
F0000.S048
F0000.S049
F0000.S050
F0000.S051
F0000.S052
F0000.S053
F0000.S054
F0000.S055
F0000.S056
F0000.S057
F0000.S058
F0000.S059
F0000.S060
F0000.S061
F0000.S062
F0000.S063
F0000.S064
F0000.S065
F0000.S066
F0000.S067
F0000.S068
F0000.S069
F0000.S070
F0000.S071
F0000.S072
F0000.S073
F0000.S074
F0000.S075
F0000.S076
F0000.S077
F0000.S078
F0000.S079
F0000.S080
F0000.S081
F0000.S082
F0000.S083
F0000.S084
F0000.S085
F0000.S086
F0000.S087
F0000.S088
F0000.S089
F0000.S090
F0000.S091
F0000.S092
F0000.S093
F0000.S094
F0000.S095
F0000.S096
F0000.S097
F0000.S098
F0000.S099
F0000.S100
F0000.S101
F0000.S102
F0000.S103
F0000.S104
F0000.S105
F0000.S106
F0000.S107
F0000.S108
F0000.S109
F0000.S110
F0000.S111
F0000.S112
F0000.S113
F0000.S114
F0000.S115
F0000.S116
F0000.S117
F0000.S118
F0000.S119
F0000.S120
F0000.S121
F0000.S122
F0000.S123
F0000.S124
F0000.S125
F0000.S126
F0000.S127
F0000.S128
F0000.S129
F0000.S130
F0000.S131
F0000.S132
F0000.S133
F0000.S134
F0000.S135
F0000.S136
F0000.S137
F0000.S138
F0000.S139
F0000.S140
F0000.S141
F0000.S142
F0000.S143
F0000.S144
F0000.S145
F0000.S146
F0000.S147
F0000.S148
F0000.S149
F0000.S150
F0000.S151
F0000.S152
F0000.S153
Row Sum
Percentage
input
137,933
158,464
116,599
128,071
52,072
127,997
145,498
108,323
98,346
87,499
131,709
127,676
112,314
158,078
124,537
125,332
139,907
140,357
107,190
103,439
148,030
168,282
105,451
113,386
64,332
153,168
58,341
136,918
138,524
118,186
110,379
118,624
139,383
143,666
136,998
105,086
144,982
164,653
125,276
127,010
129,689
179,172
115,022
124,791
113,092
107,250
113,253
88,074
99,161
136,953
80,032
131,329
132,947
115,598
108,285
95,062
92,431
122,725
123,696
125,729
120,843
132,563
86,582
114,212
107,784
87,269
88,141
120,151
142,107
89,696
88,665
142,115
167,673
201,920
188,678
173,706
137,090
163,271
63,333
162,834
109,716
283,113
169,976
114,580
734
149,709
123,675
78,240
96,890
176,746
218,963
172,152
128,071
248,827
125,164
119,205
147,826
181,290
110,607
120,027
201,314
116,175
129,502
118,217
141,759
86,128
102,658
140,308
180,923
128,955
118,115
144,271
137,654
122,178
144,983
126,917
114,673
105,467
95,144
155,442
165,991
181,548
79,417
141,017
101,107
92,597
3,787
133,829
113,750
128,885
207,046
251,698
93,589
92,409
101,117
119,830
116,901
110,044
69,591
146,038
264,380
180,835
126,758
150,768
84,784
75,980
133,941
109,177
124,545
99,952
153,251
116,732
174,674
19,597,202
100.00%
filtered
123,904
146,028
106,933
117,804
48,385
121,079
132,292
101,512
92,093
81,263
126,756
115,865
101,306
149,814
113,734
114,934
130,279
129,839
100,228
95,621
135,275
160,159
99,719
107,011
60,291
141,363
54,079
124,507
130,676
109,538
101,512
110,304
135,343
129,899
122,102
95,885
133,760
151,098
116,197
119,147
118,368
163,490
106,484
120,756
104,637
97,637
103,904
82,349
91,424
128,271
74,145
118,898
125,310
105,467
105,839
90,469
86,286
115,394
113,649
120,980
113,452
127,341
83,770
111,427
103,744
84,066
84,589
115,590
137,367
86,347
85,360
137,536
162,004
195,431
184,054
167,474
132,215
158,931
61,200
156,305
106,022
272,403
160,519
111,118
615
144,269
118,665
73,009
93,219
170,876
212,127
164,557
123,806
240,514
121,088
116,146
142,873
175,084
105,939
114,363
196,453
112,045
125,728
113,833
136,923
83,438
99,348
136,095
174,797
123,906
113,430
137,893
132,682
113,816
140,020
121,877
110,194
101,459
91,396
149,071
161,340
174,771
76,448
136,532
98,664
89,413
3,490
129,205
110,368
123,722
199,772
243,550
90,168
89,113
97,382
110,959
113,360
106,850
67,677
141,878
256,534
175,999
121,845
145,505
82,363
73,673
128,640
102,425
114,410
93,167
140,266
109,164
170,170
18,609,933
94.96%
denoisedF
120,426
143,774
104,842
114,752
47,355
119,873
128,567
99,147
90,731
79,598
123,847
113,572
98,656
148,329
110,323
113,272
128,000
127,124
98,374
93,620
131,136
156,780
97,934
104,984
59,214
138,124
52,812
121,738
129,110
107,272
99,294
107,594
133,040
126,392
118,014
93,374
130,000
148,388
114,298
117,824
115,017
159,729
103,857
118,986
102,475
94,629
101,473
81,016
89,715
126,803
72,656
115,533
123,198
103,308
104,203
89,323
84,619
113,899
110,641
119,401
111,962
125,181
82,573
109,725
100,897
82,876
82,670
113,157
136,120
84,280
83,727
135,475
160,103
194,004
182,330
165,255
129,747
156,379
59,812
153,118
104,524
267,388
158,880
109,464
543
141,635
115,625
71,427
91,185
169,815
209,226
161,856
121,487
236,595
119,610
114,345
140,437
172,665
103,546
111,691
194,537
110,188
124,137
112,028
134,562
82,006
97,495
133,971
171,810
122,204
110,318
135,338
130,803
110,033
137,766
119,542
107,938
99,726
89,496
147,216
159,176
171,217
75,423
134,511
97,845
87,571
3,395
126,861
107,989
120,504
195,295
241,880
88,217
87,539
96,078
108,443
111,907
105,582
66,698
139,909
250,844
173,174
119,220
143,466
81,277
72,515
126,190
100,649
112,188
91,373
138,021
107,438
167,757
18,274,586
93.25%
denoisedR
115,429
136,212
102,127
110,139
45,559
117,272
123,869
95,708
88,861
77,344
122,241
110,254
95,008
144,019
104,861
110,396
124,654
123,478
95,642
90,827
127,238
154,439
94,229
102,022
56,724
133,846
50,924
116,998
125,049
103,129
96,603
104,009
131,349
121,537
113,870
90,219
126,086
144,004
111,676
115,311
111,121
155,805
100,275
117,941
99,154
91,149
98,772
78,240
86,566
123,968
70,458
110,381
118,649
99,667
102,717
87,165
81,955
111,099
106,688
118,388
110,329
123,883
81,918
107,956
100,400
81,710
81,160
108,226
134,589
83,035
82,298
134,127
158,123
191,916
180,243
163,137
128,289
153,159
58,845
150,568
103,195
263,039
157,174
107,331
528
139,047
113,573
70,413
89,021
168,095
206,532
159,218
120,150
233,797
118,160
113,055
138,568
171,083
101,322
109,315
191,373
108,043
123,415
110,878
132,426
80,944
95,746
131,999
169,983
118,946
107,465
132,539
128,649
107,564
135,866
117,527
106,046
98,683
88,276
145,825
156,566
169,073
74,571
132,922
96,814
85,948
3,301
124,378
106,354
119,545
190,911
237,981
87,218
85,419
95,302
107,145
110,428
104,397
65,623
138,966
249,610
170,806
117,497
141,800
80,516
71,669
123,873
97,730
109,310
89,155
133,648
104,119
165,616
17,900,251
91.34%
merged
101,484
125,063
91,810
98,135
41,546
109,830
112,559
87,283
81,277
70,946
115,798
99,652
85,085
133,703
93,515
99,718
113,721
112,040
86,799
81,830
113,091
144,541
84,587
91,755
50,288
123,733
46,392
106,892
118,905
93,299
87,936
92,079
121,548
109,160
100,472
81,552
115,624
134,012
102,068
106,613
99,721
146,770
92,087
110,974
91,002
84,333
87,611
74,675
78,035
113,062
63,203
101,459
111,448
93,400
97,214
80,243
75,784
103,011
98,530
110,455
103,351
114,971
76,464
99,922
96,942
78,229
73,751
102,043
126,914
75,340
75,409
126,123
155,588
186,381
170,041
160,600
119,537
143,198
54,332
142,800
95,161
244,799
154,141
99,822
440
130,079
106,268
65,130
82,124
160,187
193,559
147,951
111,105
224,199
110,075
106,063
131,739
162,439
94,112
101,760
188,830
100,027
114,727
101,677
123,544
74,248
88,361
124,399
151,706
111,844
98,940
122,574
120,566
96,875
127,159
109,162
96,532
91,149
80,203
138,578
144,087
156,665
69,753
123,669
92,804
78,521
3,092
114,197
97,821
115,238
185,048
233,716
79,530
77,917
91,847
102,168
104,553
98,854
61,706
129,806
242,205
166,563
110,660
138,840
75,607
65,992
116,340
89,779
97,642
80,813
124,429
93,517
162,033
16,668,960
85.06%
nonchim
70,408
104,050
72,258
66,394
35,240
84,184
92,456
63,450
57,807
52,723
89,714
79,047
60,901
107,601
73,443
77,918
82,860
70,847
64,254
62,282
87,073
106,048
60,646
70,960
38,982
105,824
35,953
77,951
105,715
68,340
63,461
70,224
92,219
84,697
67,985
62,055
88,996
110,979
80,159
82,610
74,379
127,534
64,831
76,528
68,985
73,436
61,426
71,047
61,833
84,947
48,555
75,669
96,077
81,511
72,311
61,860
61,596
82,347
80,296
91,660
78,444
85,891
57,822
77,154
92,752
70,764
51,557
79,936
101,327
58,214
61,073
97,955
153,316
162,314
134,382
157,890
81,143
113,499
40,399
115,389
70,101
176,840
149,286
73,959
354
87,422
86,106
48,244
68,094
106,226
141,493
107,894
81,076
166,914
81,911
75,484
114,168
130,377
69,734
77,578
186,164
71,204
76,627
73,995
96,659
59,761
64,957
98,236
128,658
86,011
73,596
95,252
90,243
71,808
102,672
84,633
67,221
66,458
58,253
114,351
106,573
126,021
53,857
95,077
70,811
60,558
2,855
81,293
72,071
109,533
174,703
226,948
55,660
59,637
79,448
81,111
80,531
77,685
45,713
99,157
229,989
162,430
89,553
132,519
59,802
49,985
89,152
55,368
67,397
62,277
93,087
76,893
159,185
13,183,662
67.27%
This table can be downloaded as an Excel table below:
5. DADA2 Amplicon Sequence Variants (ASVs). A total of 6443 unique merged and chimera-free ASV sequences were identified, and their corresponding
read counts for each sample are available in the "ASV Read Count Table" with rows for the ASV sequences and columns for sample. This read count table can be used for
microbial profile comparison among different samples and the sequences provided in the table can be used to taxonomy assignment.
The species-level, open-reference 16S rRNA NGS reads taxonomy assignment pipeline
Version 20210310
1. Raw sequences reads in FASTA format were BLASTN-searched against a combined set of 16S rRNA reference sequences.
It consists of MOMD (version 0.1), the HOMD (version 15.2 http://www.homd.org/index.php?name=seqDownload&file&type=R ),
HOMD 16S rRNA RefSeq Extended Version 1.1 (EXT), GreenGene Gold (GG)
(http://greengenes.lbl.gov/Download/Sequence_Data/Fasta_data_files/gold_strains_gg16S_aligned.fasta.gz) ,
and the NCBI 16S rRNA reference sequence set (https://ftp.ncbi.nlm.nih.gov/blast/db/16S_ribosomal_RNA.tar.gz).
These sequences were screened and combined to remove short sequences (<1000nt), chimera, duplicated and sub-sequences,
as well as sequences with poor taxonomy annotation (e.g., without species information).
This process resulted in 1,015 from HOMD V15.22, 495 from EXT, 3,940 from GG and 18,044 from NCBI, a total of 25,120 sequences.
Altogether these sequence represent a total of 15,601 oral and non-oral microbial species.
The NCBI BLASTN version 2.7.1+ (Zhang et al, 2000) was used with the default parameters.
Reads with ≥ 98% sequence identity to the matched reference and ≥ 90% alignment length
(i.e., ≥ 90% of the read length that was aligned to the reference and was used to calculate
the sequence percent identity) were classified based on the taxonomy of the reference sequence
with highest sequence identity. If a read matched with reference sequences representing
more than one species with equal percent identity and alignment length, it was subject
to chimera checking with USEARCH program version v8.1.1861 (Edgar 2010). Non-chimeric reads with multi-species
best hits were considered valid and were assigned with a unique species
notation (e.g., spp) denoting unresolvable multiple species.
2. Unassigned reads (i.e., reads with < 98% identity or < 90% alignment length) were pooled together and reads < 200 bases were
removed. The remaining reads were subject to the de novo
operational taxonomy unit (OTU) calling and chimera checking using the USEARCH program version v8.1.1861 (Edgar 2010).
The de novo OTU calling and chimera checking was done using 98% as the sequence identity cutoff, i.e., the species-level OTU.
The output of this step produced species-level de novo clustered OTUs with 98% identity.
Representative reads from each of the OTUs/species were then BLASTN-searched
against the same reference sequence set again to determine the closest species for
these potential novel species. These potential novel species were pooled together with the reads that were signed to specie-level in
the previous step, for down-stream analyses.
Reference:
Edgar RC. Search and clustering orders of magnitude faster than BLAST.
Bioinformatics. 2010 Oct 1;26(19):2460-1. doi: 10.1093/bioinformatics/btq461. Epub 2010 Aug 12. PubMed PMID: 20709691.
3. Designations used in the taxonomy:
1) Taxonomy levels are indicated by these prefixes:
k__: domain/kingdom
p__: phylum
c__: class
o__: order
f__: family
g__: genus
s__: species
Example:
k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae;g__Blautia;s__faecis
2) Unique level identified – known species:
k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae;g__Roseburia;s__hominis
The above example shows some reads match to a single species (all levels are unique)
3) Non-unique level identified – known species:
k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae;g__Roseburia;s__multispecies_spp123_3
The above example “s__multispecies_spp123_3” indicates certain reads equally match to 3 species of the
genus Roseburia; the “spp123” is a temporally assigned species ID.
k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae;g__multigenus;s__multispecies_spp234_5
The above example indicates certain reads match equally to 5 different species, which belong to multiple genera.;
the “spp234” is a temporally assigned species ID.
4) Unique level identified – unknown species, potential novel species:
k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae;g__Roseburia;s__ hominis_nov_97%
The above example indicates that some reads have no match to any of the reference sequences with
sequence identity ≥ 98% and percent coverage (alignment length) ≥ 98% as well. However this groups
of reads (actually the representative read from a de novo OTU) has 96% percent identity to
Roseburia hominis, thus this is a potential novel species, closest to Roseburia hominis.
(But they are not the same species).
5) Multiple level identified – unknown species, potential novel species:
k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae;g__Roseburia;s__ multispecies_sppn123_3_nov_96%
The above example indicates that some reads have no match to any of the reference sequences
with sequence identity ≥ 98% and percent coverage (alignment length) ≥ 98% as well.
However this groups of reads (actually the representative read from a de novo OTU)
has 96% percent identity equally to 3 species in Roseburia. Thus this is no single
closest species, instead this group of reads match equally to multiple species at 96%.
Since they have passed chimera check so they represent a novel species. “sppn123” is a
temporary ID for this potential novel species.
4. The taxonomy assignment algorithm is illustrated in this flow char below:
Read Taxonomy Assignment - Result Summary *
Code
Category
MPC=0% (>=1 read)
MPC=0.1%(>=13170 reads)
A
Total reads
13,183,662
13,183,662
B
Total assigned reads
13,170,446
13,170,446
C
Assigned reads in species with read count < MPC
0
905,105
D
Assigned reads in samples with read count < 500
354
347
E
Total samples
153
153
F
Samples with reads >= 500
152
152
G
Samples with reads < 500
1
1
H
Total assigned reads used for analysis (B-C-D)
13,170,092
12,264,994
I
Reads assigned to single species
8,094,232
7,484,905
J
Reads assigned to multiple species
5,032,465
4,780,089
K
Reads assigned to novel species
43,395
0
L
Total number of species
1,317
114
M
Number of single species
474
78
N
Number of multi-species
252
36
O
Number of novel species
590
0
P
Total unassigned reads
13,216
13,216
Q
Chimeric reads
237
237
R
Reads without BLASTN hits
499
499
S
Others: short, low quality, singletons, etc.
12,480
12,480
A=B+P=C+D+H+Q+R+S
E=F+G
B=C+D+H
H=I+J+K
L=M+N+O
P=Q+R+S
* MPC = Minimal percent (of all assigned reads) read count per species, species with read count < MPC were removed.
* Samples with reads < 500 were removed from downstream analyses.
* The assignment result from MPC=0.1% was used in the downstream analyses.
Read Taxonomy Assignment - ASV Species-Level Read Counts Table
This table shows the read counts for each sample (columns) and each species identified based on the ASV sequences.
The downstream analyses were based on this table.
Download Read Count Tables at Different Taxonomy Levels
domain
phylum
class
order
family
genus
species
;
The species listed in the table has full taxonomy and a dynamically assigned species ID specific to this report.
When some reads match with the reference sequences of more than one species equally (i.e., same percent identiy and alignmnet coverage),
they can't be assigned to a particular species. Instead, they are assigned to multiple species with the species notaton
"s__multispecies_spp2_2". In this notation, spp2 is the dynamic ID assigned to these reads that hit multiple sequences and the "_2"
at the end of the notation means there are two species in the spp2.
You can look up which species are included in the multi-species assignment, in this table below:
Another type of notation is "s__multispecies_sppn2_2", in which the "n" in the sppn2 means it's a potential novel species because all the reads in this species
have < 98% idenity to any of the reference sequences. They were grouped together based on de novo OTU clustering at 98% identity cutoff. And then
a representative sequence was chosed to BLASTN search against the reference database to find the closest match (but will still be < 98%). This representative
sequence also matched equally to more than one species, hence the "spp" was given in the label.
In ecology, alpha diversity (α-diversity) is the mean species diversity in sites or habitats at a local scale.
The term was introduced by R. H. Whittaker[1][2] together with the terms beta diversity (β-diversity)
and gamma diversity (γ-diversity). Whittaker's idea was that the total species diversity in a landscape
(gamma diversity) is determined by two different things, the mean species diversity in sites or habitats
at a more local scale (alpha diversity) and the differentiation among those habitats (beta diversity).
Diversity measures are affected by the sampling depth. Rarefaction is a technique to assess species richness from the results of sampling. Rarefaction allows
the calculation of species richness for a given number of individual samples, based on the construction
of so-called rarefaction curves. This curve is a plot of the number of species as a function of the
number of samples. Rarefaction curves generally grow rapidly at first, as the most common species are found,
but the curves plateau as only the rarest species remain to be sampled.
The two main factors taken into account when measuring diversity are richness and evenness.
Richness is a measure of the number of different kinds of organisms present in a particular area.
Evenness compares the similarity of the population size of each of the species present. There are
many different ways to measure the richness and evenness. These measurements are called "estimators" or "indices".
Below is a diversity of 3 commonly used indices showing the values for all the samples (dots) and in groups (boxes).
 
Alpha Diversity Box Plots for All Groups
 
 
 
Alpha Diversity Box Plots for Individual Comparisons
Beta diversity compares the similarity (or dissimilarity) of microbial profiles between different
groups of samples. There are many different similarity/dissimilarity metrics.
In general, they can be quantitative (using sequence abundance, e.g., Bray-Curtis or weighted UniFrac)
or binary (considering only presence-absence of sequences, e.g., binary Jaccard or unweighted UniFrac).
They can be even based on phylogeny (e.g., UniFrac metrics) or not (non-UniFrac metrics, such as Bray-Curtis, etc.).
For microbiome studies, species profiles of samples can be compared with the Bray-Curtis dissimilarity,
which is based on the count data type. The pair-wise Bray-Curtis dissimilarity matrix of all samples can then be
subject to either multi-dimensional scaling (MDS, also known as PCoA) or non-metric MDS (NMDS).
MDS/PCoA is a
scaling or ordination method that starts with a matrix of similarities or dissimilarities
between a set of samples and aims to produce a low-dimensional graphical plot of the data
in such a way that distances between points in the plot are close to original dissimilarities.
NMDS is similar to MDS, however it does not use the dissimilarities data, instead it converts them into
the ranks and use these ranks in the calculation.
In our beta diversity analysis, Bray-Curtis dissimilarity matrix was first calculated and then plotted by the PCoA and
NMDS separately. Below are beta diveristy results for all groups together:
 
 
NMDS and PCoA Plots for All Groups
 
 
 
 
 
The above PCoA and NMDS plots are based on count data. The count data can also be transformed into centered log ratio (CLR)
for each species. The CLR data is no longer count data and cannot be used in Bray-Curtis dissimilarity calculation. Instead
CLR can be compared with Euclidean distances. When CLR data are compared by Euclidean distance, the distance is also called
Aitchison distance.
Below are the NMDS and PCoA plots of the Aitchison distances of the samples:
16S rRNA next generation sequencing (NGS) generates a fixed number of reads that reflect the proportion of different
species in a sample, i.e., the relative abundance of species, instead of the absolute abundance.
In Mathematics, measurements involving probabilities, proportions, percentages, and ppm can all
be thought of as compositional data. This makes the microbiome read count data “compositional”
(Gloor et al, 2017). In general, compositional data represent parts of a whole which only
carry relative information (http://www.compositionaldata.com/).
The problem of microbiome data being compositional arises when comparing two groups of samples for
identifying “differentially abundant” species. A species with the same absolute abundance between two
conditions, its relative abundances in the two conditions (e.g., percent abundance) can become different
if the relative abundance of other species change greatly. This problem can lead to incorrect conclusion
in terms of differential abundance for microbial species in the samples.
When studying differential abundance (DA), the current better approach is to transform the read count
data into log ratio data. The ratios are calculated between read counts of all species in a sample to
a “reference” count (e.g., mean read count of the sample). The log ratio data allow the detection of DA
species without being affected by percentage bias mentioned above
In this report, a compositional DA analysis tool “ANCOM” (analysis of composition of microbiomes)
was used. ANCOM transforms the count data into log-ratios and thus is more suitable for comparing
the composition of microbiomes in two or more populations. "ANCOM" generates a table of features with
W-statistics and whether the null hypothesis is rejected. The “W” is the W-statistic, or number of
features that a single feature is tested to be significantly different against. Hence the higher the "W"
the more statistical sifgnificane that a feature/species is differentially abundant.
References:
Gloor GB, Macklaim JM, Pawlowsky-Glahn V, Egozcue JJ. Microbiome Datasets Are Compositional: And This Is Not Optional. Front Microbiol.
2017 Nov 15;8:2224. doi: 10.3389/fmicb.2017.02224. PMID: 29187837; PMCID: PMC5695134.
Mandal S, Van Treuren W, White RA, Eggesbø M, Knight R, Peddada SD. Analysis of composition of
microbiomes: a novel method for studying microbial composition. Microb Ecol Health Dis.
2015 May 29;26:27663. doi: 10.3402/mehd.v26.27663. PMID: 26028277; PMCID: PMC4450248.
Lin H, Peddada SD. Analysis of compositions of microbiomes with bias correction.
Nat Commun. 2020 Jul 14;11(1):3514. doi: 10.1038/s41467-020-17041-7.
PMID: 32665548; PMCID: PMC7360769.
Starting with version V1.2, we also include the results of ANCOM-BC (Analysis of Compositions of
Microbiomes with Bias Correction) (Lin and Peddada 2020). ANCOM-BC is an updated version of "ANCOM" that:
(a) provides statistically valid test with appropriate p-values,
(b) provides confidence intervals for differential abundance of each taxon,
(c) controls the False Discovery Rate (FDR),
(d) maintains adequate power, and
(e) is computationally simple to implement.
The bias correction (BC) addresses a challenging problem of the bias introduced by differences in
the sampling fractions across samples. This bias has been a major hurdle in performing DA analysis of microbiome data.
ANCOM-BC estimates the unknown sampling fractions and corrects the bias induced by their differences among samples.
The absolute abundance data are modeled using a linear regression framework.
References:
Lin H, Peddada SD. Analysis of compositions of microbiomes with bias correction.
Nat Commun. 2020 Jul 14;11(1):3514. doi: 10.1038/s41467-020-17041-7.
PMID: 32665548; PMCID: PMC7360769.
LEfSe (Linear Discriminant Analysis Effect Size) is an alternative method to find "organisms, genes, or
pathways that consistently explain the differences between two or more microbial communities" (Segata et al., 2011).
Specifically, LEfSe uses rank-based Kruskal-Wallis (KW) sum-rank test to detect features with significant
differential (relative) abundance with respect to the class of interest. Since it is rank-based, instead of proportional based,
the differential species identified among the comparison groups is less biased (than percent abundance based).
Reference:
Segata N, Izard J, Waldron L, Gevers D, Miropolsky L, Garrett WS, Huttenhower C. Metagenomic biomarker discovery and explanation. Genome Biol. 2011 Jun 24;12(6):R60. doi: 10.1186/gb-2011-12-6-r60. PMID: 21702898; PMCID: PMC3218848.
To analyze the co-occurrence or co-exclusion between microbial species among different samples, network correlation
analysis tools are usually used for this purpose. However, microbiome count data are compositional. If count data are normalized to the total number of counts in the
sample, the data become not independent and traditional statistical metrics (e.g., correlation) for the detection
of specie-species relationships can lead to spurious results. In addition, sequencing-based studies typically
measure hundreds of OTUs (species) on few samples; thus, inference of OTU-OTU association networks is severely
under-powered. Here we use SPIEC-EASI (SParse InversECovariance Estimation
for Ecological Association Inference), a statistical method for the inference of microbial
ecological networks from amplicon sequencing datasets that addresses both of these issues (Kurtz et al., 2015).
SPIEC-EASI combines data transformations developed for compositional data analysis with a graphical model
inference framework that assumes the underlying ecological association network is sparse. SPIEC-EASI provides
two algorithms for network inferencing – 1) Meinshausen-Bühlmann's neighborhood selection (MB method) and inverse covariance selection
(GLASSO method, i.e., graphical least absolute shrinkage and selection operator). This is fundamentally distinct from SparCC, which essentially estimate pairwise correlations. In addition
to these two methods, we provide the results of a third method - SparCC (Sparse Correlations for Compositional Data)(Friedman & Alm 2012), which
is also a method for inferring correlations from compositional data. SparCC estimates the linear Pearson correlations between
the log-transformed components.
References:
Kurtz ZD, Müller CL, Miraldi ER, Littman DR, Blaser MJ, Bonneau RA. Sparse and compositionally robust inference of microbial ecological networks. PLoS Comput Biol. 2015 May 7;11(5):e1004226. doi: 10.1371/journal.pcbi.1004226. PMID: 25950956; PMCID: PMC4423992.
The results of this analysis are for research purpose only. They are not intended to diagnose, treat, cure, or prevent any disease. Forsyth and FOMC
are not responsible for use of information provided in this report outside the research area.