FOMC Service Report

16S rRNA Gene V1V3 Amplicon Sequencing

Version V1.52

Version History

The Forsyth Institute, Cambridge, MA, USA
January 20, 2026

Project ID: FOMC27353


";

I. Project Summary

Project FOMC27353 services include NGS sequencing of the V1V3 region of the 16S rRNA gene amplicons from the samples. First and foremost, please download this report, as well as the sequence raw data from the download links provided below. These links will expire after 60 days. We cannot guarantee the availability of your data after 60 days.

Full Bioinformatics analysis service was requested. We provide many analyses, starting from the raw sequence quality and noise filtering, pair reads merging, as well as chimera filtering for the sequences, using the DADA2 denosing algorithm and pipeline.

We also provide many downstream analyses such as taxonomy assignment, alpha and beta diversity analyses, and differential abundance analysis.

For taxonomy assignment, most informative would be the taxonomy barplots. We provide an interactive barplots to show the relative abundance of microbes at different taxonomy levels (from Phylum to species) that you can choose.

If you specify which groups of samples you want to compare for differential abundance, we provide both ANCOM and LEfSe differential abundance analysis.

 

II. Workflow Checklist

1.Sample Received
2.Sample Quality Evaluated
3.Sample Prepared for Sequencing
4.Next-Gen Sequencing
5.Sequence Quality Check
6.Absolute Abundance
7.Report and Raw Sequence Data Available for Download
8.Bioinformatics Analysis - Reads Processing (DADA2 Quality Trimming, Denoising, Paired Reads Merging)
9.Bioinformatics Analysis - Reads Taxonomy Assignment
10.Bioinformatics Analysis - Alpha Diversity Analysis
11.Bioinformatics Analysis - Beta Diversity Analysis
12.Bioinformatics Analysis - Differential Abundance Analysis
13.Bioinformatics Analysis - Heatmap Profile
14.Bioinformatics Analysis - Network Association
 

III. NGS Sequencing

The samples were processed and analyzed with the ZymoBIOMICS® Service: Targeted Metagenomic Sequencing (Zymo Research, Irvine, CA).

DNA Extraction: If DNA extraction was performed, the following DNA extraction kit was used according to the manufacturer’s instructions:

ZymoBIOMICS®-96 MagBead DNA Kit (Zymo Research, Irvine, CA)
N/A (DNA Extraction Not Performed)
Elution Volume: 50µL
Additional Notes: NA

Targeted Library Preparation: The DNA samples were prepared for targeted sequencing with the Quick-16S™ NGS Library Prep Kit (Zymo Research, Irvine, CA). These primers were custom designed by Zymo Research to provide the best coverage of the 16S gene while maintaining high sensitivity. The primer sets used in this project are marked below:

Quick-16S™ Primer Set V1-V2 (Zymo Research, Irvine, CA)
Quick-16S™ Primer Set V1-V3 (Zymo Research, Irvine, CA)
Quick-16S™ Primer Set V3-V4 (Zymo Research, Irvine, CA)
Quick-16S™ Primer Set V4 (Zymo Research, Irvine, CA)
Quick-16S™ Primer Set V6-V8 (Zymo Research, Irvine, CA)
Additional Notes: NA

The sequencing library was prepared using an innovative library preparation process in which PCR reactions were performed in real-time PCR machines to control cycles and therefore limit PCR chimera formation. The final PCR products were quantified with qPCR fluorescence readings and pooled together based on equal molarity. The final pooled library was cleaned up with the Select-a-Size DNA Clean & Concentrator™ (Zymo Research, Irvine, CA), then quantified with TapeStation® (Agilent Technologies, Santa Clara, CA) and Qubit® (Thermo Fisher Scientific, Waltham, WA).

Control Samples: The ZymoBIOMICS® Microbial Community Standard (Zymo Research, Irvine, CA) was used as a positive control for each DNA extraction, if performed. The ZymoBIOMICS® Microbial Community DNA Standard (Zymo Research, Irvine, CA) was used as a positive control for each targeted library preparation. Negative controls (i.e. blank extraction control, blank library preparation control) were included to assess the level of bioburden carried by the wet-lab process.

Sequencing: The final library was sequenced on Illumina® NextSeq 2000™ with a p1 (Illumina, Sand Diego, CA) reagent kit (600 cycles). The sequencing was performed with 25% PhiX spike-in.

Absolute Abundance Quantification*: A quantitative real-time PCR was set up with a standard curve. The standard curve was made with plasmid DNA containing one copy of the 16S gene and one copy of the fungal ITS2 region prepared in 10-fold serial dilutions. The primers used were the same as those used in Targeted Library Preparation. The equation generated by the plasmid DNA standard curve was used to calculate the number of gene copies in the reaction for each sample. The PCR input volume (2 µl) was used to calculate the number of gene copies per microliter in each DNA sample.
The number of genome copies per microliter DNA sample was calculated by dividing the gene copy number by an assumed number of gene copies per genome. The value used for 16S copies per genome is 4. The value used for ITS copies per genome is 200. The amount of DNA per microliter DNA sample was calculated using an assumed genome size of 4.64 x 106 bp, the genome size of Escherichia coli, for 16S samples, or an assumed genome size of 1.20 x 107 bp, the genome size of Saccharomyces cerevisiae, for ITS samples. This calculation is shown below:

Calculated Total DNA = Calculated Total Genome Copies × Assumed Genome Size (4.64 × 106 bp) ×
Average Molecular Weight of a DNA bp (660 g/mole/bp) ÷ Avogadro’s Number (6.022 x 1023/mole)


* Absolute Abundance Quantification is only available for 16S and ITS analyses.

The absolute abundance standard curve data can be viewed in Excel here:

The absolute abundance standard curve is shown below:

Absolute Abundance Standard Curve

 

IV. Complete Report Download

The complete report of your project, including all links in this report, can be downloaded by clicking the link provided below. The downloaded file is a compressed ZIP file and once unzipped, open the file “REPORT.html” (may only shown as "REPORT" in your computer) by double clicking it. Your default web browser will open it and you will see the exact content of this report.

Please download and save the file to your computer storage device. The download link will expire after 60 days upon your receiving of this report.

Complete report download link:

To view the report, please follow the following steps:

1.Download the .zip file from the report link above.
2.Extract all the contents of the downloaded .zip file to your desktop.
3.Open the extracted folder and find the "REPORT.html" (may shown as only "REPORT").
4.Open (double-clicking) the REPORT.html file. Your default browser will open the top age of the complete report. Within the report, there are links to view all the analyses performed for the project.

 

V. Raw Sequence Data Download

The raw NGS sequence data is available for download with the link provided below. The data is a compressed ZIP file and can be unzipped to individual sequence files. Since this is a Pac-Bio full-length (V1V9) 16S rRNA amplicon sequencing, raw sequences are available for download in a single compressed zip file in the download link below. After unzipping, you will find individual sequence files for each of your samples with the file extension “*.fastq.gz”. The files are in FASTQ format and are compressed. FASTQ format is a text-based data format for storing both a biological sequence and its corresponding quality scores. Most sequence analysis software will be able to open them. The Sample IDs associated with the fastq files are listed in the table below:

Sample IDOriginal Sample IDRead 1 File NameRead 2 File Name
F27353.S10original sample ID herezr27353_10V1V3_R1.fastq.gzzr27353_10V1V3_R2.fastq.gz
F27353.S11original sample ID herezr27353_11V1V3_R1.fastq.gzzr27353_11V1V3_R2.fastq.gz
F27353.S12original sample ID herezr27353_12V1V3_R1.fastq.gzzr27353_12V1V3_R2.fastq.gz
F27353.S13original sample ID herezr27353_13V1V3_R1.fastq.gzzr27353_13V1V3_R2.fastq.gz
F27353.S14original sample ID herezr27353_14V1V3_R1.fastq.gzzr27353_14V1V3_R2.fastq.gz
F27353.S15original sample ID herezr27353_15V1V3_R1.fastq.gzzr27353_15V1V3_R2.fastq.gz
F27353.S16original sample ID herezr27353_16V1V3_R1.fastq.gzzr27353_16V1V3_R2.fastq.gz
F27353.S17original sample ID herezr27353_17V1V3_R1.fastq.gzzr27353_17V1V3_R2.fastq.gz
F27353.S18original sample ID herezr27353_18V1V3_R1.fastq.gzzr27353_18V1V3_R2.fastq.gz
F27353.S19original sample ID herezr27353_19V1V3_R1.fastq.gzzr27353_19V1V3_R2.fastq.gz
F27353.S01original sample ID herezr27353_1V1V3_R1.fastq.gzzr27353_1V1V3_R2.fastq.gz
F27353.S20original sample ID herezr27353_20V1V3_R1.fastq.gzzr27353_20V1V3_R2.fastq.gz
F27353.S21original sample ID herezr27353_21V1V3_R1.fastq.gzzr27353_21V1V3_R2.fastq.gz
F27353.S22original sample ID herezr27353_22V1V3_R1.fastq.gzzr27353_22V1V3_R2.fastq.gz
F27353.S23original sample ID herezr27353_23V1V3_R1.fastq.gzzr27353_23V1V3_R2.fastq.gz
F27353.S24original sample ID herezr27353_24V1V3_R1.fastq.gzzr27353_24V1V3_R2.fastq.gz
F27353.S25original sample ID herezr27353_25V1V3_R1.fastq.gzzr27353_25V1V3_R2.fastq.gz
F27353.S26original sample ID herezr27353_26V1V3_R1.fastq.gzzr27353_26V1V3_R2.fastq.gz
F27353.S27original sample ID herezr27353_27V1V3_R1.fastq.gzzr27353_27V1V3_R2.fastq.gz
F27353.S28original sample ID herezr27353_28V1V3_R1.fastq.gzzr27353_28V1V3_R2.fastq.gz
F27353.S29original sample ID herezr27353_29V1V3_R1.fastq.gzzr27353_29V1V3_R2.fastq.gz
F27353.S02original sample ID herezr27353_2V1V3_R1.fastq.gzzr27353_2V1V3_R2.fastq.gz
F27353.S30original sample ID herezr27353_30V1V3_R1.fastq.gzzr27353_30V1V3_R2.fastq.gz
F27353.S31original sample ID herezr27353_31V1V3_R1.fastq.gzzr27353_31V1V3_R2.fastq.gz
F27353.S32original sample ID herezr27353_32V1V3_R1.fastq.gzzr27353_32V1V3_R2.fastq.gz
F27353.S33original sample ID herezr27353_33V1V3_R1.fastq.gzzr27353_33V1V3_R2.fastq.gz
F27353.S34original sample ID herezr27353_34V1V3_R1.fastq.gzzr27353_34V1V3_R2.fastq.gz
F27353.S35original sample ID herezr27353_35V1V3_R1.fastq.gzzr27353_35V1V3_R2.fastq.gz
F27353.S36original sample ID herezr27353_36V1V3_R1.fastq.gzzr27353_36V1V3_R2.fastq.gz
F27353.S37original sample ID herezr27353_37V1V3_R1.fastq.gzzr27353_37V1V3_R2.fastq.gz
F27353.S38original sample ID herezr27353_38V1V3_R1.fastq.gzzr27353_38V1V3_R2.fastq.gz
F27353.S39original sample ID herezr27353_39V1V3_R1.fastq.gzzr27353_39V1V3_R2.fastq.gz
F27353.S03original sample ID herezr27353_3V1V3_R1.fastq.gzzr27353_3V1V3_R2.fastq.gz
F27353.S40original sample ID herezr27353_40V1V3_R1.fastq.gzzr27353_40V1V3_R2.fastq.gz
F27353.S04original sample ID herezr27353_4V1V3_R1.fastq.gzzr27353_4V1V3_R2.fastq.gz
F27353.S05original sample ID herezr27353_5V1V3_R1.fastq.gzzr27353_5V1V3_R2.fastq.gz
F27353.S06original sample ID herezr27353_6V1V3_R1.fastq.gzzr27353_6V1V3_R2.fastq.gz
F27353.S07original sample ID herezr27353_7V1V3_R1.fastq.gzzr27353_7V1V3_R2.fastq.gz
F27353.S08original sample ID herezr27353_8V1V3_R1.fastq.gzzr27353_8V1V3_R2.fastq.gz
F27353.S09original sample ID herezr27353_9V1V3_R1.fastq.gzzr27353_9V1V3_R2.fastq.gz

Please download and save the file to your computer storage device. The download link will expire after 60 days upon your receiving of this report.

Raw sequence data download link:

 

VI. Analysis - DADA2 Read Processing

What is DADA2?

DADA2 is a software package that models and corrects Illumina-sequenced amplicon errors [1]. DADA2 infers sample sequences exactly, without coarse-graining into OTUs, and resolves differences of as little as one nucleotide. DADA2 identified more real variants and output fewer spurious sequences than other methods.

DADA2’s advantage is that it uses more of the data. The DADA2 error model incorporates quality information, which is ignored by all other methods after filtering. The DADA2 error model incorporates quantitative abundances, whereas most other methods use abundance ranks if they use abundance at all. The DADA2 error model identifies the differences between sequences, eg. A->C, whereas other methods merely count the mismatches. DADA2 can parameterize its error model from the data itself, rather than relying on previous datasets that may or may not reflect the PCR and sequencing protocols used in your study.

DADA2 Software Package is available as an R package at : https://benjjneb.github.io/dada2/index.html

References

  1. Callahan BJ, McMurdie PJ, Rosen MJ, Han AW, Johnson AJ, Holmes SP. DADA2: High-resolution sample inference from Illumina amplicon data. Nat Methods. 2016 Jul;13(7):581-3. doi: 10.1038/nmeth.3869. Epub 2016 May 23. PMID: 27214047; PMCID: PMC4927377.

Analysis Procedures:

DADA2 pipeline includes several tools for read quality control, including quality filtering, trimming, denoising, pair merging and chimera filtering. Below are the major processing steps of DADA2:

Step 1. Read trimming based on sequence quality The quality of NGS Illumina sequences often decreases toward the end of the reads. DADA2 allows to trim off the poor quality read ends in order to improve the error model building and pair mergicing performance.

Step 2. Learn the Error Rates The DADA2 algorithm makes use of a parametric error model (err) and every amplicon dataset has a different set of error rates. The learnErrors method learns this error model from the data, by alternating estimation of the error rates and inference of sample composition until they converge on a jointly consistent solution. As in many machine-learning problems, the algorithm must begin with an initial guess, for which the maximum possible error rates in this data are used (the error rates if only the most abundant sequence is correct and all the rest are errors).

Step 3. Infer amplicon sequence variants (ASVs) based on the error model built in previous step. This step is also called sequence "denoising". The outcome of this step is a list of ASVs that are the equivalent of oligonucleotides.

Step 4. Merge paired reads. If the sequencing products are read pairs, DADA2 will merge the R1 and R2 ASVs into single sequences. Merging is performed by aligning the denoised forward reads with the reverse-complement of the corresponding denoised reverse reads, and then constructing the merged “contig” sequences. By default, merged sequences are only output if the forward and reverse reads overlap by at least 12 bases, and are identical to each other in the overlap region (but these conditions can be changed via function arguments).

Step 5. Remove chimera. The core dada method corrects substitution and indel errors, but chimeras remain. Fortunately, the accuracy of sequence variants after denoising makes identifying chimeric ASVs simpler than when dealing with fuzzy OTUs. Chimeric sequences are identified if they can be exactly reconstructed by combining a left-segment and a right-segment from two more abundant “parent” sequences. The frequency of chimeric sequences varies substantially from dataset to dataset, and depends on on factors including experimental procedures and sample complexity.

Results

1. Read Quality Plots NGS sequence analaysis starts with visualizing the quality of the sequencing. Below are the quality plots of the first sample for the R1 and R2 reads separately. In gray-scale is a heat map of the frequency of each quality score at each base position. The mean quality score at each position is shown by the green line, and the quartiles of the quality score distribution by the orange lines. The forward reads are usually of better quality. It is a common practice to trim the last few nucleotides to avoid less well-controlled errors that can arise there. The trimming affects the downstream steps including error model building, merging and chimera calling. FOMC uses an empirical approach to test many combinations of different trim length in order to achieve best final amplicon sequence variants (ASVs), see the next section “Optimal trim length for ASVs”.

Quality plots for all samples:

2. Optimal trim length for ASVs The final number of merged and chimera-filtered ASVs depends on the quality filtering (hence trimming) in the very beginning of the DADA2 pipeline. In order to achieve highest number of ASVs, an empirical approach was used -

  1. Create a random subset of each sample consisting of 5,000 R1 and 5,000 R2 (to reduce computation time)
  2. Trim 10 bases at a time from the ends of both R1 and R2 up to 50 bases
  3. For each combination of trimmed length (e.g., 300x300, 300x290, 290x290 etc), the trimmed reads are subject to the entire DADA2 pipeline for chimera-filtered merged ASVs
  4. The combination with highest percentage of the input reads becoming final ASVs is selected for the complete set of data

Below is the result of such operation, showing ASV percentages of total reads for all trimming combinations (1st Column = R1 lengths in bases; 1st Row = R2 lengths in bases):

R1/R2301291281271261251
30177.28%77.66%77.69%78.35%78.88%70.87%
29177.31%77.67%77.74%78.42%70.10%43.30%
28177.57%77.99%78.02%69.74%43.38%15.94%
27178.26%78.62%69.68%43.89%15.98%10.86%
26178.78%70.22%43.95%16.06%10.92%5.49%
25170.14%44.43%16.21%10.99%5.58%2.79%

Based on the above result, the trim length combination of R1 = 301 bases and R2 = 261 bases (highlighted red above), was chosen for generating final ASVs for all sequences. This combination generated highest number of merged non-chimeric ASVs and was used for downstream analyses, if requested.

3. Error plots from learning the error rates After DADA2 building the error model for the set of data, it is always worthwhile, as a sanity check if nothing else, to visualize the estimated error rates. The error rates for each possible transition (A→C, A→G, …) are shown below. Points are the observed error rates for each consensus quality score. The black line shows the estimated error rates after convergence of the machine-learning algorithm. The red line shows the error rates expected under the nominal definition of the Q-score. The ideal result would be the estimated error rates (black line) are a good fit to the observed rates (points), and the error rates drop with increased quality as expected.

Forward Read R1 Error Plot


Reverse Read R2 Error Plot

The PDF version of these plots are available here:

 

4. DADA2 Result Summary The table below shows the summary of the DADA2 analysis, tracking paired read counts of each samples for all the steps during DADA2 denoising process - including end-trimming (filtered), denoising (denoisedF, denoisedF), pair merging (merged) and chimera removal (nonchim).

Sample IDF27353.S01F27353.S02F27353.S03F27353.S04F27353.S05F27353.S06F27353.S07F27353.S08F27353.S09F27353.S10F27353.S11F27353.S12F27353.S13F27353.S14F27353.S15F27353.S16F27353.S17F27353.S18F27353.S19F27353.S20F27353.S21F27353.S22F27353.S23F27353.S24F27353.S25F27353.S26F27353.S27F27353.S28F27353.S29F27353.S30F27353.S31F27353.S32F27353.S33F27353.S34F27353.S35F27353.S36F27353.S37F27353.S38F27353.S39F27353.S40Row SumPercentage
input56,81153,72368,01354,15860,45255,28147,90066,11855,50953,22876,35854,42049,53459,11652,35956,50649,02662,48826,19549,43146,03448,88372,06261,42556,33469,51662,10248,61662,50352,76149,54561,46766,90760,70063,18773,16755,35454,02753,92148,3992,273,536100.00%
filtered56,81153,72268,01254,15760,45255,28147,89866,11855,50853,22776,35754,41849,53259,11552,35756,50649,02562,48826,19549,43146,03248,88372,06261,42356,33469,51662,09948,61562,50252,76049,54561,46766,90760,70063,18773,16455,35454,02653,92048,3982,273,504100.00%
denoisedF56,13253,10067,10853,35959,77754,49347,22465,11154,89252,35475,10053,64049,00758,17651,88555,67848,23861,72325,76448,76845,48348,24271,27260,56255,14168,79861,48747,85961,58552,25648,95260,94566,20959,93762,57872,48854,84353,26253,22647,8362,244,49098.72%
denoisedR55,84052,88566,79752,74459,58954,51746,81864,86754,57052,22275,06853,39848,81457,94151,58555,51148,28061,50225,65748,66345,28847,82470,91060,42655,18368,52461,05647,89861,28852,09648,71960,75266,24859,58262,28872,30954,70753,31752,96847,6322,236,28398.36%
merged52,48149,09561,63047,82755,57249,99443,05159,49850,83747,97368,83848,72545,19853,52648,35751,17144,56956,55223,40944,59341,73043,97265,09155,75050,66263,96757,15544,36656,52949,25645,51757,60362,47355,24758,15267,92751,64749,90348,72044,2982,072,86191.17%
nonchim46,46146,04355,27142,12950,23042,89038,38953,53644,51344,57063,40544,91839,88949,85040,16947,81440,75851,32819,85938,62637,68237,94359,23051,63846,49958,44649,23039,55551,79843,76342,77650,91853,20451,10753,40058,66645,90945,55144,35441,7911,864,10881.99%

This table can be downloaded as an Excel table below:

 

5. DADA2 Amplicon Sequence Variants (ASVs). A total of 7345 unique merged and chimera-free ASV sequences were identified, and their corresponding read counts for each sample are available in the "ASV Read Count Table" with rows for the ASV sequences and columns for sample. This read count table can be used for microbial profile comparison among different samples and the sequences provided in the table can be used to taxonomy assignment.

 

The table can be downloaded from this link:

 
 

Sample Meta Information

Download Sample Meta Information
#SampleIDSampleNameID_SampleGroup
F27353.S01F27353.S011Cases
F27353.S02F27353.S022Cases
F27353.S03F27353.S033Cases
F27353.S04F27353.S044Cases
F27353.S05F27353.S055Cases
F27353.S06F27353.S066Cases
F27353.S07F27353.S077Cases
F27353.S08F27353.S088Cases
F27353.S09F27353.S099Cases
F27353.S10F27353.S1010Cases
F27353.S11F27353.S1111Cases
F27353.S12F27353.S1212Cases
F27353.S13F27353.S1313Cases
F27353.S14F27353.S1414Cases
F27353.S15F27353.S1515Cases
F27353.S16F27353.S1616Cases
F27353.S17F27353.S1717Cases
F27353.S18F27353.S1818Cases
F27353.S19F27353.S1919Cases
F27353.S20F27353.S2020Cases
F27353.S21F27353.S2151Controls
F27353.S22F27353.S2252Controls
F27353.S23F27353.S2353Controls
F27353.S24F27353.S2454Controls
F27353.S25F27353.S2555Controls
F27353.S26F27353.S2656Controls
F27353.S27F27353.S2757Controls
F27353.S28F27353.S2858Controls
F27353.S29F27353.S2959Controls
F27353.S30F27353.S3060Controls
F27353.S31F27353.S3161Controls
F27353.S32F27353.S3262Controls
F27353.S33F27353.S3363Controls
F27353.S34F27353.S3464Controls
F27353.S35F27353.S3565Controls
F27353.S36F27353.S3666Controls
F27353.S37F27353.S3767Controls
F27353.S38F27353.S3868Controls
F27353.S39F27353.S3969Controls
F27353.S40F27353.S4070Controls
 
 

ASV Read Counts by Samples

#Sample IDRead Count
F27353.S1919,859
F27353.S2137,682
F27353.S2237,943
F27353.S0738,389
F27353.S2038,626
F27353.S2839,555
F27353.S1339,889
F27353.S1540,169
F27353.S1740,758
F27353.S4041,791
F27353.S0442,129
F27353.S3142,776
F27353.S0642,890
F27353.S3043,763
F27353.S3944,354
F27353.S0944,513
F27353.S1044,570
F27353.S1244,918
F27353.S3845,551
F27353.S3745,909
F27353.S0246,043
F27353.S0146,461
F27353.S2546,499
F27353.S1647,814
F27353.S2749,230
F27353.S1449,850
F27353.S0550,230
F27353.S3250,918
F27353.S3451,107
F27353.S1851,328
F27353.S2451,638
F27353.S2951,798
F27353.S3353,204
F27353.S3553,400
F27353.S0853,536
F27353.S0355,271
F27353.S2658,446
F27353.S3658,666
F27353.S2359,230
F27353.S1163,405
 
 
 

VII. Analysis - Read Taxonomy Assignment

Read Taxonomy Assignment - Methods

 

The close-reference taxonomy assignment of the ASV sequences using BLASTN is based on the algorithm published by Al-Hebshi et. al. (2015)[2].

The species-level, open-reference 16S rRNA NGS reads taxonomy assignment pipeline

Version 20210310a
 
 

1. Raw sequences reads in FASTA format were BLASTN-searched against a combined set of 16S rRNA reference sequences - the FOMC 16S rRNA Reference Sequences version 20221029 (https://microbiome.forsyth.org/ftp/refseq/). This set consists of the HOMD (version 15.22 http://www.homd.org/index.php?name=seqDownload&file&type=R ), Mouse Oral Microbiome Database (MOMD version 5.1 https://momd.org/ftp/16S_rRNA_refseq/MOMD_16S_rRNA_RefSeq/V5.1/), and the NCBI 16S rRNA reference sequence set (https://ftp.ncbi.nlm.nih.gov/blast/db/16S_ribosomal_RNA.tar.gz). These sequences were screened and combined to remove short sequences (<1000nt), chimera, duplicated and sub-sequences, as well as sequences with poor taxonomy annotation (e.g., without species information). This process resulted in 1,015 full-length 16S rRNA sequences from HOMD V15.22, 356 from MOMD V5.1, and 22,126 from NCBI, a total of 23,497 sequences. Altogether these sequence represent a total of 17,035 oral and non-oral microbial species.

The NCBI BLASTN version 2.7.1+ (Zhang et al, 2000) [3] was used with the default parameters. Reads with ≥ 98% sequence identity to the matched reference and ≥ 90% alignment length (i.e., ≥ 90% of the read length that was aligned to the reference and was used to calculate the sequence percent identity) were classified based on the taxonomy of the reference sequence with highest sequence identity. If a read matched with reference sequences representing more than one species with equal percent identity and alignment length, it was subject to chimera checking with USEARCH program version v8.1.1861 (Edgar 2010). Non-chimeric reads with multi-species best hits were considered valid and were assigned with a unique species notation (e.g., spp) denoting unresolvable multiple species.

2. Unassigned reads (i.e., reads with < 98% identity or < 90% alignment length) were pooled together and reads < 200 bases were removed. The remaining reads were subject to the de novo operational taxonomy unit (OTU) calling and chimera checking using the USEARCH program version v8.1.1861 (Edgar 2010)[4]. The de novo OTU calling and chimera checking was done using 98% as the sequence identity cutoff, i.e., the species-level OTU. The output of this step produced species-level de novo clustered OTUs with 98% identity. Representative reads from each of the OTUs/species were then BLASTN-searched against the same reference sequence set again to determine the closest species for these potential novel species. These potential novel species were pooled together with the reads that were signed to specie-level in the previous step, for down-stream analyses.

Reference:

  1. Al-Hebshi NN, Nasher AT, Idris AM, Chen T. Robust species taxonomy assignment algorithm for 16S rRNA NGS reads: application to oral carcinoma samples. J Oral Microbiol. 2015 Sep 29;7:28934. doi: 10.3402/jom.v7.28934. PMID: 26426306; PMCID: PMC4590409.
  2. Zhang Z, Schwartz S, Wagner L, Miller W. A greedy algorithm for aligning DNA sequences. J Comput Biol. 2000 Feb-Apr;7(1-2):203-14. doi: 10.1089/10665270050081478. PMID: 10890397.
  3. Edgar RC. Search and clustering orders of magnitude faster than BLAST. Bioinformatics. 2010 Oct 1;26(19):2460-1. doi: 10.1093/bioinformatics/btq461. Epub 2010 Aug 12. PubMed PMID: 20709691.
  4. 3. Designations used in the taxonomy:

    	1) Taxonomy levels are indicated by these prefixes:
    	
    	   k__: domain/kingdom
    	   p__: phylum
    	   c__: class
    	   o__: order
    	   f__: family
    	   g__: genus  
    	   s__: species
    	
    	   Example: 
    	
    	   k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae;g__Blautia;s__faecis
    		
    	2) Unique level identified – known species:
    	   
    	   k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae;g__Roseburia;s__hominis
    	
    	   The above example shows some reads match to a single species (all levels are unique)
    	
    	3) Non-unique level identified – known species:
    
    	   k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae;g__Roseburia;s__multispecies_spp123_3
    	   
    	   The above example “s__multispecies_spp123_3” indicates certain reads equally match to 3 species of the 
    	   genus Roseburia; the “spp123” is a temporally assigned species ID.
    	
    	   k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae;g__multigenus;s__multispecies_spp234_5
    	   
    	   The above example indicates certain reads match equally to 5 different species, which belong to multiple genera.; 
    	   the “spp234” is a temporally assigned species ID.
    	
    	4) Unique level identified – unknown species, potential novel species:
    	   
    	   k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae;g__Roseburia;s__ hominis_nov_97%
    	   
    	   The above example indicates that some reads have no match to any of the reference sequences with 
    	   sequence identity ≥ 98% and percent coverage (alignment length)  ≥ 98% as well. However this groups 
    	   of reads (actually the representative read from a de novo  OTU) has 96% percent identity to 
    	   Roseburia hominis, thus this is a potential novel species, closest to Roseburia hominis. 
    	   (But they are not the same species).
    	
    	5) Multiple level identified – unknown species, potential novel species:
    	   k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae;g__Roseburia;s__ multispecies_sppn123_3_nov_96%
    	
    	   The above example indicates that some reads have no match to any of the reference sequences 
    	   with sequence identity ≥ 98% and percent coverage (alignment length)  ≥ 98% as well. 
    	   However this groups of reads (actually the representative read from a de novo  OTU) 
    	   has 96% percent identity equally to 3 species in Roseburia. Thus this is no single 
    	   closest species, instead this group of reads match equally to multiple species at 96%. 
    	   Since they have passed chimera check so they represent a novel species. “sppn123” is a 
    	   temporary ID for this potential novel species. 
    

 
4. The taxonomy assignment algorithm is illustrated in this flow char below:
 
 
 
 

Read Taxonomy Assignment - Result Summary *

CodeCategoryMPC=0% (>=1 read)MPC=0.01%(>=186 reads)
ATotal reads1,864,1081,864,108
BTotal assigned reads1,860,9781,860,978
CAssigned reads in species with read count < MPC015,221
DAssigned reads in samples with read count < 50000
ETotal samples4040
FSamples with reads >= 5004040
GSamples with reads < 50000
HTotal assigned reads used for analysis (B-C-D)1,860,9781,845,757
IReads assigned to single species1,705,8281,696,575
JReads assigned to multiple species103,353102,360
KReads assigned to novel species51,79746,822
LTotal number of species748282
MNumber of single species439231
NNumber of multi-species3913
ONumber of novel species27038
PTotal unassigned reads3,1303,130
QChimeric reads597597
RReads without BLASTN hits318318
SOthers: short, low quality, singletons, etc.2,2152,215
A=B+P=C+D+H+Q+R+S
E=F+G
B=C+D+H
H=I+J+K
L=M+N+O
P=Q+R+S
* MPC = Minimal percent (of all assigned reads) read count per species, species with read count < MPC were removed.
* Samples with reads < 500 were removed from downstream analyses.
* The assignment result from MPC=0.1% was used in the downstream analyses.
 
 
 

Read Taxonomy Assignment - ASV Species-Level Read Counts Table

This table shows the read counts for each sample (columns) and each species identified based on the ASV sequences. The downstream analyses were based on this table.
SPIDTaxonomyF27353.S01F27353.S02F27353.S03F27353.S04F27353.S05F27353.S06F27353.S07F27353.S08F27353.S09F27353.S10F27353.S11F27353.S12F27353.S13F27353.S14F27353.S15F27353.S16F27353.S17F27353.S18F27353.S19F27353.S20F27353.S21F27353.S22F27353.S23F27353.S24F27353.S25F27353.S26F27353.S27F27353.S28F27353.S29F27353.S30F27353.S31F27353.S32F27353.S33F27353.S34F27353.S35F27353.S36F27353.S37F27353.S38F27353.S39F27353.S40
SP1Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Prevotellaceae;Prevotella;melaninogenica214698814649586119965278731178789416985419058673690327413049165443234204171979174174176929248484132736682502218227471087232712682369119916061641012670276
SP10Bacteria;Fusobacteria;Fusobacteriia;Fusobacteriales;Leptotrichiaceae;Leptotrichia;hongkongensis48375395002512011171921371721027330848107142316777281454508595014427452014
SP101Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Prevotellaceae;Prevotella;sp. HMT475611374090152000700169516000000615250006816000413026012027
SP102Bacteria;Firmicutes;Erysipelotrichia;Erysipelotrichales;Erysipelotrichaceae;Solobacterium;moorei11949317346012677164294283383644544221107352358931119310498142626755543489105163105147170234098059411912413997
SP103Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Prevotellaceae;Prevotella;sp. HMT4722201475001400002001870180450000051494418019003604703309033
SP104Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Prevotellaceae;Prevotella;nanceiensis15313428414602591976492465601082457287506219428953416037903361798209300148681382942371912637520170357535
SP105Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Porphyromonadaceae;Porphyromonas;sp. HMT278921090171332111363401020211033351000026254751905216190000000004700
SP109Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Prevotellaceae;Prevotella;sp. HMT30542098257100290200330016106710844987813721061680000017031801220
SP110Bacteria;Proteobacteria;Gammaproteobacteria;Pasteurellales;Pasteurellaceae;Haemophilus;haemolyticus13637392136201828567516414846964270900022230171131225372816704071016062212604442442
SP111Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Prevotellaceae;Alloprevotella;sp. HMT308104768841154560121171352630579160854901155971917951154037178101361190261445791728427916446116891811150167
SP112Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;oralis_subsp._tigurinus_clade_07002304044165000880000001900001103453640300002090261044031
SP115Bacteria;Firmicutes;Clostridia;Eubacteriales;Lachnospiraceae;Lachnospiraceae_[G-3];bacterium HMT1001024181900007016601010905090022193500337047082962011415
SP116Bacteria;Firmicutes;Clostridia;Eubacteriales;Lachnospiraceae;Lachnoanaerobaculum;gingivalis6293017904013075266152128107300535211930420801351861864221036616212413651184354170392422
SP12Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Prevotellaceae;Alloprevotella;sp. HMT4731362317543252457117568416732147348527235395882824053657350161145247338127952241104412267791565110260060019771803220
SP120Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;sp. HMT0662938065152086010613199209850641586300014961138039513325034321013336131111711439
SP121Bacteria;Proteobacteria;Gammaproteobacteria;Pasteurellales;Pasteurellaceae;Haemophilus;paraphrohaemolyticus3190000680000000000000334001391948121000810581240960469
SP122Bacteria;Proteobacteria;Betaproteobacteria;Neisseriales;Neisseriaceae;Neisseria;subflava6429672520139422034012712772001220471244600027602631965254661542906954000115048921300
SP124Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Prevotellaceae;Prevotella;loescheii08301309013382001000530000021001700071200000000022
SP126Bacteria;Actinobacteria;Actinomycetia;Actinomycetales;Actinomycetaceae;Schaalia;lingnae375914039148341066197175017763231212061882711302575615419269151235360108331398565746632875861102
SP127Bacteria;Proteobacteria;Betaproteobacteria;Neisseriales;Neisseriaceae;Neisseria;mucosa0105646327099184823842403580311502861541223320008405331774572046889890906200211493191057236048
SP128Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Prevotellaceae;Prevotella;shahii91142013900360146679329710624143871570017008857235760004453400011600174627
SP13Bacteria;Firmicutes;Negativicutes;Veillonellales;Veillonellaceae;Veillonella;tobetsuensis02400037626791229331780023305380000000031303300043400118800000
SP130Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;sp. HMT05642146206004005001152914320011741711057628100001225039564010
SP131Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Prevotellaceae;Prevotella;sp. HMT94211293917466050517360110184070000011176804169060000412871120171310
SP132Bacteria;Saccharibacteria_(TM7);Saccharibacteria_(TM7)_[C-1];Saccharibacteria_(TM7)_[O-1];Saccharibacteria_(TM7)_[F-1];Saccharibacteria_(TM7)_[G-6];bacterium HMT87025614313452004961457021186021417192581663533131112107617500429776119111110142112912702598149
SP133Bacteria;Bacteroidetes;Flavobacteriia;Flavobacteriales;Flavobacteriaceae;Capnocytophaga;endodontalis520140441580618003405203705042400090002191302203
SP135Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Tannerellaceae;Tannerella;sp. HMT2861232151104260000156080004000184541712405300419000009
SP136Bacteria;Actinobacteria;Actinomycetia;Actinomycetales;Actinomycetaceae;Actinomyces;sp. HMT1690076059010001001500015450900472590945038000254361382711031
SP137Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Prevotellaceae;Prevotella;vespertina191396577021130292499140000253840131246120091871174342277115438106021999041334010250288
SP138Bacteria;Firmicutes;Bacilli;Bacillales;Gemellaceae;Gemella;morbillorum644910930716591139391891108252540216158992001192125419223988884381211901014065376523898137
SP139Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Prevotellaceae;Prevotella;veroralis35026203462110017451000562120000283522016100590150541142200028
SP14Bacteria;Proteobacteria;Betaproteobacteria;Neisseriales;Neisseriaceae;Kingella;denitrificans8116336340020331304230730210470026702859905318125381230156113521166931018
SP140Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;sp. HMT064304002411964238142240137845201730185121460000246225035301231969201401611257504719153
SP142Bacteria;Spirochaetes;Spirochaetia;Spirochaetales;Treponemataceae;Treponema;sp. HMT23132811214286000600110500000063614000800026090000
SP143Bacteria;Firmicutes;Clostridia;Eubacteriales;Peptostreptococcaceae;Mogibacterium;diversum126346105939615403033245113365432240216845211727729299264454738345121916113814
SP144Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Prevotellaceae;Prevotella;denticola13830029626040986000040000252611171400550000320903017
SP145Bacteria;Fusobacteria;Fusobacteriia;Fusobacteriales;Fusobacteriaceae;Fusobacterium;sp. HMT20300128000000251810011024011000003495301118000015000000
SP148Bacteria;Fusobacteria;Fusobacteriia;Fusobacteriales;Fusobacteriaceae;Fusobacterium;sp. HMT2044827100240967354800149280000001537185315904011002346274329282532
SP149Bacteria;Actinobacteria;Actinomycetia;Actinomycetales;Actinomycetaceae;Schaalia;odontolytica8631342083781452231322063621924353145472337124457731565154777117159297377470243258802221393460852141322915651663525049
SP15Bacteria;Actinobacteria;Actinomycetia;Corynebacteriales;Corynebacteriaceae;Corynebacterium;matruchotii010147556075224022000914991710987335227001816012003118475216521023
SP150Bacteria;Bacteroidetes;Flavobacteriia;Flavobacteriales;Flavobacteriaceae;Capnocytophaga;sp. HMT9012944011603000000016204800000205519000020700001101004015
SP151Bacteria;Firmicutes;Clostridia;Eubacteriales;Lachnospiraceae;Shuttleworthia;satelles000405103000000000605670006000018000043140000
SP153Bacteria;Firmicutes;Negativicutes;Veillonellales;Veillonellaceae;Veillonella;rogosae599182168955035429251819281147228025124799718135425779601090287105542951317634405110718281126070126648460235754154
SP154Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Prevotellaceae;Prevotella;sp. HMT31466573718361302201575997000242394912150601261165200959701500983441900130
SP155Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Prevotellaceae;Prevotella;micans18831600201730110041000405034181700079000013430006
SP156Bacteria;Firmicutes;Clostridia;Eubacteriales;Peptostreptococcaceae;Peptostreptococcaceae_[G-9];[Eubacterium]_brachy1823503060021613000907000160053021515020140400029
SP157Bacteria;Fusobacteria;Fusobacteriia;Fusobacteriales;Leptotrichiaceae;Pseudoleptotrichia;goodfellowii10960605807210000007001504052500056000652114173
SP158Bacteria;Firmicutes;Clostridia;Eubacteriales;Lachnospiraceae;Stomatobaculum;sp. HMT09715208065005440192355120763582908102130116002702673304706415301067830005370
SP159Bacteria;Spirochaetes;Spirochaetia;Spirochaetales;Treponemataceae;Treponema;sp. HMT2571808400111700180110026000000000050001400000070003
SP16Bacteria;Bacteroidetes;Flavobacteriia;Flavobacteriales;Flavobacteriaceae;Capnocytophaga;leadbetteri465513817413111952313621151700906013181401116013937201195783130287140783640298404715671827
SP160Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Prevotellaceae;Prevotella;sp. HMT317162011500823470961009704600002816817701410100020760683440138
SP161Bacteria;Firmicutes;Clostridia;Eubacteriales;Peptococcaceae;Peptococcus;sp. HMT1681216090046350342100240409500001010444500002600000051014
SP162Bacteria;Proteobacteria;Betaproteobacteria;Neisseriales;Neisseriaceae;Neisseria;elongata73351671530196215001400300405230017211227709534001121200071181255293517
SP163Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;sanguinis21914714626413177241127343313928411332138749874206321152351505905884441722391312616880381757979458128109
SP165Bacteria;Firmicutes;Clostridia;Eubacteriales;Lachnospiraceae;Butyrivibrio;sp. HMT455722351213401926721595219061282730284796718165421093308000031035004253
SP166Bacteria;Proteobacteria;Gammaproteobacteria;Pasteurellales;Pasteurellaceae;Haemophilus;parahaemolyticus12590820403448280002903100000001300122140203028113301398104126101145281673
SP167Bacteria;Firmicutes;Clostridia;Eubacteriales;Peptococcaceae;Peptococcus;sp. HMT1671203016110000138000000000038681240320000162100000
SP169Bacteria;Proteobacteria;Betaproteobacteria;Neisseriales;Neisseriaceae;Neisseria;cinerea396151011322082204959156978425046035600126090157421441390309133021193251140069135516
SP17Bacteria;Firmicutes;Bacilli;Lactobacillales;Carnobacteriaceae;Granulicatella;adiacens28333974833247466483438470029673336610183078911005463273632062286115169191239398110296778353153194232813306056821414586448439
SP173Bacteria;Fusobacteria;Fusobacteriia;Fusobacteriales;Leptotrichiaceae;Leptotrichia;sp. HMT3921579201882101747004842050705692400829271111891817822761704841412739532
SP175Bacteria;Proteobacteria;Betaproteobacteria;Neisseriales;Neisseriaceae;Neisseria;sicca36200235938706142516506622520031201800000199002132411057640630003601910396540399
SP176Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;intermedius1761247415103411334920411835231196160105374113476019302291616016132749
SP179Bacteria;Bacteroidetes;Flavobacteriia;Flavobacteriales;Flavobacteriaceae;Capnocytophaga;sp. HMT878000270700000001740000000000107000000000000000
SP180Bacteria;Firmicutes;Bacilli;Bacillales;Gemellaceae;Gemella;haemolysans17791121187175143736273470813241066412769343924110964891051408378452151418212361008841999100210421737611863126516051059296382663671
SP181Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Porphyromonadaceae;Porphyromonas;gingivalis4002850334601004732000001205829181214052175422612173610102029
SP182Bacteria;Firmicutes;Clostridia;Eubacteriales;Lachnospiraceae;Lachnoanaerobaculum;saburreum0028020013860000005561520816900000000025130001000012
SP183Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Prevotellaceae;Prevotella;jejuni3904574876011113001402373262515655155014663627922410121611319802548717266100501157030755526565052407029824
SP184Bacteria;Actinobacteria;Coriobacteriia;Coriobacteriales;Atopobiaceae;Lancefieldella;parvula8318221118826914070741601691074213051220823983598341914193822292401586914711715137010915411160272703483267162
SP185Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Bacteroidales_[F-2];Bacteroidales_[G-2];bacterium HMT274071215010300351004800080370010620321900003662001022
SP186Bacteria;Proteobacteria;Betaproteobacteria;Neisseriales;Neisseriaceae;Simonsiella;muelleri03642600004746126221501100000001000400000000000000610
SP187Bacteria;Firmicutes;Clostridia;Eubacteriales;Ruminococcaceae;Ruminococcaceae_[G-1];bacterium HMT07527911586579621521990371391133764382431126100161313412951282024129235477180422471513012319235
SP188Bacteria;Proteobacteria;Gammaproteobacteria;Pasteurellales;Pasteurellaceae;Aggregatibacter;sp. HMT8984023064000570800700803000200063301496004911210092614
SP19Bacteria;Actinobacteria;Actinomycetia;Actinomycetales;Actinomycetaceae;Schaalia;sp. HMT1720619941613102609104052186001242371143750435582849154119287020579433815908201401282282385011116
SP192Bacteria;Proteobacteria;Gammaproteobacteria;Pasteurellales;Pasteurellaceae;Aggregatibacter;segnis197360370251241036369430031309022000582301542123882670003911528250128066
SP194Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Prevotellaceae;Prevotella;sp. HMT309272726003526132280124575593121283415163236156616561966559019220051467760
SP195Bacteria;Saccharibacteria_(TM7);Saccharibacteria_(TM7)_[C-1];Saccharibacteria_(TM7)_[O-1];Saccharibacteria_(TM7)_[F-1];Saccharibacteria_(TM7)_[G-1];bacterium HMT352122102471021211039244751017919423828910236110738435512217663475939215451960992980091824110106345043211156
SP196Bacteria;Absconditabacteria_(SR1);Absconditabacteria_(SR1)_[C-1];Absconditabacteria_(SR1)_[O-1];Absconditabacteria_(SR1)_[F-1];Absconditabacteria_(SR1)_[G-1];bacterium HMT87411174726210224500323702255830120100381411340050801450779610702710
SP198Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;salivarius94109133252208661268556728172744644621817031349764427235866821481368100671411933185213733229849353671286457541314
SP199Bacteria;Bacteroidetes;Flavobacteriia;Flavobacteriales;Weeksellaceae;Weeksellaceae_[G-1];sp. HMT931250364320516012000034005000040021121093700045212500475017
SP2Bacteria;Firmicutes;Clostridia;Eubacteriales;Lachnospiraceae;Oribacterium;sinus249273398256692589491346042504196030194263119109559646434738386415051245311969153972382410110035141011577156
SP20Bacteria;Firmicutes;Negativicutes;Veillonellales;Veillonellaceae;Veillonella;atypica333115720823435416330649053592260263612472656076560421567421369514444915243494191565997174234825707641435339121937530785628402400997
SP201Bacteria;Spirochaetes;Spirochaetia;Spirochaetales;Treponemataceae;Treponema;sp. HMT2375000060004122503200000004004300000000260000033
SP202Bacteria;Firmicutes;Tissierellia;Tissierellales;Peptoniphilaceae;Parvimonas;micra560015700140012748220044018080075829911200000101071781243
SP204Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;periodonticum330430109698322843774000008539042700400291570411060326900130005803341200
SP206Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;cristatus_clade_5787314955156247480242849345893241698310814054657755632136401761645349651475559981441412294318744784661153111711161398
SP207Bacteria;Absconditabacteria_(SR1);Absconditabacteria_(SR1)_[C-1];Absconditabacteria_(SR1)_[O-1];Absconditabacteria_(SR1)_[F-1];Absconditabacteria_(SR1)_[G-1];bacterium HMT87511066336172930652166039010490013700894902804945401501473140552534200254240001610
SP208Bacteria;Proteobacteria;Betaproteobacteria;Neisseriales;Neisseriaceae;Neisseria;perflava18647100315821907024543309741128138351559543143400801552801322127213153413674520120695902821263251143102900448167
SP209Bacteria;Fusobacteria;Fusobacteriia;Fusobacteriales;Leptotrichiaceae;Pseudoleptotrichia;sp. HMT22180651233768032016202104359123922112013031140460921228282478840032270140002213531401148
SP21Bacteria;Firmicutes;Bacilli;Lactobacillales;Carnobacteriaceae;Granulicatella;elegans103224479241210078932630531375421431269256120139283065512011083847472795114866770017711028141411010703348417112955741819162903733
SP210Bacteria;Fusobacteria;Fusobacteriia;Fusobacteriales;Leptotrichiaceae;Leptotrichia;trevisanii256171231207513547035015000001332303201202616050211144245950
SP212Bacteria;Bacteroidetes;Flavobacteriia;Flavobacteriales;Flavobacteriaceae;Capnocytophaga;sp. HMT8648008000000019000000000060043210000000012011007
SP213Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;pseudopneumoniae14808420214059145099110202002400000000008000000030100009300
SP22Bacteria;Firmicutes;Negativicutes;Selenomonadales;Selenomonadaceae;Selenomonas;sputigena01632351315127014251604000152122146431812342730003401201300
SP220Bacteria;Proteobacteria;Gammaproteobacteria;Cardiobacteriales;Cardiobacteriaceae;Cardiobacterium;hominis32427471731274000044012410000150930179623340016261214311469
SP221Bacteria;Fusobacteria;Fusobacteriia;Fusobacteriales;Leptotrichiaceae;Leptotrichia;sp. HMT21266691457210249113185411035211750566003675371165301214797181413643204096128937
SP223Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Prevotellaceae;Prevotella;oris3281322141222061015941117223228000501917831130369410214251521612367251567432061825168
SP224Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Prevotellaceae;Alloprevotella;tannerae4442148231315241226111115661587210164804410280047674215300311100002460340000
SP226Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Porphyromonadaceae;Porphyromonas;endodontalis3120015732010228230256155640600000120000460001190000300400000
SP227Bacteria;Proteobacteria;Epsilonproteobacteria;Campylobacterales;Campylobacteraceae;Campylobacter;concisus23035736053658102884585628121963562336542451951705697824211096184613857284771306243326521825926259203
SP229Bacteria;Actinobacteria;Coriobacteriia;Coriobacteriales;Atopobiaceae;Lancefieldella;rimae00200002000152500001000000160090002040004103100023
SP23Bacteria;Fusobacteria;Fusobacteriia;Fusobacteriales;Fusobacteriaceae;Fusobacterium;nucleatum2051372173911136211512047135452112361690671811712012124144201912971036942568221216172923421215412781119
SP231Bacteria;Saccharibacteria_(TM7);Saccharibacteria_(TM7)_[C-1];Saccharibacteria_(TM7)_[O-1];Saccharibacteria_(TM7)_[F-1];Saccharibacteria_(TM7)_[G-1];bacterium HMT34900060141163052500800000120150000051400005601900022
SP233Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Porphyromonadaceae;Porphyromonas;sp. HMT27526011501501098200078056000000002700045000800000320
SP234Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Prevotellaceae;Prevotella;histicola24227422159684024893336580224467310773329604901698381544113977034551658109890316848395017121220210141619163415519482469842431846
SP235Bacteria;Fusobacteria;Fusobacteriia;Fusobacteriales;Leptotrichiaceae;Leptotrichia;shahii1630151725228054000520290010551060020012839839000235000893211483244958
SP236Bacteria;Proteobacteria;Gammaproteobacteria;Pasteurellales;Pasteurellaceae;Aggregatibacter;aphrophilus1541041094004125170000282000000004271307038100606091079180190760
SP237Bacteria;Fusobacteria;Fusobacteriia;Fusobacteriales;Leptotrichiaceae;Leptotrichia;hofstadii0061202400000900000000001628390290025000362005115000
SP238Bacteria;Fusobacteria;Fusobacteriia;Fusobacteriales;Leptotrichiaceae;Leptotrichia;sp. HMT498007580278186000265500000250004011330016128720000000017300
SP242Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Prevotellaceae;Alloprevotella;sp. HMT9133125930000371301147068000000060826000392000271170994
SP243Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Prevotellaceae;Prevotella;scopos0219026031196260024512540030904728000019549742860700510034200000
SP247Bacteria;Firmicutes;Clostridia;Eubacteriales;Peptostreptococcaceae;Peptostreptococcaceae_[G-1];[Eubacterium]_sulci6041116083171012890206120192203070104184205518912413170286324411662711496039300781303743511
SP248Bacteria;Proteobacteria;Epsilonproteobacteria;Campylobacterales;Campylobacteraceae;Campylobacter;sp. HMT0440000000609015600501120400500047000900000000040
SP25Bacteria;Firmicutes;Negativicutes;Veillonellales;Veillonellaceae;Megasphaera;micronuciformis1537451067414316110512122323424963301506313916684291120511817527401217111328446701115871931357580131
SP252Bacteria;Proteobacteria;Betaproteobacteria;Neisseriales;Neisseriaceae;Neisseria;flavescens41502185860106186819410543131709758210076302577292337900057306813157387242402619971750072107000089
SP253Bacteria;Firmicutes;Negativicutes;Selenomonadales;Selenomonadaceae;Selenomonas;infelix04431807521001750200000000033000110160000450601200
SP258Bacteria;Firmicutes;Negativicutes;Selenomonadales;Selenomonadaceae;Selenomonas;sp. HMT149137010100878100154201915384468537178018210018160062402816611731548
SP259Bacteria;Actinobacteria;Actinomycetia;Actinomycetales;Actinomycetaceae;Actinomyces;naeslundii402514300406911060009007390613111304120171521310816010
SP26Bacteria;Firmicutes;Clostridia;Eubacteriales;Lachnospiraceae;Catonella;morbi149122226267151382780500129223378224714484239845106130261941251281119028508691140859711258429996
SP261Bacteria;Fusobacteria;Fusobacteriia;Fusobacteriales;Fusobacteriaceae;Fusobacterium;nucleatum_subsp._animalis1151676036000001150000250460000033000028000085025900023
SP262Bacteria;Actinobacteria;Actinomycetia;Propionibacteriales;Propionibacteriaceae;Arachnia;propionica1526038291348761913070017174031121301000782521042361014192122537
SP263Bacteria;Saccharibacteria_(TM7);Saccharibacteria_(TM7)_[C-1];Saccharibacteria_(TM7)_[O-1];Saccharibacteria_(TM7)_[F-1];Saccharibacteria_(TM7)_[G-1];bacterium HMT3481708917900173900000260150000047017719901803317007171800302438
SP267Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Prevotellaceae;Prevotella;pleuritidis02500001001104750100000000760500000000510150000
SP268Bacteria;Actinobacteria;Actinomycetia;Actinomycetales;Actinomycetaceae;Actinomyces;sp. HMT171401090500009003009000391501271501700012139732000
SP273Bacteria;Fusobacteria;Fusobacteriia;Fusobacteriales;Fusobacteriaceae;Fusobacterium;nucleatum_subsp._vincentii138827502755865032108131001550260000028024580000109000332655130081
SP275Bacteria;Proteobacteria;Betaproteobacteria;Neisseriales;Neisseriaceae;Neisseria;flava303026403637188313321100165004810821320160143019206546063314737919005281152465480491205177
SP277Bacteria;Gracilibacteria_(GN02);Gracilibacteria_(GN02)_[C-1];Gracilibacteria_(GN02)_[O-1];Gracilibacteria_(GN02)_[F-1];Gracilibacteria_(GN02)_[G-1];bacterium HMT871660293700000000000030000004271400000070000230
SP279Bacteria;Bacteroidetes;Flavobacteriia;Flavobacteriales;Flavobacteriaceae;Capnocytophaga;sp. HMT3325601531600042335082901350000070100552020150200007371580959
SP28Bacteria;Firmicutes;Clostridia;Eubacteriales;Lachnospiraceae;Stomatobaculum;longum28010301132301802503143304602192012321386886413502760116210220000334622811
SP282Bacteria;Firmicutes;Clostridia;Eubacteriales;Peptostreptococcaceae;Peptostreptococcus;stomatis362096436130184110091194288113181471431755788471370102100455524171531920190378704233
SP283Bacteria;Fusobacteria;Fusobacteriia;Fusobacteriales;Fusobacteriaceae;Fusobacterium;hwasookii03319118001771171483005970003000000391306315016700340927000028131
SP285Bacteria;Proteobacteria;Gammaproteobacteria;Pasteurellales;Pasteurellaceae;Aggregatibacter;sp. HMT51210024006305002910416500000091150589600150000000001300
SP287Bacteria;Absconditabacteria_(SR1);Absconditabacteria_(SR1)_[C-1];Absconditabacteria_(SR1)_[O-1];Absconditabacteria_(SR1)_[F-1];Absconditabacteria_(SR1)_[G-1];bacterium HMT3451190181900085005000320110000000451500037000000000290
SP288Bacteria;Proteobacteria;Epsilonproteobacteria;Campylobacterales;Campylobacteraceae;Campylobacter;showae1011191810091101503100125018013000120131545826027081743659212740633
SP29Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;downii435666011922020992071944185414911941955990011522119758097667101731979337113392952260059123577808877405238611425592140322001256269321523516
SP293Bacteria;Spirochaetes;Spirochaetia;Spirochaetales;Treponemataceae;Treponema;denticola16021146063400172016000000000000000500090130000
SP296Bacteria;Firmicutes;Negativicutes;Veillonellales;Veillonellaceae;Veillonella;sp. HMT917002070240000000000025500002400000400000002340000
SP3Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Porphyromonadaceae;Porphyromonas;pasteri6422079270489451728421168619782281247167611271586438379650510521204518321411397184482414684645216599564832651454114316093608262027154
SP30Bacteria;Fusobacteria;Fusobacteriia;Fusobacteriales;Leptotrichiaceae;Leptotrichia;buccalis1186057121022039210313204025320000013290342150237860146662301655145710772597
SP302Bacteria;Proteobacteria;Gammaproteobacteria;Pasteurellales;Pasteurellaceae;Aggregatibacter;sp. HMT94911019290004001140570007000470140040024000207080000
SP305Bacteria;Proteobacteria;Gammaproteobacteria;Pasteurellales;Pasteurellaceae;Haemophilus;sputorum502806307448271023023915324770000018470032014715955001600312471687673062
SP31Bacteria;Firmicutes;Negativicutes;Veillonellales;Veillonellaceae;Veillonella;parvula2151627196394647323395844483415757021290020397749234121671017124661054222215212351239185451172214666427120212352024695
SP311Bacteria;Proteobacteria;Betaproteobacteria;Burkholderiales;Comamonadaceae;Ottowia;sp. HMT894270071006000300800050136602160005700027360400
SP312Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Prevotellaceae;Prevotella;intermedia0007701103402739008006000000001391895000000000000
SP317Bacteria;Saccharibacteria_(TM7);Saccharibacteria_(TM7)_[C-1];Saccharibacteria_(TM7)_[O-1];Saccharibacteria_(TM7)_[F-1];Saccharibacteria_(TM7)_[G-1];bacterium HMT34651603709144901354404060603031103237414430000194410020017
SP318Bacteria;Saccharibacteria_(TM7);Saccharibacteria_(TM7)_[C-1];Saccharibacteria_(TM7)_[O-1];Saccharibacteria_(TM7)_[F-1];Saccharibacteria_(TM7)_[G-3];bacterium HMT35133897341616015984731288616053321516849101343316103156494577221267317916698172479153
SP32Bacteria;Firmicutes;Negativicutes;Veillonellales;Veillonellaceae;Veillonella;dispar17980874551914246135455357311929715115330135193628556434989120779210232447696473044315208021762829101513837085442110750
SP320Bacteria;Bacteroidetes;Flavobacteriia;Flavobacteriales;Weeksellaceae;Cloacibacterium;sp. HMT2061070462280064896353034418409072483828310620617880011061301056047004004320150737775130
SP33Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;oralis3469420412835180216210062593031518204081582368171429107011642963944512420746936723507735343145621201386266128287
SP332Bacteria;Actinobacteria;Actinomycetia;Micrococcales;Micrococcaceae;Rothia;mucilaginosa35587013474962298224241884961437107386397842831921061085686171216586394122606409319449063852197722493807198580011025871547139025783219819955
SP336Bacteria;Saccharibacteria_(TM7);Saccharibacteria_(TM7)_[C-1];Saccharibacteria_(TM7)_[O-1];Saccharibacteria_(TM7)_[F-2];Saccharibacteria_(TM7)_[G-5];bacterium HMT3561040003224802049301140000000005190058000014030000
SP34Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;infantis_clade_431324776116325355739227330426212127210094491117235941331476284113147340108688658350532072336260246963213923175541969406177326
SP340Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;sp. HMT0741528732203578171817915251831900229661501319400201202123422401537523370157012455428694074339543108
SP343Bacteria;Proteobacteria;Epsilonproteobacteria;Campylobacterales;Campylobacteraceae;Campylobacter;gracilis257216406000343003400000170414840170000151219317018
SP344Bacteria;Fusobacteria;Fusobacteriia;Fusobacteriales;Fusobacteriaceae;Fusobacterium;sp. HMT248000067279330250001240009000210029051140001310397000021
SP347Bacteria;Proteobacteria;Gammaproteobacteria;Pseudomonadales;Pseudomonadaceae;Pseudomonas;aeruginosa000000000000000000000200000000000000000000
SP35Bacteria;Fusobacteria;Fusobacteriia;Fusobacteriales;Leptotrichiaceae;Pseudostreptobacillus;hongkongensis000153059960000016103000021141435000600000139140065
SP36Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;gordonii030113787737312552912002372090132402314491919845224908182832410321030614
SP365Bacteria;Proteobacteria;Gammaproteobacteria;Pasteurellales;Pasteurellaceae;Haemophilus;influenzae000024000000960011000000086100000600000000000
SP37Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;oralis_subsp._dentisani_clade_0581401525698090916968277017327336079188340253651122662332217071921549312786249255441920017310236
SP370Bacteria;Proteobacteria;Betaproteobacteria;Neisseriales;Neisseriaceae;Neisseria;sp. HMT0181350000003003000000100000000000404660301280002711
SP38Bacteria;Firmicutes;Negativicutes;Veillonellales;Veillonellaceae;Veillonella;sp. HMT780105144195495527030989934951216719285145312602305148178913410965472164761120521823895139086054212
SP382Bacteria;Fusobacteria;Fusobacteriia;Fusobacteriales;Leptotrichiaceae;Leptotrichia;wadei00305302300000001060083952170049201790773323000091070140770261
SP383Bacteria;Fusobacteria;Fusobacteriia;Fusobacteriales;Leptotrichiaceae;Leptotrichia;sp. HMT463000140060900000800500000001140005200000000000
SP386Bacteria;Actinobacteria;Actinomycetia;Micrococcales;Micrococcaceae;Rothia;aeria288131134711281216318192024102627163956711461642548170202115166401815942564456510189917520120
SP388Bacteria;Firmicutes;Tissierellia;Tissierellales;Peptoniphilaceae;Parvimonas;sp. HMT1100000000000000000600069400000001190080600000
SP39Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;australis71632460019669136814765051168102455160506273146684397314482397227543054101014490
SP392Bacteria;Bacteroidetes;Flavobacteriia;Flavobacteriales;Flavobacteriaceae;Capnocytophaga;sp. HMT8631601194110120050302001800000172619120004100000100041100
SP394Bacteria;Proteobacteria;Gammaproteobacteria;Pasteurellales;Pasteurellaceae;Aggregatibacter;sp. HMT513037600408012901829031204300000000580000530094335548400086
SP4Bacteria;Firmicutes;Clostridia;Eubacteriales;Lachnospiraceae;Lachnoanaerobaculum;orale4338116464747008730469111003236105213803011757510512616269916844011130720108221031178
SP40Bacteria;Actinobacteria;Actinomycetia;Actinomycetales;Actinomycetaceae;Schaalia;sp. HMT1803033655223625250941507034023362711557612962902686641111418908111531202603914700017224323715
SP41Bacteria;Bacteroidetes;Flavobacteriia;Flavobacteriales;Weeksellaceae;Weeksellaceae_[G-1];sp. HMT900085570001212603019050060000017012905055527001812071775
SP410Bacteria;Actinobacteria;Actinomycetia;Micrococcales;Micrococcaceae;Rothia;dentocariosa225273132267569132214248320466521039110290111872019007289972847113171527911195485435
SP411Bacteria;Proteobacteria;Betaproteobacteria;Neisseriales;Neisseriaceae;Kingella;oralis5105413567478215112500500012331105185622185016398559129132
SP42Bacteria;Bacteroidetes;Flavobacteriia;Flavobacteriales;Flavobacteriaceae;Capnocytophaga;granulosa4132172032412342800380000119190150540107510140780110918037222975013
SP420Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Prevotellaceae;Prevotella;aurantiaca96000000156006100534000000000018000000000000000
SP429Bacteria;Actinobacteria;Actinomycetia;Bifidobacteriales;Bifidobacteriaceae;Scardovia;wiggsiae40592360403311300020000062706040135000015002012
SP43Bacteria;Bacteroidetes;Flavobacteriia;Flavobacteriales;Flavobacteriaceae;Capnocytophaga;gingivalis6024926010053157290106826704303106500271263526059238874105146042211925
SP44Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;chosunense87073749411311814135332988128403140536746731432132556968920948045243844513716021259660602123565385215921383715855707668169284525262515
SP448Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;peroris0000783000000000000016970151000000000000006000
SP45Bacteria;Firmicutes;Bacilli;Bacillales;Gemellaceae;Gemella;sanguinis3012414421193802842234014313328146297330170273317474903843359050716217917013526841074354851497919683110
SP451Bacteria;Proteobacteria;Betaproteobacteria;Neisseriales;Neisseriaceae;Neisseria;oralis37001320015307134740360700002222001500616150026240190137034
SP46Bacteria;Firmicutes;Clostridia;Eubacteriales;Lachnospiraceae;Lachnoanaerobaculum;umeaense16000230000017004100000340010242581003400000000000
SP47Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;parasanguinis_clade_4110222596163100932135561997354194131054046993527973751344572605591535210117625704035019319717412373838748377
SP48Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Prevotellaceae;Prevotella;salivae20616663526217536060187839185265281379161992341007100737983096821857638490511258779013857123001343222829223125
SP49Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Porphyromonadaceae;Porphyromonas;sp. HMT9303479565610144142284701858993960524264132730331214401172036431920925730239185720263204623149200758112719
SP5Bacteria;Proteobacteria;Gammaproteobacteria;Pasteurellales;Pasteurellaceae;Haemophilus;sp. HMT036130000312075201302400480060403050288565314702291130291449026
SP50Bacteria;Fusobacteria;Fusobacteriia;Fusobacteriales;Leptotrichiaceae;Leptotrichia;sp. HMT21558950103101569438938392330321292116712635656384454001697665045266402149906213457904435261945068129147
SP51Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Prevotellaceae;Prevotella;pallens170818146373029220635640888171414677371580357172013971721864180524444714836051565574461477702499242021074296146278686992
SP52Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Porphyromonadaceae;Porphyromonas;catoniae2135577324242366165830560577108434000011455241502210851276137371944385766
SP53Bacteria;Fusobacteria;Fusobacteriia;Fusobacteriales;Leptotrichiaceae;Leptotrichia;sp. HMT417477186199779883859422710740312332232521097110248532610702511793441269319106056729376700028546656711504351992
SP55Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;mitis401470142021094178248241288989414661752425735891017229212533091324758222461841314144122261636938973667109731110521816615568103642154617345011015470341392200
SP57Bacteria;Proteobacteria;Betaproteobacteria;Neisseriales;Neisseriaceae;Eikenella;corrodens28151428119014010304068034719000160155413003921150880191091390
SP59Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Prevotellaceae;Prevotella;sp. HMT3064011002340080347701180303174391131061471001991002933500000032106615850
SP60Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Prevotellaceae;Prevotella;nigrescens9778179614111222050752906000508613033730051100091064818216185144
SP61Bacteria;Bacteroidetes;Flavobacteriia;Flavobacteriales;Weeksellaceae;Riemerella;sp. HMT3227947242632114383093671514524320296693408744112275076574254539081415689288264659
SP65Bacteria;Proteobacteria;Gammaproteobacteria;Pasteurellales;Pasteurellaceae;Haemophilus;parainfluenzae131716632892929593901255104542048533876525321310581814502338218333392794611062093178423124911251803104754321362190437616411799170417751076
SP66Bacteria;Actinobacteria;Actinomycetia;Corynebacteriales;Corynebacteriaceae;Corynebacterium;durum3217378240700132590115185000117810152408622218554361931155781222
SP67Bacteria;Firmicutes;Negativicutes;Veillonellales;Veillonellaceae;Dialister;invisus1017164610132515310760010060440001961481550451600027112000022
SP68Bacteria;Actinobacteria;Actinomycetia;Actinomycetales;Actinomycetaceae;Actinomyces;graevenitzii22653170283291301308110161705181059553544863116580571424395102973184255900601635827317167516141285
SP69Bacteria;Actinobacteria;Actinomycetia;Actinomycetales;Actinomycetaceae;Schaalia;meyeri00000024000600015700000000012000000007000000
SP7Bacteria;Proteobacteria;Gammaproteobacteria;Pasteurellales;Pasteurellaceae;Haemophilus;sp. HMT90811712284711082976487012501000060001001800090136081123970
SP70Bacteria;Bacteroidetes;Flavobacteriia;Flavobacteriales;Flavobacteriaceae;Capnocytophaga;sputigena1033214118481449804223563222354111083935441962131197621110713170840013746932645010537
SP72Bacteria;Actinobacteria;Actinomycetia;Actinomycetales;Actinomycetaceae;Actinomyces;sp. HMT175000908601612040002824900021393314144120553100121136262035044
SP73Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Prevotellaceae;Prevotella;saccharolytica8610210080500005900000003038206019500400001270
SP74Bacteria;Firmicutes;Clostridia;Eubacteriales;Peptostreptococcaceae;Peptostreptococcaceae_[G-7];[Eubacterium]_yurii_subsps._yurii_&_margaretiae000122001600000140508013000149323600600000003021
SP75Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;sp. HMT061531140500448427120321442132853960463533014036441435149572617222320303204188159213001511302
SP76Bacteria;Fusobacteria;Fusobacteriia;Fusobacteriales;Leptotrichiaceae;Leptotrichia;sp. HMT225619057441641280232524017440910000011117281631611320307660621152619002634
SP77Bacteria;Firmicutes;Negativicutes;Selenomonadales;Selenomonadaceae;Selenomonas;noxia0788020916002220180000003292244128200000240302500
SP78Bacteria;Firmicutes;Clostridia;Eubacteriales;Ruminococcaceae;Ruminococcaceae_[G-2];bacterium HMT0852741991991612390371103714511911601101101259614282614257173008560631415397354206122591603147114
SP79Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;cristatus440900110000597012108271350249006907831000140000000002900
SP8Bacteria;Firmicutes;Clostridia;Eubacteriales;Lachnospiraceae;Oribacterium;asaccharolyticum00540480000000000011002251014401720000030000000000
SP81Bacteria;Proteobacteria;Betaproteobacteria;Burkholderiales;Burkholderiaceae;Lautropia;mirabilis54521439471210424121124747301131415256052181351566142210191163304638452098300161831768693212081933758255755
SP82Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;pneumoniae4000000000011320940000004800000243200000000000000
SP83Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;sp. HMT4234349263221348641016044010401600587530503150286481250100205082947418940683728832024612734019161487426553
SP84Bacteria;Saccharibacteria_(TM7);Saccharibacteria_(TM7)_[C-1];Saccharibacteria_(TM7)_[O-1];Saccharibacteria_(TM7)_[F-1];Saccharibacteria_(TM7)_[G-1];bacterium HMT347000201108000040012012061001301960000271100105028008120
SP85Bacteria;Firmicutes;Clostridia;Eubacteriales;Lachnospiraceae;Lachnospiraceae_[G-2];bacterium HMT09661170409000310053000790362417001627470000001300230801003571
SP86Bacteria;Fusobacteria;Fusobacteriia;Fusobacteriales;Fusobacteriaceae;Fusobacterium;periodonticum1222255563614854713181496999310026852984675111130234891332160140880558142374817869332317822688111389160151611893218999352421786695
SP87Bacteria;Firmicutes;Bacilli;Lactobacillales;Aerococcaceae;Abiotrophia;defectiva397422363615802623868187114341691202281922532564567122227238440319991793155553124764284264199364622101
SP88Bacteria;Proteobacteria;Gammaproteobacteria;Pasteurellales;Pasteurellaceae;Aggregatibacter;sp. HMT458871730756914480194755034156020700319604223521958263203022874904655422082
SP89Bacteria;Fusobacteria;Fusobacteriia;Fusobacteriales;Leptotrichiaceae;Leptotrichia;sp. HMT2182517982519600001104920000353011902451562210411800010100000000024362
SP9Bacteria;Firmicutes;Clostridia;Eubacteriales;Lachnospiraceae;Oribacterium;parvum0000000022564704327280000186000000000025004219000013
SP90Bacteria;Firmicutes;Clostridia;Eubacteriales;Peptostreptococcaceae;Filifactor;alocis214106231261150056200024021000470031300094000270000013
SP91Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;sp. HMT057560243632252309412451501125642016300771391210172317415000008000062
SP92Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;oralis_subsp._tigurinus_clade_071321927029210304871741409900366700025219134013148000036529264001286
SP93Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;anginosus1200045271519000000005123000100390344110320160390200
SP94Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Prevotellaceae;Alloprevotella;sp. HMT914149161119004374596280008601731322000066053196731313103532389164100124427
SP95Bacteria;Saccharibacteria_(TM7);Saccharibacteria_(TM7)_[C-1];Saccharibacteria_(TM7)_[O-1];Saccharibacteria_(TM7)_[F-1];Saccharibacteria_(TM7)_[G-1];bacterium HMT95700018000007000000000000013300063920000115000000
SP96Bacteria;Gracilibacteria_(GN02);Gracilibacteria_(GN02)_[C-1];Gracilibacteria_(GN02)_[O-1];Gracilibacteria_(GN02)_[F-1];Gracilibacteria_(GN02)_[G-1];bacterium HMT8720404565000010007705000001564200007080060620308
SP97Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Prevotellaceae;Alloprevotella;rava219005700010221003360002119115023600801170001401130044080002147
SP98Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Prevotellaceae;Prevotella;oulorum13145778041300262002207350520157320060500508903800033
SP99Bacteria;Firmicutes;Negativicutes;Selenomonadales;Selenomonadaceae;Selenomonas;sp. HMT1361760252041237532818500173005146138456141236936179770173640261740013137971253680
SPN100Bacteria;Actinobacteria;Actinomycetia;Corynebacteriales;Corynebacteriaceae;Corynebacterium;matruchotii_nov_97.546%0000000000000000000002101602291000000000000000
SPN101Bacteria;Fusobacteria;Fusobacteriia;Fusobacteriales;Leptotrichiaceae;Pseudoleptotrichia;goodfellowii_nov_91.416%2330269017917302103510000012902395073000002127001680005300002284770
SPN102Bacteria;Bacteroidetes;Flavobacteriia;Flavobacteriales;Flavobacteriaceae;Capnocytophaga;gingivalis_nov_97.899%021080012000100000000300200615213180560610132300000
SPN103Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Prevotellaceae;Prevotella;jejuni_nov_97.755%023300017000001000000000000000000000000003100
SPN104Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Porphyromonadaceae;Porphyromonas;sp. HMT284 nov_97.746%04002605408091400009048000261790017000000900000
SPN106Bacteria;Fusobacteria;Fusobacteriia;Fusobacteriales;Leptotrichiaceae;Streptobacillus;notomytis_nov_93.856%48317118246244403257622712324901010580310007166317311183277283380001975433300745108222
SPN107Bacteria;Actinobacteria;Actinomycetia;Corynebacteriales;Corynebacteriaceae;Corynebacterium;durum_nov_97.280%170019009240080002700140001060810128160007019018000
SPN108Bacteria;Firmicutes;Bacilli;Lactobacillales;Carnobacteriaceae;Granulicatella;elegans_nov_97.988%10100000000000063005600000000000000000000000
SPN109Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Porphyromonadaceae;Porphyromonas;catoniae_nov_97.746%000089020018110000300009050140007253290000000220
SPN110Bacteria;Firmicutes;Clostridia;Eubacteriales;Peptostreptococcaceae;Peptoanaerobacter;[Eubacterium] yurii_nov_90.795%3100000129401100014000000000012000000000000060
SPN111Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;sanguinis_nov_97.782%445570000400193300000000000380000000000000000
SPN112Bacteria;Fusobacteria;Fusobacteriia;Fusobacteriales;Fusobacteriaceae;Fusobacterium;hwasookii_nov_97.783%166766447202006680320130054204721000003201891351870211000160237001574578
SPN146Bacteria;Actinobacteria;Actinomycetia;Actinomycetales;Actinomycetaceae;Schaalia;sp. HMT180 nov_97.930%00000000000000001819001990101100000000000000000
SPN148Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;parasanguinis_clade_411_nov_97.773%0000476053400302100304700019411330100000010000314100
SPN15Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Prevotellaceae;Prevotella;histicola_nov_97.347%000000010000000001101339011101200003100000000000
SPN156Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Prevotellaceae;Alloprevotella;sp. HMT308 nov_97.347%00104000012302045421090000016250231009014480007201025500112
SPN169Bacteria;Firmicutes;Clostridia;Eubacteriales;Ruminococcaceae;Ruminococcaceae_[G-1];bacterium HMT075 nov_91.453%120613012001601700005300000000001510005000807006272190
SPN184Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Prevotellaceae;Alloprevotella;sp. HMT914 nov_97.551%42913790027411510100013402562610000034763740210541401341036270002635
SPN188Bacteria;Firmicutes;Negativicutes;Veillonellales;Veillonellaceae;Veillonella;sp. HMT780 nov_97.628%100181000088120170904193800000001470005000024802600135550
SPN210Bacteria;Firmicutes;Negativicutes;Selenomonadales;Selenomonadaceae;Selenomonas;sputigena_nov_97.000%0501900068000110343500538140152230021000380000200130
SPN220Bacteria;Fusobacteria;Fusobacteriia;Fusobacteriales;Leptotrichiaceae;Pseudoleptotrichia;goodfellowii_nov_92.060%000041500000012600002030000016040600330000110000300
SPN231Bacteria;Proteobacteria;Gammaproteobacteria;Pseudomonadales;Moraxellaceae;Moraxella;boevrei_nov_92.807%00178000000140220045015000000001400041004000000000
SPN240Bacteria;Firmicutes;Clostridia;Eubacteriales;Lachnospiraceae;Lachnospiraceae_[G-2];bacterium HMT088 nov_93.096%34131274810647039688071000000001602637230530001764600103480
SPN25Bacteria;Saccharibacteria_(TM7);Saccharibacteria_(TM7)_[C-1];Saccharibacteria_(TM7)_[O-1];Saccharibacteria_(TM7)_[F-1];Saccharibacteria_(TM7)_[G-6];bacterium HMT870 nov_96.994%8190018700101012700830400000000126805614024000000170
SPN250Bacteria;Bacteroidetes;Flavobacteriia;Flavobacteriales;Flavobacteriaceae;Capnocytophaga;gingivalis_nov_96.218%14607900906000023601805000004758234202159000330000149
SPN3Bacteria;Proteobacteria;Gammaproteobacteria;Pseudomonadales;Moraxellaceae;Moraxella;oblonga_nov_93.776%00028008023500000573079000000045386200021500000000000684
SPN32Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Prevotellaceae;Prevotella;sp. HMT305 nov_93.865%65700820309138511770766248010119210600558916000126510057000140
SPN4Bacteria;Fusobacteria;Fusobacteriia;Fusobacteriales;Leptotrichiaceae;Leptotrichia;sp. HMT215 nov_97.397%821170000470000710201000100000006613000000059000000
SPN48Bacteria;Proteobacteria;Gammaproteobacteria;Pasteurellales;Pasteurellaceae;Haemophilus;sp. HMT908 nov_96.509%00231700000000000000000000265000051000242344300020
SPN60Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Prevotellaceae;Prevotella;veroralis_nov_97.546%0430284072282001850001800000220112882301900000002000
SPN81Bacteria;Bacteroidetes;Flavobacteriia;Flavobacteriales;Flavobacteriaceae;Capnocytophaga;gingivalis_nov_90.546%0001130000000007500000020002303011738000800005115
SPN91Bacteria;Fusobacteria;Fusobacteriia;Fusobacteriales;Leptotrichiaceae;Leptotrichia;sp. HMT215 nov_96.963%000000000000010400027600000050000000000000000
SPP12Bacteria;Firmicutes;Negativicutes;Veillonellales;Veillonellaceae;Veillonella;multispecies_spp12_23018992563209200038004800074392219516667000800370003517112800
SPP16Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;multispecies_spp16_43900000000000000000000000000188000000000000
SPP17Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;multispecies_spp17_3000000000011000000067120000015043000000000027100
SPP18Bacteria;Firmicutes;Negativicutes;Veillonellales;Veillonellaceae;Veillonella;multispecies_spp18_3015497562775152227733646513214491582001951537804924081601133210794218949010725024462040891639210993667134148589077405
SPP20Bacteria;Fusobacteria;Fusobacteriia;Fusobacteriales;Fusobacteriaceae;Fusobacterium;multispecies_spp20_200000302722034104004000000051341600004260003121000040
SPP21Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;multispecies_spp21_20000015000000000000000000000000000400762006990
SPP24Bacteria;Proteobacteria;Gammaproteobacteria;Pasteurellales;Pasteurellaceae;Haemophilus;multispecies_spp24_20014433316005217062102106000000006000080033003050344180
SPP25Bacteria;Firmicutes;Negativicutes;Veillonellales;Veillonellaceae;Veillonella;multispecies_spp25_2129173710931067652833163766206707687371405069317819797449252296764878552978804572015956917418133201715122611983860556109539
SPP28Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;multispecies_spp28_37102787117700304430899180696510690270277471024713701261168447706623300297250
SPP37Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;multispecies_spp37_213913204631082188127057491932925739672747712123560228147146658199272870548262811471172556842191968870163137481414644288732
SPP39Bacteria;Firmicutes;Clostridia;Eubacteriales;Lachnospiraceae;Oribacterium;multispecies_spp39_223245572287081142601031461515713447136422571351115919621198386147254887416513421940166128251433137108
SPP4Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;multispecies_spp4_20104205332876590141113635446050124469133202881704434643311138111514941652112842305525153078
SPP5Bacteria;Firmicutes;Clostridia;Eubacteriales;Lachnospiraceae;Lachnoanaerobaculum;multispecies_spp5_2021780016016094013130031011500000170504100001300000
SPPN10Bacteria;Firmicutes;Tissierellia;Tissierellales;Peptoniphilaceae;Parvimonas;multispecies_sppn10_2_nov_96.842%2213500000142231200016402090129011000025400002900490000612
SPPN2Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Porphyromonadaceae;Porphyromonas;multispecies_sppn2_2_nov_97.536%000000000342100296000000090132000900186000000014
SPPN3Bacteria;Fusobacteria;Fusobacteriia;Fusobacteriales;Fusobacteriaceae;Fusobacterium;multispecies_sppn3_3_nov_97.845%000003324240216600390000000034045000677000004300000
SPPN5Bacteria;Proteobacteria;Gammaproteobacteria;Pasteurellales;Pasteurellaceae;Haemophilus;multispecies_sppn5_2_nov_97.536%1580167200000000000000000000179032000000000000
SPPN6Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;multispecies_sppn6_2_nov_97.976%1917429042210011129000370022082357137031047311234174927500000353340140068
SPPN7Bacteria;Cyanobacteria;Gloeobacteria;Gloeobacterales;Gloeobacteraceae;Gloeobacter;multispecies_sppn7_2_nov_84.439%152300390041000000000000039200000300002502323731000
 
 
Download OTU Tables at Different Taxonomy Levels
PhylumCount*: Relative**: CLR***:
ClassCount*: Relative**: CLR***:
OrderCount*: Relative**: CLR***:
FamilyCount*: Relative**: CLR***:
GenusCount*: Relative**: CLR***:
SpeciesCount*: Relative**: CLR***:
* Read count
** Relative abundance (count/total sample count)
*** Centered log ratio transformed abundance
;
 
The species listed in the table has full taxonomy and a dynamically assigned species ID specific to this report. When some reads match with the reference sequences of more than one species equally (i.e., same percent identiy and alignmnet coverage), they can't be assigned to a particular species. Instead, they are assigned to multiple species with the species notaton "s__multispecies_spp2_2". In this notation, spp2 is the dynamic ID assigned to these reads that hit multiple sequences and the "_2" at the end of the notation means there are two species in the spp2.

You can look up which species are included in the multi-species assignment, in this table below:
 
 
 
 
Another type of notation is "s__multispecies_sppn2_2", in which the "n" in the sppn2 means it's a potential novel species because all the reads in this species have < 98% idenity to any of the reference sequences. They were grouped together based on de novo OTU clustering at 98% identity cutoff. And then a representative sequence was chosed to BLASTN search against the reference database to find the closest match (but will still be < 98%). This representative sequence also matched equally to more than one species, hence the "spp" was given in the label.
 
 

Taxonomy Bar Plots for All Samples

 
 

Taxonomy Bar Plots for Individual Comparison Groups

 
 
Comparison No.Comparison NameFamiliesGeneraSpecies
Comparison 1Cases vs ControlsPDFSVGPDFSVGPDFSVG
 
 

VIII. Analysis - Alpha Diversity

 

In ecology, alpha diversity (α-diversity) is the mean species diversity in sites or habitats at a local scale. The term was introduced by R. H. Whittaker[5][6] together with the terms beta diversity (β-diversity) and gamma diversity (γ-diversity). Whittaker's idea was that the total species diversity in a landscape (gamma diversity) is determined by two different things, the mean species diversity in sites or habitats at a more local scale (alpha diversity) and the differentiation among those habitats (beta diversity).

 

References:

  1. Whittaker, R. H. (1960) Vegetation of the Siskiyou Mountains, Oregon and California. Ecological Monographs, 30, 279–338. doi:10.2307/1943563
  2. Whittaker, R. H. (1972). Evolution and Measurement of Species Diversity. Taxon, 21, 213-251. doi:10.2307/1218190

 

Alpha Diversity Analysis by Rarefaction

Diversity measures are affected by the sampling depth. Rarefaction is a technique to assess species richness from the results of sampling. Rarefaction allows the calculation of species richness for a given number of individual samples, based on the construction of so-called rarefaction curves. This curve is a plot of the number of species as a function of the number of samples. Rarefaction curves generally grow rapidly at first, as the most common species are found, but the curves plateau as only the rarest species remain to be sampled [7].


References:

  1. Willis AD. Rarefaction, Alpha Diversity, and Statistics. Front Microbiol. 2019 Oct 23;10:2407. doi: 10.3389/fmicb.2019.02407. PMID: 31708888; PMCID: PMC6819366.

 
 
 

Boxplot of Alpha-diversity Indices

The two main factors taken into account when measuring diversity are richness and evenness. Richness is a measure of the number of different kinds of organisms present in a particular area. Evenness compares the similarity of the population size of each of the species present. There are many different ways to measure the richness and evenness. These measurements are called "estimators" or "indices". Below is a diversity of 3 commonly used indices showing the values for all the samples (dots) and in groups (boxes) at the species level.

Printed on each graph is the statistical significance p values of the difference between the groups. The significance is calculated using either Kruskal-Wallis test or the Wilcoxon rank sum test, both are non-parametric methods (since microbiome read count data are considered non-normally distributed) for testing whether samples originate from the same distribution (i.e., no difference between groups). The Kruskal-Wallis test is used to compare three or more independent groups to determine if there are statistically significant differences between their medians. The Wilcoxon Rank Sum test, also known as the Mann-Whitney U test, is used to compare two independent groups to determine if there is a significant difference between their distributions.
The p-value is shown on the top of each graph. A p-value < 0.05 is considered statistically significant between/among the test groups.

 
Alpha Diversity Box Plots for All Groups - Species Level
 
 
 
 
 
 
 
 
 
Alpha Diversity Box Plots for Individual Comparisons at Species level
 
Comparison 1Cases vs ControlsView in PDFView in SVG
 
The above comparisons are at the species-level. Comparisons of other taxonomy levels, from phylum to genus, are also available:
 
 
 

IX. Analysis - Beta Diversity

 

NMDS and PCoA Plots

Beta diversity compares the similarity (or dissimilarity) of microbial profiles between different groups of samples. There are many different similarity/dissimilarity metrics [8]. In general, they can be quantitative (using sequence abundance, e.g., Bray-Curtis or weighted UniFrac) or binary (considering only presence-absence of sequences, e.g., binary Jaccard or unweighted UniFrac). They can be even based on phylogeny (e.g., UniFrac metrics) or not (non-UniFrac metrics, such as Bray-Curtis, etc.).

For microbiome studies, species profiles of samples can be compared with the Bray-Curtis dissimilarity, which is based on the count data type. The pair-wise Bray-Curtis dissimilarity matrix of all samples can then be subject to either multi-dimensional scaling (MDS, also known as PCoA) or non-metric MDS (NMDS).

MDS/PCoA is a scaling or ordination method that starts with a matrix of similarities or dissimilarities between a set of samples and aims to produce a low-dimensional graphical plot of the data in such a way that distances between points in the plot are close to original dissimilarities.

NMDS is similar to MDS, however it does not use the dissimilarities data, instead it converts them into the ranks and use these ranks in the calculation.

References:

  1. Plantinga, AM, Wu, MC (2021). Beta Diversity and Distance-Based Analysis of Microbiome Data. In: Datta, S., Guha, S. (eds) Statistical Analysis of Microbiome Data. Frontiers in Probability and the Statistical Sciences. Springer, Cham. https://doi.org/10.1007/978-3-030-73351-3_5

In our beta diversity analysis, Bray-Curtis dissimilarity matrix was first calculated and then plotted by the PCoA and NMDS separately. Below are beta diveristy results for all groups together, at the Species level:

 
 
NMDS and PCoA Plots for All Groups - Species Level
 
 
 
 
 

The above PCoA and NMDS plots are based on count data. The count data can also be transformed into centered log ratio (CLR) for each species. The CLR data is no longer count data and cannot be used in Bray-Curtis dissimilarity calculation. Instead CLR can be compared with Euclidean distances. When CLR data are compared by Euclidean distance, the distance is also called Aitchison distance.

Below are the NMDS and PCoA plots of the Aitchison distances of the samples at the Species level:

 
 
 
 
 
 
 
NMDS and PCoA Plots for Individual Comparisons at Species level
 
 
Comparison No.Comparison NameNMDAPCoA
Bray-CurtisCLR EuclideanBray-CurtisCLR Euclidean
Comparison 1Cases vs ControlsPDFSVGPDFSVGPDFSVGPDFSVG
 
 
 
 
 
 

Interactive 3D PCoA Plots - Bray-Curtis Dissimilarity

 
 
 

Interactive 3D PCoA Plots - Euclidean Distance

 
 
 

Interactive 3D PCoA Plots - Correlation Coefficients

 
 
 

X. Analysis - Differential Abundance

16S rRNA next generation sequencing (NGS) generates a fixed number of reads that reflect the proportion of different species in a sample, i.e., the relative abundance of species, instead of the absolute abundance. In Mathematics, measurements involving probabilities, proportions, percentages, and ppm can all be thought of as compositional data. This makes the microbiome read count data “compositional” (Gloor et al, 2017). In general, compositional data represent parts of a whole which only carry relative information [9].

The problem of microbiome data being compositional arises when comparing two groups of samples for identifying “differentially abundant” species. A species with the same absolute abundance between two conditions, its relative abundances in the two conditions (e.g., percent abundance) can become different if the relative abundance of other species change greatly. This problem can lead to incorrect conclusion in terms of differential abundance for microbial species in the samples.

When studying differential abundance (DA), the current better approach is to transform the read count data into log ratio data. The ratios are calculated between read counts of all species in a sample to a “reference” count (e.g., mean read count of the sample). The log ratio data allow the detection of DA species without being affected by percentage bias mentioned above

In this report, a compositional DA analysis tool “ANCOM” (analysis of composition of microbiomes) was used [10]. ANCOM transforms the count data into log-ratios and thus is more suitable for comparing the composition of microbiomes in two or more populations. "ANCOM" generates a table of features with W-statistics and whether the null hypothesis is rejected. The “W” is the W-statistic, or number of features that a single feature is tested to be significantly different against. Hence the higher the "W" the more statistical sifgnificant that a feature/species is differentially abundant.

References:

  1. Gloor GB, Macklaim JM, Pawlowsky-Glahn V, Egozcue JJ. Microbiome Datasets Are Compositional: And This Is Not Optional. Front Microbiol. 2017 Nov 15;8:2224. doi: 10.3389/fmicb.2017.02224. PMID: 29187837; PMCID: PMC5695134.
  2. Mandal S, Van Treuren W, White RA, Eggesbø M, Knight R, Peddada SD. Analysis of composition of microbiomes: a novel method for studying microbial composition. Microb Ecol Health Dis. 2015 May 29;26:27663. doi: 10.3402/mehd.v26.27663. PMID: 26028277; PMCID: PMC4450248.
 
 

ANCOM Differential Abundance Analysis

 
ANCOM Results for Individual Comparisons
Comparison No.Comparison Name
Comparison 1.Cases vs Controls
 
 

ANCOM-BC2 Differential Abundance Analysis

 

Starting with version V1.2, we include the results of ANCOM-BC (Analysis of Compositions of Microbiomes with Bias Correction) (Lin and Peddada 2020) [11]. ANCOM-BC is an updated version of "ANCOM" that:
(a) provides statistically valid test with appropriate p-values,
(b) provides confidence intervals for differential abundance of each taxon,
(c) controls the False Discovery Rate (FDR),
(d) maintains adequate power, and
(e) is computationally simple to implement.

The bias correction (BC) addresses a challenging problem of the bias introduced by differences in the sampling fractions across samples. This bias has been a major hurdle in performing DA analysis of microbiome data. ANCOM-BC estimates the unknown sampling fractions and corrects the bias induced by their differences among samples. The absolute abundance data are modeled using a linear regression framework.

Starting with version V1.43, ANCOM-BC2 is used instead of ANCOM-BC, So that multiple pairwise directional test can be performed (if there are more than two gorups in a comparison). When performing pairwise directional test, the mixed directional false discover rate (mdFDR) is taken into account. The mdFDR is the combination of false discovery rate due to multiple testing, multiple pairwise comparisons, and directional tests within each pairwise comparison. The mdFDR is adopted from (Guo, Sarkar, and Peddada 2010 [12]; Grandhi, Guo, and Peddada 2016 [13]). For more detail explanation and additional features of ANCOM-BC2 please see author's documentation.

References:

  1. Lin H, Peddada SD. Analysis of compositions of microbiomes with bias correction. Nat Commun. 2020 Jul 14;11(1):3514. doi: 10.1038/s41467-020-17041-7. PMID: 32665548; PMCID: PMC7360769.
  2. Guo W, Sarkar SK, Peddada SD. Controlling false discoveries in multidimensional directional decisions, with applications to gene expression data on ordered categories. Biometrics. 2010 Jun;66(2):485-92. doi: 10.1111/j.1541-0420.2009.01292.x. Epub 2009 Jul 23. PMID: 19645703; PMCID: PMC2895927.
  3. Grandhi A, Guo W, Peddada SD. A multiple testing procedure for multi-dimensional pairwise comparisons with application to gene expression studies. BMC Bioinformatics. 2016 Feb 25;17:104. doi: 10.1186/s12859-016-0937-5. PMID: 26917217; PMCID: PMC4768411.
 
 
ANCOM-BC Results for Individual Comparisons
 
Comparison No.Comparison Name
Comparison 1.Cases vs Controls
 
 
 
 
 

LEfSe - Linear Discriminant Analysis Effect Size

LEfSe (Linear Discriminant Analysis Effect Size) is an alternative method to find "organisms, genes, or pathways that consistently explain the differences between two or more microbial communities" (Segata et al., 2011) [14]. Specifically, LEfSe uses rank-based Kruskal-Wallis (KW) sum-rank test to detect features with significant differential (relative) abundance with respect to the class of interest. Since it is rank-based, instead of proportional based, the differential species identified among the comparison groups is less biased (than percent abundance based).

Reference:

  1. Segata N, Izard J, Waldron L, Gevers D, Miropolsky L, Garrett WS, Huttenhower C. Metagenomic biomarker discovery and explanation. Genome Biol. 2011 Jun 24;12(6):R60. doi: 10.1186/gb-2011-12-6-r60. PMID: 21702898; PMCID: PMC3218848.
 
Cases vs Controls
 
 
 
 
 
 
 

XI. Analysis - Heatmap Profile

 

Species vs Sample Abundance Heatmap for All Samples

 
 
 

Heatmaps for Individual Comparisons

 
A) Two-way clustering - clustered on both columns (Samples) and rows (organism)
Comparison No.Comparison NameFamily LevelGenus LevelSpecies Level
Comparison 1Cases vs ControlsPDFSVGPDFSVGPDFSVG
 
 
B) One-way clustering - clustered on rows (organism) only
Comparison No.Comparison NameFamily LevelGenus LevelSpecies Level
Comparison 1Cases vs ControlsPDFSVGPDFSVGPDFSVG
 
 
C) No clustering
Comparison No.Comparison NameFamily LevelGenus LevelSpecies Level
Comparison 1Cases vs ControlsPDFSVGPDFSVGPDFSVG
 
 

XII. Analysis - Network Association

To analyze the co-occurrence or co-exclusion between microbial species among different samples, network correlation analysis tools are usually used for this purpose. However, microbiome count data are compositional. If count data are normalized to the total number of counts in the sample, the data become not independent and traditional statistical metrics (e.g., correlation) for the detection of specie-species relationships can lead to spurious results. In addition, sequencing-based studies typically measure hundreds of OTUs (species) on few samples; thus, inference of OTU-OTU association networks is severely under-powered. We provide the network association result with SparCC (Sparse Correlations for Compositional data)(Friedman & Alm 2012), which is a method for inferring correlations from compositional data. SparCC estimates the linear Pearson correlations between the log-transformed components.


References:

Friedman J, Alm EJ. Inferring correlation networks from genomic survey data. PLoS Comput Biol. 2012;8(9):e1002687. doi: 10.1371/journal.pcbi.1002687. Epub 2012 Sep 20. PMID: 23028285; PMCID: PMC3447976.

 

Association Network Inference by SparCC

 

 

 
 

XIII. Disclaimer

The results of this analysis are for research purpose only. They are not intended to diagnose, treat, cure, or prevent any disease. Forsyth and FOMC are not responsible for use of information provided in this report outside the research area.

 

Copyright FOMC 2026