Table of Content
I.	Project Summary
II.	Workflow Checklist
III.	NGS Sequencing Results
IV.	Complete Report Download
V.	Raw Sequence Data Download
VI.	Analysis - DADA2 Read Processing
	Sample Meta Info
	Read Count by Sample
VII.	Analysis - Read Taxonomy Assignment
	Taxonomy Barplots
VIII.	Analysis - Alpha Diversity
IX.	Analysis - Beta Diversity
X.	Analysis - Differential Abundance
	ANCOM Result
	LEfSe Result
XI.	Analysis - Heatmap Profile
XII.	Analysis - Network Association
XIII.	Disclaimer

(Click to navigate)

FOMC Service Report

16S rRNA Gene V1V3 Amplicon Sequencing

Version V1.50

Version History

The Forsyth Institute, Cambridge, MA, USA

July 25, 2025

Project ID: SRP420678

I. Project Summary

Project SRP420678 services include NGS sequencing of the V1V3 region of the 16S rRNA gene amplicons from the samples. First and foremost, please download this report, as well as the sequence raw data from the download links provided below. These links will expire after 60 days. We cannot guarantee the availability of your data after 60 days.

Full Bioinformatics analysis service was requested. We provide many analyses, starting from the raw sequence quality and noise filtering, pair reads merging, as well as chimera filtering for the sequences, using the DADA2 denosing algorithm and pipeline.

We also provide many downstream analyses such as taxonomy assignment, alpha and beta diversity analyses, and differential abundance analysis.

For taxonomy assignment, most informative would be the taxonomy barplots. We provide an interactive barplots to show the relative abundance of microbes at different taxonomy levels (from Phylum to species) that you can choose.

If you specify which groups of samples you want to compare for differential abundance, we provide both ANCOM and LEfSe differential abundance analysis.

II. Workflow Checklist

☑	1.	Sample Received
☑	2.	Sample Quality Evaluated
☑	3.	Sample Prepared for Sequencing
☑	4.	Next-Gen Sequencing
☑	5.	Sequence Quality Check
☑	6.	Absolute Abundance
☑	7.	Report and Raw Sequence Data Available for Download
☑	8.	Bioinformatics Analysis - Reads Processing (DADA2 Quality Trimming, Denoising, Paired Reads Merging)
☑	9.	Bioinformatics Analysis - Reads Taxonomy Assignment
☑	10.	Bioinformatics Analysis - Alpha Diversity Analysis
☑	11.	Bioinformatics Analysis - Beta Diversity Analysis
☑	12.	Bioinformatics Analysis - Differential Abundance Analysis
☑	13.	Bioinformatics Analysis - Heatmap Profile
☑	14.	Bioinformatics Analysis - Network Association

III. NGS Sequencing

The samples were processed and analyzed with the ZymoBIOMICS® Service: Targeted Metagenomic Sequencing (Zymo Research, Irvine, CA).

DNA Extraction: If DNA extraction was performed, the following DNA extraction kit was used according to the manufacturer’s instructions:

☑	ZymoBIOMICS®-96 MagBead DNA Kit (Zymo Research, Irvine, CA)
☐	N/A (DNA Extraction Not Performed)
Elution Volume: 50µL
Additional Notes: NA

Targeted Library Preparation: The DNA samples were prepared for targeted sequencing with the Quick-16S™ NGS Library Prep Kit (Zymo Research, Irvine, CA). These primers were custom designed by Zymo Research to provide the best coverage of the 16S gene while maintaining high sensitivity. The primer sets used in this project are marked below:

☐	Quick-16S™ Primer Set V1-V2 (Zymo Research, Irvine, CA)
☑	Quick-16S™ Primer Set V1-V3 (Zymo Research, Irvine, CA)
☐	Quick-16S™ Primer Set V3-V4 (Zymo Research, Irvine, CA)
☐	Quick-16S™ Primer Set V4 (Zymo Research, Irvine, CA)
☐	Quick-16S™ Primer Set V6-V8 (Zymo Research, Irvine, CA)
Additional Notes: NA

The sequencing library was prepared using an innovative library preparation process in which PCR reactions were performed in real-time PCR machines to control cycles and therefore limit PCR chimera formation. The final PCR products were quantified with qPCR fluorescence readings and pooled together based on equal molarity. The final pooled library was cleaned up with the Select-a-Size DNA Clean & Concentrator™ (Zymo Research, Irvine, CA), then quantified with TapeStation® (Agilent Technologies, Santa Clara, CA) and Qubit® (Thermo Fisher Scientific, Waltham, WA).

Control Samples: The ZymoBIOMICS® Microbial Community Standard (Zymo Research, Irvine, CA) was used as a positive control for each DNA extraction, if performed. The ZymoBIOMICS® Microbial Community DNA Standard (Zymo Research, Irvine, CA) was used as a positive control for each targeted library preparation. Negative controls (i.e. blank extraction control, blank library preparation control) were included to assess the level of bioburden carried by the wet-lab process.

Sequencing: The final library was sequenced on Illumina® NextSeq 2000™ with a p1 (Illumina, Sand Diego, CA) reagent kit (600 cycles). The sequencing was performed with 25% PhiX spike-in.

Absolute Abundance Quantification*: A quantitative real-time PCR was set up with a standard curve. The standard curve was made with plasmid DNA containing one copy of the 16S gene and one copy of the fungal ITS2 region prepared in 10-fold serial dilutions. The primers used were the same as those used in Targeted Library Preparation. The equation generated by the plasmid DNA standard curve was used to calculate the number of gene copies in the reaction for each sample. The PCR input volume (2 µl) was used to calculate the number of gene copies per microliter in each DNA sample.
The number of genome copies per microliter DNA sample was calculated by dividing the gene copy number by an assumed number of gene copies per genome. The value used for 16S copies per genome is 4. The value used for ITS copies per genome is 200. The amount of DNA per microliter DNA sample was calculated using an assumed genome size of 4.64 x 10⁶ bp, the genome size of Escherichia coli, for 16S samples, or an assumed genome size of 1.20 x 10⁷ bp, the genome size of Saccharomyces cerevisiae, for ITS samples. This calculation is shown below:

Calculated Total DNA = Calculated Total Genome Copies × Assumed Genome Size (4.64 × 10⁶ bp) ×
Average Molecular Weight of a DNA bp (660 g/mole/bp) ÷ Avogadro’s Number (6.022 x 10²³/mole)

* Absolute Abundance Quantification is only available for 16S and ITS analyses.

The absolute abundance standard curve data can be viewed in Excel here:

The absolute abundance standard curve is shown below:

Absolute Abundance Standard Curve

IV. Complete Report Download

The complete report of your project, including all links in this report, can be downloaded by clicking the link provided below. The downloaded file is a compressed ZIP file and once unzipped, open the file “REPORT.html” (may only shown as "REPORT" in your computer) by double clicking it. Your default web browser will open it and you will see the exact content of this report.

Please download and save the file to your computer storage device. The download link will expire after 60 days upon your receiving of this report.

Complete report download link:

To view the report, please follow the following steps:

1.	Download the .zip file from the report link above.
2.	Extract all the contents of the downloaded .zip file to your desktop.
3.	Open the extracted folder and find the "REPORT.html" (may shown as only "REPORT").
4.	Open (double-clicking) the REPORT.html file. Your default browser will open the top age of the complete report. Within the report, there are links to view all the analyses performed for the project.

V. Raw Sequence Data Download

The raw NGS sequence data is available for download with the link provided below. The data is a compressed ZIP file and can be unzipped to individual sequence files. Since this is a pair-end sequencing, each of your samples is represented by two sequence files, one for READ 1, with the file extension “*_R1.fastq.gz”, another READ 2, with the file extension “*_R1.fastq.gz”. The files are in FASTQ format and are compressed. FASTQ format is a text-based data format for storing both a biological sequence and its corresponding quality scores. Most sequence analysis software will be able to open them. The Sample IDs associated with the R1 and R2 fastq files are listed in the table below:


SRR23319523 Original Sample ID SRR23319523_R1.fastq SRR23319523_R2.fastq
SRR23319524 SRR23319524_R1.fastq SRR23319524_R2.fastq
SRR23319525 SRR23319525_R1.fastq SRR23319525_R2.fastq
SRR23319526 SRR23319526_R1.fastq SRR23319526_R2.fastq
SRR23319527 SRR23319527_R1.fastq SRR23319527_R2.fastq
SRR23319528 SRR23319528_R1.fastq SRR23319528_R2.fastq
SRR23319529 SRR23319529_R1.fastq SRR23319529_R2.fastq
SRR23319530 SRR23319530_R1.fastq SRR23319530_R2.fastq
SRR23319531 SRR23319531_R1.fastq SRR23319531_R2.fastq
SRR23319532 SRR23319532_R1.fastq SRR23319532_R2.fastq
SRR23319533 SRR23319533_R1.fastq SRR23319533_R2.fastq
SRR23319534 SRR23319534_R1.fastq SRR23319534_R2.fastq
SRR23319535 SRR23319535_R1.fastq SRR23319535_R2.fastq
SRR23319536 SRR23319536_R1.fastq SRR23319536_R2.fastq
SRR23319537 SRR23319537_R1.fastq SRR23319537_R2.fastq
SRR23319538 SRR23319538_R1.fastq SRR23319538_R2.fastq
SRR23319539 SRR23319539_R1.fastq SRR23319539_R2.fastq
SRR23319540 SRR23319540_R1.fastq SRR23319540_R2.fastq
SRR23319541 SRR23319541_R1.fastq SRR23319541_R2.fastq
SRR23319542 SRR23319542_R1.fastq SRR23319542_R2.fastq
SRR23319543 SRR23319543_R1.fastq SRR23319543_R2.fastq
SRR23319544 SRR23319544_R1.fastq SRR23319544_R2.fastq
SRR23319545 SRR23319545_R1.fastq SRR23319545_R2.fastq
SRR23319546 SRR23319546_R1.fastq SRR23319546_R2.fastq
SRR23319547 SRR23319547_R1.fastq SRR23319547_R2.fastq
SRR23319548 SRR23319548_R1.fastq SRR23319548_R2.fastq
SRR23319549 SRR23319549_R1.fastq SRR23319549_R2.fastq
SRR23319550 SRR23319550_R1.fastq SRR23319550_R2.fastq
SRR23319552 SRR23319552_R1.fastq SRR23319552_R2.fastq
SRR23319553 SRR23319553_R1.fastq SRR23319553_R2.fastq
SRR23319554 SRR23319554_R1.fastq SRR23319554_R2.fastq
SRR23319555 SRR23319555_R1.fastq SRR23319555_R2.fastq
SRR23319556 SRR23319556_R1.fastq SRR23319556_R2.fastq
SRR23319557 SRR23319557_R1.fastq SRR23319557_R2.fastq
SRR23319558 SRR23319558_R1.fastq SRR23319558_R2.fastq
SRR23319559 SRR23319559_R1.fastq SRR23319559_R2.fastq
SRR23319560 SRR23319560_R1.fastq SRR23319560_R2.fastq
SRR23319561 SRR23319561_R1.fastq SRR23319561_R2.fastq
SRR23319562 SRR23319562_R1.fastq SRR23319562_R2.fastq
SRR23319563 SRR23319563_R1.fastq SRR23319563_R2.fastq
SRR23319564 SRR23319564_R1.fastq SRR23319564_R2.fastq
SRR23319565 SRR23319565_R1.fastq SRR23319565_R2.fastq
SRR23319566 SRR23319566_R1.fastq SRR23319566_R2.fastq
SRR23319567 SRR23319567_R1.fastq SRR23319567_R2.fastq
SRR23319568 SRR23319568_R1.fastq SRR23319568_R2.fastq
SRR23319569 SRR23319569_R1.fastq SRR23319569_R2.fastq
SRR23319570 SRR23319570_R1.fastq SRR23319570_R2.fastq
SRR23319571 SRR23319571_R1.fastq SRR23319571_R2.fastq
SRR23319572 SRR23319572_R1.fastq SRR23319572_R2.fastq
SRR23319573 SRR23319573_R1.fastq SRR23319573_R2.fastq
SRR23319574 SRR23319574_R1.fastq SRR23319574_R2.fastq
SRR23319575 SRR23319575_R1.fastq SRR23319575_R2.fastq
SRR23319576 SRR23319576_R1.fastq SRR23319576_R2.fastq
SRR23319577 SRR23319577_R1.fastq SRR23319577_R2.fastq
SRR23319578 SRR23319578_R1.fastq SRR23319578_R2.fastq
SRR23319579 SRR23319579_R1.fastq SRR23319579_R2.fastq
SRR23319580 SRR23319580_R1.fastq SRR23319580_R2.fastq
SRR23319581 SRR23319581_R1.fastq SRR23319581_R2.fastq
SRR23319582 SRR23319582_R1.fastq SRR23319582_R2.fastq
SRR23319583 SRR23319583_R1.fastq SRR23319583_R2.fastq
SRR23319584 SRR23319584_R1.fastq SRR23319584_R2.fastq
SRR23319585 SRR23319585_R1.fastq SRR23319585_R2.fastq
SRR23319586 SRR23319586_R1.fastq SRR23319586_R2.fastq
SRR23319587 SRR23319587_R1.fastq SRR23319587_R2.fastq
SRR23319588 SRR23319588_R1.fastq SRR23319588_R2.fastq
SRR23319589 SRR23319589_R1.fastq SRR23319589_R2.fastq
SRR23319590 SRR23319590_R1.fastq SRR23319590_R2.fastq
SRR23319591 SRR23319591_R1.fastq SRR23319591_R2.fastq
SRR23319593 SRR23319593_R1.fastq SRR23319593_R2.fastq
SRR23319595 SRR23319595_R1.fastq SRR23319595_R2.fastq
SRR23319596 SRR23319596_R1.fastq SRR23319596_R2.fastq
SRR23319597 SRR23319597_R1.fastq SRR23319597_R2.fastq
SRR23319598 SRR23319598_R1.fastq SRR23319598_R2.fastq
SRR23319599 SRR23319599_R1.fastq SRR23319599_R2.fastq
SRR23319600 SRR23319600_R1.fastq SRR23319600_R2.fastq
SRR23319601 SRR23319601_R1.fastq SRR23319601_R2.fastq
SRR23319602 SRR23319602_R1.fastq SRR23319602_R2.fastq
SRR23319603 SRR23319603_R1.fastq SRR23319603_R2.fastq
SRR23319604 SRR23319604_R1.fastq SRR23319604_R2.fastq
SRR23319605 SRR23319605_R1.fastq SRR23319605_R2.fastq
SRR23319606 SRR23319606_R1.fastq SRR23319606_R2.fastq
SRR23319607 SRR23319607_R1.fastq SRR23319607_R2.fastq
SRR23319608 SRR23319608_R1.fastq SRR23319608_R2.fastq
SRR23319609 SRR23319609_R1.fastq SRR23319609_R2.fastq
SRR23319610 SRR23319610_R1.fastq SRR23319610_R2.fastq
SRR23319611 SRR23319611_R1.fastq SRR23319611_R2.fastq
SRR23319612 SRR23319612_R1.fastq SRR23319612_R2.fastq

Please download and save the file to your computer storage device. The download link will expire after 60 days upon your receiving of this report.

Raw sequence data download link:

VI. Analysis - DADA2 Read Processing

What is DADA2?

DADA2 is a software package that models and corrects Illumina-sequenced amplicon errors [1]. DADA2 infers sample sequences exactly, without coarse-graining into OTUs, and resolves differences of as little as one nucleotide. DADA2 identified more real variants and output fewer spurious sequences than other methods.

DADA2’s advantage is that it uses more of the data. The DADA2 error model incorporates quality information, which is ignored by all other methods after filtering. The DADA2 error model incorporates quantitative abundances, whereas most other methods use abundance ranks if they use abundance at all. The DADA2 error model identifies the differences between sequences, eg. A->C, whereas other methods merely count the mismatches. DADA2 can parameterize its error model from the data itself, rather than relying on previous datasets that may or may not reflect the PCR and sequencing protocols used in your study.

DADA2 Software Package is available as an R package at : https://benjjneb.github.io/dada2/index.html

References

Callahan BJ, McMurdie PJ, Rosen MJ, Han AW, Johnson AJ, Holmes SP. DADA2: High-resolution sample inference from Illumina amplicon data. Nat Methods. 2016 Jul;13(7):581-3. doi: 10.1038/nmeth.3869. Epub 2016 May 23. PMID: 27214047; PMCID: PMC4927377.

Analysis Procedures:

DADA2 pipeline includes several tools for read quality control, including quality filtering, trimming, denoising, pair merging and chimera filtering. Below are the major processing steps of DADA2:

Step 1. Read trimming based on sequence quality The quality of NGS Illumina sequences often decreases toward the end of the reads. DADA2 allows to trim off the poor quality read ends in order to improve the error model building and pair mergicing performance.

Step 2. Learn the Error Rates The DADA2 algorithm makes use of a parametric error model (err) and every amplicon dataset has a different set of error rates. The learnErrors method learns this error model from the data, by alternating estimation of the error rates and inference of sample composition until they converge on a jointly consistent solution. As in many machine-learning problems, the algorithm must begin with an initial guess, for which the maximum possible error rates in this data are used (the error rates if only the most abundant sequence is correct and all the rest are errors).

Step 3. Infer amplicon sequence variants (ASVs) based on the error model built in previous step. This step is also called sequence "denoising". The outcome of this step is a list of ASVs that are the equivalent of oligonucleotides.

Step 4. Merge paired reads. If the sequencing products are read pairs, DADA2 will merge the R1 and R2 ASVs into single sequences. Merging is performed by aligning the denoised forward reads with the reverse-complement of the corresponding denoised reverse reads, and then constructing the merged “contig” sequences. By default, merged sequences are only output if the forward and reverse reads overlap by at least 12 bases, and are identical to each other in the overlap region (but these conditions can be changed via function arguments).

Step 5. Remove chimera. The core dada method corrects substitution and indel errors, but chimeras remain. Fortunately, the accuracy of sequence variants after denoising makes identifying chimeric ASVs simpler than when dealing with fuzzy OTUs. Chimeric sequences are identified if they can be exactly reconstructed by combining a left-segment and a right-segment from two more abundant “parent” sequences. The frequency of chimeric sequences varies substantially from dataset to dataset, and depends on on factors including experimental procedures and sample complexity.

Results

1. Read Quality Plots NGS sequence analaysis starts with visualizing the quality of the sequencing. Below are the quality plots of the first sample for the R1 and R2 reads separately. In gray-scale is a heat map of the frequency of each quality score at each base position. The mean quality score at each position is shown by the green line, and the quartiles of the quality score distribution by the orange lines. The forward reads are usually of better quality. It is a common practice to trim the last few nucleotides to avoid less well-controlled errors that can arise there. The trimming affects the downstream steps including error model building, merging and chimera calling. FOMC uses an empirical approach to test many combinations of different trim length in order to achieve best final amplicon sequence variants (ASVs), see the next section “Optimal trim length for ASVs”.

Quality plots for all samples:

quality_plots_1-20.pdf

quality_plots_21-40.pdf

quality_plots_41-60.pdf

quality_plots_61-80.pdf

quality_plots_81-87.pdf

2. Optimal trim length for ASVs The final number of merged and chimera-filtered ASVs depends on the quality filtering (hence trimming) in the very beginning of the DADA2 pipeline. In order to achieve highest number of ASVs, an empirical approach was used -

Create a random subset of each sample consisting of 5,000 R1 and 5,000 R2 (to reduce computation time)
Trim 10 bases at a time from the ends of both R1 and R2 up to 50 bases
For each combination of trimmed length (e.g., 300x300, 300x290, 290x290 etc), the trimmed reads are subject to the entire DADA2 pipeline for chimera-filtered merged ASVs
The combination with highest percentage of the input reads becoming final ASVs is selected for the complete set of data

Below is the result of such operation, showing ASV percentages of total reads for all trimming combinations (1st Column = R1 lengths in bases; 1st Row = R2 lengths in bases):

R1/R2	251	241	231	221	211	201
233	75.66%	17.81%	14.34%	0.00%	0.00%	0.00%
223	18.24%	14.65%	0.00%	0.00%	0.00%	0.00%
213	15.02%	0.00%	0.00%	0.00%	0.00%	0.00%
203	0.01%	0.00%	0.00%	0.00%	0.00%	0.00%
193	0.00%	0.00%	0.00%	0.00%	0.00%	0.00%

Based on the above result, the trim length combination of R1 = 233 bases and R2 = 251 bases (highlighted red above), was chosen for generating final ASVs for all sequences. This combination generated highest number of merged non-chimeric ASVs and was used for downstream analyses, if requested.

3. Error plots from learning the error rates After DADA2 building the error model for the set of data, it is always worthwhile, as a sanity check if nothing else, to visualize the estimated error rates. The error rates for each possible transition (A→C, A→G, …) are shown below. Points are the observed error rates for each consensus quality score. The black line shows the estimated error rates after convergence of the machine-learning algorithm. The red line shows the error rates expected under the nominal definition of the Q-score. The ideal result would be the estimated error rates (black line) are a good fit to the observed rates (points), and the error rates drop with increased quality as expected.

Forward Read R1 Error Plot

Reverse Read R2 Error Plot

The PDF version of these plots are available here:

4. DADA2 Result Summary The table below shows the summary of the DADA2 analysis, tracking paired read counts of each samples for all the steps during DADA2 denoising process - including end-trimming (filtered), denoising (denoisedF, denoisedF), pair merging (merged) and chimera removal (nonchim).

Sample ID	SRR23319523	SRR23319524	SRR23319525	SRR23319526	SRR23319527	SRR23319528	SRR23319529	SRR23319530	SRR23319531	SRR23319532	SRR23319533	SRR23319534	SRR23319535	SRR23319536	SRR23319537	SRR23319538	SRR23319539	SRR23319540	SRR23319541	SRR23319542	SRR23319543	SRR23319544	SRR23319545	SRR23319546	SRR23319547	SRR23319548	SRR23319549	SRR23319550	SRR23319552	SRR23319553	SRR23319554	SRR23319555	SRR23319556	SRR23319557	SRR23319558	SRR23319559	SRR23319560	SRR23319561	SRR23319562	SRR23319563	SRR23319564	SRR23319565	SRR23319566	SRR23319567	SRR23319568	SRR23319569	SRR23319570	SRR23319571	SRR23319572	SRR23319573	SRR23319574	SRR23319575	SRR23319576	SRR23319577	SRR23319578	SRR23319579	SRR23319580	SRR23319581	SRR23319582	SRR23319583	SRR23319584	SRR23319585	SRR23319586	SRR23319587	SRR23319588	SRR23319589	SRR23319590	SRR23319591	SRR23319593	SRR23319595	SRR23319596	SRR23319597	SRR23319598	SRR23319599	SRR23319600	SRR23319601	SRR23319602	SRR23319603	SRR23319604	SRR23319605	SRR23319606	SRR23319607	SRR23319608	SRR23319609	SRR23319610	SRR23319611	SRR23319612	Row Sum	Percentage
input	106,370	156,987	154,840	106,203	122,603	143,802	126,559	153,777	118,548	118,542	146,378	103,861	127,152	150,158	129,381	117,241	114,657	118,593	113,687	104,732	143,983	113,831	140,787	136,741	160,191	89,651	153,773	136,993	115,609	103,413	128,943	144,207	129,423	124,562	139,370	125,271	148,846	125,567	146,169	131,225	125,537	135,901	134,102	141,747	149,406	144,921	105,099	108,714	114,672	114,407	79,870	97,296	103,630	108,866	112,317	98,800	92,339	92,007	97,388	105,202	88,831	113,064	104,752	68,669	87,537	108,513	96,465	97,067	86,603	96,227	104,338	99,617	115,427	100,616	127,338	108,164	92,835	93,790	106,701	123,612	106,168	148,190	151,822	109,753	106,785	124,217	110,916	10,312,864	100.00%
filtered	105,757	156,103	153,953	105,581	121,892	142,962	125,851	152,914	117,888	117,892	145,556	103,311	126,426	149,278	128,668	116,580	114,002	117,871	113,031	104,082	143,106	113,203	140,016	135,912	159,208	89,140	152,926	136,217	114,959	102,818	128,272	143,373	128,643	123,874	138,547	124,555	147,930	124,841	145,291	130,509	124,821	135,096	133,288	140,938	148,572	144,066	104,480	108,081	114,016	113,766	79,430	96,758	103,055	108,212	111,669	98,234	91,998	91,474	96,818	104,622	88,360	112,409	104,157	68,306	87,036	107,898	95,928	96,559	86,114	95,705	103,738	99,017	114,745	100,014	126,589	107,518	92,330	93,252	106,095	122,923	105,602	147,317	150,976	109,132	106,140	123,482	110,294	10,253,968	99.43%
denoisedF	104,490	154,062	152,229	104,473	120,752	140,942	124,492	151,234	116,001	116,400	142,731	101,043	124,084	146,448	126,701	114,600	112,597	116,711	111,770	103,118	141,832	112,309	138,442	134,522	156,782	88,540	149,836	134,210	113,634	101,897	126,951	140,980	126,581	122,093	136,457	122,872	146,082	123,719	143,626	129,037	122,986	132,444	130,431	139,409	146,255	141,681	102,637	106,916	111,968	111,723	78,598	95,785	101,811	106,943	110,023	97,206	90,633	90,755	95,432	102,710	86,999	110,551	102,897	67,458	85,228	107,000	94,378	95,156	85,036	94,185	102,711	96,715	113,140	98,935	124,469	106,425	91,379	91,379	104,683	121,236	103,318	145,614	148,108	107,458	104,541	122,225	109,058	10,111,908	98.05%
denoisedR	102,388	150,767	149,041	102,548	118,237	138,004	122,296	148,577	113,524	114,522	139,934	98,628	121,245	143,378	124,339	112,197	109,875	114,754	109,562	101,325	139,059	110,736	135,095	132,625	153,531	86,918	146,581	131,471	111,508	100,371	124,656	138,326	123,605	118,857	133,453	120,134	143,187	121,408	140,404	126,296	120,474	129,778	127,376	136,443	143,160	138,762	100,077	104,515	108,874	109,214	76,988	93,831	99,120	104,390	107,593	95,241	86,708	88,864	92,904	100,411	85,092	108,288	100,843	66,051	83,220	104,803	92,290	92,740	83,210	91,983	100,324	94,514	110,565	96,812	121,187	103,952	89,597	89,401	102,634	117,932	100,971	141,812	144,144	104,624	102,337	119,909	107,025	9,894,345	95.94%
merged	99,010	142,385	142,825	99,663	114,135	130,278	118,224	143,107	106,902	110,337	128,486	90,508	111,708	129,750	117,711	104,419	105,117	110,373	105,533	98,487	134,600	108,188	128,699	127,714	145,398	85,763	134,471	124,270	107,353	98,108	120,702	129,320	114,730	112,911	125,808	114,698	137,630	118,244	133,049	121,003	114,977	119,377	115,443	130,924	133,416	129,915	93,786	100,857	100,002	101,957	75,457	91,189	94,547	99,546	103,242	92,335	85,047	86,772	87,417	93,230	80,842	100,355	97,041	63,099	77,607	102,276	85,907	88,440	79,178	85,987	97,048	84,765	104,803	93,128	114,344	99,643	87,266	81,629	98,431	110,651	93,248	135,518	131,298	97,619	96,121	115,303	103,048	9,385,648	91.01%
nonchim	84,767	109,894	107,823	92,810	102,291	94,774	105,002	118,725	77,480	97,773	83,712	64,011	75,960	105,429	85,921	71,534	89,296	95,428	92,229	87,642	114,065	102,117	96,402	114,840	108,706	84,259	77,728	98,554	95,242	89,471	109,014	88,801	70,033	98,195	87,469	88,989	99,480	105,233	102,997	89,847	92,467	79,647	69,651	109,280	86,424	92,845	73,603	84,499	62,245	70,936	73,601	82,915	76,770	79,605	90,046	86,218	83,220	82,049	59,191	68,693	69,433	69,473	86,149	57,583	53,145	96,945	62,627	76,042	68,601	63,520	86,277	47,176	80,645	81,862	84,990	81,166	77,974	49,298	84,698	69,236	66,569	95,329	82,442	70,057	78,615	101,082	87,864	7,396,646	71.72%

This table can be downloaded as an Excel table below:

5. DADA2 Amplicon Sequence Variants (ASVs). A total of 5883 unique merged and chimera-free ASV sequences were identified, and their corresponding read counts for each sample are available in the "ASV Read Count Table" with rows for the ASV sequences and columns for sample. This read count table can be used for microbial profile comparison among different samples and the sequences provided in the table can be used to taxonomy assignment.

The table can be downloaded from this link:

Sample Meta Information

Download Sample Meta Information

#Sample ID	Run	BioSample	Experiment	geo_loc_name_country	geo_loc_name_country_continent	geo_loc_name	Library Name	Organism	SampleName	Group
SRR23319523	SRR23319523	SAMN33013297	SRX19262136	China	Asia	China: shijiazhuang	H10	human saliva metagenome	H10	HighCA
SRR23319524	SRR23319524	SAMN33013377	SRX19262135	China	Asia	China: shijiazhuang	L30	human saliva metagenome	L30	LowCA
SRR23319525	SRR23319525	SAMN33013376	SRX19262134	China	Asia	China: shijiazhuang	L29	human saliva metagenome	L29	LowCA
SRR23319526	SRR23319526	SAMN33013375	SRX19262133	China	Asia	China: shijiazhuang	L28	human saliva metagenome	L28	LowCA
SRR23319527	SRR23319527	SAMN33013374	SRX19262132	China	Asia	China: shijiazhuang	L27	human saliva metagenome	L27	LowCA
SRR23319528	SRR23319528	SAMN33013373	SRX19262131	China	Asia	China: shijiazhuang	L26	human saliva metagenome	L26	LowCA
SRR23319529	SRR23319529	SAMN33013372	SRX19262130	China	Asia	China: shijiazhuang	L25	human saliva metagenome	L25	LowCA
SRR23319530	SRR23319530	SAMN33013371	SRX19262129	China	Asia	China: shijiazhuang	L24	human saliva metagenome	L24	LowCA
SRR23319531	SRR23319531	SAMN33013370	SRX19262128	China	Asia	China: shijiazhuang	L23	human saliva metagenome	L23	LowCA
SRR23319532	SRR23319532	SAMN33013369	SRX19262127	China	Asia	China: shijiazhuang	L22	human saliva metagenome	L22	LowCA
SRR23319533	SRR23319533	SAMN33013368	SRX19262126	China	Asia	China: shijiazhuang	L21	human saliva metagenome	L21	LowCA
SRR23319534	SRR23319534	SAMN33013296	SRX19262125	China	Asia	China: shijiazhuang	H9	human saliva metagenome	H9	HighCA
SRR23319535	SRR23319535	SAMN33013367	SRX19262124	China	Asia	China: shijiazhuang	L20	human saliva metagenome	L20	LowCA
SRR23319536	SRR23319536	SAMN33013366	SRX19262123	China	Asia	China: shijiazhuang	L19	human saliva metagenome	L19	LowCA
SRR23319537	SRR23319537	SAMN33013365	SRX19262122	China	Asia	China: shijiazhuang	L18	human saliva metagenome	L18	LowCA
SRR23319538	SRR23319538	SAMN33013364	SRX19262121	China	Asia	China: shijiazhuang	L17	human saliva metagenome	L17	LowCA
SRR23319539	SRR23319539	SAMN33013363	SRX19262120	China	Asia	China: shijiazhuang	L16	human saliva metagenome	L16	LowCA
SRR23319540	SRR23319540	SAMN33013362	SRX19262119	China	Asia	China: shijiazhuang	L15	human saliva metagenome	L15	LowCA
SRR23319541	SRR23319541	SAMN33013361	SRX19262118	China	Asia	China: shijiazhuang	L14	human saliva metagenome	L14	LowCA
SRR23319542	SRR23319542	SAMN33013360	SRX19262117	China	Asia	China: shijiazhuang	L13	human saliva metagenome	L13	LowCA
SRR23319543	SRR23319543	SAMN33013359	SRX19262116	China	Asia	China: shijiazhuang	L12	human saliva metagenome	L12	LowCA
SRR23319544	SRR23319544	SAMN33013358	SRX19262115	China	Asia	China: shijiazhuang	L11	human saliva metagenome	L11	LowCA
SRR23319545	SRR23319545	SAMN33013295	SRX19262114	China	Asia	China: shijiazhuang	H8	human saliva metagenome	H8	HighCA
SRR23319546	SRR23319546	SAMN33013357	SRX19262113	China	Asia	China: shijiazhuang	L10	human saliva metagenome	L10	LowCA
SRR23319547	SRR23319547	SAMN33013356	SRX19262112	China	Asia	China: shijiazhuang	L9	human saliva metagenome	L9	LowCA
SRR23319548	SRR23319548	SAMN33013355	SRX19262111	China	Asia	China: shijiazhuang	L8	human saliva metagenome	L8	LowCA
SRR23319549	SRR23319549	SAMN33013354	SRX19262110	China	Asia	China: shijiazhuang	L7	human saliva metagenome	L7	LowCA
SRR23319550	SRR23319550	SAMN33013353	SRX19262109	China	Asia	China: shijiazhuang	L6	human saliva metagenome	L6	LowCA
SRR23319551	SRR23319551	SAMN33013352	SRX19262108	China	Asia	China: shijiazhuang	L5	human saliva metagenome	L5	LowCA
SRR23319552	SRR23319552	SAMN33013351	SRX19262107	China	Asia	China: shijiazhuang	L4	human saliva metagenome	L4	LowCA
SRR23319553	SRR23319553	SAMN33013350	SRX19262106	China	Asia	China: shijiazhuang	L3	human saliva metagenome	L3	LowCA
SRR23319554	SRR23319554	SAMN33013349	SRX19262105	China	Asia	China: shijiazhuang	L2	human saliva metagenome	L2	LowCA
SRR23319555	SRR23319555	SAMN33013348	SRX19262104	China	Asia	China: shijiazhuang	L1	human saliva metagenome	L1	LowCA
SRR23319556	SRR23319556	SAMN33013294	SRX19262103	China	Asia	China: shijiazhuang	H7	human saliva metagenome	H7	HighCA
SRR23319557	SRR23319557	SAMN33013347	SRX19262102	China	Asia	China: shijiazhuang	M30	human saliva metagenome	M30	MediumCA
SRR23319558	SRR23319558	SAMN33013346	SRX19262101	China	Asia	China: shijiazhuang	M29	human saliva metagenome	M29	MediumCA
SRR23319559	SRR23319559	SAMN33013345	SRX19262100	China	Asia	China: shijiazhuang	M28	human saliva metagenome	M28	MediumCA
SRR23319560	SRR23319560	SAMN33013344	SRX19262099	China	Asia	China: shijiazhuang	M27	human saliva metagenome	M27	MediumCA
SRR23319561	SRR23319561	SAMN33013343	SRX19262098	China	Asia	China: shijiazhuang	M26	human saliva metagenome	M26	MediumCA
SRR23319562	SRR23319562	SAMN33013342	SRX19262097	China	Asia	China: shijiazhuang	M25	human saliva metagenome	M25	MediumCA
SRR23319563	SRR23319563	SAMN33013341	SRX19262096	China	Asia	China: shijiazhuang	M24	human saliva metagenome	M24	MediumCA
SRR23319564	SRR23319564	SAMN33013340	SRX19262095	China	Asia	China: shijiazhuang	M23	human saliva metagenome	M23	MediumCA
SRR23319565	SRR23319565	SAMN33013339	SRX19262094	China	Asia	China: shijiazhuang	M22	human saliva metagenome	M22	MediumCA
SRR23319566	SRR23319566	SAMN33013338	SRX19262093	China	Asia	China: shijiazhuang	M21	human saliva metagenome	M21	MediumCA
SRR23319567	SRR23319567	SAMN33013293	SRX19262092	China	Asia	China: shijiazhuang	H6	human saliva metagenome	H6	HighCA
SRR23319568	SRR23319568	SAMN33013337	SRX19262091	China	Asia	China: shijiazhuang	M20	human saliva metagenome	M20	MediumCA
SRR23319569	SRR23319569	SAMN33013336	SRX19262090	China	Asia	China: shijiazhuang	M19	human saliva metagenome	M19	MediumCA
SRR23319570	SRR23319570	SAMN33013335	SRX19262089	China	Asia	China: shijiazhuang	M18	human saliva metagenome	M18	MediumCA
SRR23319571	SRR23319571	SAMN33013334	SRX19262088	China	Asia	China: shijiazhuang	M17	human saliva metagenome	M17	MediumCA
SRR23319572	SRR23319572	SAMN33013333	SRX19262087	China	Asia	China: shijiazhuang	M16	human saliva metagenome	M16	MediumCA
SRR23319573	SRR23319573	SAMN33013332	SRX19262086	China	Asia	China: shijiazhuang	M15	human saliva metagenome	M15	MediumCA
SRR23319574	SRR23319574	SAMN33013331	SRX19262085	China	Asia	China: shijiazhuang	M14	human saliva metagenome	M14	MediumCA
SRR23319575	SRR23319575	SAMN33013330	SRX19262084	China	Asia	China: shijiazhuang	M13	human saliva metagenome	M13	MediumCA
SRR23319576	SRR23319576	SAMN33013329	SRX19262083	China	Asia	China: shijiazhuang	M12	human saliva metagenome	M12	MediumCA
SRR23319577	SRR23319577	SAMN33013328	SRX19262082	China	Asia	China: shijiazhuang	M11	human saliva metagenome	M11	MediumCA
SRR23319578	SRR23319578	SAMN33013292	SRX19262081	China	Asia	China: shijiazhuang	H5	human saliva metagenome	H5	HighCA
SRR23319579	SRR23319579	SAMN33013327	SRX19262080	China	Asia	China: shijiazhuang	M10	human saliva metagenome	M10	MediumCA
SRR23319580	SRR23319580	SAMN33013326	SRX19262079	China	Asia	China: shijiazhuang	M9	human saliva metagenome	M9	MediumCA
SRR23319581	SRR23319581	SAMN33013325	SRX19262078	China	Asia	China: shijiazhuang	M8	human saliva metagenome	M8	MediumCA
SRR23319582	SRR23319582	SAMN33013324	SRX19262077	China	Asia	China: shijiazhuang	M7	human saliva metagenome	M7	MediumCA
SRR23319583	SRR23319583	SAMN33013323	SRX19262076	China	Asia	China: shijiazhuang	M6	human saliva metagenome	M6	MediumCA
SRR23319584	SRR23319584	SAMN33013322	SRX19262075	China	Asia	China: shijiazhuang	M5	human saliva metagenome	M5	MediumCA
SRR23319585	SRR23319585	SAMN33013321	SRX19262074	China	Asia	China: shijiazhuang	M4	human saliva metagenome	M4	MediumCA
SRR23319586	SRR23319586	SAMN33013320	SRX19262073	China	Asia	China: shijiazhuang	M3	human saliva metagenome	M3	MediumCA
SRR23319587	SRR23319587	SAMN33013319	SRX19262072	China	Asia	China: shijiazhuang	M2	human saliva metagenome	M2	MediumCA
SRR23319588	SRR23319588	SAMN33013318	SRX19262071	China	Asia	China: shijiazhuang	M1	human saliva metagenome	M1	MediumCA
SRR23319589	SRR23319589	SAMN33013291	SRX19262070	China	Asia	China: shijiazhuang	H4	human saliva metagenome	H4	HighCA
SRR23319590	SRR23319590	SAMN33013317	SRX19262069	China	Asia	China: shijiazhuang	H30	human saliva metagenome	H30	HighCA
SRR23319591	SRR23319591	SAMN33013316	SRX19262068	China	Asia	China: shijiazhuang	H29	human saliva metagenome	H29	HighCA
SRR23319592	SRR23319592	SAMN33013315	SRX19262067	China	Asia	China: shijiazhuang	H28	human saliva metagenome	H28	HighCA
SRR23319593	SRR23319593	SAMN33013314	SRX19262066	China	Asia	China: shijiazhuang	H27	human saliva metagenome	H27	HighCA
SRR23319594	SRR23319594	SAMN33013313	SRX19262065	China	Asia	China: shijiazhuang	H26	human saliva metagenome	H26	HighCA
SRR23319595	SRR23319595	SAMN33013312	SRX19262064	China	Asia	China: shijiazhuang	H25	human saliva metagenome	H25	HighCA
SRR23319596	SRR23319596	SAMN33013311	SRX19262063	China	Asia	China: shijiazhuang	H24	human saliva metagenome	H24	HighCA
SRR23319597	SRR23319597	SAMN33013310	SRX19262062	China	Asia	China: shijiazhuang	H23	human saliva metagenome	H23	HighCA
SRR23319598	SRR23319598	SAMN33013309	SRX19262061	China	Asia	China: shijiazhuang	H22	human saliva metagenome	H22	HighCA
SRR23319599	SRR23319599	SAMN33013308	SRX19262060	China	Asia	China: shijiazhuang	H21	human saliva metagenome	H21	HighCA
SRR23319600	SRR23319600	SAMN33013290	SRX19262059	China	Asia	China: shijiazhuang	H3	human saliva metagenome	H3	HighCA
SRR23319601	SRR23319601	SAMN33013307	SRX19262058	China	Asia	China: shijiazhuang	H20	human saliva metagenome	H20	HighCA
SRR23319602	SRR23319602	SAMN33013306	SRX19262057	China	Asia	China: shijiazhuang	H19	human saliva metagenome	H19	HighCA
SRR23319603	SRR23319603	SAMN33013305	SRX19262056	China	Asia	China: shijiazhuang	H18	human saliva metagenome	H18	HighCA
SRR23319604	SRR23319604	SAMN33013304	SRX19262055	China	Asia	China: shijiazhuang	H17	human saliva metagenome	H17	HighCA
SRR23319605	SRR23319605	SAMN33013303	SRX19262054	China	Asia	China: shijiazhuang	H16	human saliva metagenome	H16	HighCA
SRR23319606	SRR23319606	SAMN33013302	SRX19262053	China	Asia	China: shijiazhuang	H15	human saliva metagenome	H15	HighCA
SRR23319607	SRR23319607	SAMN33013301	SRX19262052	China	Asia	China: shijiazhuang	H14	human saliva metagenome	H14	HighCA
SRR23319608	SRR23319608	SAMN33013300	SRX19262051	China	Asia	China: shijiazhuang	H13	human saliva metagenome	H13	HighCA
SRR23319609	SRR23319609	SAMN33013299	SRX19262050	China	Asia	China: shijiazhuang	H12	human saliva metagenome	H12	HighCA
SRR23319610	SRR23319610	SAMN33013298	SRX19262049	China	Asia	China: shijiazhuang	H11	human saliva metagenome	H11	HighCA
SRR23319611	SRR23319611	SAMN33013289	SRX19262048	China	Asia	China: shijiazhuang	H2	human saliva metagenome	H2	HighCA
SRR23319612	SRR23319612	SAMN33013288	SRX19262047	China	Asia	China: shijiazhuang	H1	human saliva metagenome	H1	HighCA

ASV Read Counts by Samples

#Sample ID	Read Count
SRR23319597	47,176
SRR23319603	49,298
SRR23319588	53,145
SRR23319587	57,583
SRR23319582	59,191
SRR23319572	62,245
SRR23319590	62,627
SRR23319595	63,520
SRR23319534	64,011
SRR23319606	66,569
SRR23319593	68,601
SRR23319583	68,693
SRR23319605	69,236
SRR23319584	69,433
SRR23319585	69,473
SRR23319566	69,651
SRR23319556	70,033
SRR23319609	70,057
SRR23319573	70,936
SRR23319538	71,534
SRR23319574	73,601
SRR23319570	73,603
SRR23319535	75,960
SRR23319591	76,042
SRR23319576	76,770
SRR23319531	77,480
SRR23319549	77,728
SRR23319602	77,974
SRR23319610	78,615
SRR23319577	79,605
SRR23319565	79,647
SRR23319598	80,645
SRR23319601	81,166
SRR23319599	81,862
SRR23319581	82,049
SRR23319608	82,442
SRR23319575	82,915
SRR23319580	83,220
SRR23319533	83,712
SRR23319548	84,259
SRR23319571	84,499
SRR23319604	84,698
SRR23319523	84,767
SRR23319600	84,990
SRR23319537	85,921
SRR23319586	86,149
SRR23319579	86,218
SRR23319596	86,277
SRR23319568	86,424
SRR23319558	87,469
SRR23319542	87,642
SRR23319612	87,864
SRR23319555	88,801
SRR23319559	88,989
SRR23319539	89,296
SRR23319553	89,471
SRR23319563	89,847
SRR23319578	90,046
SRR23319541	92,229
SRR23319564	92,467
SRR23319526	92,810
SRR23319569	92,845
SRR23319528	94,774
SRR23319552	95,242
SRR23319607	95,329
SRR23319540	95,428
SRR23319545	96,402
SRR23319589	96,945
SRR23319532	97,773
SRR23319557	98,195
SRR23319550	98,554
SRR23319560	99,480
SRR23319611	101,082
SRR23319544	102,117
SRR23319527	102,291
SRR23319562	102,997
SRR23319529	105,002
SRR23319561	105,233
SRR23319536	105,429
SRR23319525	107,823
SRR23319547	108,706
SRR23319554	109,014
SRR23319567	109,280
SRR23319524	109,894
SRR23319543	114,065
SRR23319546	114,840
SRR23319530	118,725

VII. Analysis - Read Taxonomy Assignment

Read Taxonomy Assignment - Methods

The close-reference taxonomy assignment of the ASV sequences using BLASTN is based on the algorithm published by Al-Hebshi et. al. (2015)[2].

The species-level, open-reference 16S rRNA NGS reads taxonomy assignment pipeline

Version 20210310a

1. Raw sequences reads in FASTA format were BLASTN-searched against a combined set of 16S rRNA reference sequences - the FOMC 16S rRNA Reference Sequences version 20221029 (https://microbiome.forsyth.org/ftp/refseq/). This set consists of the HOMD (version 15.22 http://www.homd.org/index.php?name=seqDownload&file&type=R ), Mouse Oral Microbiome Database (MOMD version 5.1 https://momd.org/ftp/16S_rRNA_refseq/MOMD_16S_rRNA_RefSeq/V5.1/), and the NCBI 16S rRNA reference sequence set (https://ftp.ncbi.nlm.nih.gov/blast/db/16S_ribosomal_RNA.tar.gz). These sequences were screened and combined to remove short sequences (<1000nt), chimera, duplicated and sub-sequences, as well as sequences with poor taxonomy annotation (e.g., without species information). This process resulted in 1,015 full-length 16S rRNA sequences from HOMD V15.22, 356 from MOMD V5.1, and 22,126 from NCBI, a total of 23,497 sequences. Altogether these sequence represent a total of 17,035 oral and non-oral microbial species.

The NCBI BLASTN version 2.7.1+ (Zhang et al, 2000) [3] was used with the default parameters. Reads with ≥ 98% sequence identity to the matched reference and ≥ 90% alignment length (i.e., ≥ 90% of the read length that was aligned to the reference and was used to calculate the sequence percent identity) were classified based on the taxonomy of the reference sequence with highest sequence identity. If a read matched with reference sequences representing more than one species with equal percent identity and alignment length, it was subject to chimera checking with USEARCH program version v8.1.1861 (Edgar 2010). Non-chimeric reads with multi-species best hits were considered valid and were assigned with a unique species notation (e.g., spp) denoting unresolvable multiple species.

2. Unassigned reads (i.e., reads with < 98% identity or < 90% alignment length) were pooled together and reads < 200 bases were removed. The remaining reads were subject to the de novo operational taxonomy unit (OTU) calling and chimera checking using the USEARCH program version v8.1.1861 (Edgar 2010)[4]. The de novo OTU calling and chimera checking was done using 98% as the sequence identity cutoff, i.e., the species-level OTU. The output of this step produced species-level de novo clustered OTUs with 98% identity. Representative reads from each of the OTUs/species were then BLASTN-searched against the same reference sequence set again to determine the closest species for these potential novel species. These potential novel species were pooled together with the reads that were signed to specie-level in the previous step, for down-stream analyses.

Reference:

Al-Hebshi NN, Nasher AT, Idris AM, Chen T. Robust species taxonomy assignment algorithm for 16S rRNA NGS reads: application to oral carcinoma samples. J Oral Microbiol. 2015 Sep 29;7:28934. doi: 10.3402/jom.v7.28934. PMID: 26426306; PMCID: PMC4590409.
Zhang Z, Schwartz S, Wagner L, Miller W. A greedy algorithm for aligning DNA sequences. J Comput Biol. 2000 Feb-Apr;7(1-2):203-14. doi: 10.1089/10665270050081478. PMID: 10890397.
Edgar RC. Search and clustering orders of magnitude faster than BLAST. Bioinformatics. 2010 Oct 1;26(19):2460-1. doi: 10.1093/bioinformatics/btq461. Epub 2010 Aug 12. PubMed PMID: 20709691.

3. Designations used in the taxonomy:

	1) Taxonomy levels are indicated by these prefixes:
	
	   k__: domain/kingdom
	   p__: phylum
	   c__: class
	   o__: order
	   f__: family
	   g__: genus  
	   s__: species
	
	   Example: 
	
	   k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae;g__Blautia;s__faecis
		
	2) Unique level identified – known species:
	   
	   k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae;g__Roseburia;s__hominis
	
	   The above example shows some reads match to a single species (all levels are unique)
	
	3) Non-unique level identified – known species:

	   k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae;g__Roseburia;s__multispecies_spp123_3
	   
	   The above example “s__multispecies_spp123_3” indicates certain reads equally match to 3 species of the 
	   genus Roseburia; the “spp123” is a temporally assigned species ID.
	
	   k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae;g__multigenus;s__multispecies_spp234_5
	   
	   The above example indicates certain reads match equally to 5 different species, which belong to multiple genera.; 
	   the “spp234” is a temporally assigned species ID.
	
	4) Unique level identified – unknown species, potential novel species:
	   
	   k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae;g__Roseburia;s__ hominis_nov_97%
	   
	   The above example indicates that some reads have no match to any of the reference sequences with 
	   sequence identity ≥ 98% and percent coverage (alignment length)  ≥ 98% as well. However this groups 
	   of reads (actually the representative read from a de novo  OTU) has 96% percent identity to 
	   Roseburia hominis, thus this is a potential novel species, closest to Roseburia hominis. 
	   (But they are not the same species).
	
	5) Multiple level identified – unknown species, potential novel species:
	   k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae;g__Roseburia;s__ multispecies_sppn123_3_nov_96%
	
	   The above example indicates that some reads have no match to any of the reference sequences 
	   with sequence identity ≥ 98% and percent coverage (alignment length)  ≥ 98% as well. 
	   However this groups of reads (actually the representative read from a de novo  OTU) 
	   has 96% percent identity equally to 3 species in Roseburia. Thus this is no single 
	   closest species, instead this group of reads match equally to multiple species at 96%. 
	   Since they have passed chimera check so they represent a novel species. “sppn123” is a 
	   temporary ID for this potential novel species.

4. The taxonomy assignment algorithm is illustrated in this flow char below:

Read Taxonomy Assignment - Result Summary *

Code	Category	MPC=0% (>=1 read)	MPC=0.01%(>=578 reads)
A	Total reads	7,396,646	7,396,646
B	Total assigned reads	5,780,215	5,780,215
C	Assigned reads in species with read count < MPC	0	23,465
D	Assigned reads in samples with read count < 500	0	0
E	Total samples	87	87
F	Samples with reads >= 500	87	87
G	Samples with reads < 500	0	0
H	Total assigned reads used for analysis (B-C-D)	5,780,215	5,756,750
I	Reads assigned to single species	4,043,997	4,027,007
J	Reads assigned to multiple species	1,447,517	1,442,362
K	Reads assigned to novel species	288,701	287,381
L	Total number of species	545	78
M	Number of single species	208	43
N	Number of multi-species	63	7
O	Number of novel species	274	28
P	Total unassigned reads	1,616,431	1,616,431
Q	Chimeric reads	9,902	9,902
R	Reads without BLASTN hits	1,547,585	1,547,585
S	Others: short, low quality, singletons, etc.	58,944	58,944
	A=B+P=C+D+H+Q+R+S
	E=F+G
	B=C+D+H
	H=I+J+K
	L=M+N+O
	P=Q+R+S

* MPC = Minimal percent (of all assigned reads) read count per species, species with read count < MPC were removed.

* Samples with reads < 500 were removed from downstream analyses.

* The assignment result from MPC=0.1% was used in the downstream analyses.

Read Taxonomy Assignment - ASV Species-Level Read Counts Table

This table shows the read counts for each sample (columns) and each species identified based on the ASV sequences. The downstream analyses were based on this table.

SPID	Taxonomy	SRR23319523	SRR23319524	SRR23319525	SRR23319526	SRR23319527	SRR23319528	SRR23319529	SRR23319530	SRR23319531	SRR23319532	SRR23319533	SRR23319534	SRR23319535	SRR23319536	SRR23319537	SRR23319538	SRR23319539	SRR23319540	SRR23319541	SRR23319542	SRR23319543	SRR23319544	SRR23319545	SRR23319546	SRR23319547	SRR23319548	SRR23319549	SRR23319550	SRR23319552	SRR23319553	SRR23319554	SRR23319555	SRR23319556	SRR23319557	SRR23319558	SRR23319559	SRR23319560	SRR23319561	SRR23319562	SRR23319563	SRR23319564	SRR23319565	SRR23319566	SRR23319567	SRR23319568	SRR23319569	SRR23319570	SRR23319571	SRR23319572	SRR23319573	SRR23319574	SRR23319575	SRR23319576	SRR23319577	SRR23319578	SRR23319579	SRR23319580	SRR23319581	SRR23319582	SRR23319583	SRR23319584	SRR23319585	SRR23319586	SRR23319587	SRR23319588	SRR23319589	SRR23319590	SRR23319591	SRR23319593	SRR23319595	SRR23319596	SRR23319597	SRR23319598	SRR23319599	SRR23319600	SRR23319601	SRR23319602	SRR23319603	SRR23319604	SRR23319605	SRR23319606	SRR23319607	SRR23319608	SRR23319609	SRR23319610	SRR23319611	SRR23319612
SP102	Bacteria;Proteobacteria;Betaproteobacteria;Neisseriales;Neisseriaceae;Neisseria;flavescens	0	412	62	51	750	1139	217	2290	526	4590	0	19	0	113	70	77	329	0	71	56	15	63	159	156	0	0	30	19	32	0	0	34	161	337	0	2036	0	0	127	278	8161	87	328	90	17	88	608	0	0	433	36	252	0	324	0	626	0	364	82	79	729	306	978	532	359	17	51	43	0	67	8	0	42	672	89	0	110	49	437	0	246	507	0	1963	1027	19	2828
SP105	Bacteria;Proteobacteria;Betaproteobacteria;Neisseriales;Neisseriaceae;Neisseria;perflava	345	1829	8999	5947	5150	16582	3563	0	18393	7860	8318	2019	14408	839	6675	2314	3095	9724	6456	1379	3157	2128	13930	2968	1401	4336	339	8402	3768	835	13133	3785	804	3147	4369	1982	544	1156	9014	1253	9365	10102	12491	7159	7949	6446	1865	6401	215	10020	5893	1508	26270	139	15158	7573	18473	1590	7453	1119	1014	1134	2164	2016	53	3514	4008	11685	2672	4712	9298	6195	3616	6998	4267	922	433	60	8776	979	6808	8273	859	4677	6584	5895	207
SP125	Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Prevotellaceae;Alloprevotella;sp. HMT914	0	412	0	0	49	0	0	0	0	0	0	71	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	24	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	109	17	0	0	0	0	0	0	0	0	0	0	0	0	43	0	14	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	19
SP129	Bacteria;Fusobacteria;Fusobacteriia;Fusobacteriales;Leptotrichiaceae;Leptotrichia;sp. HMT218	0	1300	0	635	147	0	42	433	0	39	815	2367	0	208	0	0	29	738	206	82	0	1022	0	1116	2912	0	0	329	0	0	0	0	314	153	0	0	0	965	70	0	0	0	0	1545	0	3324	75	0	4611	181	853	0	0	1754	0	438	22	0	0	1151	0	0	2652	18	0	0	0	33	4234	728	91	0	0	195	0	330	0	0	105	0	1182	0	0	0	0	1895	158
SP138	Bacteria;Proteobacteria;Gammaproteobacteria;Pasteurellales;Pasteurellaceae;Haemophilus;parainfluenzae	723	1204	1511	616	2603	5014	6222	6927	5264	3160	1632	486	1239	791	5051	1521	2709	4167	2326	3027	1475	3562	8276	1224	3148	954	1836	2542	5409	1655	9877	926	422	409	859	5493	454	388	649	7166	9410	3786	2345	5098	1994	4880	4260	862	282	3242	1113	1726	3584	3889	1258	1391	3969	3567	2085	3172	657	3457	1095	758	3000	2466	1968	1346	674	1251	570	3648	1863	914	2094	1317	7118	318	653	618	3907	1201	1240	5938	1285	1547	1056
SP148	Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Porphyromonadaceae;Porphyromonas;sp. HMT278	0	0	29	0	0	58	0	0	0	0	0	0	0	63	0	0	0	0	96	0	0	71	0	0	0	0	0	32	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	159	0	0	0	0	0	0	0	0	0	0	26	0	0	0	0	48	0	0	0	0	0
SP164	Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Prevotellaceae;Alloprevotella;rava	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	43	0	0	0	0	0	21	15	150	0	92	0	0	31	0	0	0	0	0	9	0	0	0	0	0	0	0	0	0	0	39	0	0	87	0	0	0	0	88	0	84	0	0	0	0	0	0	0	0	0	0	0	397	176	0	0	0	4	0	0	0	0	71	16	0	0	0	30	15	0	65	0
SP169	Bacteria;Firmicutes;Bacilli;Lactobacillales;Carnobacteriaceae;Granulicatella;elegans	0	278	307	65	689	730	656	940	61	1421	58	18	218	551	494	77	265	978	348	360	475	2545	335	89	248	1133	70	1217	992	2547	97	407	117	94	31	221	390	12	808	918	964	38	191	973	98	136	396	0	230	358	358	454	74	412	154	48	43	1044	44	58	278	2356	834	113	770	886	87	263	209	83	336	9	47	492	114	59	160	13	180	209	40	182	92	142	708	507	517
SP206	Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Prevotellaceae;Prevotella;histicola	3345	307	83	240	0	87	132	133	394	568	0	120	77	1684	204	0	2305	937	1163	423	809	3756	247	1872	3314	393	263	1635	1001	702	380	1890	922	1177	318	127	673	1187	495	272	435	631	491	928	418	540	334	357	701	58	1393	1703	214	809	942	391	286	1096	2670	1677	2252	218	666	467	115	427	631	1176	765	681	0	533	254	606	1126	580	0	730	985	2507	164	536	1285	797	890	1097	795
SP208	Bacteria;Actinobacteria;Actinomycetia;Actinomycetales;Actinomycetaceae;Schaalia;sp. HMT172	0	54	4158	393	1511	1307	28	337	0	5830	2980	1610	0	290	0	110	1056	571	313	0	2024	37	0	4237	0	129	392	1731	94	0	684	4739	1148	1077	251	1170	0	2470	2671	133	0	1329	330	1477	3	2013	999	0	3343	827	338	576	543	693	533	1322	952	68	0	3594	141	75	3649	1831	0	1213	834	252	307	2781	708	27	68	288	2393	0	0	890	3112	0	950	0	736	43	30	0	30
SP219	Bacteria;Fusobacteria;Fusobacteriia;Fusobacteriales;Leptotrichiaceae;Leptotrichia;sp. HMT215	0	17	0	0	27	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	31	0	0	0	0	0	4	0	0	43	0	2	0	8	0	211	0	250	26	0	0	0	0	1	0	0	0	0	0	0	0	29	0	0	33	0	0	0	19	0	0	0	0	0	0	126	0	0	0	0	2	0	0	0	0	17	0	0	0	0	0	0	0	46	0
SP23	Bacteria;Actinobacteria;Actinomycetia;Micrococcales;Micrococcaceae;Rothia;mucilaginosa	815	2099	5583	6944	6154	4130	1861	19393	2509	3538	5017	1964	3020	1024	11790	3483	7408	3427	6552	6300	5361	4854	3677	4133	2958	5226	9560	9304	6317	8046	4186	5460	1497	9067	13167	4025	10436	15441	10888	5704	6776	5975	3174	11441	9810	6217	3257	3612	3481	630	3044	1106	10426	2842	4768	5859	7889	7544	6779	4250	6239	1479	10357	5792	2370	4759	3114	2087	2204	8640	5064	2919	4766	10860	5517	3195	5859	1691	3996	6436	3552	6043	12640	7319	3485	8288	2203
SP233	Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Prevotellaceae;Prevotella;sp. HMT306	2371	29	0	507	0	0	228	0	3	148	260	244	0	201	42	246	690	1031	0	0	2104	0	0	1901	836	256	0	205	790	48	0	188	192	127	0	0	560	0	1121	243	70	0	643	68	330	959	46	94	532	119	211	0	65	256	0	11	0	101	0	3	656	82	249	9	0	382	0	434	364	329	393	513	12	208	417	811	0	749	0	201	0	270	421	0	0	1176	550
SP235	Bacteria;Proteobacteria;Betaproteobacteria;Burkholderiales;Burkholderiaceae;Ralstonia;pickettii	65	36	417	72	992	84	193	241	33	230	45	0	27	6082	40	8	8	1250	62	168	216	763	173	547	81	342	0	80	177	735	190	62	0	25	21	14	45	69	1427	35	54	0	7	79	166	45	29	50	7	468	65	175	11	650	96	57	766	139	13	781	99	1080	123	140	28	471	26	78	12	272	199	0	12	165	53	5020	188	0	113	443	21	689	1143	211	129	151	477
SP243	Bacteria;Actinobacteria;Actinomycetia;Actinomycetales;Actinomycetaceae;Schaalia;sp. HMT180	0	630	1717	1890	2414	1089	198	734	490	322	83	732	3798	1	642	1140	1143	217	9	57	136	0	52	702	0	72	1380	308	1105	91	52	1337	1726	1611	188	28	0	39	63	41	425	1845	91	339	60	1173	294	7004	1536	477	313	7	169	300	753	918	36	469	666	1958	995	1233	573	1261	153	1248	26	1701	1088	428	10	247	271	194	333	1016	0	1661	241	107	2074	43	295	946	218	3306	563
SP25	Bacteria;Firmicutes;Negativicutes;Selenomonadales;Selenomonadaceae;Selenomonas;sputigena	79	0	0	132	0	0	0	0	0	9	86	0	0	0	0	11	0	7	0	0	55	0	7	617	0	0	0	0	0	0	0	0	0	41	0	0	0	0	0	0	6	0	0	0	0	0	11	0	8	0	0	0	0	33	0	0	0	0	0	0	22	0	33	0	0	28	0	0	0	18	11	0	0	0	0	0	0	61	0	0	0	0	0	0	0	11	0
SP255	Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Prevotellaceae;Prevotella;jejuni	72	938	482	324	0	0	23	30	206	0	669	1098	88	0	0	248	289	0	117	49	0	245	0	254	134	49	62	245	347	0	32	34	170	282	0	0	204	4542	41	59	37	0	0	144	128	197	83	105	439	315	187	98	0	0	162	311	0	106	0	0	1527	83	472	961	0	145	40	260	1071	187	211	90	672	283	291	1115	0	274	290	152	344	209	43	46	0	457	0
SP27	Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Prevotellaceae;Prevotella;melaninogenica	55	5077	1272	2866	2023	737	1647	1917	1336	141	1290	948	1520	240	1249	2438	1815	972	1956	1678	3843	4597	1237	3644	6239	903	2057	2273	1361	1328	1124	7840	551	3901	468	604	10316	2137	1957	749	1054	1548	5959	2542	945	2033	3408	6808	2630	1998	4637	5199	325	1071	3108	4443	1903	529	2322	1432	2652	1837	5501	2046	317	1766	1511	2527	862	1235	866	950	1714	2514	590	694	93	667	1142	1200	1033	1901	1570	1062	2371	3723	2647
SP270	Bacteria;Actinobacteria;Coriobacteriia;Coriobacteriales;Atopobiaceae;Lancefieldella;parvula	2391	789	563	1376	255	34	87	0	186	264	66	414	614	72	26	184	410	126	73	203	175	536	0	1063	784	0	95	431	632	141	0	950	4083	473	19	0	86	0	145	355	67	122	37	189	203	175	29	298	0	447	546	310	21	163	86	229	23	237	67	164	903	383	148	105	47	169	130	240	198	47	750	92	710	82	938	1712	11	922	239	0	74	220	88	93	618	261	581
SP28	Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Prevotellaceae;Prevotella;nanceiensis	43	1196	362	737	1259	320	537	1694	535	332	697	244	368	0	311	195	480	227	759	477	321	0	70	846	45	0	709	531	1400	244	2903	1303	92	410	3256	90	262	192	359	33	2502	537	102	1379	697	756	103	182	135	73	712	371	903	433	4205	119	2397	777	988	168	308	265	561	427	304	439	0	288	170	349	96	100	915	1367	229	65	952	14	647	0	1508	843	252	518	10	446	302
SP281	Bacteria;Proteobacteria;Betaproteobacteria;Burkholderiales;Burkholderiaceae;Lautropia;mirabilis	0	485	88	316	151	534	2216	330	77	0	219	90	109	85	0	0	78	0	397	1418	363	770	3278	1188	0	0	64	294	0	0	326	0	0	346	743	74	37	0	127	1225	314	39	51	711	74	171	236	131	35	23	879	0	0	641	0	0	167	237	150	81	83	0	0	139	633	111	0	41	31	422	55	46	33	273	38	0	40	46	487	142	0	2235	0	0	140	332	866
SP33	Bacteria;Fusobacteria;Fusobacteriia;Fusobacteriales;Leptotrichiaceae;Leptotrichia;sp. HMT212	0	0	25	0	0	38	108	106	49	0	0	0	0	203	0	0	0	87	0	0	68	0	0	2	0	50	0	0	8	0	0	0	0	0	0	21	0	0	0	0	0	18	0	0	0	0	15	0	0	21	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	13	0	0	0	0	0	0	9	0	0	0	0	0	0	0	5	0	0	0	0	0	0
SP338	Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Prevotellaceae;Prevotella;pallens	439	129	200	291	0	3	38	381	0	0	13	284	36	0	0	0	0	8	1516	0	499	219	0	2516	1676	0	239	381	0	0	1265	334	435	1017	21	95	0	486	0	37	0	667	809	43	538	497	198	379	0	519	0	161	164	578	0	890	352	67	298	0	298	5	0	0	0	0	0	8	196	140	0	410	163	271	1186	234	0	262	183	0	542	119	179	257	0	529	0
SP36	Bacteria;Proteobacteria;Betaproteobacteria;Neisseriales;Neisseriaceae;Neisseria;subflava	0	0	621	0	651	0	2524	6261	129	0	0	71	0	33	698	0	0	0	0	0	0	0	0	0	8431	0	0	0	0	5	0	0	0	205	0	0	0	0	97	0	0	0	0	113	177	0	0	0	0	0	0	213	29	0	0	0	0	11	0	0	27	2378	0	0	0	2395	0	37	0	0	0	18	0	0	36	0	8409	0	0	0	0	112	0	0	64	0	2975
SP369	Bacteria;Firmicutes;Clostridia;Eubacteriales;Lachnospiraceae;Lachnospiraceae_[G-2];bacterium HMT096	8571	90	11	167	77	0	57	25	0	49	144	499	53	60	16	1928	377	351	145	0	1477	91	74	841	384	269	627	703	81	0	10	74	90	487	102	78	577	0	239	0	11	7	601	106	86	159	19	0	179	82	91	0	88	189	140	196	0	8	220	41	503	11	283	184	13	58	87	197	591	352	225	73	0	125	304	678	0	516	58	587	55	0	42	0	350	450	224
SP370	Bacteria;Saccharibacteria_(TM7);Saccharibacteria_(TM7)_[C-1];Saccharibacteria_(TM7)_[O-1];Saccharibacteria_(TM7)_[F-1];Saccharibacteria_(TM7)_[G-1];bacterium HMT352	0	942	2839	3086	2754	517	1016	1979	35	884	2640	1100	280	54	79	553	2185	58	523	0	923	474	0	6158	2045	772	583	3225	876	20	571	2495	613	4060	0	39	0	3734	1620	0	289	1505	666	1216	0	1143	173	1427	144	3465	793	675	43	1686	2100	456	1819	36	203	786	304	338	1945	1717	31	957	49	396	480	434	241	292	444	637	722	146	108	398	1366	54	268	0	400	327	635	2290	2042
SP371	Bacteria;Proteobacteria;Epsilonproteobacteria;Campylobacterales;Campylobacteraceae;Campylobacter;concisus	450	634	1163	223	141	117	220	51	115	584	464	274	393	118	96	521	796	180	401	213	740	264	602	184	346	105	145	415	254	201	625	184	360	42	235	1121	196	511	342	176	184	300	278	210	261	390	238	853	181	105	376	270	260	223	346	266	134	203	88	115	494	76	178	119	69	223	268	518	517	60	416	571	748	434	410	503	50	335	344	90	232	262	360	82	1036	342	281
SP382	Bacteria;Fusobacteria;Fusobacteriia;Fusobacteriales;Leptotrichiaceae;Pseudoleptotrichia;sp. HMT221	496	201	0	203	88	0	185	52	0	81	1483	94	4000	154	267	0	43	80	233	0	166	80	1144	469	387	46	0	139	173	21	45	0	80	327	0	83	554	0	801	0	450	389	56	95	0	195	151	71	258	80	149	234	217	0	474	198	0	426	0	0	187	0	46	603	154	206	74	550	925	95	237	136	307	104	637	652	79	410	197	63	134	0	591	72	305	296	866
SP393	Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;sp. HMT074	361	577	143	813	1602	992	512	188	1099	466	461	128	387	0	217	340	152	487	477	254	440	957	1688	301	384	525	683	597	325	1143	324	871	465	188	177	614	515	205	584	755	317	1084	947	652	275	142	0	1248	677	838	472	0	1364	167	974	236	1000	573	111	717	490	324	473	1177	0	624	52	639	192	417	536	458	1020	301	216	501	511	222	451	300	765	256	934	807	385	210	748
SP43	Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;sanguinis	0	607	423	705	249	474	1349	868	103	633	99	84	23	2322	207	186	85	1072	157	2144	1302	1084	847	1524	809	672	139	251	267	813	2037	236	36	96	118	73	180	836	177	1379	262	30	27	218	189	198	171	1497	0	183	366	879	195	2430	61	283	100	333	81	254	156	26	188	160	456	395	170	81	0	786	316	0	128	516	111	537	148	0	433	132	66	969	452	126	194	187	287
SP44	Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Prevotellaceae;Prevotella;salivae	1224	646	155	358	35	40	61	135	213	64	227	690	255	134	60	600	79	155	436	76	521	655	27	1964	1344	188	320	229	150	88	53	374	434	522	340	55	982	188	268	87	243	92	236	173	330	235	168	448	154	153	165	266	86	120	319	383	15	62	1657	75	365	81	212	536	0	150	186	256	686	164	365	107	180	203	302	447	0	532	155	351	153	156	286	167	222	693	183
SP45	Bacteria;Firmicutes;Negativicutes;Veillonellales;Veillonellaceae;Veillonella;dispar	8032	8355	7330	7164	2747	1086	2645	3401	11319	4969	10387	6993	13110	3912	9402	6848	7845	6238	12655	7046	11114	5463	14207	5030	13188	9469	11588	7505	6499	6796	7216	6648	7598	8512	14595	8336	11034	11161	5789	6306	5681	8563	5308	5697	7694	14599	7163	18092	7317	6673	7317	15139	1451	6183	11473	8499	8450	5267	6980	3830	6102	2242	3862	3557	4111	4122	7701	7278	6382	7583	7802	5979	12186	6751	12396	7625	32	3420	9080	9540	5180	10425	7771	9382	3825	7158	6443
SP46	Bacteria;Saccharibacteria_(TM7);Saccharibacteria_(TM7)_[C-1];Saccharibacteria_(TM7)_[O-1];Saccharibacteria_(TM7)_[F-1];Saccharibacteria_(TM7)_[G-6];bacterium HMT870	0	881	785	418	132	23	114	97	0	470	465	0	465	51	0	1142	58	818	156	0	2544	0	0	345	0	435	0	1081	52	0	643	235	68	22	0	0	0	1667	107	0	45	73	0	158	0	0	38	0	0	78	581	0	57	54	86	206	57	11	648	0	691	0	460	0	0	124	0	0	11	622	0	286	0	405	433	0	318	377	1412	62	0	0	819	124	0	2386	21
SP47	Bacteria;Firmicutes;Bacilli;Bacillales;Gemellaceae;Gemella;haemolysans	263	1015	4363	1414	3155	2284	2380	3605	576	6204	138	93	283	2723	1631	428	923	2786	1084	2679	1275	3714	1674	1097	442	3335	335	1890	1149	3602	583	1124	264	741	1870	2215	2011	233	7280	2298	2143	151	484	1792	966	459	2231	239	221	1254	1288	2562	674	1534	314	946	366	936	488	1159	847	4014	3264	322	1161	5087	810	880	761	614	2781	119	316	1481	811	2344	1430	111	2707	685	377	2984	608	669	574	963	1795
SP50	Bacteria;Firmicutes;Negativicutes;Veillonellales;Veillonellaceae;Veillonella;atypica	11024	4703	2695	2527	535	89	737	348	1491	717	3734	5419	6034	2426	1978	14763	6840	3242	5705	858	9966	3355	2856	5504	5795	1401	4195	2983	3007	1123	3286	1300	10406	2761	3746	4082	3883	8772	2129	2391	1284	3694	2907	2064	1555	6765	2533	5544	5166	3885	2568	1765	412	3825	1823	2282	1558	1311	3275	1479	2851	1096	2316	1049	354	2976	1559	4359	5649	1276	4881	5246	2515	1933	6535	4678	0	7064	1781	1953	2736	1591	1239	1577	2848	2858	2286
SP51	Bacteria;Firmicutes;Bacilli;Bacillales;Gemellaceae;Gemella;sanguinis	221	802	1152	2817	2352	1112	1533	1097	362	1009	394	83	364	274	616	258	1630	822	1500	1100	410	590	994	1136	182	356	893	739	398	855	1513	798	288	1853	79	3032	363	0	400	2578	319	665	345	998	1574	492	164	0	187	281	223	1311	1358	516	1173	320	827	736	367	436	101	562	838	897	405	437	344	1395	166	1037	373	250	2366	640	1995	519	2322	161	1406	1313	955	1697	1623	418	254	703	992
SP53	Bacteria;Fusobacteria;Fusobacteriia;Fusobacteriales;Leptotrichiaceae;Leptotrichia;sp. HMT417	7110	14960	137	320	113	220	163	72	0	58	9582	5908	49	0	430	2641	1346	455	222	66	4558	256	88	1222	789	122	4193	367	1079	50	1417	753	4116	499	0	936	0	100	330	0	235	571	1923	195	50	2287	469	233	2475	87	238	7	111	1065	616	1504	612	361	0	44	1032	90	415	689	0	782	425	1367	2937	125	3487	292	788	108	5176	494	0	7244	543	49	445	0	95	53	2679	256	1118
SP74	Bacteria;Firmicutes;Bacilli;Lactobacillales;Carnobacteriaceae;Granulicatella;adiacens	539	1740	776	897	5815	2455	1350	1279	1035	1699	846	832	780	823	1581	569	1982	1367	904	3434	852	1674	567	1934	931	827	803	3486	1172	2525	2374	1776	701	1042	3150	2698	1579	1275	1308	1863	1386	680	768	3135	3071	1223	337	881	344	781	799	1617	1058	650	2536	1266	1779	4303	1606	866	1021	495	1686	1205	575	977	1739	930	512	1471	2166	1976	2110	2329	1173	900	2676	259	1397	1226	867	1336	3104	1832	916	1903	1994
SP84	Bacteria;Actinobacteria;Actinomycetia;Actinomycetales;Actinomycetaceae;Actinomyces;graevenitzii	0	533	0	948	0	0	0	49	0	0	0	0	0	0	0	0	0	0	25	347	0	467	0	1355	0	0	0	1322	0	0	1097	0	0	0	77	0	0	0	1721	0	571	3060	0	0	49	982	0	0	4849	0	1253	554	38	0	0	1013	0	352	0	0	0	0	252	0	0	0	4098	1192	0	0	626	0	3798	1106	0	0	0	0	1447	0	60	0	257	134	0	0	0
SP86	Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;parasanguinis_clade_411	1658	49	1326	1860	361	0	313	284	169	595	1160	672	509	691	743	733	1468	880	2761	1588	1744	1577	1433	2018	1190	964	527	1441	488	3446	848	230	596	296	791	3701	4315	5787	959	3560	327	513	434	926	1372	1506	367	1645	379	258	1303	2105	207	1279	1197	2231	1158	4366	749	442	982	76	1358	206	327	759	1073	449	532	1249	1486	439	776	714	2547	668	43	487	2029	2444	720	1183	2059	1610	232	3001	350
SP87	Bacteria;Fusobacteria;Fusobacteriia;Fusobacteriales;Fusobacteriaceae;Fusobacterium;periodonticum	109	2854	1337	1499	1282	1219	2442	1339	546	906	1479	461	1374	67	1412	353	1231	765	1092	178	496	35	517	764	174	300	1489	808	1103	57	3021	1779	202	2991	310	978	613	1482	517	139	796	5943	1369	3853	1523	410	926	293	594	355	1837	0	2906	624	4905	665	628	341	352	49	656	194	745	604	342	367	68	1702	1551	807	226	507	1638	639	714	172	635	293	525	88	1729	717	302	388	1006	545	1399
SP88	Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Porphyromonadaceae;Porphyromonas;pasteri	16	566	1904	1029	2926	583	3681	9734	961	884	1056	122	536	826	1373	412	526	1061	1322	756	388	733	1077	1830	185	485	931	1781	1938	432	2301	3785	111	3950	606	247	110	80	591	333	1511	1648	728	2038	2824	1063	792	1506	57	599	404	1459	4327	1243	2982	1405	2308	1965	946	350	542	673	1072	1774	417	1861	266	578	243	790	321	134	721	1357	245	0	2434	33	181	207	2264	2066	358	2209	516	730	659
SP97	Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Prevotellaceae;Alloprevotella;sp. HMT308	939	21	187	570	212	0	0	27	633	67	838	760	666	23	419	880	2197	102	594	148	143	407	147	439	114	97	1037	323	335	7	55	0	891	1498	118	417	825	161	52	36	239	289	673	490	146	221	475	407	1176	161	1723	176	403	430	0	1168	322	123	1111	223	1218	8	217	255	21	352	158	391	1133	117	724	209	1814	298	1255	384	0	838	191	158	91	414	82	503	123	1458	351
SPN100	Bacteria;Tenericutes;Mollicutes;Mollicutes_[O-2];Mollicutes_[F-2];Mollicutes_[G-2];bacterium_MOT-188_nov_86.079%	0	61	0	89	158	0	0	149	0	11	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	181	3	0	0	0	105	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	7	0	23	0	0	0	0	0	0	0	0	0	0	0	0	0	0	4	0	0	0	0	0	0	0	0
SPN112	Bacteria;Firmicutes;Negativicutes;Veillonellales;Veillonellaceae;Megasphaera;micronuciformis_nov_100.000%	196	426	120	413	40	0	94	194	456	121	588	494	320	328	146	1307	510	223	860	369	153	1104	354	2163	2017	158	289	646	787	189	366	548	2054	147	53	77	384	2633	183	91	24	400	587	689	598	1128	413	1387	1011	660	717	42	18	648	230	499	147	149	268	318	1259	108	437	315	29	536	566	1165	1182	209	586	233	475	360	225	799	0	534	65	144	185	236	369	77	679	900	417
SPN124	Bacteria;Proteobacteria;Gammaproteobacteria;Pasteurellales;Pasteurellaceae;Haemophilus;sp. HMT908 nov_100.000%	0	0	0	0	0	0	494	0	0	0	0	0	0	0	0	0	0	0	0	0	0	17	57	0	0	0	0	0	14	0	0	0	0	0	0	11	0	0	0	0	12	0	0	0	0	0	0	0	0	28	0	0	0	0	0	0	0	0	0	0	0	15	0	0	0	34	0	0	0	0	0	0	0	0	42	14	0	0	23	0	0	0	0	0	18	0	0
SPN133	Bacteria;Firmicutes;Negativicutes;Veillonellales;Veillonellaceae;Veillonella;rogosae_nov_99.768%	0	0	0	0	172	0	0	0	0	0	0	51	0	0	0	0	124	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	37	0	0	0	0	0	0	0	0	0	0	0	0	105	0	0	107	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	176
SPN144	Bacteria;Firmicutes;Negativicutes;Veillonellales;Veillonellaceae;Veillonella;sp. HMT917 nov_99.536%	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	94	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	28	0	0	0	0	0	0	0	0	548	0	88	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
SPN151	Bacteria;Fusobacteria;Fusobacteriia;Fusobacteriales;Leptotrichiaceae;Leptotrichia;shahii_nov_99.022%	0	0	0	0	0	4	0	0	0	0	14	60	0	0	0	0	12	0	15	16	200	0	0	0	0	27	25	0	0	17	25	0	30	0	0	57	0	0	0	0	0	0	0	0	0	15	32	0	0	0	0	0	0	33	0	28	29	32	0	0	0	0	89	0	19	0	0	0	0	0	0	0	71	0	27	0	0	0	45	47	0	0	47	0	92	0	91
SPN155	Bacteria;Fusobacteria;Fusobacteriia;Fusobacteriales;Leptotrichiaceae;Leptotrichia;sp. HMT392 nov_99.022%	0	30	13	0	41	0	6	23	0	0	0	0	0	14	12	14	15	0	0	0	0	30	0	11	0	0	2	1	2	0	0	18	10	8	0	2	0	0	0	0	0	9	0	0	73	0	10	0	3	0	0	0	0	0	0	12	0	11	0	0	0	5	19	0	0	30	0	0	8	0	0	0	0	0	6	0	10	0	0	0	9	112	1	0	60	16	41
SPN166	Bacteria;Actinobacteria;Actinomycetia;Micrococcales;Micrococcaceae;Rothia;mucilaginosa_nov_99.515%	1703	2290	0	0	6111	28	1832	0	3314	0	0	1408	0	761	35	0	363	0	691	565	0	0	0	0	1715	0	0	0	4125	3865	0	0	0	0	0	0	638	0	0	1569	0	916	0	494	2506	0	0	0	0	1937	0	1305	0	0	0	1953	0	149	0	0	238	0	32	0	0	0	2006	0	0	0	0	61	1443	0	0	21	0	0	0	0	5282	0	0	0	0	0	0
SPN173	Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Tannerellaceae;Tannerella;sp. HMT286 nov_100.000%	10	47	12	0	21	26	11	19	0	50	9	11	6	0	0	0	7	0	6	0	35	15	12	10	3	0	45	0	12	0	16	27	10	39	0	10	0	0	30	19	0	16	17	15	12	6	10	0	64	0	5	3	0	9	6	24	0	11	0	0	0	5	0	6	8	31	4	0	144	19	0	0	11	7	0	14	0	0	19	14	14	11	25	28	59	8	52
SPN176	Bacteria;Proteobacteria;Epsilonproteobacteria;Campylobacterales;Campylobacteraceae;Campylobacter;concisus_nov_99.754%	0	0	0	0	0	14	0	0	0	0	0	0	58	0	191	0	0	0	42	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	4	0	0	0	15	0	65	0	0	0	0	0	0	0	0	181	0	0	0	0	122	0	0	0	0	0	0	0	0	0	0	0	0	0	275	114	0	0	0	0	0	0	0	0	126	0	0	0	35	0	0	0
SPN18	Bacteria;Fusobacteria;Fusobacteriia;Fusobacteriales;Leptotrichiaceae;Pseudoleptotrichia;goodfellowii_nov_94.621%	0	0	0	0	157	0	0	0	0	0	0	0	0	0	0	0	0	852	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
SPN183	Bacteria;Proteobacteria;Gammaproteobacteria;Pasteurellales;Pasteurellaceae;Aggregatibacter;sp. HMT513 nov_100.000%	0	7	30	0	0	56	42	6	0	0	5	0	6	25	0	0	6	0	0	0	47	0	0	15	0	5	2	0	0	13	0	9	0	5	0	90	0	0	0	0	0	11	0	304	5	13	0	0	15	15	23	0	0	35	0	0	0	0	0	30	7	0	6	4	0	34	7	15	0	0	0	4	0	23	0	0	0	0	0	0	7	0	14	0	232	0	0
SPN194	Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Porphyromonadaceae;Porphyromonas;endodontalis_nov_99.532%	0	12	0	0	0	6	25	0	0	5	0	0	0	0	7	0	0	0	16	46	0	4	0	0	0	0	0	0	0	0	0	0	0	29	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	202	14	0	0	0	0	0	0	0	0	0	0	0	0	0	0	4	20	0	0	0	0	0	0	0	15	0	0	14	750	0	0	0	0
SPN216	Bacteria;Firmicutes;Clostridia;Eubacteriales;Ruminococcaceae;Ruminococcaceae_[G-1];bacterium HMT075 nov_94.568%	0	0	0	0	11	0	0	233	0	0	0	0	0	0	0	0	0	104	93	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	109	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	9	0	0	0	0	500	0	16	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	17
SPN27	Bacteria;Spirochaetes;Spirochaetia;Spirochaetales;Treponemataceae;Treponema;vincentii_nov_99.072%	0	57	6	26	13	8	37	0	1	4	6	6	3	0	3	0	0	0	6	0	60	23	0	4	0	6	30	3	23	0	41	0	0	30	0	6	0	0	0	0	3	94	6	0	0	4	0	16	0	0	0	0	6	0	127	14	0	27	0	18	3	8	0	9	0	5	0	0	10	0	0	28	7	0	0	0	0	0	0	15	23	58	85	21	0	0	12
SPN36	Bacteria;Bacteroidetes;Flavobacteriia;Flavobacteriales;Flavobacteriaceae;Capnocytophaga;sp. HMT863 nov_99.532%	0	7	6	0	51	14	0	0	0	23	4	0	9	0	18	0	4	0	0	14	8	7	7	16	0	14	0	0	16	17	18	0	15	0	0	11	0	0	0	0	0	0	3	6	33	0	20	0	8	0	4	0	0	39	0	21	35	0	0	53	3	0	6	22	0	37	6	0	6	19	36	0	0	12	0	15	69	0	0	4	7	57	23	0	72	20	41
SPN47	Bacteria;Firmicutes;Bacilli;Lactobacillales;Lactobacillaceae;Lactobacillus;acetotolerans_nov_100.000%	0	31	28	22	13	32	46	19	14	23	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	39	40	58	0	43	51	8	0	16	0	0	0	32	23	50	27	16	26	37	10	16	19	0	0	0	36	5	0	0	0	0	0	0	0	62	0	0	0	0	0	0	0	0	0	0	57	49
SPN57	Bacteria;Bacteroidetes;Flavobacteriia;Flavobacteriales;Flavobacteriaceae;Capnocytophaga;granulosa_nov_92.037%	0	13	44	0	14	0	0	37	0	0	0	0	0	0	33	0	0	20	11	0	0	0	0	242	3	0	0	110	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	26	0	0	0	0	45	0	0	7	21	0	96	0	0	0	0	21	0	0	0	0	88	0	0	0	0	0	0	0	24	0	0	6	0	0	0	28	5	0	0	10	0	21
SPN67	Bacteria;Actinobacteria;Actinomycetia;Bifidobacteriales;Bifidobacteriaceae;Scardovia;wiggsiae_nov_100.000%	13	0	0	0	0	0	0	0	2	0	0	0	0	0	0	0	0	7	0	3	0	65	0	0	21	0	4	0	0	0	0	0	0	0	0	8	27	0	0	9	0	0	0	13	2	80	0	2	0	0	0	5	0	8	0	58	0	0	0	0	0	0	3	5	0	7	17	10	0	21	8	7	17	10	8	55	4	0	21	300	0	0	17	0	0	19	28
SPN78	Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Prevotellaceae;Prevotella;sp. HMT396 nov_98.357%	0	34	0	0	0	10	22	45	32	0	0	12	9	0	66	0	0	0	0	0	0	0	0	27	0	0	55	15	11	0	11	18	0	61	14	0	0	0	0	0	12	33	0	6	9	0	0	0	6	0	0	8	7	0	46	12	6	6	0	30	9	18	0	20	0	0	0	15	18	0	0	22	12	22	0	0	17	0	0	0	26	11	0	39	0	5	14
SPN88	Bacteria;Cyanobacteria;Oscillatoriophycideae;Oscillatoriales;Microcoleaceae;Arthrospira;platensis_nov_86.199%	0	0	0	22	17	0	0	20	0	157	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	3	0	0	92	51	479	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	21	8	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
SPN9	Bacteria;Firmicutes;Negativicutes;Veillonellales;Veillonellaceae;Dialister;invisus_nov_100.000%	9	9	17	0	4	14	14	3	0	32	12	0	0	0	0	0	0	6	4	50	0	23	0	14	16	0	0	5	0	6	0	7	7	3	0	27	0	0	17	0	0	14	0	25	0	104	15	0	10	0	16	5	9	12	5	6	24	5	0	21	0	0	5	0	0	90	11	7	0	3	13	0	14	4	0	0	0	0	60	4	8	0	73	23	18	44	83
SPN91	Bacteria;Actinobacteria;Actinomycetia;Actinomycetales;Actinomycetaceae;Schaalia;sp. HMT172 nov_99.764%	121	4359	733	7348	894	436	773	608	5341	959	689	1209	1409	247	659	652	1387	741	849	409	454	1055	1217	1558	4723	365	6727	2228	749	349	3003	1550	236	4323	4460	1067	669	1526	1595	1052	601	3988	2771	729	1340	432	108	1610	471	577	638	1171	868	315	2054	1675	1254	375	1143	1298	1414	701	384	686	150	1296	246	525	3032	535	1988	1480	163	2128	205	656	496	843	134	808	722	492	513	1441	3229	2735	3168
SPP12	Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;multispecies_spp12_18	0	56	0	0	0	87	374	254	240	1187	80	106	78	0	0	0	0	1614	49	161	0	503	1692	0	73	0	94	0	1681	0	479	659	56	183	8154	0	0	0	9892	0	0	109	0	0	0	0	0	0	0	0	0	285	58	155	0	15	383	0	103	332	59	1455	0	154	1117	48	261	0	0	0	0	226	0	0	30	0	532	0	1104	0	280	468	0	0	0	42	1036
SPP20	Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;multispecies_spp20_2	1740	5896	18460	4976	4313	828	3026	1849	3898	6102	1945	951	1662	2134	8581	5037	8200	6239	10285	3716	16947	4814	6500	6771	5118	4986	2745	4889	2454	13525	6148	1113	2544	2356	1262	10109	16678	15326	3216	13286	820	755	1844	1260	3071	3092	2870	3528	553	760	4279	5853	1234	6062	3513	8474	2563	7485	4095	502	7131	642	1438	5540	737	3543	6195	6806	2493	6497	11297	1243	8135	7609	7602	15011	7320	2891	8872	11724	1628	12868	2419	1866	3741	4954	7673
SPP46	Bacteria;Proteobacteria;Betaproteobacteria;Neisseriales;Neisseriaceae;Neisseria;multispecies_spp46_4	138	997	661	251	122	6790	16298	2019	290	2925	112	135	155	10581	255	768	1237	716	334	3680	1999	1327	4202	1181	2067	3491	45	1043	2510	1590	2887	0	30	34	222	1046	36	42	217	3223	350	6	467	4351	56	481	2879	503	0	4608	2231	648	101	0	41	0	816	415	28	318	66	64	528	0	8621	2990	1978	665	150	316	734	74	1009	1678	1350	239	885	0	1985	156	269	3181	304	941	2152	1237	663
SPP48	Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;multispecies_spp48_3	1601	297	529	587	145	0	80	241	89	170	158	89	88	97	212	778	325	848	841	166	169	218	341	1187	332	0	71	264	389	1137	1082	63	509	354	89	1349	526	1010	495	421	96	130	110	330	204	309	46	253	141	221	185	428	183	304	174	319	238	106	101	259	360	184	673	1145	64	116	136	230	458	344	243	236	972	470	178	750	297	86	204	869	164	614	267	206	588	875	458
SPP52	Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;multispecies_spp52_2	221	222	230	179	171	307	206	458	88	216	229	150	100	473	278	240	283	931	681	65	196	138	147	301	374	1306	180	203	339	544	275	418	207	203	112	414	373	43	192	545	294	228	238	434	749	400	288	89	166	213	210	391	148	641	104	245	581	120	220	431	289	541	388	267	269	614	198	114	108	353	210	204	226	137	171	131	176	115	233	441	98	533	745	526	184	391	272
SPP60	Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;multispecies_spp60_17	10436	6556	9503	9072	13073	18169	10108	13089	3206	15411	2537	3508	2537	18791	8477	2571	4095	16436	6778	13947	10147	17881	7090	5981	8197	14840	2632	8706	10502	16295	6628	12151	2789	6666	8460	7872	7443	3183	11608	11671	15340	3260	4189	17295	12352	4798	10197	4823	4229	7175	6123	11733	4410	9092	4287	3954	6114	14483	3856	11068	4541	19221	14445	4367	2440	19533	2790	4824	4373	3816	10768	2489	4264	7589	2309	6391	5456	1693	7519	8006	2186	13727	7503	6524	6967	5990	10675
SPP8	Bacteria;Firmicutes;Negativicutes;Veillonellales;Veillonellaceae;Veillonella;multispecies_spp8_3	7	1021	1965	771	727	1319	3448	2844	99	1829	226	44	917	39	472	0	447	2	33	20	15	10	87	368	1276	76	643	303	392	13	4545	902	176	661	135	50	0	0	379	2	360	63	268	451	1	123	73	69	0	17	1	0	1525	72	495	41	1	2405	0	524	76	450	60	720	72	311	41	504	253	4	1	196	0	6	4	355	2975	0	49	3	266	1072	206	372	267	69	712
SPPN1	Bacteria;Fusobacteria;Fusobacteriia;Fusobacteriales;Fusobacteriaceae;Fusobacterium;multispecies_sppn1_2_nov_97.555%	0	0	0	0	0	0	0	0	0	0	0	0	0	12	0	0	0	214	13	0	0	0	0	0	0	130	0	15	18	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	16	0	0	9	0	0	0	0	0	0	0	0	61	0	22	0	0	0	0	0	0	0	0	0	0	179	0	0	0	0	0	311	27	10
SPPN14	Bacteria;Proteobacteria;Gammaproteobacteria;Pasteurellales;Pasteurellaceae;Haemophilus;multispecies_sppn14_2_nov_99.304%	0	12	75	5	0	0	0	21	0	178	0	0	0	21	0	0	0	0	0	0	34	60	0	0	0	6	0	0	0	72	0	0	0	0	0	24	0	0	0	285	81	0	0	14	0	0	0	0	0	0	6	0	4	23	0	0	0	0	0	23	0	0	0	0	50	0	0	0	0	0	0	9	0	13	42	114	0	0	18	0	0	0	8	0	18	0	19
SPPN21	Bacteria;Proteobacteria;Epsilonproteobacteria;Campylobacterales;Campylobacteraceae;Campylobacter;multispecies_sppn21_2_nov_100.000%	1	23	0	0	18	74	38	3	0	26	0	0	0	0	0	7	11	1	9	37	0	0	2	14	0	0	5	0	0	3	41	9	10	9	0	74	0	0	16	12	6	0	2	16	0	4	0	0	6	0	0	0	0	12	5	6	0	0	0	4	0	22	12	0	5	12	3	8	0	0	14	0	0	0	5	0	7	0	24	12	79	0	18	0	21	0	35
SPPN35	Bacteria;Firmicutes;Negativicutes;Selenomonadales;Selenomonadaceae;Selenomonas;multispecies_sppn35_3_nov_100.000%	0	0	5	0	20	0	18	23	0	36	0	0	0	52	0	0	0	0	0	0	82	23	3	74	0	0	0	0	0	0	7	13	9	0	0	17	0	3	0	0	0	20	11	0	19	3	49	0	0	0	0	0	0	15	0	22	0	0	0	39	0	3	18	0	0	51	0	0	0	0	8	0	5	0	0	0	5	0	20	5	0	48	256	9	97	15	3
SPPN5	Bacteria;Proteobacteria;Betaproteobacteria;Neisseriales;Neisseriaceae;Neisseria;multispecies_sppn5_4_nov_99.536%	0	0	1425	0	0	892	81	7065	0	1779	22	123	39	175	359	113	510	1288	790	265	105	5638	0	0	62	41	94	677	927	143	34	1174	422	46	0	574	0	2642	0	589	296	463	63	505	0	70	3854	0	67	0	0	2666	154	1169	16	1130	55	0	0	3598	804	0	15	159	0	0	291	302	0	34	283	0	0	0	226	118	90	0	858	836	66	30	331	892	221	386	432

Download OTU Tables at Different Taxonomy Levels
Phylum	Count*:	Relative**:	CLR***:
Class	Count*:	Relative**:	CLR***:
Order	Count*:	Relative**:	CLR***:
Family	Count*:	Relative**:	CLR***:
Genus	Count*:	Relative**:	CLR***:
Species	Count*:	Relative**:	CLR***:
* Read count
** Relative abundance (count/total sample count)
*** Centered log ratio transformed abundance

;

The species listed in the table has full taxonomy and a dynamically assigned species ID specific to this report. When some reads match with the reference sequences of more than one species equally (i.e., same percent identiy and alignmnet coverage), they can't be assigned to a particular species. Instead, they are assigned to multiple species with the species notaton "s__multispecies_spp2_2". In this notation, spp2 is the dynamic ID assigned to these reads that hit multiple sequences and the "_2" at the end of the notation means there are two species in the spp2.

You can look up which species are included in the multi-species assignment, in this table below:

Another type of notation is "s__multispecies_sppn2_2", in which the "n" in the sppn2 means it's a potential novel species because all the reads in this species have < 98% idenity to any of the reference sequences. They were grouped together based on de novo OTU clustering at 98% identity cutoff. And then a representative sequence was chosed to BLASTN search against the reference database to find the closest match (but will still be < 98%). This representative sequence also matched equally to more than one species, hence the "spp" was given in the label.

Taxonomy Bar Plots for All Samples

Taxonomy Bar Plots for Individual Comparison Groups

Comparison No.	Comparison Name	Families		Genera		Species
Comparison 1	HighCA vs LowCA vs MediumCA	PDF	SVG	PDF	SVG	PDF	SVG

VIII. Analysis - Alpha Diversity

In ecology, alpha diversity (α-diversity) is the mean species diversity in sites or habitats at a local scale. The term was introduced by R. H. Whittaker[5][6] together with the terms beta diversity (β-diversity) and gamma diversity (γ-diversity). Whittaker's idea was that the total species diversity in a landscape (gamma diversity) is determined by two different things, the mean species diversity in sites or habitats at a more local scale (alpha diversity) and the differentiation among those habitats (beta diversity).

References:

Alpha Diversity Analysis by Rarefaction

Diversity measures are affected by the sampling depth. Rarefaction is a technique to assess species richness from the results of sampling. Rarefaction allows the calculation of species richness for a given number of individual samples, based on the construction of so-called rarefaction curves. This curve is a plot of the number of species as a function of the number of samples. Rarefaction curves generally grow rapidly at first, as the most common species are found, but the curves plateau as only the rarest species remain to be sampled [7].

References:

Willis AD. Rarefaction, Alpha Diversity, and Statistics. Front Microbiol. 2019 Oct 23;10:2407. doi: 10.3389/fmicb.2019.02407. PMID: 31708888; PMCID: PMC6819366.

Boxplot of Alpha-diversity Indices

The two main factors taken into account when measuring diversity are richness and evenness. Richness is a measure of the number of different kinds of organisms present in a particular area. Evenness compares the similarity of the population size of each of the species present. There are many different ways to measure the richness and evenness. These measurements are called "estimators" or "indices". Below is a diversity of 3 commonly used indices showing the values for all the samples (dots) and in groups (boxes).

Printed on each graph is the statistical significance p values of the difference between the groups. The significance is calculated using either Kruskal-Wallis test or the Wilcoxon rank sum test, both are non-parametric methods (since microbiome read count data are considered non-normally distributed) for testing whether samples originate from the same distribution (i.e., no difference between groups). The Kruskal-Wallis test is used to compare three or more independent groups to determine if there are statistically significant differences between their medians. The Wilcoxon Rank Sum test, also known as the Mann-Whitney U test, is used to compare two independent groups to determine if there is a significant difference between their distributions.
The p-value is shown on the top of each graph. A p-value < 0.05 is considered statistically significant between/among the test groups.

Alpha Diversity Box Plots for All Groups

Alpha Diversity Box Plots for Individual Comparisons at Species level

Comparison 1

HighCA vs LowCA vs MediumCA

View in PDF

View in SVG

The above comparisons are at the species-level. Comparisons of other taxonomy levels, from phylum to genus, are also available:

Group Significance Evaluation of Alpha-diversity Indices with QIIME2

The above comparisons and significance tests were done under the R environment. For compasison (also because this was included in the pipeline early on) we also use the Kruskal Wallis H test provided the "alpha-group-significance" fucntion in the QIIME 2 "diversity" package. As mentioned above, Kruskal Wallis test is the non-parametric alternative to the One Way ANOVA. Non-parametric means that the test doesn’t assume your data comes from a particular distribution. The H test is used when the assumptions for ANOVA aren’t met (assumption of normality). It is sometimes called the one-way ANOVA on ranks, as the ranks of the data values are used in the test rather than the actual data points. The H test determines whether the medians of two or more groups are different.

Below are the Kruskal Wallis H test results for each comparison based on three different alpha diversity measures: 1) Observed species (features), 2) Shannon index, and 3) Simpson index.

Comparison 1.

HighCA vs LowCA vs MediumCA

Observed Features

Shannon Index

Simpson Index

IX. Analysis - Beta Diversity

NMDS and PCoA Plots

Beta diversity compares the similarity (or dissimilarity) of microbial profiles between different groups of samples. There are many different similarity/dissimilarity metrics [8]. In general, they can be quantitative (using sequence abundance, e.g., Bray-Curtis or weighted UniFrac) or binary (considering only presence-absence of sequences, e.g., binary Jaccard or unweighted UniFrac). They can be even based on phylogeny (e.g., UniFrac metrics) or not (non-UniFrac metrics, such as Bray-Curtis, etc.).

For microbiome studies, species profiles of samples can be compared with the Bray-Curtis dissimilarity, which is based on the count data type. The pair-wise Bray-Curtis dissimilarity matrix of all samples can then be subject to either multi-dimensional scaling (MDS, also known as PCoA) or non-metric MDS (NMDS).

MDS/PCoA is a scaling or ordination method that starts with a matrix of similarities or dissimilarities between a set of samples and aims to produce a low-dimensional graphical plot of the data in such a way that distances between points in the plot are close to original dissimilarities.

NMDS is similar to MDS, however it does not use the dissimilarities data, instead it converts them into the ranks and use these ranks in the calculation.

In our beta diversity analysis, Bray-Curtis dissimilarity matrix was first calculated and then plotted by the PCoA and NMDS separately. Below are beta diveristy results for all groups together:

References:

Plantinga, AM, Wu, MC (2021). Beta Diversity and Distance-Based Analysis of Microbiome Data. In: Datta, S., Guha, S. (eds) Statistical Analysis of Microbiome Data. Frontiers in Probability and the Statistical Sciences. Springer, Cham. https://doi.org/10.1007/978-3-030-73351-3_5

NMDS and PCoA Plots for All Groups

The above PCoA and NMDS plots are based on count data. The count data can also be transformed into centered log ratio (CLR) for each species. The CLR data is no longer count data and cannot be used in Bray-Curtis dissimilarity calculation. Instead CLR can be compared with Euclidean distances. When CLR data are compared by Euclidean distance, the distance is also called Aitchison distance.

Below are the NMDS and PCoA plots of the Aitchison distances of the samples:

NMDS and PCoA Plots for Individual Comparisons at Species level

Comparison No.	Comparison Name	NMDA				PCoA
Comparison No.	Comparison Name	Bray-Curtis		CLR Euclidean		Bray-Curtis		CLR Euclidean
Comparison 1	HighCA vs LowCA vs MediumCA	PDF	SVG	PDF	SVG	PDF	SVG	PDF	SVG

Interactive 3D PCoA Plots - Bray-Curtis Dissimilarity

Interactive 3D PCoA Plots - Euclidean Distance

Interactive 3D PCoA Plots - Correlation Coefficients

Group Significance of Beta-diversity Indices

To test whether the between-group dissimilarities are significantly greater than the within-group dissimilarities, the "beta-group-significance" function provided in the QIIME 2 "diversity" package was used with PERMANOVA (permutational multivariate analysis of variance) as the group significant testing method.

Three beta diversity matrics were used: 1) Bray–Curtis dissimilarity 2) Correlation coefficient matrix , and 3) Aitchison distance (Euclidean distance between clr-transformed compositions).

Comparison 1.

HighCA vs LowCA vs MediumCA

Bray–Curtis

Correlation

Aitchison

X. Analysis - Differential Abundance

16S rRNA next generation sequencing (NGS) generates a fixed number of reads that reflect the proportion of different species in a sample, i.e., the relative abundance of species, instead of the absolute abundance. In Mathematics, measurements involving probabilities, proportions, percentages, and ppm can all be thought of as compositional data. This makes the microbiome read count data “compositional” (Gloor et al, 2017). In general, compositional data represent parts of a whole which only carry relative information [9].

The problem of microbiome data being compositional arises when comparing two groups of samples for identifying “differentially abundant” species. A species with the same absolute abundance between two conditions, its relative abundances in the two conditions (e.g., percent abundance) can become different if the relative abundance of other species change greatly. This problem can lead to incorrect conclusion in terms of differential abundance for microbial species in the samples.

When studying differential abundance (DA), the current better approach is to transform the read count data into log ratio data. The ratios are calculated between read counts of all species in a sample to a “reference” count (e.g., mean read count of the sample). The log ratio data allow the detection of DA species without being affected by percentage bias mentioned above

In this report, a compositional DA analysis tool “ANCOM” (analysis of composition of microbiomes) was used [10]. ANCOM transforms the count data into log-ratios and thus is more suitable for comparing the composition of microbiomes in two or more populations. "ANCOM" generates a table of features with W-statistics and whether the null hypothesis is rejected. The “W” is the W-statistic, or number of features that a single feature is tested to be significantly different against. Hence the higher the "W" the more statistical sifgnificant that a feature/species is differentially abundant.

References:

Gloor GB, Macklaim JM, Pawlowsky-Glahn V, Egozcue JJ. Microbiome Datasets Are Compositional: And This Is Not Optional. Front Microbiol. 2017 Nov 15;8:2224. doi: 10.3389/fmicb.2017.02224. PMID: 29187837; PMCID: PMC5695134.
Mandal S, Van Treuren W, White RA, Eggesbø M, Knight R, Peddada SD. Analysis of composition of microbiomes: a novel method for studying microbial composition. Microb Ecol Health Dis. 2015 May 29;26:27663. doi: 10.3402/mehd.v26.27663. PMID: 26028277; PMCID: PMC4450248.

ANCOM Differential Abundance Analysis

ANCOM Results for Individual Comparisons

Comparison No.	Comparison Name
Comparison 1.	HighCA vs LowCA vs MediumCA

ANCOM-BC2 Differential Abundance Analysis

Starting with version V1.2, we include the results of ANCOM-BC (Analysis of Compositions of Microbiomes with Bias Correction) (Lin and Peddada 2020) [11]. ANCOM-BC is an updated version of "ANCOM" that:
(a) provides statistically valid test with appropriate p-values,
(b) provides confidence intervals for differential abundance of each taxon,
(c) controls the False Discovery Rate (FDR),
(d) maintains adequate power, and
(e) is computationally simple to implement.

The bias correction (BC) addresses a challenging problem of the bias introduced by differences in the sampling fractions across samples. This bias has been a major hurdle in performing DA analysis of microbiome data. ANCOM-BC estimates the unknown sampling fractions and corrects the bias induced by their differences among samples. The absolute abundance data are modeled using a linear regression framework.

Starting with version V1.43, ANCOM-BC2 is used instead of ANCOM-BC, So that multiple pairwise directional test can be performed (if there are more than two gorups in a comparison). When performing pairwise directional test, the mixed directional false discover rate (mdFDR) is taken into account. The mdFDR is the combination of false discovery rate due to multiple testing, multiple pairwise comparisons, and directional tests within each pairwise comparison. The mdFDR is adopted from (Guo, Sarkar, and Peddada 2010 [12]; Grandhi, Guo, and Peddada 2016 [13]). For more detail explanation and additional features of ANCOM-BC2 please see author's documentation.

References:

Lin H, Peddada SD. Analysis of compositions of microbiomes with bias correction. Nat Commun. 2020 Jul 14;11(1):3514. doi: 10.1038/s41467-020-17041-7. PMID: 32665548; PMCID: PMC7360769.
Guo W, Sarkar SK, Peddada SD. Controlling false discoveries in multidimensional directional decisions, with applications to gene expression data on ordered categories. Biometrics. 2010 Jun;66(2):485-92. doi: 10.1111/j.1541-0420.2009.01292.x. Epub 2009 Jul 23. PMID: 19645703; PMCID: PMC2895927.
Grandhi A, Guo W, Peddada SD. A multiple testing procedure for multi-dimensional pairwise comparisons with application to gene expression studies. BMC Bioinformatics. 2016 Feb 25;17:104. doi: 10.1186/s12859-016-0937-5. PMID: 26917217; PMCID: PMC4768411.

ANCOM-BC Results for Individual Comparisons

Comparison No.	Comparison Name
Comparison 1.	HighCA vs LowCA vs MediumCA

LEfSe - Linear Discriminant Analysis Effect Size

LEfSe (Linear Discriminant Analysis Effect Size) is an alternative method to find "organisms, genes, or pathways that consistently explain the differences between two or more microbial communities" (Segata et al., 2011) [14]. Specifically, LEfSe uses rank-based Kruskal-Wallis (KW) sum-rank test to detect features with significant differential (relative) abundance with respect to the class of interest. Since it is rank-based, instead of proportional based, the differential species identified among the comparison groups is less biased (than percent abundance based).

Reference:

Segata N, Izard J, Waldron L, Gevers D, Miropolsky L, Garrett WS, Huttenhower C. Metagenomic biomarker discovery and explanation. Genome Biol. 2011 Jun 24;12(6):R60. doi: 10.1186/gb-2011-12-6-r60. PMID: 21702898; PMCID: PMC3218848.

HighCA vs LowCA vs MediumCA

XI. Analysis - Heatmap Profile

Species vs Sample Abundance Heatmap for All Samples

Heatmaps for Individual Comparisons

A) Two-way clustering - clustered on both columns (Samples) and rows (organism)

Comparison No.	Comparison Name	Family Level		Genus Level		Species Level
Comparison 1	HighCA vs LowCA vs MediumCA	PDF	SVG	PDF	SVG	PDF	SVG

B) One-way clustering - clustered on rows (organism) only

Comparison No.	Comparison Name	Family Level		Genus Level		Species Level
Comparison 1	HighCA vs LowCA vs MediumCA	PDF	SVG	PDF	SVG	PDF	SVG

C) No clustering

Comparison No.	Comparison Name	Family Level		Genus Level		Species Level
Comparison 1	HighCA vs LowCA vs MediumCA	PDF	SVG	PDF	SVG	PDF	SVG

XII. Analysis - Network Association

To analyze the co-occurrence or co-exclusion between microbial species among different samples, network correlation analysis tools are usually used for this purpose. However, microbiome count data are compositional. If count data are normalized to the total number of counts in the sample, the data become not independent and traditional statistical metrics (e.g., correlation) for the detection of specie-species relationships can lead to spurious results. In addition, sequencing-based studies typically measure hundreds of OTUs (species) on few samples; thus, inference of OTU-OTU association networks is severely under-powered. Here we use SPIEC-EASI (SParse InversE Covariance Estimation for Ecological Association Inference), a statistical method for the inference of microbial ecological networks from amplicon sequencing datasets that addresses both of these issues (Kurtz et al., 2015) [15]. SPIEC-EASI combines data transformations developed for compositional data analysis with a graphical model inference framework that assumes the underlying ecological association network is sparse. SPIEC-EASI provides two algorithms for network inferencing – 1) Meinshausen-Bühlmann's neighborhood selection (MB method) and inverse covariance selection (GLASSO method, i.e., graphical least absolute shrinkage and selection operator). This is fundamentally distinct from SparCC, which essentially estimate pairwise correlations. In addition to these two methods, we provide the results of a third method - SparCC (Sparse Correlations for Compositional Data)(Friedman & Alm 2012)[16], which is also a method for inferring correlations from compositional data. SparCC estimates the linear Pearson correlations between the log-transformed components.

References:

Kurtz ZD, Müller CL, Miraldi ER, Littman DR, Blaser MJ, Bonneau RA. Sparse and compositionally robust inference of microbial ecological networks. PLoS Comput Biol. 2015 May 7;11(5):e1004226. doi: 10.1371/journal.pcbi.1004226. PMID: 25950956; PMCID: PMC4423992.
Friedman J, Alm EJ. Inferring correlation networks from genomic survey data. PLoS Comput Biol. 2012;8(9):e1002687. doi: 10.1371/journal.pcbi.1002687. Epub 2012 Sep 20. PMID: 23028285; PMCID: PMC3447976.

SPIEC-EASI Network Inference by Neighborhood Selection (MB Method)

Association Network Inference by SparCC

XIII. Disclaimer

The results of this analysis are for research purpose only. They are not intended to diagnose, treat, cure, or prevent any disease. Forsyth and FOMC are not responsible for use of information provided in this report outside the research area.